Thanks for amplifying. I disagree with Thane on some things they said in that comment, and I don’t want to get into the details publicly, but I will say:
it’s worth looking at DeepSeek V3 and what they did with a $5.6 million training run (obviously that is still a nontrivial amount / CEO actively says most of the cost of their training runs is coming from research talent),
compute is still a bottleneck (and why I’m looking to build an ai safety org to efficiently absorb funding/compute for this), but I think Thane is not acknowledging that some types of research require much more compute than others (tho I agree research taste matters, which is also why DeepSeek’s CEO hires for cracked researchers, but don’t think it’s an insurmountable wall),
“Simultaneously integrating several disjunctive incremental improvements into one SotA training run is likely nontrivial/impossible in the general case.” Yes, seems really hard and a bottleneck...for humans and current AIs.
imo, AI models will become Omega Cracked at infra and hyper-optimizing training/inference to keep costs down soon enough (which seems to be what DeepSeek is especially insanely good at)
Thanks for amplifying. I disagree with Thane on some things they said in that comment, and I don’t want to get into the details publicly, but I will say:
Is this because it would reveal private/trade-secret information, or is this for another reason?
I mean that it’s a trade secret for what I’m personally building, and I would also rather people don’t just use it freely for advancing frontier capabilities research.
Thanks for amplifying. I disagree with Thane on some things they said in that comment, and I don’t want to get into the details publicly, but I will say:
it’s worth looking at DeepSeek V3 and what they did with a $5.6 million training run (obviously that is still a nontrivial amount / CEO actively says most of the cost of their training runs is coming from research talent),
compute is still a bottleneck (and why I’m looking to build an ai safety org to efficiently absorb funding/compute for this), but I think Thane is not acknowledging that some types of research require much more compute than others (tho I agree research taste matters, which is also why DeepSeek’s CEO hires for cracked researchers, but don’t think it’s an insurmountable wall),
“Simultaneously integrating several disjunctive incremental improvements into one SotA training run is likely nontrivial/impossible in the general case.” Yes, seems really hard and a bottleneck...for humans and current AIs.
imo, AI models will become Omega Cracked at infra and hyper-optimizing training/inference to keep costs down soon enough (which seems to be what DeepSeek is especially insanely good at)
Is this because it would reveal private/trade-secret information, or is this for another reason?
Yes (all of the above)
If you knew it was legal to disseminate the information, and trade-secret/copyright/patent law didn’t apply, would you still not release it?
I mean that it’s a trade secret for what I’m personally building, and I would also rather people don’t just use it freely for advancing frontier capabilities research.