yams
fwiw I think piano pedagogy is exactly the kind of thing where an entrenched regime has propogated a suboptimal approach relative to most people’s goals on the instrument (and that there’s maybe only some single digit number of people in the country teaching outside of the small handful of dominant, not-especially-useful-to-most-people paradigms).
E.g., if what you want to do is play pop songs, a combination of ear training and a ‘simon says’ style app that reads midi off your keyboard and instructs you to play a ~random triad will basically get you there in ~100 hours of practice (assuming daily practice not to exceed ~3 hours/day). There are similarly straightforward training setups that I expect to be effective for other goals one may have on the instrument. I built ~all of my physical facility on the instrument in about three months of focused practice, and have similarly ‘cheated’ my way into my other capacities (I’m definitely missing things that other people 5 years into the instrument would have, but I also have a lot of things those folks don’t, and I prefer and deliberately pursued my skill profile over theirs).
I agree that few people who start playing piano as adults will ever play Rachmaninoff at competition level (but think very few people who enter into the pedagogical system designed to meet that end actually have that goal in practice).
I also (I think, although you don’t say so outright) agree that some tasks require developing wholly other senses — new channels for phenomenal sensation or new ways of comparing phenomenal sensation in an existing channel (e.g. audiation and relative pitch) — and that those aren’t well-captured in Oli’s ontology.
Wow that is surprising! Even after considering the suite of caveats one applies to benchmarks as evidence, I am very surprised.
All things considered I think I still lean harder on self-reports from lab and non-lab technical staff regarding the elicitation delta, but I’m much less confident than before.
[I suspect we may have other less interesting disagreement about how economically useful current systems could be if more effort were put toward juicing them, but happy to talk about that some other time; just mentioning this for completeness or something.]
Really? I would expect valuations to briefly stall out and then continue to grow when it became clear that the labs have a big lead when it comes to elicitation, scaffolding, etc.
I would also expect existing big-tech valuations to grow in this scenario—just not startups (although maybe they get a bump in the short term).
Can you say more about why you expect this? Trying to see if the answer is [real disagreement] or [Oli has superior knowledge of economics] (and also learn something, in the latter case).
‘Major AI labs can only justify their high valuations by developing very powerful, very general AI systems’ is a claim I sometimes hear. That is, many seem to expect ‘if no AGI in n years, then the bubble pops’.
However, I think just revolutionizing tech is likely enough to justify current valuation levels (and maybe as much as 4x current valuations; maybe even more?), given the market caps of other large tech companies (even if we exclude NVIDIA, which we may want to do because their current market cap is more heavily tied to the AI boom than others). After all, these are still ‘only’ 12-figure valuations in a sector where many of the major players have broken a trillion. A 12-figure valuation is consistent with ‘future major player of the kind that already exists’ and not ‘potential god emperor of the solar system’.
Using these big tech companies as a reference point and assuming very limited further capabilities progress (eg no TAI, AGI, [your favorite way to talk about very general, ~human level systems]), the major LLM companies still don’t obviously look overvalued to me. The tech pie by itself is big enough (and the current state of the tech looks, to me, sufficient to massively disrupt that entire sector), that if they capture enough of the sector (even if it takes a while), we won’t see a significant correction.
(there’s a nearby but unrelated point about whether the fundamentals of these companies look sound in a conventional sense, which I don’t mean to weigh in on here)
[recovering from a concussion so apologies if this is especially poorly written or unclear]
I don’t dispute that some postmodernists would consider cultural relativism central to their worldview, and think instead of ’mostly by its detractors’ I should have said ’often by its detractors’.
I’m glad you’re open to using different language.
I think it’s a mistake to call the above postmodernism and I’d be disappointed if your long form address of the above point were framed that way.
I agree this position is part of a bundle that’s associated with postmodernism (mostly by its detractors!), but the use here feels conflationary, adversarial, mind-killing.
I would find this future post much more readable, enjoyable, and easier to fit in my model of the world (and of Oli) if you didn’t use this piece of language.
A reason might be that the composition of the investor pool and relationships with shareholders will be very different. I think this is kind of a motte though, and would require the OP to be making a more nuanced sort of argument.
(I really just mean ‘might be’; I haven’t thought enough about this to have much of a take, but this is something that occurred to me that I’m slightly surprised didn’t also occur to you.)
I think a lot about dice and cards, especially because I have the most trouble with probabilities that are <5 percent.
‘Number of consecutive perfect draws’ in magic is very useful for me. Eg ‘x consecutive one of one draws‘ in a draft is 1/~(20-30)^x. Imprecise, but gives me any handle at all for pretty slim odds.
Similarly, dice rolls. # of consecutive critical successes, or # of critical successes among n attempts, or ‘critical success * rolling n on an additional n sided die’.
I’ve played a lot of dice-heavy games and used dice to help resolve indecision since I was ~11, but have only started taking putting non-bullshit probabilities on things as a serious skill very recently (maybe a year).
The post is meant to be somewhat agnostic on the question—conditional on one has a map, here’s a common failure mode. It’s also meant to point in the direction of ‘reconsider the value of your map’.
Separately, I think I ~endorse your first comment, but I also think there are cases in which you should definitely have a map (eg you are attempting to achieve political ends). So I think your second comment is somewhat overstated.
You wrote this comment in an adversarial tone but I Just Agree With You.
Indeed, this is an alternate formulation of the thesis of my post, and even uses language I used when characterizing the post itself to someone in the office ~2 hours ago.
most of the things you said seemed like on average it would increase the amount I expected some kind of adversarial posture to make sense
I don’t understand this. Can you say more?
Meanings of political identities shift dramatically based on context, and you can’t manually confirm the beliefs of everyone present at your ‘gathering of people with x political identity’. To the extent that your political identity is based on Real Beliefs with Real Consequences, you should expect not to have much in common with many other people who declare the same identity when you move to a new place (or corner of the internet).
Example: In rural Southeast Texas, Confederate flags are a common sight, and my geometry teacher once told us about a cross burning he witnessed (which a few students murmured we really ought to bring back).
The majority of people genuinely hold at least one belief that, to many of my coastal-elite-descended friends, would seem comical. E.g., women should never have jobs and should rarely speak (especially in public), men with long hair are wanton or gay or trans or both, beating children (not like ‘spanking’ but like ‘anything short of broken bones’) is not only fine but your duty as a father, weed overdose not only can but will definitely kill you, megadoses of zinc can cure cancer, the covid vaccine is the mark of the beast from the book of revelation, high school football ought to be the most important thing in your life and, if it isn’t, you are not just odd but untrustworthy, and abortion doctors force-feed fetuses to geese to make fois gras for gay New Yorkers (of which the force feeding is the only ethical component).
Okay, I made up the last one, but the rest are actual positions I’ve heard espoused hundreds or thousands of times by people I met between the ages of 14 and 18.
Also many people talk like this, and everyone’s a ‘libertarian’.
My mom’s from a conservative California family with environmentalist sympathies, and we had something like 60 percent overlap in our views prior to the Texas move. However, I soon found that everyone around who wasn’t liable to drop one of these devastating truth bombs on me thought of themselves as somewhere to the left of Bernie Sanders and read 20th century Marxist writings in their free time. Often these people would think leftist voices at the national scale were somewhat silly or focused on the wrong things (e.g. identity politics), but they nonetheless considered themselves closer to those views than to the other views present in their environs. (There was a democratic party around, but it was very different from the national democratic party for reasons I won’t address here.)
I assume most readers are in an environment more similar to my current environment (Berkeley, CA) than to Lumberton, TX, and so won’t explicate the delta.
I think there’s a lot of mind-killing that happens as a result of relying on a presumed shared vocabulary for political identities that does not exist. When I say something left-coded, my Rationalist Libertarian Interlocutor often reproaches me, and then as we talk more about it, they often conclude that I’m a ‘boring centrist like everybody else’ who uses the language of the left owing to some biographical quirk.
I submit instead that everyone’s sense of the political map is hopelessly warped due to biographical quirks, and that assuming an adversarial posture on the basis of someone’s declared political identity is often, and maybe even ~always, a mistake.
New reacts available only to paid users of LessWrong Premium (not you freeloaders) facilitate frictionless, borderline-telepathic communication.
‘I will NEVER change my mind’: Use this react to assert that you’re content with exactly how wrong you are (which is not at all), and that the case is permanently closed on this matter, so far as you’re concerned.1
‘EY Stamp of Approval’: Use this react to assert that, on your personal authority, Eliezer Yudkowsky agrees with the contents of the comment, rendering it beyond reproach.
‘NOT EY Approved’: Use this react to assert that, on your personal authority, Eliezer Yudkowsky disagrees with the contents of the comment, rendering it immensely reproachful. Users who accrue too many ‘NOT EY Approved’ reacts will have their accounts suspended (although actual thresholds here have yet to be set).2
‘May as well be AI’: Use this react when you’re indifferent to whether or not a statement was generated by AI because shit, it may as well be. You’re ignoring it either way.
‘Have you even read plane crash?’: Use this react when your interlocutor’s unfamiliarity with prior literature is clearly on display.
‘China Hawk’: Use this react to assert American supremacy and insist that its recipient is derelict in their duty to ensure the preeminence of the greatest country in the history of conscious life from here to the other side of the singularity.
’Toilet’3
‘Sure, buddy’: We all know that the optimal amount of mental masturbation is non-zero. But some reach far beyond the zone of optimality and into the depths of their pants to produce truly monstrous works of self-gratification. Previously, one was compelled to express such an opinion obliquely on LessWrong, or else shatter the decorum of the space and open oneself up to similar critiques. In Beta, we now have the power to quietly acknowledge the reality of the situation, without derailing the gratification itself.
———————————————————————————————1This has replaced the ‘I beseech thee’ emoji, which never worked anyway.
2 Note that both EY-invoking reacts are invisible to anyone logged into the @-EliezerYudkowsky LessWrong account, or from any IP address that has ever been logged into that account. No point in having the man himself weigh in when so many LessWrong users are so well practiced at speaking on his behalf!
3Beta users haven’t settled on how this react ought to be deployed. I’ve seen utilizations ranging from ‘This post belongs in the toilet’ to ‘I enjoyed reading this on the toilet.’
I don’t have a real explanation, but I’ve been interested in this, since it feels like the LLM is doing something like the opposite of what writers intend to do (at least in the effect). As if there’s some portion of language space that invites engagement, or trips an alarm in the reader that says ‘there’s something in this!’ Human writers swim toward that portion of the space; LLMs swim away from it.
[I would be unsurprised to find I have not expressed this well.]
I found this post was pretty disappointing in its argumentation, for reasons you describe, and I fairly strongly support its conclusion.
high potential upside for alignment
I like AE Studios!
Can you give an at-all-more-concrete operationalization of this?
Who’s evaluating the proposals and where’s the best public-facing analog of their views on alignment / the criteria applied to evaluate research?
Same question as above, but this time for ‘whoever decides whether this program gets scaled’.
Can you give an example of prior work (e.g., a paper from Anthropic’s safety teams) that would have been competitive for this program?
For instance, if the higher-ups are thinking ‘the problem is that we need more reliable LLMs on roughly the current instantiation of LLMs’, that’s very different from ‘the problem is we need to align superhuman coders that will build the aligned AI, plausibly on some other substrate’, or ‘the problem is we need to identify an alternative architecture that’s fundamentally easier to align than LLMs’.
The people writing these proposals are likely fitting whatever piece you work on for them into a larger picture, and I’m wondering which of the competing larger pictures are advantaged in the application phase.
I was comparing to other video posts by Sanders.
I was comparing to a broader activation of Eliezer’s audience vs any given tweet.
Outperforming the ‘average’ is the wrong standard for ‘blowing up’. ‘Blowing up’ would be ‘outperforming all the recent similar artifacts’, at least as I intended it in my original post.
Meta: it feels pretty strange to have used an underspecified colloquial term, to have walked back the applicability of that term as I intended to use it, and then to be told I’m wrong for walking it back. The point I cared about capturing in that edit is ‘this tweet didn’t do as well as I expected when it first dropped.’ That’s a claim about my own expectations.
Depends on the reference class. As of the past year or so, 1m eyes on a piece of AI safety content isn’t crazy, especially for a video on Twitter, where my impression is the criteria for what counts as a ‘view’ are pretty liberal. Like, plausibly the video has been viewed in full (much) less than half that many times.
Separately, videos posted by that account seem to routinely get ~1m views—not outperforming other content from an external collaborator is a little disappointing from a raw metrics perspective! Naively you’d hope to get the combined weight of your respective audiences, which seems to have only somewhat happened here.
When I posted this I think I expected we’d get to 2m views in the first couple days (weakly outperforming other Sanders Twitter content). I think with a different video, that could have happened.
Still an exciting crossover episode.
How would you respond to the counter arguments?