[Mod note: I edited out your email from the comment, to save you from getting spam email and similar. If you really want it there, feel free to add it back! :) ]
Mod here: most of the team were away over the weekend so we just didn’t get around to processing this for personal vs frontpage yet. (All posts start as personal until approved to frontpage.) About to make a decision in this morning’s moderation review session, as we do for all other new posts.
Jake himself has participated in both Zika and Shigella challenge trials.
Your civilisation thanks you
Cool idea and congrats on shipping! Installed it now and am trying it. One user feedback is I found the having-to-wait for replies a bit frictiony. Maybe you could stream responses in chunks? (I did for a gpt-to-slack app once. You just can’t do letter-by-letter because you’ll be rate limited).
If that’s your belief, I think you should edit in a disclaimer to your TL;DR section, like “Gemini and GPT-4 authors report results close to or matching human performance at 95%, though I don’t trust their methodology”.
Also, the numbers aren’t “non-provable”: anyone could just replicate them with the GPT-4 API! (Modulo dataset contamination considerations.)
Humans achieve over 95% accuracy, while no model surpasses 50% accuracy. (2019)
A series on benchmarks does seem very interesting and useful—but you really gotta report more recent model results than from 2019!! GPT-4 allegedly surpasses 95.3% on HellaSwag, making that initial claim in the post very misleading.
Ah! I investigated and realise what the bug is. (Currently, only the single dialogue main author can archive it, not the other authors.) Will fix!
You can go to your profile page and press the “Archive” icon, that appears when hovering to the right of a dialogue.
Yeah, I’m interested in features in this space!
Another idea is to implement a similar algorithm to Twitter’s community votes: identify comments that have gotten upvotes by people who usually disagree with each other, and highlight those.
Oops, somehow didn’t see there was actually a market baked into your question
I’d also be interested in “Will there be a publicly revealed instance of a pause in either deployment or development, as a result of a model scoring High or Critical on a scorecard, by Date X?”
Made a Manifold market
Might make more later, and would welcome others to do the same! (I think one could ask more interesting questions than the one I asked above.)
Heads up, we support latex :)
Use Ctrl-4 to open the LaTex prompt (or Cmd-4 if you’re on a Mac). Open a centred LaTex popup using Ctrl-M (aka Cmd-M). If you’ve written some maths in normal writing and want to turn it into LaTex, if you highlight the text and then hit the LaTex editor button it will turn straight into LaTex.
I feel pretty frustrated at how rarely people actually bet or make quantitative predictions about existential risk from AI.
Without commenting on how often people do or don’t bet, I think overall betting is great and I’d love to see more it!
I’m also excited how much of it I’ve seen since Manifold started gaining traction. So I’d like to give a shout out to LessWrong users who are active on Manifold, in particular on AI questions. Some I’ve seen are:
Jaime Sevilla Molina
Good job everyone for betting on your beliefs :)
There are definitely more folks than this: feel free to mention more folks in the comments who you want to give kudos to (though please don’t dox anyone who’s name on either platforms is pseudonymous and doesn’t match the other).
LLM summaries aren’t yet non-hallucinatory enough that we’ve felt comfortable putting them on the site, but we have run some internal experiments on this.
Yep. Will set myself a reminder for 6 months from now!
They get a list of topics I’ve written/commented on, but so far as I can see I don’t have any way to see that list
Yeah, users can’t currently see that list for themselves (unless of course you create a new account, upvote yourself, and then look at the matching page through that account!).
However, the SQL for this is actually open source, in the function getUserTopTags: https://github.com/ForumMagnum/ForumMagnum/blob/master/packages/lesswrong/server/repos/TagsRepo.ts
What we show is “The tags a user commented on in the last 3 years, sorted by comment count, and excluding a set of tags that I deemed as less interesting to show to other users, for example because they were too general (World Modeling, …), too niche (Has Diagram, …) or too political (Drama, LW Moderation, …).”
(Sidenote, but you probably want to fix it: https://bristolaisafety.org/ appears to be down, as of the posting of this message)
I use Cursor, Copilot, sometimes GPT-4 in the chat, and also Hex.tech’s built-in SQL shoggoth.
I would say the combination of all those helps a huge amount, and I think has been key in allowing me to go from pre-junior to junior dev in the last few months. (That is, from not being able to make any site changes without painstaking handholding, to leading and building a lot of the Dialogue matching feature and associated stuff (I also had a lot of help from teammates, but less in a “they need to carry things over the finish line for me”, and more “I’m able to build features of this complexity, and they help out as collaborators”)).
But also, PR review and advise from senior devs on the team has also been key, and much appreciated.
Yeah, that reminds me of this thread https://www.lesswrong.com/posts/P32AuYu9MqM2ejKKY/so-geez-there-s-a-lot-of-ai-content-these-days