Lech Mazur

Karma: 346

Advameg, Inc. CEO

Founder, city-data.com

https://twitter.com/LechMazur

Author: County-level COVID-19 machine learning case prediction model.

Author: AI assistant for melody composition.

Lech Mazur 21 Aug 2022 5:57 UTC
30 points
−2
on: What’s the Least Impressive Thing GPT-4 Won’t be Able to Do
It won’t be able to multiply 5-digit integers (middle digits will be wrong).

Lech Mazur 5 Mar 2024 16:49 UTC
LW: 28 AF: 12
8
AF
on: Anthropic release Claude 3, claims >GPT-4 Performance
I’ve just created a NYT Connections benchmark. 267 puzzles, 3 prompts for each, uppercase and lowercase.

Results:

GPT-4 Turbo: 31.0

Claude 3 Opus: 27.3

Mistral Large: 17.7

Mistral Medium: 15.3

Gemini Pro: 14.2

Qwen 1.5 72B Chat: 10.7

Claude 3 Sonnet: 7.6

GPT-3.5 Turbo: 4.2

Mixtral 8x7B Instruct: 4.2

Llama 2 70B Chat: 3.5

Nous Hermes 2 Yi 34B: 1.5
- Partial credit is given if the puzzle is not fully solved
- There is only one attempt allowed per puzzle, 0-shot. Humans get 4 attempts and a hint when they are one step away from solving a group
- Gemini Advanced is not yet available through the API
(Edit: I’ve added bigger models from together.ai and from Mistral)

Lech Mazur 2 Oct 2022 1:44 UTC
23 points
11
on: Paper: Large Language Models Can Self-improve [Linkpost]
“Anonymous” and 540B parameters, hmm… I’m sure it’s not from the company named after an even larger number.
GSM8K = grade school math word problems
DROP = reading comprehension benchmark requiring discrete reasoning over paragraphs
OpenBookQA = question-answering dataset modeled after open book exams for assessing human understanding of a subject. 5,957 multiple-choice elementary-level science questions
ANLI-A3 = adversarial benchmark designed to be challenging to current state-of-the-art models

Lech Mazur 31 Mar 2023 2:14 UTC
16 points
5
on: Pausing AI Developments Isn’t Enough. We Need to Shut it All Down by Eliezer Yudkowsky
Fox News’ Peter Doocy uses all his time at the White House press briefing to ask about an assessment that “literally everyone on Earth will die” because of artificial intelligence: “It sounds crazy, but is it?”
https://twitter.com/therecount/status/1641526864626720774

Lech Mazur 15 Jul 2021 21:29 UTC
15 points
on: AlphaFold 2 paper released: “Highly accurate protein structure prediction with AlphaFold”, Jumper et al 2021
Related development: https://www.nature.com/articles/d41586-021-01968-y
“Meanwhile, an academic team has developed its own protein-prediction tool inspired by AlphaFold 2, which is already gaining popularity with scientists. That system, called RoseTTaFold, performs nearly as well as AlphaFold 2, and is described in a paper in Science paper also published on 15 July[2] ”

Lech Mazur 24 Aug 2021 22:24 UTC
14 points
on: The Codex Skeptic FAQ
There is a new study out that found that 40% of Copilot’s code contributions in high-risk scenarios were vulnerable: https://arxiv.org/abs/2108.09293

Lech Mazur 14 Nov 2022 19:13 UTC
13 points
11
in reply to: Tamay’s comment on: Will we run out of ML data? Evidence from projecting dataset size trends
Generated data can be low quality but indistinguishable. Unless your classifier has access to more data or is better in some other way (e.g. larger, better architecture), you won’t know. In fact, if you could know without labeling generated data, why would you generate something that you can tell is bad in the first place? I’ve seen this in practice in my own project.

Lech Mazur 3 Apr 2022 7:53 UTC
13 points
on: Book review: Very Important People
In Miami there absolutely are explicit payments for models to join tables. This can lead to all of them leaving on the dot at let’s say 3:00 AM, since that’s how long they were required to stay at the club to earn their pay. NYC has a different dynamic.

Lech Mazur 3 Jan 2022 8:20 UTC
12 points
on: Self-Organised Neural Networks: A simple, natural and efficient way to intelligence
As you probably know, there are multiple theoretically-interesting ML ideas that achieve very good results on MNIST. Have you tried more challenging image recognition benchmarks, such as CIFAR-100, or some non-CV benchmark? Since you posted your code, I wouldn’t mind spending a bit of time looking over what you’ve accomplished. However, MNIST, which is now considered pretty much a toy benchmark (I don’t consider PI-MNIST to be a better benchmark), will likely be an obstacle to get others to also look at it in-depth, as it will be considered quite preliminary. Another practical point: using C and CUDA kernels also makes it less accessible to a good percentage of researchers.

Lech Mazur 20 Aug 2023 0:11 UTC
11 points
0
on: AI Forecasting: Two Years In
Following your forecast’s closing date, MATH has reached 84.3% as per this paper if counting GPT-4 Code Interpreter: https://arxiv.org/abs/2308.07921v1

Lech Mazur 8 May 2023 6:43 UTC
11 points
7
on: TED talk by Eliezer Yudkowsky: Unleashing the Power of Artificial Intelligence
I wouldn’t recommend watching this talk to someone unfamiliar with the AI risk arguments, and I think promoting it would be a mistake. Yudkowsky seemed better on Lex Friedman’s podcast. A few more Rational Animations-style AI risk YouTube videos would be more effective.

“Squiggle Maximizer” and “Paperclip Maximizer” have to go. They’re misleading terms for the orthogonal AI utility function that make the concept seem like a joke when communicating with the general public. Better to use a different term, preferably something that represents a goal that’s valuable to humans. All funny-sounding insider jargon should be avoided cough notkilleveryoneism cough.

Nanotech is too science-fictiony and distracting. More realistic near-term scenarios (hacks of nuclear facilities like Stuxnet to control energy, out-of-control trading causing world economies to crash and leading to a full-on nuclear war, large-scale environmental disaster that’s lethal to humans but not machines, gain-of-function virus engineering, controlling important people through blackmail) would resonate better and emphasize the fragility of human civilization.

The chess analogy (“you will lose but I can’t tell you exactly how”) is effective. It’s also challenging to illustrate to people how something can be significantly more intelligent than them, and this analogy simultaneously helps convey that by reminding people how they easily lose to computers.

Lech Mazur 12 Apr 2022 1:12 UTC
11 points
in reply to: Rafael Harth’s comment on: The Efficient LessWrong Hypothesis—Stock Investing Competition
Ethereum’s market cap is 47% of Bitcoin’s. While you can argue that the market cap of cryptocurrencies is arbitrary, the price of one coin is even more arbitrary.

Lech Mazur 24 Feb 2023 12:45 UTC
10 points
3
on: Full Transcript: Eliezer Yudkowsky on the Bankless podcast
Yudkowsky argues his points well in longer formats, but he could make much better use of his Twitter account if he cares about popularizing his views. Despite having Musk responding to his tweets, his posts are very insider-like with no chance of becoming widely impactful. I am unsure if he is present on other social media, and I understand that there are some health issues involved, but a YouTube channel would also be helpful if he hasn’t completely given up.
I do think it is a fact that many people involved in AI research and engineering, such as his example of Chollet, have simply not thought deeply about AGI and its consequences.

Lech Mazur 18 Dec 2021 16:24 UTC
10 points
in reply to: Tom Lieberum’s comment on: DL towards the unaligned Recursive Self-Optimization attractor
I’d like to but it’ll have to wait until I’m finished with a commercial project where I’m using them or until I replace these techniques with something else in my code. I’ll post a reply here once I do. I’d expect somebody else to discover at least one of them in the meantime, they’re not some stunning insights.

Lech Mazur 10 Nov 2022 8:37 UTC
9 points
7
in reply to: jacob_cannell’s comment on: FTX will probably be sold at a steep discount. What we know and some forecasts on what will happen next
From what I gather, Alameda wasn’t worth quite as much but he had a larger stake in it. Both appear to be worthless now. There is also a U.S. subsidiary ftx.us, which according to them “is a separate entity with separate management personnel, tech infrastructure, and licensing.” Some calculations I’ve seen put SBF’s net worth below $1 billion now and I think it’s probable that he’ll have to deal with some big legal issues.

Lech Mazur 23 Aug 2023 0:08 UTC
8 points
0
on: State of Generally Available Self-Driving
Baidu and Pony.ai have permits for fully driverless robotaxis in China: https://www.globaltimes.cn/page/202303/1287492.shtml

Lech Mazur 21 Feb 2023 5:36 UTC
8 points
4
in reply to: gwern’s comment on: Bing Chat is blatantly, aggressively misaligned
Gwern, have you actually tried Bing Chat yet? If it is GPT-4, then it’s a big disappointment compared to how unexpectedly good ChatGPT was. It fails on simple logic and math questions, just like ChatGPT. I don’t find the ability to retrieve text from the web to be too impressive—it’s low-laying fruit that was long expected. It’s probably half-baked simply because Microsoft is in a hurry because they have limited time to gain market share before Google integrates Bard.

Lech Mazur 29 Sep 2022 4:30 UTC
8 points
0
in reply to: Darcey’s comment on: Why I think strong general AI is coming soon
There have been a few papers with architectures showing performance that matches transformers on smaller datasets with scaling that looks promising. I can tell you that I’ve switched from attention to an architecture loosely based on one of these papers because it performed better on a smallish dataset in my project but I haven’t tested it on any standard vision or language datasets, so I don’t have any concrete evidence yet. Nevertheless, my guess is that indeed there is nothing special about transformers.

Lech Mazur 3 Jan 2022 9:44 UTC
7 points
in reply to: Daniel Kokotajlo’s comment on: Self-Organised Neural Networks: A simple, natural and efficient way to intelligence
I don’t think PI-MNIST SOTA is really a thing. The OP even links to the original dropout paper from 2014, which shows this. MNIST SOTA is much less of a thing than it used to be but that’s at 99.9%+, not 98.9%.

Lech Mazur 18 Dec 2021 14:21 UTC
7 points
in reply to: Quintin Pope’s comment on: DL towards the unaligned Recursive Self-Optimization attractor
Some anecdotal evidence: in the last few months I was able to improve on three 2021 conference-published, peer-reviewed DL papers. In each case, the reason I was able to do it was that the authors did not fully understand why the technique they used worked and obviously just wrote a paper around something that they experimentally found to be working. In addition, there are two pretty obvious bugs in a reasonably popular optimization library (100+ github stars) that reduce performance and haven’t been fixed or noticed in “Issues” for a long time. Seems that none of its users went step-by-step or tried to carefully understand what was going on.
What all four of these have in common is that they are still actually working, just not optimally. Their experimental results are not fake. This does not fill me with hope for the future of interpretability.