streawkceur

Karma: 53

streawkceur 14 Apr 2026 11:12 UTC
7 points
−4
on: Do not be surprised if LessWrong gets hacked
Regarding the permanent deletion, someone apparently wrote a merged patch for GDPR-compliant data deletion: https://github.com/ForumMagnum/ForumMagnum/pull/9466 , although it doesn’t currently seem to be reachable by the frontend according to my AI agent—what would be missing if this was made accessible to users?
While it would be hard to enforce GDPR against Lightcone, and the equivalent California law apparently doesn’t apply to nonprofits, it still seems like a lame excuse to my mind that a simple request to delete data would be “too complicated” and “impose externalities on others”—these laws are motivated exactly by the sort of problems your post discusses, implementing a programmatic account-and-data-deletion feature requires little effort compared to implementing the entire forum, and if you don’t even offer that the externalizing is just going the other way.
When taking GDPR as a standard, it would also not be correct that all data would need to be deleted from all backups immediately for reliability—it would be enough if they get rotated out at some point. Finally, if there is some problem with the database integrity not reflected in the PR, anonymizing the PII while leaving the rows intact would also be acceptable.

streawkceur 16 Dec 2025 14:56 UTC
3 points
2
in reply to: Jesper L.’s comment on: Gemini 3 is Evaluation-Paranoid and Contaminated
As mentioned in my other comment, the reason an LLM would behave like that is because during the time all its training data was written, end-2025 was a future date. So this is apparently something that needs to be trained out, which was not done in the case of Gemini. (when using AI studio). One way to reduce the behavior is to put “today is <date>” into the system prompt, but even then, it apparently spends an inordinate amount of tokens validating and pondering that date.

streawkceur 16 Dec 2025 14:55 UTC
3 points
0
on: Gemini 3 is Evaluation-Paranoid and Contaminated
TL;DR: Gemini 3 frequently thinks it is in an evaluation when it is not, assuming that all of its reality is fabricated. It can also reliably output the BIG-bench canary string, indicating that Google likely trained on a broad set of benchmark data.
To my understanding, you only observe this effect for prompts that indicate or imply the current late-2025 time/2025 year. Gemini completes such prompts with “that must be hypothetical writing”, because in the vast majority of its training data, 2025 was in the future (and end-2025 was always hypothetical). I think it is more accurate to phrase this as “Gemini 3 goes off the rails when it sees a prompt that indicates it was written in 2025, because in its training data, everything that implied a 2025 time was a fictional scenario” (that’s also true for 2.5). Or did you manage to elicit such such an effect with a prompt from which the current after-training-data-cutoff date can’t be inferred?

streawkceur 9 May 2020 14:19 UTC
1 point
0
on: How uniform is the neocortex?
Deep learning is a general method in the sense that most tasks are solved by utilizing a handful of basic tools from a standard toolkit, adapted for the specific task at hand. Once you’ve selected the basic tools, all that’s left is figuring out how to supply the training data, specifying the objective that lets the AI know how well it’s doing, throwing a lot of computation at the problem, and fiddling with details. My understanding is that there typically isn’t much conceptual ingenuity involved in solving the problems, that most of the work goes into fiddling with details, and that trying to be clever doesn’t lead to better results than using standard tricks with more computation and training data. It’s also worth noting that most of the tools in this standard toolkit have been around since the 90′s (e.g. convolutional neural networks, LSTMs, reinforcement learning, backpropagation), and that the recent boom in AI was driven by using these decades-old tools with unprecedented amounts of computation.
Well, the “details” are in fact hard to come up with, can be reused across problems, and do make the difference between working well and not working well! It’s a bit like saying that general relativity fills in some details in the claim that nature is described by differential equations, which was made much earlier.
In the AlexNet paper [1], ReLU units were referred to as nonstandard and referenced from a 2010 paper, and Dropout regularization was introduced as a recent invention from 2012. In fact, the efficiency of computer vision DL architectures has increased faster than that of the silicon since then (https://openai.com/blog/ai-and-efficiency/).
My understanding of the claim made by the “bitter lesson” article you link to is not that intellectual effort is worthless when it comes to AI, but that the effort should be directed at improving the efficiency with which the computer learns from training data, not implementing human understanding of the problem in the computer directly.
In a very general sense, e.g. attention mechanisms can be understood to be inspired by subjective experience though (even though here, as well, the effort was in developing things that work for computers, not in thinking really hard about how a human pays attention and formalizing that).
[1] https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

streawkceur 28 Apr 2020 8:39 UTC
1 point
0
in reply to: CellBioGuy’s comment on: Coronavirus: Justified Key Insights Thread
Let’s assume that half of the deaths of currently infected people have happened, due to the lockdown extending the doubling time from three days to more than a week.
How do you draw that conclusion?

streawkceur 22 Apr 2020 20:14 UTC
11 points
0
on: Coronavirus Justified Practical Advice Summary
According to this article, it seems clear by now that low oxygen is in fact dangerous even when you feel fine, so buying a pulse oximeter is useful.
https://www.nytimes.com/2020/04/20/opinion/coronavirus-testing-pneumonia.html
What links here?
- Rob Bensinger’s COVID-19 overview by habryka (28 Mar 2020 21:47 UTC; 40 points)
- Rob Bensinger's comment on Rob Bensinger’s COVID-19 overview by habryka (27 Apr 2020 18:51 UTC; 18 points)

streawkceur 11 Apr 2020 9:53 UTC
2 points
0
on: How to evaluate (50%) predictions
1. I think the “calibration curves” one sees e.g. in https://slatestarcodex.com/2020/04/08/2019-predictions-calibration-results/ are helpful/designed to evaluate/improve a strict subset of prediction errors: Systematic over- oder underconfidence. Clearly, there is more to being an impressive predictor than just being well-calibrated, but becoming better-calibrated is a relatively easy thing to do with those curves. One can also imagine someone who naturally generates 50 % predictions that are over-/underconfident.
2.0. Having access to “baseline probabilities/common-wisdom estimates” is mathematically equivalent to having a “baseline predictor/woman-on-the-street” whose probability estimates match those baseline probabilities. I think your discussion can be clarified and extended by not framing it as “judging the impressiveness of one person by comparing their estimates against a baseline”, but as “given track records of two or more persons/algorithms, compare their predictions’ accuracy and impressiveness, where one person might be the ‘baseline predictor’”.
2.1. If you do want to measure to compare two persons’ track record/generalized impressiveness on the same set of predictions (e.g. to decide whom to trust more), the natural choice is log loss as used to optimize ML algorithms. This means that one sums -ln(p) for all probability estimates p of true judgments; lower sums are better. 50 % predictions are of course a valid data point for the log loss if both persons made a prediction. In contrast, if reference predictions aren’t available, it doesn’t seem feasible to me to judge predictions of 50 % or any other probability estimate.
2.2. One can prove: For events with a truly random component, the expectation value of the log loss is minimized by giving the correct probability estimates. If there is a very competent predictor who is nevertheless systematically overconfident as in 1., on can strictly improve upon their log loss by appropriately rescaling their probability estimates.

streawkceur 10 Apr 2020 10:51 UTC
9 points
0
on: Why I’m Not Vegan
I think your argument can be strengthened by multiplying all the animal-year-values by 1000 - this would yield a value of veganism of 430 $/year, which is still less than what eating meat would be worth to a typical LW user, and yields values for the worth of animals that are probably higher than what most vegans would claim.

streawkceur 8 Apr 2020 14:59 UTC
1 point
0
on: April Coronavirus Open Thread
Why are surgical or self-made masks supposed to be better at protecting others than at protecting oneself? Naively, it seems to me that the percentage of filtered droplets/aerosol should be the same regardless of the direction in which it is breathed.

streawkceur 2 Apr 2020 17:25 UTC
18 points
0
in reply to: Scott Alexander’s comment on: April Coronavirus Open Thread
I’d like to point out that the growth in India is still exponential (linear on the log-scale) https://www.worldometers.info/coronavirus/country/india/. This could be or become true of other developing countries.

India and other developing countries probably have a harder time controlling the outbreak (and governments and the young, food-insecure populations may judge the economic cost of social distancing to be higher than the risk of the virus).

There was a time when the number of worldwide cases appeared to stagnate because of the Chinese lockdown, but this number just hid the exponential growth of the European+US outbreaks.

What I said doesn’t contradict any explicit statement in your comment, I just want to argue against the hypothetical deduction from “the growth rate of the world as a whole has also turned linear” to “and this means that the world is over the hill”.

streawkceur 30 Mar 2020 12:25 UTC
1 point
0
on: Coronavirus Open Thread
EDIT: The South Korean press releases contain a chart somewhat like the one I wanted, see e.g. https://www.cdc.go.kr/board/board.es?mid=a30402000000&bid=0030

I am looking for a better overview of imported cases by country of origin in East Asian countries.

EDIT: I remembered incorrectly, the following is wrong. In particular, I recall a statistic according to which a significant number of imported cases in South Korea in one day ~1-2 weeks ago came from China (~12, vs ~40 Europeans).

If this is true, this would seem to me like strong evidence that China is lying about having all domestic cases isolated, and community spread suppressed.

streawkceur 25 Mar 2020 13:42 UTC
3 points
0
on: Coronavirus Open Thread
Are there statistics/tables listing not only infections, but also not-infections with circumstances (X was quarantined because of contact with Y, but turned out to be negative)? This might help to assess the risk associated with certain situations (quickly buying something in a store, conversation in open air with distance, in a closed room etc.)

streawkceur 17 Mar 2020 21:56 UTC
3 points
0
in reply to: gwillen’s comment on: Coronavirus Open Thread
I actually think it is plausible that governments and/or Facebook do this, and it becomes widely enough adopted.

A community-level risk score would already be helpful (“based on estimates in your locality, the risk of contracting the virus when taking a bus is X now...) for individuals.

streawkceur 17 Mar 2020 11:42 UTC
2 points
0
on: What will be the big-picture implications of the coronavirus, assuming it eventually infects >10% of the world?
Decline of the custom of hand-shaking in western countries

streawkceur 16 Mar 2020 23:39 UTC
3 points
0
on: Coronavirus Open Thread
Are there attempts to build an app that tracks everyone by GPS, and notifies all possible contacts (and people having been in the same supermarket etc.) when someone develops a cough?

It seems to me that, with a majority of people using such an app, the R0 could easily be pushed below 1 without too many restrictions. I think this could even work when using the app is on a voluntary basis—I guess that people making wrong statements in such an app and getting someone sick amounts to negligent assault in many countries.

I don’t believe the claims that 60% of people getting infected—or even a month-long quarantine for everyone—is unavoidable with such a technology, and would like to hear differing opinions/see data suggesting the opposite.