Tao Lin

Karma: 899

Tao Lin Nov 6, 2024, 5:24 PM
1 point
0
in reply to: Daniel Kokotajlo’s comment on: Daniel Kokotajlo’s Shortform
Why would the defenders allow the tunnels to exist? Demolishing tunnels isnt expensive, if attackers prefer to attack through tunnels there likely isn’t enough incentive for defenders to not demolish tunnels

Tao Lin Oct 30, 2024, 5:07 AM
11 points
1
on: The hostile telepaths problem
I’m often surprised how little people notice, adapt to, or even punish self deception. It’s not very hard to detect when someone’s deceiving them self, people should notice more and disincentivise that

Tao Lin Oct 16, 2024, 10:30 PM
1 point
0
on: Change My Mind: Thirders in “Sleeping Beauty” are Just Doing Epistemology Wrong
I prefer to just think about utility, rather than probabilities. Then you can have 2 different “incentivized sleeping beauty problems”
- Each time you are awakened, you bet on the coin toss, with $ payout. You get to spend this money on that day or save it for later or whatever
- At the end of the experiment, you are paid money equal to what you would have made betting on your average probability you said when awoken.
In the first case, ¹⁄₃ maximizes your money, in the second case ¹⁄₂ maximizes it.
To me this implies that in real world analogues to the Sleeping Beauty problem, you need to ask whether your reward is per-awakening or per-world, and answer accordingly

Tao Lin Oct 10, 2024, 11:24 PM
7 points
5
in reply to: sarahconstantin’s comment on: sarahconstantin’s Shortform
I disagree a lot! Many things have gotten better! Is sufferage, abolition, democracy, property rights etc not significant? All the random stuff eg better angels of our nature claims has gotten better.
Either things have improved in the past or they haven’t, and either people trying to “steer the future” in some sense have been influential on these improvements. I think things have improved, and I think there’s definitely not strong evidence that people trying to steer the future was always useless. Because trying to steer the future is very important and motivating, i try to do it.
Yes the counterfactual impact of you individually trying to steer the future may or may not be insignificant, but people trying to steer the future is better than no one doing that!

Tao Lin Sep 26, 2024, 5:04 PM
3 points
0
in reply to: Wei Dai’s comment on: Wei Dai’s Shortform
Do these options have a chance to default / are the sellers stable enough?

Tao Lin Sep 25, 2024, 11:03 PM
3 points
2
in reply to: Eli Tyre’s comment on: What are the best arguments for/against AIs being “slightly ‘nice’”?
A core part of Paul’s arguments is that having 1/million of your values towards humans only applies a minute amount of selection pressure against you. It could be that coordinating causes less kindness because without coordination it’s more likely some fraction of agents have small vestigial values that never got selected against or intentionally removed

Tao Lin Sep 19, 2024, 1:08 AM
19 points
13
on: The case for a negative alignment tax
to me “alignment tax” usually only refers to alignment methods that don’t cost-effectively increase capabilities, so if 90% of alignment methods did cost effectively increase capabilities but 10% did not, i would still say there was an “alignment tax”, just ignore the negatives.

Also, it’s important to consider cost-effective capabilities rather than raw capabilities—if a lab knows of a way to increase capabilities more cost-effectively than alignment, using that money for alignment is a positive alignment tax

Tao Lin Sep 18, 2024, 5:02 PM
1 point
0
in reply to: Davidmanheim’s comment on: Proveably Safe Self Driving Cars
there’s steganography, you’d need to limit total bits not accounted for by the gating system or something to remove them

Tao Lin Sep 18, 2024, 4:04 AM
1 point
0
in reply to: Eric Neyman’s comment on: Proveably Safe Self Driving Cars
yes, in some cases a much weaker (because it’s constrained to be provable) system can restrict the main ai, but in the case of llm jailbreaks there is no particular hope that such a guard system could work (eg jailbreaks where the llm answers in base64 require the guard to understand base64 and any other code the main ai could use)

Tao Lin Aug 26, 2024, 10:58 PM
LW: 1 AF: 1
0
AF
on: In Defense of Open-Minded UDT
interesting, this actually changed my mind, to the extent i had any beliefs about this already. I can see why you would want to update your prior, but the iterated mugging doesn’t seem like the right type of thing that should cause you to update. My intuition is to pay all the single coinflip muggings. For the digit of pi muggings, i want to consider how different this universe would be if the digit of pi was different. Even though both options are subjectively equally likely to me, one would be inconsistent with other observations or less likely or have something wrong with it, so i lean toward never paying it

Tao Lin Aug 26, 2024, 10:04 PM
3 points
0
on: The Pragmascope Idea
Train two nets, with different architectures (both capable of achieving zero training loss and good performance on the test set), on the same data.
...
Conceptually, this sort of experiment is intended to take all the stuff one network learned, and compare it to all the stuff the other network learned. It wouldn’t yield a full pragmascope, because it wouldn’t say anything about how to factor all the stuff a network learns into individual concepts, but it would give a very well-grounded starting point for translating stuff-in-one-net into stuff-in-another-net (to first/second-order approximation).
I don’t see why this experiment is good. This hessian similarity loss is only a product of the input/output behavior, and because both networks get 0 loss, their input/output behavior must be very similar, combined with general continuous optimization smoothness would lead to similar hessians. I think doing this in a case where the nets get nonzero loss (like ~all real world scenarios), would be more meaningful, because it would be similarity despite input-output behavior being non-identical and some amount of lossy compression happening.

Tao Lin Aug 26, 2024, 8:35 PM
6 points
1
in reply to: Raemon’s comment on: Would catching your AIs trying to escape convince AI developers to slow down or undeploy?
yeah, i agree the movie has to be very high quality to work. This is a long shot, although the best rationalist novels are actually high quality which gives me some hope that someone could write a great novel/movie outline that’s more targeted at plausible ASI scenarios

Tao Lin Aug 26, 2024, 7:41 PM
1 point
0
in reply to: Raemon’s comment on: Please stop using mediocre AI art in your posts
it’s sad that open source models like Flux have a lot of potential for customized workflows and finetuning but few people use them

Tao Lin Aug 26, 2024, 7:26 PM
4 points
1
in reply to: habryka’s comment on: Would catching your AIs trying to escape convince AI developers to slow down or undeploy?
yeah. One trajectory could be someone in-community-ish writes an extremely good novel about a very realistic ASI scenario with the intention to be adaptable into a movie, it becomes moderately popular, and it’s accessible and pointed enough to do most of the guidence for the movie. I don’t know exactly who could write this book, there are a few possibilities.

Tao Lin 26 Aug 2024 19:09 UTC
4 points
2
on: … Wait, our models of semantics should inform fluid mechanics?!?
Another way this might fail is if fluid dynamics is too complex/difficult for you to constructively argue that your semantics are useful in fluid dynamics. As an analogy, if you wanted to show that your semantics were useful for proving fermat’s last theorem, you would likely fail because you simply didn’t apply enough power to the problem, and I think you may fail that way in fluid dynamics.

Tao Lin 26 Aug 2024 18:53 UTC
14 points
3
on: Would catching your AIs trying to escape convince AI developers to slow down or undeploy?
Great post!
I’m most optimistic about “feel the ASI” interventions to improve this. I think once people understand the scale and gravity of ASI, they will behave much more sensibly here. The thing I intuitively feel most optimistic (whithout really analyzing it) is movies or generally very high quality mass appeal art.

Tao Lin 22 Aug 2024 17:48 UTC
2 points
0
in reply to: Dave Orr’s comment on: The economics of space tethers
you can recover lost momentum by decelerating things to land. OP mentions that briefly
And they need a regular supply of falling mass to counter the momentum lost from boosting rockets. These considerations mean that tethers have to constantly adapt to their conditions, frequently repositioning and doing maintenance.
If every launch returns and lands on earth, that would recover some but not all lost momentum, because of fuel spent on the trip. it’s probably more complicted than that though

Tao Lin 21 Aug 2024 19:31 UTC
3 points
0
in reply to: Zach Stein-Perlman’s comment on: Zach Stein-Perlman’s Shortform
two versions with the same posttraining, one with only 90% pretraining are indeed very similar, no need to evaluate both. It’s likely more like one model with 80% pretraining and 70% posttraining of the final model, and the last 30% of posttraining might be significant

Tao Lin 20 Aug 2024 22:55 UTC
3 points
0
in reply to: Zach Stein-Perlman’s comment on: Zach Stein-Perlman’s Shortform
if you tested a recent version of the model and your tests have a large enough safety buffer, it’s OK to not test the final model at all.
I agree in theory but testing the final model feels worthwhile, because we want more direct observability and less complex reasoning in safety cases.

Tao Lin 15 Aug 2024 15:45 UTC
2 points
0
on: Recommendation: reports on the search for missing hiker Bill Ewasko
With modern drones, searching in places with as few trees as Joshua tree could be done far more effectively. I don’t know if any parks have trained teams with ~$50k with of drones ready but if they did they could have found him quickly