Naive use of limit orders will cause you to lose the profitable trades, and fill the unprofitable ones. There are ways around this, but it’s not trivial.
SatvikBeri
I work in quant trading, but not specifically in order execution. These are all real concerns. Which ones are most important depends heavily on your strategy and market, e.g. if your positions last for days order execution is a lot easier than if they last for minutes. And time-based slippage might be big or small relative to tick size.
the HFTs can only fuck you over if they know exactly when you’re going to trade
This isn’t quite true, sophisticated funds can exploit almost any predictability.
I haven’t noticed many people mention the tmux trick with LLMs: it’s easy to programmatically write to another tmux session. So you can spawn a long-running process like a REPL or debugger in a tmux session, use it as a feedback loop for Claude Code, and easily inspect every command in detail, examine the state of the program, etc. if you want to. You can use this with other bash processes too, anything where you’d like to inspect what the LLM has done in detail.
Using this with a REPL has made a noticeable difference in my productivity.
I think this is incorrect, but that agent swarms etc. are mostly not helpful, and that the large productivity boosts are specific to domains or situations.
Two from my side: Claude Code got much better once I got it successfully working with a REPL (which made the feedback loop much faster, let me inspect the outputs etc.) and once I wrote up a fair bit of documentation on how to use our custom framework.
Edit: I forgot that not everyone works in software. I am much less confident that this applies in other domains today.
Sort of. But you can also ask LLMs to put in a lot of effort understanding the codebase, run tests to disprove its hypotheses, and write documentation or refactor.
One interesting anecdote is how LLMs found a bunch of vulnerabilities in OpenSSL, and OpenSSL is notable for having a particularly bad/hard to work with codebase, suggesting that humans couldn’t find them because it was too hard to read. While higher code quality alternatives had none of the same vulnerabilities.
If you believe that the UHC CEO knowingly pushed a model that had a 90% error rate, being programmed to almost always just (illegally, incorrectly) deny health care coverage to people who were less likely to sue, then “innocent” is a big overstatement. That’s pretty close to murdering people for money.
Similarly, I don’t think you could claim that the executives who knowingly launched the Ford Pinto were innocent.
The UHC nhPredict lawsuit has not resolved yet, and I haven’t done enough research to be confident about it one way or another. But my point is that the crux is more “are current billionaires actively getting people killed for money?”, not “is it ok to kill innocent people because they’re rich?”
Moltbook for misalignment research?
I don’t think so – my CLAUDE.md is fairly short (23 lines of text) and consists mostly of code style comments. I also have one skill for set up for using Julia via a REPL. But I don’t think either of these would result in more disagreement/correction.
I’ve used Claude Code in mostly the same way since 4.0, usually either iteratively making detailed plans and then asking it to check off todos one at a time, or saying “here’s a big, here’s how to reproduce it, figure out what’s going on.”
I also tend to write/speak with a lot of hedging, so that might make Claude more likely to assume my instructions are wrong.
I move data around and crunch numbers at a quant hedge fund. There are some aspects that make our work somewhat resistant to LLMs normally: we use a niche language (Julia) and a custom framework. Typically, when writing framework related code, I’ve given Claude Code very specific instructions and it’s followed them to the letter, even when those happened to be wrong.
In 4.6, Claude seems to finally “get” the framework, searching the codebase to understand its internals (as opposed to just understanding similar examples) and has given me corrections or pushback – e.g. it warned me (correctly) about cases where I had an unacceptably high chance of hash collisions, and said something like “no, the bug isn’t X, it’s Y” (again correctly) when I was debugging.
Opus 4.6 has been notably better at correcting me. Today, there were two instances where it pushed back on my plan, and proposed better alternatives both times. That was very rare with Opus 4.5 (maybe once a week?) and nonexistent before 4.5
Relatedly, get good at the things that you’re hiring for. It’s possible to tell if somebody is about twice as good as you are at something. It’s very hard to tell the difference between twice as skilled and ten times as skilled. So if you need to hire people who are very good at something you need to get at least decently good at it yourself.
This also has a strange corollary. It often makes sense to hire people for the things that you’re good at and to keep doing the things that you’re mediocre at.
After my daughter got covid (at 4 months old), she was only sleeping for about an hour at a time, which was really rough on us and her – we were all constantly exhausted. It took just two days of cry it out to get her back to sleeping much better, and then she was noticeably happier and more energetic (and so were we.)
Donated $2.5k. Thanks for everything!
I tried to search for surveys of mathematicians on the axiom of choice, but couldn’t find any. I did find one survey of philosophers, but that’s a very different population, asked whether they believed AC/The Continuum Hypothesis has an answer rather than what the answer is: https://thephilosophyforum.com/discussion/13670/the-2020-philpapers-survey
My subjective impression is that my Mathematician friends would mostly say that asking whether AC is true or not is not really an interesting question, while asking what statements depend on it is.
Use random spot-checks
This is really, really hard to internalize. The default is to pay uniformly less attention to everything, e.g. switch to skimming every PR rather than randomly reviewing a few in detail. But that default means you lose a valuable feedback loop, while spot checking even 10% sustains it.
I’ve seen this scale to 100 person companies, and I think it scales much further.
I would describe the problem as follows: to reliably hire people who are 90th percentile at a skill, you (or someone you trust) needs to be at least 75th percentile at that skill. To hire 99th percentile, you need to be at least 90th percentile. To avoid hiring total charlatans, you need to be 25th percentile. And so on.
25th percentile is a relatively small investment! 75th is more, but still often 1-2 weeks depending on the domain. It’s almost certainly worth it to make sure your team has at least one person you trust who’s 75th percentile at any task you repeatedly hire for, even if you’re hiring contractors. And if it’s a key competency for the company, it can be worth the work to get to 90th percentile so you can hire really outstanding people in that domain.
I’d say the vast majority of cases I know of, within the US, involve firing too late rather than too early. The reason is straightforward: it sucks to fire people, knowing that you’ll potentially have a severe negative impact on their life, so most managers will put it off as long as possible.
Broadly agree. One thing I’ll add is that you should structure a “piece” around your points of highest uncertainty, and a common mistake I see is for companies to iterate on the wrong thing. Real examples from my career:
If you’re trying to make a large, offline data pipeline faster, then focusing on deploying to users is much less useful and typically slower than testing different performance improvements internally (and eventually pushing the best design).
Conversely, if you’re uncertain whether people will want a product in the first place, try selling preorders first
If your major uncertainty is whether certain libraries will do what you want, start by building out those components
If you’re trying to quickly get better at public speaking, practice a lot on your own! There are a lot of obvious errors and habits you can iron out. External feedback is more valuable, but also a lot more expensive – you can practice 10 5-minute talks in an hour by yourself, while you’ll probably only be able to do one in a Toastmasters group
If you’re a hedge fund, it’s worth quickly backtesting a lot of algorithms written in a non-production way (e.g. they run slowly and use a lot of memory) rather than insisting that everything be written to production standards before ever testing it
Even in real assembly lines at physical factories, I get the impression that generalization has often worked well because it gives you the ability to change your process. Toyota is/was considered best-in-class, and a major innovation of theirs was having workers rotate across different areas, becoming more generalized and more able to suggest improvements to the overall system, with some groups rotating every 2 hours[1].
Tesla famously reduced automation around 2018 even when the marginal costs were lower than human operators, again because the lost flexibility wasn’t worth it.[2] Though it’s worth noting they started investing more in robots again in recent years, presumably when their process was more solidified[3].
[1]: https://michelbaudin.com/2024/02/07/toyotas-job-rotation-policy/
[3]: https://newo.ai/tesla-optimus-robots-revolutionize-manufacturing/
I don’t think the CLI will be the dominant factor a year from now. Not sure how to operationalize that prediction, but my basic belief is that LLMs will be good enough to no longer need the thing that’s convenient for them, and be more capable of using the things that already exist for the rest of the world.