Year 4 Computer Science student
find me anywhere in linktr.ee/papetoast
Year 4 Computer Science student
find me anywhere in linktr.ee/papetoast
Can someone explain why they disagree? I don’t see a particularly obvious reason.
Reading through all the responses the one thing that sticks out is Gemini-2.5 really, really wants to write the first character in caps.
what it produces in these unmonitored runs takes more work for me to clean up than just iterating with Claude directly
It may be the type of work that we are doing differs then.
Also, you don’t seem too bothered that running claude code implies a responsibility to review the code soon-ish (or have your local codebase go increasingly messy). The fact that I don’t need to worry about state with PR agents mean it is more affordable to spin more attempts, and because more attempts can be ran simultaneously, each individual attempt can be of lower quality, as long as the best attempt is good. Deciding that the code is garbage and not worth any time cleaning up is much faster than cleaning up, so in general I don’t find the initial read-through of the n attempts to take that much time. At the end I still only spin up codex on desktop if I think the task has reasonable chance to be done well, which really depends on the specific task size/difficulty/type (bug fix, refactor, adds). It’s also likely that claude code work better for you because you’re more experienced and can basically tell claude exactly what to do when it’s stuck.
I like to use the PR agents in some cases. (But I still manually checkout on those branches and rebase, split the commits or rewrite some stuff)
spin off tasks when I’m on mobile
it is easier to do multiple parallel attempts on the same task when I know the output probably suck. And not gonna lie OpenAI’s codex cloud has very lenient compute limit so I also feel like I’m saving money this way.
they live in (other people’s) containers so I don’t need to worry about multiple agents colliding with each other. I know git worktrees exist but juggling the which worktree is on which branch turns out to be somewhat annoying too.
They are good for queueing up tasks that I don’t expect to have to bandwidth to start working on today. I can make the agents do the PR today and forget about them until a few days later.
LW uses graphql. You can follow the guide below for querying if you’re unfamiliar with it.
https://www.lesswrong.com/posts/LJiGhpq8w4Badr5KJ/graphql-tutorial-for-lesswrong-and-effective-altruism-forum (For step 3 it seems like you now want to hover over output_type instead of input)
For step 3 it seems like you now want to hover over output_type instead of input
How I use AI for coding.
I wrote this in like 10 minutes for quick sharing.
I am not a full time coder, I am a student who code like 15-20 hours a week.
Investing too much time on writing good prompts make little sense. I go with the defaults and add pieces of nudges as needed. (See one of my AGENTS .md at the end)
Mainly codex (cloud) and Cursor. Claude Code works, but being able to easily revert is helpful, so Cursor is better.
I still try out claude code for small pieces of edits, but it doesnt feel worth it.
I have no idea why people like claude code so much? CLI is inferior to GUI
Using cursor means I don’t need to have multiple git worktrees for each agent, as long as I get them to work on different parts of the codebase
Mobile coding is real and very convenient with codex (cloud), but I still review and edit on desktop.
Using multiple agents is possible, but usually one big feature and multiple smaller background edits.
Or multiple big features using codex cloud, and delay review to a later time.
Codex cloud is good but only generate one commit for PR, often I need to manually split them up. I am eyeing on other cloud agents solution but havent tried them seriously yet.
Current prompt for one of the python projects
## Code Style
- 120-character lines
- Type hints is a must
- **Don't use Python 3.8 typings**: Never import `List`, `Tuple` or other deprecated classes from `typing`, use `list`, `tuple` etc. instead, or import from `collections.abc`
- Do not use `from __future__ import annotations`, use forward references in type hints instead. `TYPE_CHECKING` should be used only for imports that would cause circular dependencies.
## Documentation and Comments
Add code comments sparingly. Focus on why something is done, especially for complex logic, rather than what is done. Only add high-value comments if necessary for clarity or if requested by the user. Do not edit comments that are separate from the code you are changing. NEVER talk to the user or describe your changes through comments.
### Using a new environmental variable
When using a new environmental variable, add it to `.env.example` with a placeholder value, and optionally a comment describing its purpose. Also add it to the `Environment Variables` section in `README.md`.
### Using deal
We only use the exception handling features of deal. Use `@deal.raises` to document expected exceptions for functions/methods. Do not use preconditions/postconditions/invariants.
Additionally, we assume `AssertionError` is never raised, so `@deal.raises(AssertionError)` is not allowed.
## Testing Guidelines
To be expanded.
Mocking is heavily discouraged. Use test databases, test files, and other real resources instead of mocks wherever possible.
Allowed pytest markers:
- `@pytest.mark.integration`
- `@pytest.mark.slow`
- `@pytest.mark.docker`
- builtin ones like `skip`, `xfail`, `parametrize`, etc.
We do not use
- `@pytest.mark.unit`: all tests are unit tests by default
- `@pytest.mark.asyncio`: we use `pytest-asyncio` which automatically handles async tests
- `@pytest.mark.anyio`: we do not use `anyio`
### Running Tests
Use `uv run pytest …` instead of simply `pytest …` so that the virtual environment is activated for you.
## Asking for Help
- Refactoring:
As a command-line only tool, you do not have access to helpful IDE features like “Refactor > Rename Symbol”. Instead, you can ask the user to rename variables, functions, classes, or other symbols by providing the current name and the new name. It is important that you don’t rename public variables yourself, as you might miss some occurrences of the symbol across the codebase.
## Information
Finding dependencies: we use `pyproject.toml`, not `requirements.txt`. Use `uv add <package>` to add new dependencies.
(Note that the Asking for Help is basically useless. It was experimental and I never got asked lol)
I don’t doubt the conclusion, but I think we would be buying (life expectancy—age) life years instead of 1 life.
Are you guys talking about tin foil for small lights that some appliances emit? For windows I don’t understand why not just use a curtain.
It is a bit unintuitive for me that hallucination are made-up inputs, but it does make sense.
Hard to tell simply with what you said, mind sharing the conversation?
I apologize. Should have searched before talking.
Side note: This seems like a completely different topic from your top level comment. Kind of weird to start a mostly tangential argument inside an unresolved argument thread.
You’d still be better off creating the 1M x 100 world than the (1M + 1) x (100 - ε) world.
Where does (1M + 1) come from?
In the post Ben mentions that manufacturer doing hundreds of experiments, not millions. Of course, in the limiting case the smallest quality drop can and will be observed, but I believe Ben is not talking about that.
Even we use the 1M base figure, it doesn’t explain why it is +1 rather than e.g. +1000
You are assuming that the icecream manufacturer is trying to maximise aggregate utility, which seems obviously false to me.
Alternatively, what about matching people by browser history? If there is a way to avoid data security and privacy concerns (ha!) then there are actually a lot of advantages.
I have recently learned that Fully Homomorphic Encryption (doing calculations on encrypted data) 1. exists and 2. is usable in a small scale.
https://bozmen.io/fhe
https://bozmen.io/fhe-current-apps (FHE Real-world Applications)
Current FHE has 1,000x to 10,000x computational overhead compared to plaintext operations. On the storage side, ciphertexts can be 40 to 1,000 times larger than the original. It’s like the internet in 1990—technically awesome, but limited in practice.
What do you mean by “retraction”? Do you just mean an opposite statement “sharks are older than trees” --> “sharks are not older than trees”, or do you mean something more specific?
Assuming just a general contrasting statement, my gut feeling is that 1. this heuristic is true for certain categories of statements, but generates wrong intuition for other categories 2. this heuristic works, but rarely because of memetic reasons, instead it is just signal to noise ratio of the subjects.
Currently I am thinking about counterexamples from statements that roughly equates to a recommendation “Twitch is the best streaming platform” (I know it isn’t very fitting as a memetic statement), which heuristically sounds plausibly true to me because 1. I know there is a very small number of streaming platforms 2. people who talk about this is likely to know what they are talking about
the commenter’s job to pay via microtransactions, and maybe the author can tip back if they like it via a Flattr-ish.
Yes, I feel like it is worse than author or forum paying though, because of incentives. There are other possible ways like the commenter paying for failed comments and author/forum paying for those that passed.
Monthly subscription is also possible yeah. I had this in mind and swept it under “the forum paying for the model and getting the cost back from elsewhere”.
privacy concerns
you misunderstood, I meant that some people probably don’t want their account to be traceable to their real identity, any monetary transaction is problematic unless crypto
rate limits seems like a must
but maybe it can be quite lax if a cheap model is used
So you now need to pay before you post?
privacy concerns
and comments are disabled when you’re out of funds? natural consequence but lol.
Long comments that don’t pass in the first try likely motivates comment author to a. jailbreak / b. let another LLM rewrite it rather than drafting a new one
I would like it more if it is the forum paying for the model, and getting the cost back from elsewhere.
You can also pay $10/mo to Kagi and get different filter presets (“lenses”). Is it worth the price for you? idk.
I think the Quick Takes feed needs the option to sort by newest. It makes no sense that I get fed the some posts 3-7 times in an unpredictable order if I read the feed once per day.
You may have misunderstood. I am asking about why your reply got −5 agreement vote despite it seeming correct to me, nothing related to the other comments.