Maybe you include this in “stack overflow substitute”, but the main thing I use LLMs for is to understand well known technical things. The workflow is: 1) I am interested in understanding something, e.g. how a multiplexed barrel bit shifter works. 2) I ask the LLM to explain the concept. 3) Based on the initial response I create seperate conversation branches with questions I have (to save money and have the context be closer. Didn’t evaluate if this actually makes the LLM better.). 4) Once I think I understood the concept or part of the concept I explain it to GPT. (Really I do this all the time during the entire process.) 5) The LLM (hopeful) corrects me if I am wrong (it seems it detects mistakes more often than not).
The last part of the conversation can then looks like this:
User: Ah, now I think I understand. You have mechanism that shifts by a fixed ammount (powers of 2). Each of these multiplexers can be set up such that the data flows through them, or though an ID wire, into the next multiplexer. Now by controlling whether the value flows through the multiplexer or the ID wire, we can “sum up the shifts”. Specifically if we have shifters of 1, 2, 4, …, 32, then we can achive any shift (for 64-bit numbers) by activating the appropirate multiplexers. This design has not 64^64 gates, but instead 64^6 gates. 64 gates for each multiplexer.
AI: Exactly, you’ve got the concept! The design you described efficiently uses a series of multiplexers to perform shifts by fixed amounts that are powers of two (1, 2, 4, …, 32 for a 64-bit number). By selecting which multiplexers to activate, you can achieve any desired shift amount.
I had probably ~200,000 words worth of conversation with LLMs, mainly in this format.
I am not sure what next leap you are talking about. But I intuit based on some observations that GPT-4o is much better for this than GPT-3 (you might talk about more recent “leaps”). (Didn’t test o1 extensively because it’s so expensive).
Have you tried to make a mistake in your understanding on purpose to test out whether it would correct you or agree with you even when you’d get it wrong?
(and if yes, was it “a few times” or “statistically significant” kinda test, please?)
Why don’t you run the test yourself seems very easy?
Yes it does catch me when I am saying wrong things quite often. It also quite often says things that are not correct and I correct it, and if I am right it usually agrees immediately.
Interesting—the first part of the response seems to suggest that it looked like I was trying to understand more about LLMs… Sorry for confusion, I wanted to clarify an aspect of your worflow that was puzzling to me. I think I got all info for what I was asking about, thanks!
FWIW, if the question was an expression of actual interest and not a snarky suggestion, my experience with chatbots has been positive for brainstorming, dictionary “search”, rubber ducking, description of common sense (or even niche) topics, but disappointing for anything that requires application of commons sense. For programmming, one- or few-liner autocomplete is fine for me—then it’s me doing the judgement, half of the suggestions are completely useless, half are fine, and the third half look fine at first before I realise I needed the second most obvious thing this time.. but it can save time for the repeating part of almost-repeating stuff. For multi file editing,, I find it worse than useless when it feels like doing code review after a psychopath pretending to do programming (AFAICT all models can explain everything most stuff correctly and then write the wrong code anyway .. I don’t find it useful when it tries to appologize later if I point it out or to pre-doubt itself in CoT in 7 paragraphs and then do it wrong anyway) - I like to imagine as if it was trained on all code from GH PRs—both before and after the bug fix… or as if it was bored, so it’s trying to insert drama into a novel about my stupid programming task, when the second chapter will be about heroic AGI firefighting the shit written by previous dumb LLMs...
I don’t use it to write code, or really anything. Rather I find it useful to converse with it. My experience is also that half is wrong and that it makes many dumb mistakes. But doing the conversation is still extremely valuable, because GPT often makes me aware of existing ideas that I don’t know. Also like you say it can get many things right, and then later get them wrong. That getting right part is what’s useful to me. The part where I tell it to write all my code is just not a thing I do. Usually I just have it write snippets, and it seems pretty good at that.
Overall I am like “Look there are so many useful things that GPT tells me and helps me think about simply by having a conversation”. Then somebody else says “But look it get’s so many things wrong. Even quite basic things.” And I am like “Yes, but the useful things are still useful that overall it’s totally worth it.”
Maybe you include this in “stack overflow substitute”, but the main thing I use LLMs for is to understand well known technical things. The workflow is: 1) I am interested in understanding something, e.g. how a multiplexed barrel bit shifter works. 2) I ask the LLM to explain the concept. 3) Based on the initial response I create seperate conversation branches with questions I have (to save money and have the context be closer. Didn’t evaluate if this actually makes the LLM better.). 4) Once I think I understood the concept or part of the concept I explain it to GPT. (Really I do this all the time during the entire process.) 5) The LLM (hopeful) corrects me if I am wrong (it seems it detects mistakes more often than not).
The last part of the conversation can then looks like this:
I had probably ~200,000 words worth of conversation with LLMs, mainly in this format.
I am not sure what next leap you are talking about. But I intuit based on some observations that GPT-4o is much better for this than GPT-3 (you might talk about more recent “leaps”). (Didn’t test o1 extensively because it’s so expensive).
Have you tried to make a mistake in your understanding on purpose to test out whether it would correct you or agree with you even when you’d get it wrong?
(and if yes, was it “a few times” or “statistically significant” kinda test, please?)
Why don’t you run the test yourself seems very easy?
Yes it does catch me when I am saying wrong things quite often. It also quite often says things that are not correct and I correct it, and if I am right it usually agrees immediately.
Interesting—the first part of the response seems to suggest that it looked like I was trying to understand more about LLMs… Sorry for confusion, I wanted to clarify an aspect of your worflow that was puzzling to me. I think I got all info for what I was asking about, thanks!
FWIW, if the question was an expression of actual interest and not a snarky suggestion, my experience with chatbots has been positive for brainstorming, dictionary “search”, rubber ducking, description of common sense (or even niche) topics, but disappointing for anything that requires application of commons sense. For programmming, one- or few-liner autocomplete is fine for me—then it’s me doing the judgement, half of the suggestions are completely useless, half are fine, and the third half look fine at first before I realise I needed the second most obvious thing this time.. but it can save time for the repeating part of almost-repeating stuff. For multi file editing,, I find it worse than useless when it feels like doing code review after a psychopath pretending to do programming (AFAICT all models can explain
everythingmost stuff correctly and then write the wrong code anyway .. I don’t find it useful when it tries to appologize later if I point it out or to pre-doubt itself in CoT in 7 paragraphs and then do it wrong anyway) - I like to imagine as if it was trained on all code from GH PRs—both before and after the bug fix… or as if it was bored, so it’s trying to insert drama into a novel about my stupid programming task, when the second chapter will be about heroic AGI firefighting the shit written by previous dumb LLMs...I don’t use it to write code, or really anything. Rather I find it useful to converse with it. My experience is also that half is wrong and that it makes many dumb mistakes. But doing the conversation is still extremely valuable, because GPT often makes me aware of existing ideas that I don’t know. Also like you say it can get many things right, and then later get them wrong. That getting right part is what’s useful to me. The part where I tell it to write all my code is just not a thing I do. Usually I just have it write snippets, and it seems pretty good at that.
Overall I am like “Look there are so many useful things that GPT tells me and helps me think about simply by having a conversation”. Then somebody else says “But look it get’s so many things wrong. Even quite basic things.” And I am like “Yes, but the useful things are still useful that overall it’s totally worth it.”
Maybe for your use case try codex.