Have you tried to make a mistake in your understanding on purpose to test out whether it would correct you or agree with you even when you’d get it wrong?
(and if yes, was it “a few times” or “statistically significant” kinda test, please?)
Why don’t you run the test yourself seems very easy?
Yes it does catch me when I am saying wrong things quite often. It also quite often says things that are not correct and I correct it, and if I am right it usually agrees immediately.
Interesting—the first part of the response seems to suggest that it looked like I was trying to understand more about LLMs… Sorry for confusion, I wanted to clarify an aspect of your worflow that was puzzling to me. I think I got all info for what I was asking about, thanks!
FWIW, if the question was an expression of actual interest and not a snarky suggestion, my experience with chatbots has been positive for brainstorming, dictionary “search”, rubber ducking, description of common sense (or even niche) topics, but disappointing for anything that requires application of commons sense. For programmming, one- or few-liner autocomplete is fine for me—then it’s me doing the judgement, half of the suggestions are completely useless, half are fine, and the third half look fine at first before I realise I needed the second most obvious thing this time.. but it can save time for the repeating part of almost-repeating stuff. For multi file editing,, I find it worse than useless when it feels like doing code review after a psychopath pretending to do programming (AFAICT all models can explain everything most stuff correctly and then write the wrong code anyway .. I don’t find it useful when it tries to appologize later if I point it out or to pre-doubt itself in CoT in 7 paragraphs and then do it wrong anyway) - I like to imagine as if it was trained on all code from GH PRs—both before and after the bug fix… or as if it was bored, so it’s trying to insert drama into a novel about my stupid programming task, when the second chapter will be about heroic AGI firefighting the shit written by previous dumb LLMs...
I don’t use it to write code, or really anything. Rather I find it useful to converse with it. My experience is also that half is wrong and that it makes many dumb mistakes. But doing the conversation is still extremely valuable, because GPT often makes me aware of existing ideas that I don’t know. Also like you say it can get many things right, and then later get them wrong. That getting right part is what’s useful to me. The part where I tell it to write all my code is just not a thing I do. Usually I just have it write snippets, and it seems pretty good at that.
Overall I am like “Look there are so many useful things that GPT tells me and helps me think about simply by having a conversation”. Then somebody else says “But look it get’s so many things wrong. Even quite basic things.” And I am like “Yes, but the useful things are still useful that overall it’s totally worth it.”
Have you tried to make a mistake in your understanding on purpose to test out whether it would correct you or agree with you even when you’d get it wrong?
(and if yes, was it “a few times” or “statistically significant” kinda test, please?)
Why don’t you run the test yourself seems very easy?
Yes it does catch me when I am saying wrong things quite often. It also quite often says things that are not correct and I correct it, and if I am right it usually agrees immediately.
Interesting—the first part of the response seems to suggest that it looked like I was trying to understand more about LLMs… Sorry for confusion, I wanted to clarify an aspect of your worflow that was puzzling to me. I think I got all info for what I was asking about, thanks!
FWIW, if the question was an expression of actual interest and not a snarky suggestion, my experience with chatbots has been positive for brainstorming, dictionary “search”, rubber ducking, description of common sense (or even niche) topics, but disappointing for anything that requires application of commons sense. For programmming, one- or few-liner autocomplete is fine for me—then it’s me doing the judgement, half of the suggestions are completely useless, half are fine, and the third half look fine at first before I realise I needed the second most obvious thing this time.. but it can save time for the repeating part of almost-repeating stuff. For multi file editing,, I find it worse than useless when it feels like doing code review after a psychopath pretending to do programming (AFAICT all models can explain
everythingmost stuff correctly and then write the wrong code anyway .. I don’t find it useful when it tries to appologize later if I point it out or to pre-doubt itself in CoT in 7 paragraphs and then do it wrong anyway) - I like to imagine as if it was trained on all code from GH PRs—both before and after the bug fix… or as if it was bored, so it’s trying to insert drama into a novel about my stupid programming task, when the second chapter will be about heroic AGI firefighting the shit written by previous dumb LLMs...I don’t use it to write code, or really anything. Rather I find it useful to converse with it. My experience is also that half is wrong and that it makes many dumb mistakes. But doing the conversation is still extremely valuable, because GPT often makes me aware of existing ideas that I don’t know. Also like you say it can get many things right, and then later get them wrong. That getting right part is what’s useful to me. The part where I tell it to write all my code is just not a thing I do. Usually I just have it write snippets, and it seems pretty good at that.
Overall I am like “Look there are so many useful things that GPT tells me and helps me think about simply by having a conversation”. Then somebody else says “But look it get’s so many things wrong. Even quite basic things.” And I am like “Yes, but the useful things are still useful that overall it’s totally worth it.”
Maybe for your use case try codex.