I think it would be likely that if you turn the filter up high enough to actually get useful results you're going to reject so many questions that people would actually get angry.
That's an interesting idea, I would have to actually see it work, since wrong answers can still be syntactically probable without the semantics that the reader sees in the text.
Plausible output is plausible because it is similar to the input text, to distinguish this case you have to be able to reason about what the words actually mean. Just pattern matching is not good enough because you can write two sentences that mean exactly the opposite thing, which are still strongly correlated with each other except for the sense of a conjunction or a negation. I have observed exactly this behavior.
@dalias@petergleick Because you wrote in https://hachyderm.io/@dalias/115041256970829150 that the creators could have made LLMs behave in a way that as far as I can tell is not actually possible for LLMs. So I'm assuming that that's what you meant and that is the basis of my responses. If that's not what you meant... if you meant they should have created some kind of other system that isn't a large language model... I apologize for being confused because that's not how it seemed to read to me.
@dalias@petergleick the language model is kind of like a statistical transformation of the training data. It can be tweaked by the prompt but it still cannot do anything that requires any kind of understanding.
The creators of these bullshit machines do not have sufficient understanding or control to do that. The machines do not know if the output they are producing is "well correlated" or not, because they don't reason, they can only produce continuations of the prompt that resemble the "training" data.
The ONLY thing this software does is hallucinate plausible responses. If it's accurate it's only by chance.
that's a classic failure mode. The training data has no examples of a novel meaningless statement followed by "no" or "I dont know", so "no" or "I don't know" are not responses in the database and don't get selected.
> Like, sure, if it's your hobby and you don't really care about its impact on the world, you do you?
I think the assumption should be that if they're not getting paid for it they're amateurs by every possible definition, so yeh, you shouldn't be hassling them about whether they're a solid part of the supply chain.
I think this should be classified with all the rat studies that haven't corrected for what rats are really capable of.
This guy actually did a study on what it took to keep rats from using external cues to run mazes and it involved freshly sterilized floors for each run to eliminate odor clues, building the maze on a sand table so vibrations didn't cue them in, and so on.
If you can't identify why you're getting different results on different versions of Debian that means your understanding of your code or the problem is incomplete and you should resolve that before publishing.