One of these is not like the others. Here’s how I think about it: many processes can be thought of as generating a large number of leads and then screening them to find the good ones. In classic AI this a generate-and-test algorithm. It’s vital that your testing works or you will get bad answers.
Using AI for the “generate” phase is not nearly as bad as using it for screening phase, provided that your tests are good. And we do know how to test our code, don’t we?