@Iris I'm psych adjacent. Regarding a project to replace an existing system, which has research projects attached.
Heavy sanitisation of input and output, multiple varied stakeholders giving feedback (and this feedback is valued and acted on), an open corpus of prompt data we control, clear scope limits, professional hired to 'break' things.
Variation/opacity matters less when the LLM isn't the centre of the process.
Context and specifics unfortunately limited by NDA (and the fact I'm blinded to the algorithm's details).
All the other projects I'm involved with have made a deliberate decision to run screaming from any products not hosted locally.