@RedRobyn @josh @aral overreliance on these tools is also something that we test in this type of evals, and yes, we see that a lot in clinical ai