I should mention I didn't have any success on prompt injection attacks - the guard rails were definitely better. My red team skills are limited
I did get it to critique the A' domain from a competing domain perspective. Again, these sounded very convincing as criticism, but were not semantically or technically accurate.
Where it clearly failed utterly was in the semantics, knowledge & reasoning side. All missing.
Which is *exactly* the point I was making: It doesn't do what you think it does