AI, as we know it, is fundamentally about clustering, and sometimes about finding additional cluster members in randomness. An LLM does basically this. It cannot model the reason something exists or what might cause loss to the thing's operator. All that it can do is find patterns, or signals, and identify them as members of a cluster, some of which will be labelled bad.
This is like pentesting often is: some reproducible reproduction of known techniques laid out in advance.