@resing @jimgar I'm not convinced it's possible to train a usable LLM without including copyrighted material in they raw pretraining data
As such, personally think it's a necessary evil to avoid a monopoly on LLM technology belonging to organizations that are willing to train against crawler data