@resing @jimgar I'm saying I'm not sure it's possible to build a useful LLM without including copyrighted data in the training set
The ethics of this entire field are incredibly murky - I wrote about that last year https://simonwillison.net/2022/Aug/29/stable-diffusion/#ai-vegan