While I am 100% convinced that Dall-E and Stable Diffusion are 100% commiting copyright infringement, I’m not so sure about ChatGPT.
I’ve been exploring several different products and services to help with my writing which has exposed me to much more of the intricacies of natural language models and from everything I’ve seen ChatGPT doesn’t work anywhere near the same as StableDiffusion or Dall-E.
They’re basically a prediction engine. They break down words and into “tokens” which are usually 4 characters long (but not always) and they predict the pattern of tokens that would follow.
Since every word is itself comprised of individual “tokens” (as in “letters”), and every sentence is comprised of individual “tokens” (as in “words”), the only thing ChatGPT is really doing is predicting the expected next sequence of “tokens” (words and sentences) to respond to a given a query.
And that’s where the training comes in. Loading those probability matrices with lots and lots of text.
So that if someone were to ask ChatGPT to finish the following sentence, “The quick brown fox…” it will respond with “…jumped over the lazy dog.” Because it’s seen lots and lots of references to that same phrase.
But as far as ChatGPT is concerned, it could just as easily be “Lorem Ipsum Dolar Sit Amet” too. It doesn’t really care. ?