I wonder if any of that can be found in a git, subversion or CVS archive somewhere. Like how much of its training data is just regurgitating some open source projects it learned from?