Anyways please fuzz your codebase. If you're deploying something which is intensely focused on things created by users and user tools and exchanged between them frequently, please make sure EVERY part of it is fuzzed, that you have code coverage, and that you can verify every if statement and conditional is being exercised.
Then when you're done with that take your entire dataset and create something that will take random chunks from each one and splice them all together in one file, then throw that input at your test suite.