EU AI Office presents proposal for general purpose model training data summary template. For scraped data, nothing more detailed than top 5-10% of domain names (not URLs). Given platforms (same domain) & tiny rights holders, not sure this helps. Slides via Contexte https://www.contexte.com/actualite/medias/la-commission-sapprete-a-reveler-la-structure-du-modele-de-resume-pour-lia-et-le-droit-dauteur_215024.html