@hipsterelectron @ireneista @adrienne the PDF.js project is actually an interesting one to experiment with; among other things it handles a lot of the document-level stuff, and lets you hook your own logic in to manage the page by page conversion of a PDF to text: “here’s a pile of text and graphic objects with metadata about each one, feel free to iterate them and give us back a string when you’re done!” Etc