@ireneista @hipsterelectron @adrienne yyyyyup. PDF really truly is a standard meant to reproduce a visual design; PDF to text, even without OCR, uses wild techniques like “get the XY coordinates of every word on the page and extrapolate ‘sentences’ using hope and heuristics”