[top comment on HN thread] So just take pics of the pages and convert the pics back to a PDF [first sub-comment] A motivated publisher could embed codes by altering in subtle ways the differences in distances or color between adjacent characters, so that they would survive most color or grey scale conversions; a seemingly innocuous frame drawn around a photo could be either larger or smaller by say one millimeter, representing de facto a bit, therefore using enough pages they could identify a book among billions. Unfortunately there's no way to be 100% sure that a complex document doesn't contain some form of embedded code. [second sub-comment] Easier to just strip out the metadata
https://social-coop-media.ams3.cdn.digitaloceanspaces.com/media_attachments/files/107/688/129/541/792/457/original/8eb65aca095b33e0.jpg