@hyde I would probably run pdftops or pdttohtml over it, see if and how they mark up highlighted parts, then extract those with the usual plaintext filters.