Between the personal data migration project and the long-running Spidergram crawler work, I've been spending a lot of time fussing with HTML parsing/extraction methods, particularly trying to iron out user friendly ways to represent complex extraction and transformation operations.
For node, Cheerio is a great swiss army knife — a jquery-like API for DOM foolery. It has a very promising 'extract' method discussed in its docs, but frustratingly it hasn't shipped yet. https://cheerio.js.org/docs/advanced/extract