@SwiftOnSecurity have you ever looked at codesearch (csearch / cindex)? same kind of idea, except fancier.
it builds an index of token sequences as a database during the indexing operation, then has a regex implementation built on top of that token database so you can instantly grep a huge codebase.
I use it all the time in source code assessments because a ~1hr indexing operation saves me so much more time over a few weeks of work.