Conversation

Notices

Embed this notice
alcinnz (alcinnz@floss.social)'s status on Sunday, 29-Dec-2024 03:54:09 JST alcinnz

Say you've compiled a regular expression to a "DFA" labelled graph. As described the other day. How'd you optimize it to match the same patterns in as little memory as possible?
These algorithms have decent performance, but you'll want to make full use of the results to pay back the time spent optimizing!
There's 3 main optimizations to perform:
1. Remove unreachable nodes
2. Remove nodes which won't reach an end-state.
3. Merge duplicate nodes.
1/4?

In conversation about 7 months ago from floss.social permalink
- Embed this notice
  Sergey Shandar (functionalscript@techhub.social)'s status on Sunday, 29-Dec-2024 04:09:52 JST Sergey Shandar
  in reply to
  
  @alcinnz deduplicate deeply equal nodes. If you use content-addressable language, don't need to perform this step.
  
  In conversation about 7 months ago permalink
- Embed this notice
  alcinnz (alcinnz@floss.social)'s status on Sunday, 29-Dec-2024 06:23:56 JST alcinnz
  in reply to
  - Sergey Shandar
  Removing unreachable nodes is essentially a garbage collection (GC) pass: traverse the graph to get the set of all reachable nodes, the remainder are unreachable & should be removed!
  Perform such a GC on the inverse-graph (all edges pointing the opposite way) to collect the "dead" nodes which can't reach an end-state!
  The tricky bit is merging duplicate nodes... (unless as @functionalscript notes, you defer to a content-addressed runtime) There's a couple algorithms for this!
  2/4?
  
  In conversation about 7 months ago permalink
- Embed this notice
  alcinnz (alcinnz@floss.social)'s status on Sunday, 29-Dec-2024 06:35:30 JST alcinnz
  in reply to
  
  Hoppcraft's Algorithm starts by assuming all nodes are equivalent. It operates by preceding to seek out counter-examples. Upon finding one, gathering all nodes which share the same counter-example, & thus splits the possibly-duplicate set.
  A separate set is tracked of the possibly-equivalent subsets left to be considered. Once it found all counter-examples in all the possibly-equivalent subsets... We're done!
  Takes O(n*log log n), very performant for the job!
  3/5?
  
  In conversation about 7 months ago permalink
- Embed this notice
  alcinnz (alcinnz@floss.social)'s status on Sunday, 29-Dec-2024 06:46:16 JST alcinnz
  in reply to
  
  Then there's Moore's algorithm which flags obviously non-equivalent nodes, before propagating those flags backwards. Any pairs of nodes not flagged afterwords are merged as equivalent. Has poor performance in some degenerate cases, & seemingly isn't well documented.
  Or there's Brzozowski's Algorithm, though I wouldn't recommend it on a commodity PC. At least not yet.
  I did use it as a core component in my hardware-browser hypothetical, to which it was well suited!
  4/4 on DFA Minimization!
  
  In conversation about 7 months ago permalink
- Embed this notice
  alcinnz (alcinnz@floss.social)'s status on Sunday, 29-Dec-2024 06:53:56 JST alcinnz
  in reply to
  
  Tangentially related to DFAs is the KMP String-Searching Algorithm. Which locates the first index (or all indices) of a "needle" text in a "haystack" text.
  The naive way is to iterate over each index & check if its an answer, which takes O(n*m) time! We can achieve the task in O(n+m) time via the KMP Algorithm!
  Iterate over the needle to find where its prefix reoccurs, storing results in a table. After the end of the prefix, consult the in-progress table for previous prefixes to check.
  1'/2'
  
  In conversation about 7 months ago permalink
- Embed this notice
  alcinnz (alcinnz@floss.social)'s status on Sunday, 29-Dec-2024 06:59:41 JST alcinnz
  in reply to
  
  Now when we're iterating over the haystack, thus table allows us to continue searching from the index where the needle mismatched, as opposed to going back to the index after where we started the comparison!
  Thus giving us that O(n+m) as O(n*m) performance!
  In practice though... We tend to use the naive algorithm! In practice it takes surprising scale for the win to be worth it!
  2'/2' Fin for today!
  
  In conversation about 7 months ago permalink
- Embed this notice
  alcinnz (alcinnz@floss.social)'s status on Sunday, 29-Dec-2024 12:40:08 JST alcinnz
  in reply to
  
  Incidentally Brzozowski's Algorithm is:
  1. Inverse the DFA graph, at which point it may now be non-determinstic.
  2. Convert the that inverse graph into a DFA.
  3. Uninvert it!
  I've got a hypothetical for evaluating NFAs (with function calls) I love toying with! Though realistically there'd be limits on the NFAs it can evaluate, so I'd have a handful of optimization passes (including Brzozowski's) to bring the NFA within those limits!
  5/4!
  
  In conversation about 7 months ago permalink

Public

Conversation

Notices

Feeds