GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Conversation

Notices

  1. Embed this notice
    alcinnz (alcinnz@floss.social)'s status on Sunday, 29-Dec-2024 03:54:09 JST alcinnz alcinnz

    Say you've compiled a regular expression to a "DFA" labelled graph. As described the other day. How'd you optimize it to match the same patterns in as little memory as possible?

    These algorithms have decent performance, but you'll want to make full use of the results to pay back the time spent optimizing!

    There's 3 main optimizations to perform:

    1. Remove unreachable nodes
    2. Remove nodes which won't reach an end-state.
    3. Merge duplicate nodes.

    1/4?

    In conversation about 6 months ago from floss.social permalink
    • Embed this notice
      Sergey Shandar (functionalscript@techhub.social)'s status on Sunday, 29-Dec-2024 04:09:52 JST Sergey Shandar Sergey Shandar
      in reply to

      @alcinnz deduplicate deeply equal nodes. If you use content-addressable language, don't need to perform this step.

      In conversation about 6 months ago permalink
    • Embed this notice
      alcinnz (alcinnz@floss.social)'s status on Sunday, 29-Dec-2024 06:23:56 JST alcinnz alcinnz
      in reply to
      • Sergey Shandar

      Removing unreachable nodes is essentially a garbage collection (GC) pass: traverse the graph to get the set of all reachable nodes, the remainder are unreachable & should be removed!

      Perform such a GC on the inverse-graph (all edges pointing the opposite way) to collect the "dead" nodes which can't reach an end-state!

      The tricky bit is merging duplicate nodes... (unless as @functionalscript notes, you defer to a content-addressed runtime) There's a couple algorithms for this!

      2/4?

      In conversation about 6 months ago permalink
    • Embed this notice
      alcinnz (alcinnz@floss.social)'s status on Sunday, 29-Dec-2024 06:35:30 JST alcinnz alcinnz
      in reply to

      Hoppcraft's Algorithm starts by assuming all nodes are equivalent. It operates by preceding to seek out counter-examples. Upon finding one, gathering all nodes which share the same counter-example, & thus splits the possibly-duplicate set.

      A separate set is tracked of the possibly-equivalent subsets left to be considered. Once it found all counter-examples in all the possibly-equivalent subsets... We're done!

      Takes O(n*log log n), very performant for the job!

      3/5?

      In conversation about 6 months ago permalink
    • Embed this notice
      alcinnz (alcinnz@floss.social)'s status on Sunday, 29-Dec-2024 06:46:16 JST alcinnz alcinnz
      in reply to

      Then there's Moore's algorithm which flags obviously non-equivalent nodes, before propagating those flags backwards. Any pairs of nodes not flagged afterwords are merged as equivalent. Has poor performance in some degenerate cases, & seemingly isn't well documented.

      Or there's Brzozowski's Algorithm, though I wouldn't recommend it on a commodity PC. At least not yet.
      I did use it as a core component in my hardware-browser hypothetical, to which it was well suited!

      4/4 on DFA Minimization!

      In conversation about 6 months ago permalink
    • Embed this notice
      alcinnz (alcinnz@floss.social)'s status on Sunday, 29-Dec-2024 06:53:56 JST alcinnz alcinnz
      in reply to

      Tangentially related to DFAs is the KMP String-Searching Algorithm. Which locates the first index (or all indices) of a "needle" text in a "haystack" text.

      The naive way is to iterate over each index & check if its an answer, which takes O(n*m) time! We can achieve the task in O(n+m) time via the KMP Algorithm!

      Iterate over the needle to find where its prefix reoccurs, storing results in a table. After the end of the prefix, consult the in-progress table for previous prefixes to check.

      1'/2'

      In conversation about 6 months ago permalink
    • Embed this notice
      alcinnz (alcinnz@floss.social)'s status on Sunday, 29-Dec-2024 06:59:41 JST alcinnz alcinnz
      in reply to

      Now when we're iterating over the haystack, thus table allows us to continue searching from the index where the needle mismatched, as opposed to going back to the index after where we started the comparison!
      Thus giving us that O(n+m) as O(n*m) performance!

      In practice though... We tend to use the naive algorithm! In practice it takes surprising scale for the win to be worth it!

      2'/2' Fin for today!

      In conversation about 6 months ago permalink
    • Embed this notice
      alcinnz (alcinnz@floss.social)'s status on Sunday, 29-Dec-2024 12:40:08 JST alcinnz alcinnz
      in reply to

      Incidentally Brzozowski's Algorithm is:

      1. Inverse the DFA graph, at which point it may now be non-determinstic.
      2. Convert the that inverse graph into a DFA.
      3. Uninvert it!

      I've got a hypothetical for evaluating NFAs (with function calls) I love toying with! Though realistically there'd be limits on the NFAs it can evaluate, so I'd have a handful of optimization passes (including Brzozowski's) to bring the NFA within those limits!

      5/4!

      In conversation about 6 months ago permalink

Feeds

  • Activity Streams
  • RSS 2.0
  • Atom
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.