Removing unreachable nodes is essentially a garbage collection (GC) pass: traverse the graph to get the set of all reachable nodes, the remainder are unreachable & should be removed!
Perform such a GC on the inverse-graph (all edges pointing the opposite way) to collect the "dead" nodes which can't reach an end-state!
The tricky bit is merging duplicate nodes... (unless as @functionalscript notes, you defer to a content-addressed runtime) There's a couple algorithms for this!
2/4?