@rygorous@barrelshifter I think SimplifyDemandedBits was an area where GCC didn’t do much optimization at the time, so leaning heavily into it made LLVM look more compelling in microbenchmarks.
@resistor@barrelshifter Either way, I like the way the funnel shift solution worked out. (And will always be grateful for Sanjay doing the legwork!)
They're a good normal form, funnel shift<->rotate (where applicable) is canonical and trivial in both directions, they can be formed early (which avoids destroying the pattern), they're reasonably common in target ISAs on their own right, and are still pretty straightforward to lower where they're not available
@rygorous@barrelshifter I think this and the ADD -> OR thing are downstream of a bit of an obsession Chris had for SimplifyDemandedBits transformations in the early days.
@resistor@barrelshifter The ADD->OR thing is another frequent monkey wrench especially wrt matching things that could be addressing modes, yeah.
It's not a worthless transformation for larger-than-native-register integers (where not having cross-limb carries is a solid win) but for <=pointer size, I'd argue it hurts more than it helps.
@barrelshifter this is now years ago, but I vividly recall how much resistance there was to adding rotates to LLVM IR too because "can't you just build them out of simpler shifts?"
(in short: yes, but not reliably for variable shifts; the patterns to match for rotates with variable shift amount are quite fragile and easily - and frequently - broken by unrelated opts in the middle end)
@regehr@resistor@barrelshifter actual funnel shift as an exposed operation is much rarer, especially the variable-shift-amount variant, because it absolutely needs a 3-operand architecture on the datapath. GPUs usually pervasively have that, and it's fairly common to see native funnel shift operations exposed on GPUs. (e.g. AMD has v_alignbit_b32, NV has SHF https://docs.nvidia.com/cuda/cuda-binary-utilities/index.html#maxwell-pascal)
@regehr@resistor@barrelshifter If you want to just shift right or just shift left, that's straightforward, but if you need both (plus variants like arithmetic shifts), you end up making two or more full shifters, which is not great.
The main standard solutions are: 1. build a rotator, reduce everything to rotate+masking 2. set up into unidirectional funnel shift 3. data reversal shifter: optional bit reverse, uni-directional shifter, optional bit reverse.
@rygorous@resistor@barrelshifter is there more written up about this anywhere? I guess I haven't thought about funnel shift that deeply but I kind of regarded it as a fun, quirky rotate. sometimes our superoptimizers come up with clever uses for it, which I enjoy.