@regehr@resistor@barrelshifter If you want to just shift right or just shift left, that's straightforward, but if you need both (plus variants like arithmetic shifts), you end up making two or more full shifters, which is not great.
The main standard solutions are: 1. build a rotator, reduce everything to rotate+masking 2. set up into unidirectional funnel shift 3. data reversal shifter: optional bit reverse, uni-directional shifter, optional bit reverse.
@regehr@resistor@barrelshifter actual funnel shift as an exposed operation is much rarer, especially the variable-shift-amount variant, because it absolutely needs a 3-operand architecture on the datapath. GPUs usually pervasively have that, and it's fairly common to see native funnel shift operations exposed on GPUs. (e.g. AMD has v_alignbit_b32, NV has SHF https://docs.nvidia.com/cuda/cuda-binary-utilities/index.html#maxwell-pascal)
@barrelshifter this is now years ago, but I vividly recall how much resistance there was to adding rotates to LLVM IR too because "can't you just build them out of simpler shifts?"
(in short: yes, but not reliably for variable shifts; the patterns to match for rotates with variable shift amount are quite fragile and easily - and frequently - broken by unrelated opts in the middle end)
@resistor@barrelshifter The ADD->OR thing is another frequent monkey wrench especially wrt matching things that could be addressing modes, yeah.
It's not a worthless transformation for larger-than-native-register integers (where not having cross-limb carries is a solid win) but for <=pointer size, I'd argue it hurts more than it helps.
@resistor@barrelshifter Either way, I like the way the funnel shift solution worked out. (And will always be grateful for Sanjay doing the legwork!)
They're a good normal form, funnel shift<->rotate (where applicable) is canonical and trivial in both directions, they can be formed early (which avoids destroying the pattern), they're reasonably common in target ISAs on their own right, and are still pretty straightforward to lower where they're not available
under these conditions, when overclocked a bit, once the machine has "warmed up", seems to have around a 1/10000 chance of actually storing the contents of CL instead of CH to memory.
(this was "fun" to debug.)
The workaround: when we detect Raptor Lake CPUs, we now do
shr ecx, 8 mov [rDest + <index>], cl
instead. This takes more FE and uop bandwidth, but this loop is mainly latency-limited, and this is off the critical path.
A customer managed to get a fairly consistent repro for transient decode errors by overclocking an i7-14700KF by about 5% from stock settings ("performance" multiplier 56->59).
It took weeks of back and forth and forensic debugging to figure out what actually happens, but TL;DR: the observed decode errors are all consistent with a single instruction misbehaving.
When I was a small kid, floppy disks were actually floppy (5.25") and contained actual spinning disks, floppy disk drives contained the thing that drove the disks (i.e. the motor), and hard disk drives had hard disks (metal platters) and the thing that drove them all in one package. Fair enough.
Not long after, we got 3.5" floppies that had a hard plastic shell and were not actually floppy anymore. Still called them floppies.
So a 3.5" floppy drive is actually a drive for a disk in a 3.5" hard shell but whatever.
Then we got other magnetic removable storage formats that were also all hard-shelled but whatever.
Then audio CDs got adapted for data storage and we got CD-ROMs with CDs (and later DVDs) that were actually disc-shaped, unlike floppies where the actual storage medium was disc-shaped but the overall package wasn't.
and then we got solid-state storage and the entire terminology is just terminally bonkers now
we have "disks" that are neither diskettes nor disc-shaped (the actual chips are rectangular), "disk drives" that don't interact with dis[ck]s and don't drive anything (not in a mechanical sense anyway), and the "solid-state" part refers to no moving parts but of course the stuff with moving parts is also all in a solid state of matter, generally
in short, two of the three words in "solid-state drive" refer to the thing it's got in common with literally all the other competing storage technologies and the remaining word describes the one thing it doesn't actually do, definitionally
Santa celebrates 50 years "out of list-making business"
NORTH POLE. May 12, 2025
Santa is celebrating 50 years without the naughty/nice list. "We stopped doing it in 1975 since it felt out of touch with the times then; these days, with frequent data breaches, privacy regulations, COPPA, GDPR... frankly it feels like a liability nightmare, I don't think anyone would even seriously consider doing this now. I mean, a naughty list leak. Can you imagine?"