Well, color me very confused. In a CPU bound workload two cores on the same socket have substantially different performance (32% slowdown). If I just migrate the running process between the cores, performance changes immediately.
This is on 2x Xeon 5215 system.
I checked that it's not thermals, cpu frequency/boost and the system is idle.
Here's the odd part: The biggest difference evident in perf counters is
a 2.5x difference in icache_64b.iftag_stall, with ~same icache_64b.iftag_miss.