Yamaha: can I interest you in any of our products? We make grand pianos, reed organs, guitars, synthesizers-" Businessman: "Hm, not really what we're looking for, but we're interested in building a gaming device. If you're making synths, you might know where to get the sound chips." Y: "Oh we make those ourselves as well." B: "Oh my, this is a fantastic deal! I'm going to buy a boat with my bonus!" Y: "You're not going to believe this..."
@regehr It's not that x86 couldn't do that, but you'd need to dive even deeper into history, and P5 level is honestly about the lowest anyone still wants to go.
You could do 286-or-less but that's 16-bit x86 and tooling for that is essentially extinct at this point. You're stuck with old compilers etc.
@regehr Every serious study (both from independent researchers and from vendors themselves) that I've ever seen (and I'm up to 5 or so at this point), broadly, supports this, with some caveats.
It's not "no difference", but for server/application cores, what differences there are typically somewhere in the single-digit %. You can always find pathological examples, but typically it's not that much.
There is a real cost to x86s many warts but it's mostly in design/validation cost and toolchains.
@regehr Some more details: - the D/V and toolchain costs are amortized. Broadly speaking, the bigger your ecosystem/market share, the bigger your ability to absorb that cost. - This holds for what ARM would call "application" cores; oversimplifying a bit, it's essentially a constant overhead on the design that adds some extra area and pipe stages. It's more onerous for smaller cores, but you need to be really small.
@regehr Eventually, there's nowhere left to hide. For applications where you'd use say an ARM Cortex-M0 or a bare-bones minimal RV32I CPU, I'm not aware of anything x86 past or present that would really make sense.
Intel did "Quark" a while back which I believe was either a 486 or P5 derivative, so still something like a 5-stage pipelined integer core. If you want to go even lower than that, I don't think anyone has (or wants to do) anything.
@steve@regehr Anyway, take that with whatever amount of salt you want, but Intel and AMD both are strongly incentivized to seriously look at this.
They for sure would prefer to sell you x86s because they have decades of experience with that, but they're looking at what it costs them to do it both in capex and in how much it hurts the resulting designs.
And for the latter, the consistent answer has been "a bit, but not much".
@regehr@steve Anecdotally, there's at least 3 (Intel, AMD, Centaur) companies that do this on the regular, and one of them (Centaur) is quite small as such things go.
I wouldn't want to do it either, but the other thing you gotta keep in mind is that the CPU core, while important, is only part of a SoC and ISA has very little impact on the "everything else".
@regehr@steve For example, it's a goddamn NIGHTMARE doing a high-performance memory subsystem for absolutely anything.
This whole "shared memory" fiction we're committed to maintaining is a significant drag on all HW, but HW impls of it are just in another league perf-wise than "just" building message-passing and trying to work around it in SW (lots have tried, but there's little code for it and it's a PITA), so we're kind of stuck with it.
@regehr@steve Basically almost everything that _all_ major ISAs pretend is true about memory at the ISA level is an expensive lie, but one that ~ALL the SW depends on. :)
@regehr@steve To wit: virtual memory is a lie, by design. Uniform memory is a lie. Shared instruction/data memory is a lie. Coherent caches are a lie, caches would rather be _anything_ else. Buses are a lie. Memory-mapped IO is IO lying about being memory. Oh and the data bits and wires are small and shitty enough now that they started lying too and everything is slowly creeping towards ECCing all the things
@chrisvest@whitequark it's not an array of SPARC processors, it's a bunch of actual NPU and DSP tiles each with a SPARC-derived microcontroller as its frontend - same way that say GPUs usually have some random microcontroller concoction as part of their command processing frontend, but that's not what's doing the math
@chrisvest@whitequark Something needs to parse your command buffers, handle scheduling, initiate interrupts and deal with other plumbing etc., and it's not going to be your matrix multiply unit.
@chrisvest@whitequark honestly something derived from an actual normal ISA you've heard of sounds positively sane, these random microcontrollers have a long and proud history of running the most obscure stuff imaginable because usually someone picked whatever was cheap out of a bin 25 years ago.
E.g. Intel's Management Engine firmware used (long ago) to run on ARC cores https://en.wikipedia.org/wiki/ARC_(processor) - literally derived from the SNES SuperFX chip. Not making this up.