Med bay? Fuck that. No more chickening out.
If I build a Sci-Fi ship there's just gonna be a Min Bay and a Max Bay
Med bay? Fuck that. No more chickening out.
If I build a Sci-Fi ship there's just gonna be a Min Bay and a Max Bay
Yamaha: can I interest you in any of our products? We make grand pianos, reed organs, guitars, synthesizers-"
Businessman: "Hm, not really what we're looking for, but we're interested in building a gaming device. If you're making synths, you might know where to get the sound chips."
Y: "Oh we make those ourselves as well."
B: "Oh my, this is a fantastic deal! I'm going to buy a boat with my bonus!"
Y: "You're not going to believe this..."
@regehr It's not that x86 couldn't do that, but you'd need to dive even deeper into history, and P5 level is honestly about the lowest anyone still wants to go.
You could do 286-or-less but that's 16-bit x86 and tooling for that is essentially extinct at this point. You're stuck with old compilers etc.
@regehr @steve This is one of the bigger reasons for why ISA doesn't matter more.
Broadly, your uArch is only as good as its data movement, because that shit is what's really expensive, not the logic gates.
It's things like:
- how good is your entire memory subsystem
- how good is your bypass network
- how good are your register files
etc.
It's not like you can't make mistakes in the ISA that will really kill your design, you can. That's what happened to VAX.
@regehr @steve Also, re: ISA efficiency, I like re-posting this, by now, rather old image that shows you what the score really is.
This was on the Xeon Phis but the general trend holds to this day. (Source: https://people.eecs.berkeley.edu/~ysshao/assets/papers/shao2013-islped.pdf p. 3) NB this is an in-order core with 512b vector units.
@regehr Every serious study (both from independent researchers and from vendors themselves) that I've ever seen (and I'm up to 5 or so at this point), broadly, supports this, with some caveats.
It's not "no difference", but for server/application cores, what differences there are typically somewhere in the single-digit %. You can always find pathological examples, but typically it's not that much.
There is a real cost to x86s many warts but it's mostly in design/validation cost and toolchains.
@regehr Some more details:
- the D/V and toolchain costs are amortized. Broadly speaking, the bigger your ecosystem/market share, the bigger your ability to absorb that cost.
- This holds for what ARM would call "application" cores; oversimplifying a bit, it's essentially a constant overhead on the design that adds some extra area and pipe stages. It's more onerous for smaller cores, but you need to be really small.
@regehr Eventually, there's nowhere left to hide. For applications where you'd use say an ARM Cortex-M0 or a bare-bones minimal RV32I CPU, I'm not aware of anything x86 past or present that would really make sense.
Intel did "Quark" a while back which I believe was either a 486 or P5 derivative, so still something like a 5-stage pipelined integer core. If you want to go even lower than that, I don't think anyone has (or wants to do) anything.
@steve @regehr Anyway, take that with whatever amount of salt you want, but Intel and AMD both are strongly incentivized to seriously look at this.
They for sure would prefer to sell you x86s because they have decades of experience with that, but they're looking at what it costs them to do it both in capex and in how much it hurts the resulting designs.
And for the latter, the consistent answer has been "a bit, but not much".
@regehr @steve Anecdotally, there's at least 3 (Intel, AMD, Centaur) companies that do this on the regular, and one of them (Centaur) is quite small as such things go.
I wouldn't want to do it either, but the other thing you gotta keep in mind is that the CPU core, while important, is only part of a SoC and ISA has very little impact on the "everything else".
@regehr @steve For example, it's a goddamn NIGHTMARE doing a high-performance memory subsystem for absolutely anything.
This whole "shared memory" fiction we're committed to maintaining is a significant drag on all HW, but HW impls of it are just in another league perf-wise than "just" building message-passing and trying to work around it in SW (lots have tried, but there's little code for it and it's a PITA), so we're kind of stuck with it.
@regehr @steve Basically almost everything that _all_ major ISAs pretend is true about memory at the ISA level is an expensive lie, but one that ~ALL the SW depends on. :)
@regehr @steve To wit: virtual memory is a lie, by design. Uniform memory is a lie. Shared instruction/data memory is a lie. Coherent caches are a lie, caches would rather be _anything_ else. Buses are a lie. Memory-mapped IO is IO lying about being memory. Oh and the data bits and wires are small and shitty enough now that they started lying too and everything is slowly creeping towards ECCing all the things
@regehr I agree that this is the way I would like most code to be written but codegen when written this way has been very spotty for me at best.
@regehr Yes, I can see the argument for BE byte order but IBM 0-is-MSB bit order is, just... no.
As for "but BE is so clean!", I'm gonna leave this here: (From the Cell SPE docs)
@chrisvest @whitequark it's not an array of SPARC processors, it's a bunch of actual NPU and DSP tiles each with a SPARC-derived microcontroller as its frontend - same way that say GPUs usually have some random microcontroller concoction as part of their command processing frontend, but that's not what's doing the math
@chrisvest @whitequark Something needs to parse your command buffers, handle scheduling, initiate interrupts and deal with other plumbing etc., and it's not going to be your matrix multiply unit.
For the AMD GPU equivalent, https://gpuopen.com/download/micro_engine_scheduler.pdf "The GPU frontend has three micro-processors meant to execute scheduling, compute
and gfx firmware".
It's the things that run the infamous GPU Mystery Firmware Blobs™.
@chrisvest @whitequark honestly something derived from an actual normal ISA you've heard of sounds positively sane, these random microcontrollers have a long and proud history of running the most obscure stuff imaginable because usually someone picked whatever was cheap out of a bin 25 years ago.
E.g. Intel's Management Engine firmware used (long ago) to run on ARC cores https://en.wikipedia.org/wiki/ARC_(processor) - literally derived from the SNES SuperFX chip. Not making this up.
Remember that for everyone ready to rock, there are thousands standing by to sediment
@erincandescent wait less for the UART
Abstraction maker, abstraction breaker. FUN FACT: things I prefix with FUN FACT are sometimes fun and sometimes factual, but very rarely both.
GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.
All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.