Conversation
Notices
-
Embed this notice
@Codeki @PopulistRight @snappler @RustyCrab @Inginsub you can run the model slowly for the cost of a consumer card, about 5k there abouts.
while the ones that deepseek used to train it, aftermarket h100s are around 30k each not counting other gear
nvidia pricing AI specific top line chips at 60k each, and everyone is just supposed to buy hundreds of them at once. like openai allegedly had like 150k individual units for some reason
now factor that into having to "retire" entire setups every 3 years because "nvidia" made a better card.
suddenly nvidia revenue stream is bloated 1000x for zero reason. now that news is out the first tier clients (server runners like super micro) cant sell their subscription service, nvidia is going into the trash.
just look at their own cooked up projections, i doubt they even shipped this any chips. investor ponzi
-
Embed this notice
@IAMAL_PHARIUS @Codeki @PopulistRight @RustyCrab @Inginsub Given the 50 series uplift is only about 8%, they were definitely counting on 24/7 A100s (or H100s in China) burning out and continued fever pitch from Big Tech and get-rich-quick start ups. I think they intended the Digits platform to sort of placate the prosumer home user/trainer, which is basically GDDR in a little shell with a mediocre processor. NVidia seemed to intentionally resist bumping VRAM on the consumer cards for this reason, especially considering the increasing VRAM use of AAA games.
If anyone wants to run real R1 at home (not the Ollama distills), grab a few Macs with their unified RAM and you can do network distributed inference of a decent quality quant with llama.cpp. Since it's only 27B active parameters, it should run at a decent clip as long as you can get it loaded up.
-
Embed this notice
@IAMAL_PHARIUS @Codeki @PopulistRight @snappler @RustyCrab @Inginsub I think what's especially pernicious about the scheme is the complete lack of apparent elegance in the hardware solutions on offer. Nvidia's original tensor cores *happened* to be good for AI, but their actual capabilities were clearly targeting scientific/HPC. There's been scarce improvement in any area except deepening the block.
There are plenty of ASICs which outperform nvidias offerings on a per watt basis, but they cannot compete with Nvidias wafer purchasing power, or the very wide industry adoption of pytorch and cuda. Nvidias shipped libraries have been proven to be inefficient and haphazard and doing things the nvidia way, while typically good enough, is rarely the right way to extract maximum performance. But you can't really tap into the available resources without blindly relying on Nvidia's "trust me bro" middleware.
I call it pernicious because the entire industry buying into this one vendor is pigeonholing millions of man hours of engineering into nvidias black-box toolchain.
Nvidia knows that this is a gravy train. They have no intention of meaningfully innovating fixing their flawed foundation. Maybe its partly because doing things right could break over a decade of CUDA and now tensorflow work, but I think it is more likely that Nvidia is in a downwards talent spiral. Nobody there now worked on laying the ground work or architecting the original IP blocks. Nvidia's total reliance of proprietary code for their value add features and workflows means any deviation from their current path is a massive Pandora's Box.
I know this thread is more about the "AI" side of things, and how the infinite money printer depended on infinite growth in specifically needing Nvidias miraculous one true computational solution. And how it turns out nvidia is as much in the way as it is helping. However, I thought it was a good idea to dredge into the past and point out that the entire valuation of Nvidia today is built on an IP block that *happens* to be *competent* at AI, and at this point NVIDIA can't risk a pivot. I know that the models coming from China still used Nvidia, but the first chink in the armor was the fact you don't need a gigawatt of GPU power to compete with o1. I think the next one will be an IP block specifically built for this kind of work - likely a hybrid FPGA-in-memory+matrix ASIC-in-memory, basically be able to constantly propagate forward in memory by staging FPGAs rather than go back and forth between memory levels and program levels. Nvidia can't up and invent any of this, they're already behind and even lost their ARM bid.
You'd think that companies spending hundreds of billions on leadership in a field would not want their entire business model to be hinged on a single vendor lmao
-
Embed this notice
@shippoaster @Codeki @Inginsub @PopulistRight @RustyCrab @snappler I blame asml
-
Embed this notice
@shippoaster @IAMAL_PHARIUS @Codeki @Inginsub @PopulistRight @RustyCrab I got called a shill like ten years ago by simply explaining that a lot of NVidia's domination wasn't superior hardware in itself, but developer support. They offered and still offer easy libraries to lazy/burdened AAA game development teams that let the implement useful, flashy features with minimal friction and they also have TONS of people available to answer developer questions. AMD/ATI did not. So AMD languished and fell behind because their lower sales meant no ability to create impactful libraries or technologies that were cross platform. FSR has been their most successful one to date and it's very shit and the new FSR4 is AMD only, killing any momentum there. That's on the game side, but consumer ubiquity is important for being the defacto for a lot of data center uses. It's why Ryzen has been slow to enter the datacenter.
On the other side, the research/AI side, we have the problem of all the people being college educated morons who work in Python and don't actually code in the way any programmer or engineer thinks about it. This has infected a lot of things, but it came baked in with the researchers. That helped CUDA become defacto and made sure that anything that wasn't as easy and reliable just fell by the wayside because who is ordering 500 AMD cards to try to run whatever their half-working, 5x slower brand of CUDA is this week?
So here we are. No one really discussing custom hardware beyond Nvidia's lazy slapped together rackmounts, which barely count. In some ways, I don't blame them. I'm not suggesting they're doing it out of wisdom, but the idea that transformers are the end of the road is pretty premature, so purpose building hardware to catch the wave before it crashes is a good way to go bankrupt. BitNet's ternary architecture could be very valuable if it ever gets proven out. Could lead to very fast, very small models with relatively cheap custom hardware (one day).