Embed Notice
HTML Code
Corresponding Notice
- Embed this notice@supersid333 @LukeAlmighty @NEETzsche @Shadowman311 It's an absolute shit show. I have gotten it to work with my Vega64 but it revolves around finding the right bitsandbytes build. The right pytorch build. And finding out what's the last supported version of ROCm supported on the card via scattered documents and downloads on their own damn website. Some models don't even run because of some banal instruction not supported. And AMD has worse support for older cards in this regard than Nvidia does. After I got it running I refuse to touch or update my install of oogabooga. Because I don't want to smash my CRT over my head for a week trying to get it functioning again.
A friend and I went halvsies on a used dell server featuring 4 SXM2 slots for doing machine learning projects and surprisingly, you can find Tesla p100's for $50 on eBay. Most are reaching their limits on memory failures but for the price ¯\_(ツ)_/¯ we've already had one die due to it. And it has some of the same problems I detailed above. But it is quite nice. I reccomend it. Big problem with the specific server we got was it's a 1u server. Nobody sells the damn heatsinks so we had to machine our own. We also had to include piping slots on the sides of the heatsinks to run automotive break lines containing coolant to keep temps below 80°c or else extreme throttling would occur.
TL:DR AMD fucking blows for ML and it was all over when blender dropped GPU rendering on AMD cards in linux 5 years ago. I fucking hate it. Probably just me due to the age of my hardware. ROCm can suck my cock.