12VHPWR is going great
Conversation
Notices
-
Embed this notice
Graham Sutherland / Polynomial (gsuberland@chaos.social)'s status on Wednesday, 12-Feb-2025 08:46:15 JST Graham Sutherland / Polynomial
- Haelwenn /элвэн/ :triskell: likes this.
-
Embed this notice
Doughnut Lollipop 【記録係】:blobfoxgooglymlem: (tk@bbs.kawa-kun.com)'s status on Wednesday, 12-Feb-2025 08:47:05 JST Doughnut Lollipop 【記録係】:blobfoxgooglymlem:
@gsuberland @doskel Specifying thicker wires would do a lot to help that, but that wouldn't "increase shareholder value." :blobfoxupsidedown: -
Embed this notice
Graham Sutherland / Polynomial (gsuberland@chaos.social)'s status on Friday, 14-Feb-2025 06:17:43 JST Graham Sutherland / Polynomial
don't worry, it doesn't stay at 128 degrees!
it goes up. like... 150 plus.
Haelwenn /элвэн/ :triskell: likes this. -
Embed this notice
:umu: :umu: (a1ba@suya.place)'s status on Friday, 14-Feb-2025 06:17:52 JST :umu: :umu:
@gsuberland 12 volts hot power Haelwenn /элвэн/ :triskell: likes this. -
Embed this notice
Graham Sutherland / Polynomial (gsuberland@chaos.social)'s status on Friday, 14-Feb-2025 06:18:26 JST Graham Sutherland / Polynomial
so to be clear the problem here is 100% the GPU and nvidia being fucking idiots.
you've got 500W going through this cable at 12V. so that's 40A.
the cable has a bunch of 12V lines and a bunch of GND lines. if the current is shared across those lines, it's all good. they're well within spec.
but what happens is if you put them all in parallel, whichever cable has a slightly lower resistance will carry more current, resulting in most of the current going down one or two wires and getting hot.
Haelwenn /элвэн/ :triskell: likes this. -
Embed this notice
Wolf480pl (wolf480pl@mstdn.io)'s status on Friday, 14-Feb-2025 06:21:21 JST Wolf480pl
@gsuberland @ignaloidas @phenidone
side note: is this normal that NVIDIA is dictating the power delivery design to such details? Wasn't it supposed to be board partners' job to design power delivery and cooling for the cards?Haelwenn /элвэн/ :triskell: likes this. -
Embed this notice
Graham Sutherland / Polynomial (gsuberland@chaos.social)'s status on Friday, 14-Feb-2025 06:21:22 JST Graham Sutherland / Polynomial
@ignaloidas @phenidone the 3090 and prior manage this fine by balancing across split high side power domains - if one leg is compromised it won't boot, and you don't get massive imbalances. the Asus ROG 4-series cards added per line shunts to detect bad connections (they were required to use nvidia's single combined high side design so this was the next best thing) and warn the user / refuse to power on to protect against the issue. we have the capability to be safe. nvidia just didn't do it.
-
Embed this notice
Graham Sutherland / Polynomial (gsuberland@chaos.social)'s status on Friday, 14-Feb-2025 06:21:23 JST Graham Sutherland / Polynomial
@ignaloidas @phenidone ultimately though I'm firmly of the belief that when burning people and their expensive equipment is the impact of the risk, you need secondary safety controls (active current balancing, lockout when sufficiently bad connections occur) to account for bad connectors or user error. 12VHPWR has problems but the cards should be protecting against these dangerous failure cases regardless, and especially given that it's a known problem.
-
Embed this notice
Graham Sutherland / Polynomial (gsuberland@chaos.social)'s status on Friday, 14-Feb-2025 06:21:24 JST Graham Sutherland / Polynomial
@ignaloidas @phenidone I have my suspicions that there is a sneaky runaway effect from thermal expansion increasing contact pressure, in part due to the ridiculously high current density at the connector, and cards with proper current balancing are keeping it in check. but I can't test that theory without hands on the gear and I don't have the cash right now (not even for buying the same connectors and some cables, unfortunately)
-
Embed this notice
Ignas Kiela (ignaloidas@not.acu.lt)'s status on Friday, 14-Feb-2025 06:21:26 JST Ignas Kiela
@gsuberland@chaos.social @phenidone@mstdn.social I'm honestly surprised that the imbalance is that big, that requires the resistance on the rest of the cables to be like 5x bigger than the ones with high current, at least from my playing with a circuit simulator. That's a huge difference.
-
Embed this notice
Graham Sutherland / Polynomial (gsuberland@chaos.social)'s status on Friday, 14-Feb-2025 06:21:27 JST Graham Sutherland / Polynomial
@phenidone it's right in the der8auer video, you can go watch him take the clamp measurements. 20A down two cables, <2A down the rest.
-
Embed this notice
William (phenidone@mstdn.social)'s status on Friday, 14-Feb-2025 06:21:29 JST William
@gsuberland I'mma not believe you on that 350% number unless you've got direct measurements to prove it.
What mechanism could cause that imbalance, noting that the cable resistances are going to be very similar to each other and the thermal effect provides negative feedback?
-
Embed this notice
Graham Sutherland / Polynomial (gsuberland@chaos.social)'s status on Friday, 14-Feb-2025 06:21:30 JST Graham Sutherland / Polynomial
one note on the der8auer video: he mentions that the power headroom on the 5090 is just 15%, due to the high power draw and only a single connector being used, and states his opinion that they should've gone for two connectors to offer more headroom.
I agree with this in general (15% cuts it too fine), but it's important to contextualise the problem here: assuming perfect sharing, two connectors would give you a headroom of 130%.
the magnitude of current imbalance within the cable is 350%.
-
Embed this notice
Graham Sutherland / Polynomial (gsuberland@chaos.social)'s status on Friday, 14-Feb-2025 06:21:31 JST Graham Sutherland / Polynomial
and more technical details of the 5090 shunt issue here:
-
Embed this notice
Graham Sutherland / Polynomial (gsuberland@chaos.social)'s status on Friday, 14-Feb-2025 06:21:32 JST Graham Sutherland / Polynomial
source of the photo and lots more details here:
-
Embed this notice
Graham Sutherland / Polynomial (gsuberland@chaos.social)'s status on Friday, 14-Feb-2025 06:21:33 JST Graham Sutherland / Polynomial
and this is a 100% known problem. and it's solved using active current balancing on the GPU side. you put shunt resistors in series with the lines, measure the current, and actively shift the current draw in realtime to keep everything balanced. (you can do this with an ideal diode-OR controller, or separating the high-side feeds to separate sets of VRM phases)
nVidia *has* done this on some prior cards. but they reduced the shunt count here, running stuff in parallel, and this is the result.