2021-12-20 10:09:14
Hilariously Stupid and Epic Bugs Part 3 - Exploding PSU
This is not bug per se, but some chain of events and my personal hilarious stupidity that lead to funny consequences.
The whole 3090 power issues mumbo jumbo. GPU and PSU manufacturers provide little (if any) proper (not marketing BS) guidance on actual fundamentals behind how you should connect your devices to a PSU.
3090 is known to consume from 300 to 540 W (sic). And typically it has only 2 8 pin power connectors (sometimes 3). If am not mistaken, the recommended load is about 150W for a single 8 PIN PCIE power connector. Also the PCIE port itself can deliver around 50 - 70W of energy. So if you GPU consumes around 300W, then nothing to worry about, provided you connect it
WITH 2 PCIE POWER CABLES. There was a similar argument about 1080 Ti, but I believe people eventually settled that 1 cable was kind of ok. A10 on the other hand consumes ... exactly 150W, wow.
nvidia-smi --query-gpu=index,timestamp,power.draw,clocks.sm,clocks.mem,clocks.gr --format=csv -l 1
0, 2021/12/20 06:46:04.941, 275.42 W, 1830 MHz, 9501 MHz, 1830 MHz
0, 2021/12/20 06:46:05.959, 284.80 W, 1152 MHz, 9501 MHz, 1725 MHz
0, 2021/12/20 06:46:06.974, 287.45 W, 1845 MHz, 9501 MHz, 1845 MHz
0, 2021/12/20 06:46:07.987, 278.15 W, 1172 MHz, 9501 MHz, 1755 MHz
Also of importance are bullshit cooling settings done by some manufacturers (i.e. turbo fan does not spin faster than 50%), but let us leave this can of worms for now (I had no idea this was an issue - I was just lucky to buy proper GPUs, but just extra 20-30% of fan speed solves thermal issues, lol, you even do not need 100%, 3090 works under full load at ~70C).
So I went to a data center for regular maintenance of our servers (just visually check all connectors, blow the dust, this kind of stuff). A quick adventure, in and out, 20 minutes! What could go wrong?
When I arrived I realized that I forgot about the whole 8 + 8 thing. You see, I wanted to replace some 1080 Tis with 3090. So I needed more cables.
Tuned out, my cables were stored at my older flat where my relatives live. And they were ... mixed. There are nice pouches provided by Leadex, but my relatives were kind enough to take pouches out of the boxes. No problem, you can check them with a multi-meter.
After waiting for the pouches to be delivered ... I saw that visually cables from 2 pouches were identical and they forgot to put a multimeter. I heard stories about PSUs exploding if you mixed cables EVEN WITHIN the same model and year.
So, YOLO? And yes, the PSU exploded. Good thing the butcher's bill did not take anything else and I had a spare PSU for some reason (with precariously inviting 8 + 8 single PCIE connectors - a disaster waiting to happen). But it was as hilarious as nerve-racking.
By the way, Leadex, WTF? You put PCIE power cables with newer PSUs having 1 PCIE => 2 * 8 pin connectors! This is a disaster waiting to happen.
The conclusion:
- Label you fucking PSU boxes and pouches and store them under the bed;
- Take an extra layer of precautions all of the time;
- YOLO works only a handful of times;
- Backup your data. Even RAID 10 is not a backup;
- Modern high-end hardware is surprisingly robust and fault-tolerant;
- Rely on fundamentals;
- Have at least some power margin;
- Use better components;
Another funny situation on the same topic (not me). Turns out even if your cables start smoldering (in this particular case it was a shitty PSU), your PSU and GPU and MB may be just fine.
#epic_bug_moments
417 viewsAlexander, 07:09