3090 driver handicap?

josh · Feb 6, 2021

larrysb said:
I've certainly setup workstation with 4x GPU. I used the same Mobo that Nvidia did in their DGX series workstations to build them. I even use the same EVGA 1600W power supply. (Corsair Carbide 540 case is excellent for this by the way)

I used the Titan V in that 4x format as well. However, not only was NVLink software-disabled on these (the fingers are there, and electrically connected!), they were also severely clock limited by the driver in GPU compute mode. I even wrote Jensen an email about it, and he responded!

The RTX series with the two card limited NVLink puts a bit of a damper on the number of cards you can really make use of, even if you have the slots and PLX'd PCIe lanes to work with. In a workstation situation, you might as well go with 2x GPU if you need it.

I hit limits on vram though as our model complexity increased. Honestly, I could make use of the 48gb vram cards.

I can also only pull so many amps from a single outlet too. Then there's the heat problem and noise problem. Even 2 RTX titans put out a lot of heat.

Where the Quadros begin to shine is the scaling beyond 2 GPU. If you go to a distributed compute model with Horovod or other methods, it can be useful to utilize multiple workstations when they're available. With the availability of high speed network cards of 25gb or better and direct RDMA, the scaling of GPU and storage can be pretty good on a "mini-fabric".

On the partner cards, I've found them to not be so reliable in many cases. They're designed for gamers and tend to have overclocking of everything available, different throttling and fan curves and honestly, poor cooling solutions giving way to RGB lighting and plastic dress up kits. A lot of cost optimization goes on to the reference designs so they can eek out a little more margin in competitive markets. Not all the FE cards are that great either. But the last couple of generations have been generally better than the AIB's in the long run. I've taken the occasional problem child GPU out of service and run them in a game demo and start to see artifacts on screen in many cases, or glitches and mystery crashes. Early on in our process, we thought, "hey it is the same GPU on this card as the other and it is $x.xx cheaper!" But you live and learn that you never get more than you pay for, and you often get less.

The Quadros we've used have been generally very reliable. ECC memory is a plus in my book. More conservative clocking and throttling tends to make them more reliable. Last thing you want after 50 hours of model training is a GPU related crash or situations that lead you to suspect memory corruption. (BTDT) I just wish they were not so ripping expensive.

On the new generation of RTX A6000, the virtualization is an interesting prospect. There are many cases where it would be nice to have one honking big GPU that can be virtualized out to several workloads, especially for development. Even on a single workstation I can think of good reasons to do that. Would like to play with that and see how well it works with several containerized workloads with a vGPU each. Sharing a GPU through virtualization could be useful in other HPC applications, especially on a high-speed local network.

I readily agree that scaling beyond 4 physical GPU's in a typical office room, much less in a single computer chassis can be pretty tough. I became real popular with my office landlord for popping the circuit breakers from the load imposed by multiple GPU workstations.

Have you tried any of the Turbos at all? There's no reason why ASUS would suddenly decide to produce Turbos if the Gigabytes weren't doing well in the professional market.

larrysb · Feb 6, 2021

josh said:
Have you tried any of the Turbos at all? There's no reason why ASUS would suddenly decide to produce Turbos if the Gigabytes weren't doing well in the professional market.

Yes, of course.

What makes you think I haven't?

I think there are two markets for today's blower cards: Gamers building in small form factor cases and the "pro-sumer" and entry-level pro.

When I was running 1080ti's (all blower cards of course back then), the 4-way configuration was more important. The Titan V were disappointing (due to clock crippling) since we hadn't migrated to fp16 yet. The 1080ti were beating them on fp32 workloads.

I've dropped down to 2x GPU's and 25gb network card in my 4-way x16-lane PLX'd workstations. The RTX Link is only two-way anyway. Fan cards like the RTX Titan are fine spaced out in an airflow case.

All I'm saying is the Quadro level cards have advantages if you need to scale up beyond a single computer with a pair of consumer graphic cards in it.

josh · Feb 6, 2021

larrysb said:
Yes, of course. What makes you think I haven't?

I think there are two markets for today's blower cards: Gamers building in small form factor cases and the "pro-sumer" and entry-level pro.

When I was running 1080ti's (all blower cards of course back then), the 4-way configuration was more important. The Titan V were disappointing (due to clock crippling) since we hadn't migrated to fp16 yet. The 1080ti were beating them on fp32 workloads.

I've dropped down to 2x GPU's and 25gb network card in my 4-way x16-lane PLX'd workstations. The RTX Link is only two-way anyway. Fan cards like the RTX Titan are fine spaced out in an airflow case.

All I'm saying is the Quadro level cards have advantages if you need to scale up beyond a single computer with a pair of consumer graphic cards in it.

Just wondering if your statement that AIB manufacturers have underdelivered includes the 3090 Turbos as those are not really bought by "gamers"

larrysb · Feb 6, 2021

josh said:
Just wondering if your statement that AIB manufacturers have underdelivered includes the 3090 Turbos as those are not really bought by "gamers"

I have not put hands upon a 3090 blower style card, or any 3090 at this point. I'd buy one if I could find one.

My comments about board partners is historical across the last few generations.

balnazzar · Feb 7, 2021

larrysb said:
When I was running 1080ti's (all blower cards of course back then), the 4-way configuration was more important. The Titan V were disappointing (due to clock crippling) since we hadn't migrated to fp16 yet. The 1080ti were beating them on fp32 workloads.

The 1080ti (like any other pascal card) were perfectly capable of fp16 training. The speedup was modest (~10%) but you still enjoyed the vram 'doubling'.
I know, they should have been heavily crippled in fp16, but it was not so. Convergence was fine too, the same rate of any Turing card.

balnazzar · Feb 7, 2021

larrysb said:
I have not put hands upon a 3090 blower style card, or any 3090 at this point. I'd buy one if I could find one.

I have two of them and I cannot complain. Maximum temp observed was 72C after a 20 minutes run with gpu_burn. Fan at 74%.
In the same conditions, and same case, one single RTX Titan went to 86C and throttled, while producing *more* noise. That noise was somewhat higher-pitched with respect to the 3090 turbo, which is noisy, but in a gentler manner.
They got 2-slot spacing:

Note that the case has three intake 14cm fans, but they run at very low speed. Interestingly, if I make them go full speed, the gpu temps decrease by just 1C - 2C.

josh · Feb 7, 2021

balnazzar said:
I have two of them and I cannot complain. Maximum temp observed was 72C after a 20 minutes run with gpu_burn. Fan at 74%.
In the same conditions, and same case, one single RTX Titan went to 86C and throttled, while producing *more* noise. That noise was somewhat higher-pitched with respect to the 3090 turbo, which is noisy, but in a gentler manner.
They got 2-slot spacing:

View attachment 17456

Note that the case has three intake 14cm fans, but they run at very low speed. Interestingly, if I make them go full speed, the gpu temps decrease by just 1C - 2C.

Have you looked at the memory junction on the VRAM? Apparently the throttling isn't from the core temps but the memory. I've seen that they actually run much higher than "safe" GDDR6X max of 95C even at 50+C core temps.

Don't mind me asking, what board are you using? Dual CPU in ATX format that will also take 2x3090 with spacing is pretty rare.

funkywizard · Feb 7, 2021

water cooled cards are best if your chassis can accommodate it, especially for multiple cards in one system.

balnazzar · Feb 8, 2021

josh said:
Have you looked at the memory junction on the VRAM? Apparently the throttling isn't from the core temps but the memory. I've seen that they actually run much higher than "safe" GDDR6X max of 95C even at 50+C core temps.

That's true. But the throttling temps for gddr6x on 3090s should be 110C, not 95C. Mine run at ~90C, with an observed maximum at 93C.
If one is still uncomfortable with these temps, here is a DIY solution:

However, note that Pugetsystems tested four 3090 turbo crammed in 4x configuration, upon heavy rendering workloads. They did not observe any performace drop.
Some AIB open-air cards just suck since their backplate has been badly engineered and/or has not been put in contact with the memory chips on the backside.

That said, I think that Nvidia chose gddr6 (non-x) for the a6000 exactly for this reason.

josh said:
Don't mind me asking, what board are you using? Dual CPU in ATX format that will also take 2x3090 with spacing is pretty rare.

Mine is an Asus pro ws c621-64L sage/10G. It's a CEB board, single socket lga3647.
But if you need ATX and a lot of slots for spacing your cards the way you desire, buy a ROMED8-2T or some c422 board with seven slots.

funkywizard said:
water cooled cards are best if your chassis can accommodate it, especially for multiple cards in one system.

If you want less noise and very low temps, yes.. Liquid is a solution. Actually Alphacool makes a specific backplate for the 3090 turbo, priced around 140eur/usd.
People here and there have observed that one 240mm radiator suffices in keeping a 3090 at full steam below 60C.
One single 420mm (3x140mm) does the same for two of them.

funkywizard · Feb 8, 2021

Also, there are factory water cooled cards, which will be a lot easier for a typical customer to get up and running than a custom water block, and the cost shouldn't be much higher than normal cards.

balnazzar · Feb 8, 2021

Actually OT, but I had a X11SPA previously. Bought it for its PLX pcie switch. More lanes with a scalable.

It turned out that board did not possess any sleep/hibernation capability (ACPI s3 and s4), so I had to return it. The same stands for *any* epyc board, incidentally.

That's kind of a deal-breaker for workstation users. A workstation is not an office pc that one can just turn off.. A dozen of stateless docker containers are a PITA to restart from scratch every day. You want to turn it on and find it your work exactly as you left it.
On the other hand, one hardly wants to leave a workstation on 24/7. It's not a server. Electricity bill, unnecessary dust accumulation and wear upon the components, etc..

I'd very like @Patrick to test whether such suspend functionalities are present or not when he reviews workstation platforms. And, why not, even server platforms. There is plenty of people that would make use of an epyc board for a WS.

balnazzar · Feb 8, 2021

funkywizard said:
Also, there are factory water cooled cards, which will be a lot easier for a typical customer to get up and running than a custom water block, and the cost shouldn't be much higher than normal cards.

Yes, that's a solution. But you get two radiators rather than one, fixed-length tubes, and often weaker pumps.

josh · Feb 8, 2021

balnazzar said:
That's true. But the throttling temps for gddr6x on 3090s should be 110C, not 95C. Mine run at ~90C, with an observed maximum at 93C.
If one is still uncomfortable with these temps, here is a DIY solution:
View attachment 17460
However, note that Pugetsystems tested four 3090 turbo crammed in 4x configuration, upon heavy rendering workloads. They did not observe any performace drop.
Some AIB open-air cards just suck since their backplate has been badly engineered and/or has not been put in contact with the memory chips on the backside.

That said, I think that Nvidia chose gddr6 (non-x) for the a6000 exactly for this reason.

I have considered the little heat spreaders. Do you have any idea where to place them on the Gigabyte Turbos? I got one of them today and I had to point a fan at the backplate to keep it around 92-94C.

Mine is an Asus pro ws c621-64L sage/10G. It's a CEB board, single socket lga3647.
But if you need ATX and a lot of slots for spacing your cards the way you desire, buy a ROMED8-2T or some c422 board with seven slots.

Ah my mistake, I thought it was dual socket but it's just the massive cooler!

If you want less noise and very low temps, yes.. Liquid is a solution. Actually Alphacool makes a specific backplate for the 3090 turbo, priced around 140eur/usd.
People here and there have observed that one 240mm radiator suffices in keeping a 3090 at full steam below 60C.
One single 420mm (3x140mm) does the same for two of them.

I got a unit with local warranty, don't think I want to void the warranty on it by tearing out the heatsink. I also have a bit of PTSD from my WC loop leaking a couple of years ago and spraying coolant all over the parts. Not sure when I'll be comfortable getting back into it.

balnazzar · Feb 8, 2021

josh said:
I have considered the little heat spreaders. Do you have any idea where to place them on the Gigabyte Turbos? I got one of them today and I had to point a fan at the backplate to keep it around 92-94C.

Yes. Just look here:

http://imgur.com/a/M9QFmy3

This is a 3090 turbo disassembled. Just guide yourself using the screw holes as reference.

josh said:
Ah my mistake, I thought it was dual socket but it's just the massive cooler!

It's just the Noctua 92mm for lga3647

josh said:
I got a unit with local warranty, don't think I want to void the warranty on it by tearing out the heatsink. I also have a bit of PTSD from my WC loop leaking a couple of years ago and spraying coolant all over the parts. Not sure when I'll be comfortable getting back into it.

May I ask about the cause of such leak? And which parts did you use for your custom loop?

balnazzar · Feb 8, 2021

Note that they put the serial number sticker just where it bothers you. I'm not sure about the warranty if you remove it.
On the other hand, you shouldn't leave it there.

balnazzar · Feb 8, 2021

balnazzar said:
Note that they put the serial number sticker just where it bothers you. I'm not sure about the warranty if you remove it.
On the other hand, you shouldn't leave it there.

Put the heatsinks immediately below the sticker. I think the backplate will conduct the heat acceptably, due to the short distance.
Be sure of using quality thermal tape or thermal glue.

josh · Feb 8, 2021

balnazzar said:
Yes. Just look here:

http://imgur.com/a/M9QFmy3

This is a 3090 turbo disassembled. Just guide yourself using the screw holes as reference.

Thanks! Do you think small heatsinks are better or larger ones? Bit difficult to fit the larger ones on cards packed 2 slots apart.

May I ask about the cause of such leak? And which parts did you use for your custom loop?

One of the joints popped out. Never use soft tubing.

Curiously I've had my Turbo crash my entire system (full power off with no ability to power on again) twice when mem temps stayed at 100C for more than 30 mins. Would need to let it rest for a couple of mins before it worked again. Core temps were just 56C. Keeping the mem temps 2C lower at 98C eliminates the problem entirely for some strange reason.

larrysb · Feb 8, 2021

On the RTX Titan, I see power limit throttling before thermal throttling most of the time.

This is one of mine. It's a Asus X99 WS-E motherboard, Xeon E5-2689v4, 128gb RAM, 2 RTX Titans, ConnectX5 25gb ethernet card in it. The Noctua cooler is super quiet and even under a full-power Prime95+AVX work load never goes above 50c. It is a Corsair Carbide 540 case, so tons of room and airflow. 1600w EVGA power supply. The 25gbe network card has a small fan directed on it. I would not go back to water cooling unless I had no other choice. Air is much better for long-term reliability.

The Asus board has a thermal sensor input, and I attached the sensor to the backplate of GPU #1. As the GPU heats up, the front 140mm fans kick up rapidly and automatically.

I've had 4x GPU's in the chassis in the past. I had to make special right-angle adapters for some of the connectors on the bottom edge of the board.

But I've found that 2x RTX and the 25gbe network to other machines makes a good solution for distributed. I've scaled it to as many as 4 workstations with a total of 8 GPU's with fairly good scaling. It would scale better with Quadro cards and RDMA/GPU-direct. With the consumer cards, it can do RDMA to the other computer's memory, but not directly to the GPU memory between systems.

The noise level is pretty decent. You can easily sit next to it while it is under full load. The heat level is pretty amazing, especially when two or more are running at the same time.

I've run these systems for weeks at a time with no problems.

josh · Feb 8, 2021

larrysb said:
On the RTX Titan, I see power limit throttling before thermal throttling most of the time.

This is one of mine. It's a Asus X99 WS-E motherboard, Xeon E5-2689v4, 128gb RAM, 2 RTX Titans, ConnectX5 25gb ethernet card in it. The Noctua cooler is super quiet and even under a full-power Prime95+AVX work load never goes above 50c. It is a Corsair Carbide 540 case, so tons of room and airflow. 1600w EVGA power supply. The 25gbe network card has a small fan directed on it. I would not go back to water cooling unless I had no other choice. Air is much better for long-term reliability.

The Asus board has a thermal sensor input, and I attached the sensor to the backplate of GPU #1. As the GPU heats up, the front 140mm fans kick up rapidly and automatically.

I've had 4x GPU's in the chassis in the past. I had to make special right-angle adapters for some of the connectors on the bottom edge of the board.

But I've found that 2x RTX and the 25gbe network to other machines makes a good solution for distributed. I've scaled it to as many as 4 workstations with a total of 8 GPU's with fairly good scaling. It would scale better with Quadro cards and RDMA/GPU-direct. With the consumer cards, it can do RDMA to the other computer's memory, but not directly to the GPU memory between systems.

The noise level is pretty decent. You can easily sit next to it while it is under full load. The heat level is pretty amazing, especially when two or more are running at the same time.

I've run these systems for weeks at a time with no problems.

View attachment 17489

Is there a reason why you don't just run the fans 24/7. Also, is it a good idea to blow the CPU fan downwards to circulate air across the backplate of the first GPU?

larrysb · Feb 9, 2021

josh said:
Is there a reason why you don't just run the fans 24/7. Also, is it a good idea to blow the CPU fan downwards to circulate air across the backplate of the first GPU?

It is a workstation and doesn't run GPU loads 24x7. The fans ramp up to the heat load as needed, automatically. Some of my workloads are CPU based too. Automatically managed fans are much better for a lot of reasons.

The GPU backplates don't really cool all that much. Never found any advantage to arranging the CPU fan that way to help the cards. But the CPU cooler is very effective in the normal orientation.

3090 driver handicap?

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

mmm.... bandwidth.

Active Member

mmm.... bandwidth.

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member