I've certainly setup workstation with 4x GPU. I used the same Mobo that Nvidia did in their DGX series workstations to build them. I even use the same EVGA 1600W power supply. (Corsair Carbide 540 case is excellent for this by the way)
I used the Titan V in that 4x format as well. However, not only was NVLink software-disabled on these (the fingers are there, and electrically connected!), they were also severely clock limited by the driver in GPU compute mode. I even wrote Jensen an email about it, and he responded!
The RTX series with the two card limited NVLink puts a bit of a damper on the number of cards you can really make use of, even if you have the slots and PLX'd PCIe lanes to work with. In a workstation situation, you might as well go with 2x GPU if you need it.
I hit limits on vram though as our model complexity increased. Honestly, I could make use of the 48gb vram cards.
I can also only pull so many amps from a single outlet too. Then there's the heat problem and noise problem. Even 2 RTX titans put out a lot of heat.
Where the Quadros begin to shine is the scaling beyond 2 GPU. If you go to a distributed compute model with Horovod or other methods, it can be useful to utilize multiple workstations when they're available. With the availability of high speed network cards of 25gb or better and direct RDMA, the scaling of GPU and storage can be pretty good on a "mini-fabric".
On the partner cards, I've found them to not be so reliable in many cases. They're designed for gamers and tend to have overclocking of everything available, different throttling and fan curves and honestly, poor cooling solutions giving way to RGB lighting and plastic dress up kits. A lot of cost optimization goes on to the reference designs so they can eek out a little more margin in competitive markets. Not all the FE cards are that great either. But the last couple of generations have been generally better than the AIB's in the long run. I've taken the occasional problem child GPU out of service and run them in a game demo and start to see artifacts on screen in many cases, or glitches and mystery crashes. Early on in our process, we thought, "hey it is the same GPU on this card as the other and it is $x.xx cheaper!" But you live and learn that you never get more than you pay for, and you often get less.
The Quadros we've used have been generally very reliable. ECC memory is a plus in my book. More conservative clocking and throttling tends to make them more reliable. Last thing you want after 50 hours of model training is a GPU related crash or situations that lead you to suspect memory corruption. (BTDT) I just wish they were not so ripping expensive.
On the new generation of RTX A6000, the virtualization is an interesting prospect. There are many cases where it would be nice to have one honking big GPU that can be virtualized out to several workloads, especially for development. Even on a single workstation I can think of good reasons to do that. Would like to play with that and see how well it works with several containerized workloads with a vGPU each. Sharing a GPU through virtualization could be useful in other HPC applications, especially on a high-speed local network.
I readily agree that scaling beyond 4 physical GPU's in a typical office room, much less in a single computer chassis can be pretty tough. I became real popular with my office landlord for popping the circuit breakers from the load imposed by multiple GPU workstations.