For better or for worse I figured it out, it is Amphenol G846A24211T4EU with mating receptacle G842AH242T4EU. Of course, the receptacle is out of stock everywhere, so the adventure continues...
Anyone know what part number the front panel header is on the S5652? Tyan seems to have switched to a higher density connector on the most recent boards...
If you're just getting started I would not go SFF. You pay a massive premium over a normal card (the 4090 is something like four times faster than the RTX 4000 Ada...) and performance is crap because you're limited to 70W.
If you must use a 70W card for personal reasons, get a RTX 2000 Ada. You...
so this could be a memory issue...I have a consumer Alder Lake system that is overclocked via external clockgen and it really does not like POSTing. after a ton of fiddling around it ended up being a memory issue - even with manual timings there was something causing the DRAM training to fail...
I feel like lack of ReBAR and bifurcation are not the board's fault, given they never told you these features were supported. As for the 10gbe dropping, once every several days means it's not a thermal problem, since nothing on the board should have a thermal time constant of several days.
Actually building a cluster of 4090's would be incredibly confusing, because the GPUs only have an x16 link which is shared between intra-node comms and inter-node comms. During FSDP each GPU sends 8 * (number of parameters) bytes per iteration; the theoretical upload speed is 32GB/sec so if you...
I wouldn't be so sure about that. Especially as hardware gets discontinued, the more obscure variants are worth more. I bet the SFF guys would pay top dollar for a dual slot 4090 from a US seller!
TinyLlama should be available on HF as well (it's internally just a LlamaForCausalLM). I wouldn't bother with the optimized trainers until you prove out that you have your dataset wrangled and your model architecture defined - HF trainer is slow but very reliable and for an exotic domain like...
How many tokens per second do you get per GPU? As a reference, I get about 8K tokens per second training phi-1.5 using Huggingface's device_map="auto" (so only one GPU runs at a time) on two V100-16GB. 4090s should be about twice as fast.
I've never seen a NCCL error give wrong results before -...
The loss of performance on the SYS-4028 is probably because the host is limited to PCI-e 3.0 speeds, so your gradient reductions are slower. Have you measured scaling going from 4 to 8 cards on your existing hardware? I somewhat suspect that due to improper topology, you will have less than...
Use case? A lot depends on whether you need x16 to each GPU. Training does, and furthermore benefits from a topology where all 8 GPUs share an x16 uplink to the CPUs via two levels of PCIe switches. That's an expensive setup - you need to put down $3K a piece for the special Chinese dual slot...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.