Recent content by bayleyw

B
Small Form Factor GPU for local LLM and light training

Should perform slightly worse than a 4060, but with twice the RAM which is great for language modeling.
- bayleyw
- Post #4
- Jun 21, 2024
- Forum: Machine Learning, Deep Learning, and AI
B
Tyan S5652 front panel header

For better or for worse I figured it out, it is Amphenol G846A24211T4EU with mating receptacle G842AH242T4EU. Of course, the receptacle is out of stock everywhere, so the adventure continues...
- bayleyw
- Post #2
- Jun 21, 2024
- Forum: Processors and Motherboards
B
Tyan S5652 front panel header

Anyone know what part number the front panel header is on the S5652? Tyan seems to have switched to a higher density connector on the most recent boards...
- bayleyw
- Thread
- Jun 20, 2024
- Replies: 2
- Forum: Processors and Motherboards
B
Small Form Factor GPU for local LLM and light training

If you're just getting started I would not go SFF. You pay a massive premium over a normal card (the 4090 is something like four times faster than the RTX 4000 Ada...) and performance is crap because you're limited to 70W. If you must use a 70W card for personal reasons, get a RTX 2000 Ada. You...
- bayleyw
- Post #2
- Jun 20, 2024
- Forum: Machine Learning, Deep Learning, and AI
B
Findings with Asrock W790-WS, Xeon SPR-SP D0, and BCLK OC (and lack thereof)

so this could be a memory issue...I have a consumer Alder Lake system that is overclocked via external clockgen and it really does not like POSTing. after a ton of fiddling around it ended up being a memory issue - even with manual timings there was something causing the DRAM training to fail...
- bayleyw
- Post #2
- Jun 7, 2024
- Forum: Processors and Motherboards
B
8x 4090 motherboard recommendations

Definitely not going to hold 8x 450W 4090's...
- bayleyw
- Post #26
- May 15, 2024
- Forum: Machine Learning, Deep Learning, and AI
B
Gigabyte MF51-ES0 for an SSD NAS?

I feel like lack of ReBAR and bifurcation are not the board's fault, given they never told you these features were supported. As for the 10gbe dropping, once every several days means it's not a thermal problem, since nothing on the board should have a thermal time constant of several days.
- bayleyw
- Post #6
- May 13, 2024
- Forum: Processors and Motherboards
B
8x 4090 motherboard recommendations

Right, but if you're only training at 4k context it doesn't help you.
- bayleyw
- Post #24
- May 13, 2024
- Forum: Machine Learning, Deep Learning, and AI
B
8x 4090 motherboard recommendations

RingAttention is for long sequence lengths, I don't think it works well for less than 16K-ish sequence?
- bayleyw
- Post #22
- May 13, 2024
- Forum: Machine Learning, Deep Learning, and AI
B
8x 4090 motherboard recommendations

Actually building a cluster of 4090's would be incredibly confusing, because the GPUs only have an x16 link which is shared between intra-node comms and inter-node comms. During FSDP each GPU sends 8 * (number of parameters) bytes per iteration; the theoretical upload speed is 32GB/sec so if you...
- bayleyw
- Post #19
- May 12, 2024
- Forum: Machine Learning, Deep Learning, and AI
B
8x 4090 motherboard recommendations

I wouldn't be so sure about that. Especially as hardware gets discontinued, the more obscure variants are worth more. I bet the SFF guys would pay top dollar for a dual slot 4090 from a US seller!
- bayleyw
- Post #17
- May 11, 2024
- Forum: Machine Learning, Deep Learning, and AI
B
8x 4090 motherboard recommendations

TinyLlama should be available on HF as well (it's internally just a LlamaForCausalLM). I wouldn't bother with the optimized trainers until you prove out that you have your dataset wrangled and your model architecture defined - HF trainer is slow but very reliable and for an exotic domain like...
- bayleyw
- Post #12
- May 9, 2024
- Forum: Machine Learning, Deep Learning, and AI
B
8x 4090 motherboard recommendations

How many tokens per second do you get per GPU? As a reference, I get about 8K tokens per second training phi-1.5 using Huggingface's device_map="auto" (so only one GPU runs at a time) on two V100-16GB. 4090s should be about twice as fast. I've never seen a NCCL error give wrong results before -...
- bayleyw
- Post #10
- May 9, 2024
- Forum: Machine Learning, Deep Learning, and AI
B
8x 4090 motherboard recommendations

The loss of performance on the SYS-4028 is probably because the host is limited to PCI-e 3.0 speeds, so your gradient reductions are slower. Have you measured scaling going from 4 to 8 cards on your existing hardware? I somewhat suspect that due to improper topology, you will have less than...
- bayleyw
- Post #8
- May 8, 2024
- Forum: Machine Learning, Deep Learning, and AI
B
8x 4090 motherboard recommendations

Use case? A lot depends on whether you need x16 to each GPU. Training does, and furthermore benefits from a topology where all 8 GPUs share an x16 uplink to the CPUs via two levels of PCIe switches. That's an expensive setup - you need to put down $3K a piece for the special Chinese dual slot...
- bayleyw
- Post #3
- May 4, 2024
- Forum: Machine Learning, Deep Learning, and AI

Top Bottom