Recent content by bayleyw

  1. B

    Small Form Factor GPU for local LLM and light training

    Should perform slightly worse than a 4060, but with twice the RAM which is great for language modeling.
  2. B

    Tyan S5652 front panel header

    For better or for worse I figured it out, it is Amphenol G846A24211T4EU with mating receptacle G842AH242T4EU. Of course, the receptacle is out of stock everywhere, so the adventure continues...
  3. B

    Tyan S5652 front panel header

    Anyone know what part number the front panel header is on the S5652? Tyan seems to have switched to a higher density connector on the most recent boards...
  4. B

    Small Form Factor GPU for local LLM and light training

    If you're just getting started I would not go SFF. You pay a massive premium over a normal card (the 4090 is something like four times faster than the RTX 4000 Ada...) and performance is crap because you're limited to 70W. If you must use a 70W card for personal reasons, get a RTX 2000 Ada. You...
  5. B

    Findings with Asrock W790-WS, Xeon SPR-SP D0, and BCLK OC (and lack thereof)

    so this could be a memory issue...I have a consumer Alder Lake system that is overclocked via external clockgen and it really does not like POSTing. after a ton of fiddling around it ended up being a memory issue - even with manual timings there was something causing the DRAM training to fail...
  6. B

    8x 4090 motherboard recommendations

    Definitely not going to hold 8x 450W 4090's...
  7. B

    Gigabyte MF51-ES0 for an SSD NAS?

    I feel like lack of ReBAR and bifurcation are not the board's fault, given they never told you these features were supported. As for the 10gbe dropping, once every several days means it's not a thermal problem, since nothing on the board should have a thermal time constant of several days.
  8. B

    8x 4090 motherboard recommendations

    Right, but if you're only training at 4k context it doesn't help you.
  9. B

    8x 4090 motherboard recommendations

    RingAttention is for long sequence lengths, I don't think it works well for less than 16K-ish sequence?
  10. B

    8x 4090 motherboard recommendations

    Actually building a cluster of 4090's would be incredibly confusing, because the GPUs only have an x16 link which is shared between intra-node comms and inter-node comms. During FSDP each GPU sends 8 * (number of parameters) bytes per iteration; the theoretical upload speed is 32GB/sec so if you...
  11. B

    8x 4090 motherboard recommendations

    I wouldn't be so sure about that. Especially as hardware gets discontinued, the more obscure variants are worth more. I bet the SFF guys would pay top dollar for a dual slot 4090 from a US seller!
  12. B

    8x 4090 motherboard recommendations

    TinyLlama should be available on HF as well (it's internally just a LlamaForCausalLM). I wouldn't bother with the optimized trainers until you prove out that you have your dataset wrangled and your model architecture defined - HF trainer is slow but very reliable and for an exotic domain like...
  13. B

    8x 4090 motherboard recommendations

    How many tokens per second do you get per GPU? As a reference, I get about 8K tokens per second training phi-1.5 using Huggingface's device_map="auto" (so only one GPU runs at a time) on two V100-16GB. 4090s should be about twice as fast. I've never seen a NCCL error give wrong results before -...
  14. B

    8x 4090 motherboard recommendations

    The loss of performance on the SYS-4028 is probably because the host is limited to PCI-e 3.0 speeds, so your gradient reductions are slower. Have you measured scaling going from 4 to 8 cards on your existing hardware? I somewhat suspect that due to improper topology, you will have less than...
  15. B

    8x 4090 motherboard recommendations

    Use case? A lot depends on whether you need x16 to each GPU. Training does, and furthermore benefits from a topology where all 8 GPUs share an x16 uplink to the CPUs via two levels of PCIe switches. That's an expensive setup - you need to put down $3K a piece for the special Chinese dual slot...