Search results

  1. B

    Small Form Factor GPU for local LLM and light training

    Should perform slightly worse than a 4060, but with twice the RAM which is great for language modeling.
  2. B

    Tyan S5652 front panel header

    For better or for worse I figured it out, it is Amphenol G846A24211T4EU with mating receptacle G842AH242T4EU. Of course, the receptacle is out of stock everywhere, so the adventure continues...
  3. B

    Tyan S5652 front panel header

    Anyone know what part number the front panel header is on the S5652? Tyan seems to have switched to a higher density connector on the most recent boards...
  4. B

    Small Form Factor GPU for local LLM and light training

    If you're just getting started I would not go SFF. You pay a massive premium over a normal card (the 4090 is something like four times faster than the RTX 4000 Ada...) and performance is crap because you're limited to 70W. If you must use a 70W card for personal reasons, get a RTX 2000 Ada. You...
  5. B

    Findings with Asrock W790-WS, Xeon SPR-SP D0, and BCLK OC (and lack thereof)

    so this could be a memory issue...I have a consumer Alder Lake system that is overclocked via external clockgen and it really does not like POSTing. after a ton of fiddling around it ended up being a memory issue - even with manual timings there was something causing the DRAM training to fail...
  6. B

    8x 4090 motherboard recommendations

    Definitely not going to hold 8x 450W 4090's...
  7. B

    Gigabyte MF51-ES0 for an SSD NAS?

    I feel like lack of ReBAR and bifurcation are not the board's fault, given they never told you these features were supported. As for the 10gbe dropping, once every several days means it's not a thermal problem, since nothing on the board should have a thermal time constant of several days.
  8. B

    8x 4090 motherboard recommendations

    Right, but if you're only training at 4k context it doesn't help you.
  9. B

    8x 4090 motherboard recommendations

    RingAttention is for long sequence lengths, I don't think it works well for less than 16K-ish sequence?
  10. B

    8x 4090 motherboard recommendations

    Actually building a cluster of 4090's would be incredibly confusing, because the GPUs only have an x16 link which is shared between intra-node comms and inter-node comms. During FSDP each GPU sends 8 * (number of parameters) bytes per iteration; the theoretical upload speed is 32GB/sec so if you...
  11. B

    8x 4090 motherboard recommendations

    I wouldn't be so sure about that. Especially as hardware gets discontinued, the more obscure variants are worth more. I bet the SFF guys would pay top dollar for a dual slot 4090 from a US seller!
  12. B

    8x 4090 motherboard recommendations

    TinyLlama should be available on HF as well (it's internally just a LlamaForCausalLM). I wouldn't bother with the optimized trainers until you prove out that you have your dataset wrangled and your model architecture defined - HF trainer is slow but very reliable and for an exotic domain like...
  13. B

    8x 4090 motherboard recommendations

    How many tokens per second do you get per GPU? As a reference, I get about 8K tokens per second training phi-1.5 using Huggingface's device_map="auto" (so only one GPU runs at a time) on two V100-16GB. 4090s should be about twice as fast. I've never seen a NCCL error give wrong results before -...
  14. B

    8x 4090 motherboard recommendations

    The loss of performance on the SYS-4028 is probably because the host is limited to PCI-e 3.0 speeds, so your gradient reductions are slower. Have you measured scaling going from 4 to 8 cards on your existing hardware? I somewhat suspect that due to improper topology, you will have less than...
  15. B

    8x 4090 motherboard recommendations

    Use case? A lot depends on whether you need x16 to each GPU. Training does, and furthermore benefits from a topology where all 8 GPUs share an x16 uplink to the CPUs via two levels of PCIe switches. That's an expensive setup - you need to put down $3K a piece for the special Chinese dual slot...
  16. B

    New Homelab build, should I go 14700k or 14900k?

    heh, that's a big change from "the ram usage will be pretty low" a few posts up. if your requirements are 512GB RAM and 16 lanes of nvme then yes, you are forced onto an enterprise platform. insofar that you accept performance is bad anyway compared to the desktop platforms, Skylake-SP is not a...
  17. B

    New Homelab build, should I go 14700k or 14900k?

    oof, did you buy the Epyc already? The 7313 is a full 2GHz and one major generation (~15% IPC deficit) behind the 7950X in single core performance and ~1 GHz behind in multicore. It's not a good use of money unless you are looking for RDIMM and lots of PCI-e lanes - the target use case is high...
  18. B

    New Homelab build, should I go 14700k or 14900k?

    7950X. Much more efficient than a 14900K, no E-cores to cause you scheduler trouble, and no ISA disadvantage since the consumer Intel parts do not support AVX-512. But, we need more details. Is your Jupyter work GPU accelerated? How much RAM do your databases need? The consumer parts only...
  19. B

    SXM2 over PCIe

    Titan V/Quadro GV100 are the last fp64-capable cards with display outputs so they have some value for scientific simulations, especially if you're a researcher running commercial software that doesn't like living on the cloud. For language modeling, in rough order of viability: 3090/3090 Ti...
  20. B

    Learning self hosted AI/machine learning, budget server build questions

    Language models can be partitioned across multiple GPUs *with the caveat* that only one GPU is active at any one time. This is a huge caveat, because for regular mortals (and even minor startups) this caps your memory bandwidth at about 1 Tbyte/sec and therefore puts an upper limit on your token...