Recent content by bayleyw

  1. B

    SXM2 over PCIe

    were the GPUs that were falling off the bus the ones connected to the slot further from the CPU by chance?
  2. B

    Proof that Asrock (and SM, others) are/were gatekeeping Milan on cheaper boards for no reason

    I wonder how stable it is. The 1st gen Epyc systems uniformly lack official Milan support, maybe there is some electrical gremlin that rears its head under heavy I/O load or something.
  3. B

    GPU recommendation for mass text processing, summarizing and data analysis, serving API requests etc

    the 22G 2080 Ti are on aliexpress, they are hard to find on ebay (there is one vendor who claims to be in palo alto on ebay, but he is frequently OOS). I shouldn't be so hard on amd, after all we need competition in the industry :rolleyes: and amd is getting better at supporting compute use...
  4. B

    GPU recommendation for mass text processing, summarizing and data analysis, serving API requests etc

    under $500: 2080 Ti 22G ($430) best deal: used 3090 (about $750) you want a relatively recent nvidia card with tensor cores, ideally ampere or newer but for now turing also works. you also need enough vram to hold the model, so about 5 bits per parameter, plus the KV cache which can vary...
  5. B

    SXM2 over PCIe

    do you get any MCE's in dmesg before the devices drop off the bus? if so definitely telltale of bad PCIe signaling.
  6. B

    SXM2 over PCIe

    ok, thanks. so about 1200 all in for the GPUs and trimmings except the case, or 1900 if you add the waterblocks (are there any cheaper than the Bykski's?)
  7. B

    SXM2 over PCIe

    if you go by 'median trustworthy looking' ebay/taobao pricing it is 750 for the GPUs, 300 for the board not counting shipping, and 200ish for the heatsinks shipped from china (or 300 from the US), 1250 total. also, I am not sure if folks here have actually used an sxm2 based system (i've dealt...
  8. B

    SXM2 over PCIe

    800 for the GPUs, plus AOM-SXMV, risers, heatsinks, fans, and whatever contraption you build to mechanically hold it all together gets pretty close to 4x 2080Ti (the $150 V100s aren't repeatable right now), and I'd definitely pay a bit more to get 88GB instead of 64GB. for 60 bucks more you can...
  9. B

    SXM2 over PCIe

    somewhat off topic but speaking of mixed precision on a budget, check out these beauties (eBay link): for $165 you get 2x16+1x8 out of each socket at 3 slot spacing which lets you build hives of 2x NVLINK plus shared memory communication to a NIC (which is not as good as RDMA but at least you...
  10. B

    SXM2 over PCIe

    the user I was replying to is clearly interested in mixed precision AI work (transformers and stable diffusion)...I did have a lively debate a few posts up with a fp64 user though :rolleyes:
  11. B

    SXM2 over PCIe

    probably better off with 2x 3090 at the same price. quoting myself from earlier: and also, training resnet50 at high batch size is not a good benchmark in 2024 and basically offers no information on what real world performance is like on modern models - the network is (1) fully convolutional...
  12. B

    L40S vs RTX 6000 ADA - for LLMs

    sure, if you have a revenue model which supports finetuning (some do) but many (most?) genAI apps are inference only
  13. B

    L40S vs RTX 6000 ADA - for LLMs

    you don't host your own inference servers not because its not scalable, but because its not elastic. you have to buy sufficient hardware to support your target latency at peak load which means off-peak your *extremely expensive* servers are idling. also from a business standpoint you're putting...
  14. B

    SXM2 over PCIe

    does Volta actually show significant efficiency gains over Pascal? TSMC 12nm was an optimized version of TSMC 16nm, not a shrink, so I would expect 2x P100 at 150W each to outperform 1x V100 at 300W.
  15. B

    SXM2 over PCIe

    you were saying that you wanted Volta because it had tensor cores that you might use in the future, I'm arguing that by the time you get around to using the tensor cores they might not be supported any more. if you don't need tensor cores P100 is a very cost effective way to get fp64, if you...