8x 4090 motherboard recommendations

iron-bound · May 13, 2024

Ring Attention with modified softmax split's the kv cache across devices, it also does a better job than fsdp, as it allows all devices to compute and pass the results.

Striped Attention: Faster Ring Attention for Causal Transformers

To help address the growing demand for ever-longer sequence lengths in transformer models, Liu et al. recently proposed Ring Attention, an exact attention algorithm capable of overcoming per-device memory bottle- necks by distributing self-attention across multiple devices. In this paper, we...

arxiv.org

bayleyw · May 13, 2024

iron-bound said:
Ring Attention with modified softmax split's the kv cache across devices, it also does a better job than fsdp, as it allows all devices to compute and pass the results.
View attachment 36663

Striped Attention: Faster Ring Attention for Causal Transformers

To help address the growing demand for ever-longer sequence lengths in transformer models, Liu et al. recently proposed Ring Attention, an exact attention algorithm capable of overcoming per-device memory bottle- necks by distributing self-attention across multiple devices. In this paper, we...

arxiv.org

RingAttention is for long sequence lengths, I don't think it works well for less than 16K-ish sequence?

iron-bound · May 13, 2024

Depends on your device zero, if you can't run 16k with a batch size, it let's you scale.

bayleyw · May 13, 2024

iron-bound said:
Depends on your device zero, if you can't run 16k with a batch size, it let's you scale.

Right, but if you're only training at 4k context it doesn't help you.

ufear · May 15, 2024

I have a G292-Z20 for sale that takes 8 dual slot GPUs in 2U, threw the box though - can order it from piospartlab - if you want to collect it in NL let me know!

bayleyw · May 15, 2024

ufear said:
I have a G292-Z20 for sale that takes 8 dual slot GPUs in 2U, threw the box though - can order it from piospartlab - if you want to collect it in NL let me know!

Definitely not going to hold 8x 450W 4090's...

iron-bound · May 19, 2024

Four cards fit under the 2200W power supplies and under-volting is also recommended.

kgcdctx · May 19, 2024

ufear said:
I have a G292-Z20 for sale that takes 8 dual slot GPUs in 2U, threw the box though - can order it from piospartlab - if you want to collect it in NL let me know!

I built one of these. It will hold 8 data center GPUs, dual slot, but it won’t 4090.

Search

8x 4090 motherboard recommendations

iron-bound

New Member

Striped Attention: Faster Ring Attention for Causal Transformers

bayleyw

Active Member

Striped Attention: Faster Ring Attention for Causal Transformers

iron-bound

New Member

bayleyw

Active Member

ufear

New Member

bayleyw

Active Member

iron-bound

New Member

kgcdctx

New Member