anyone here have all 8 memory channels filled would you be able to run llama-bench, my qquad channel results are below i only got like 12-13 t/s with dual channel so i think memory channels effect the performance a lot
32 threads with half cores disabled
'/Documents/llama.cpp/build/bin/llama-bench' --numa distribute -t 32 -m '/Downloads/llama-2-7b.Q4_0.gguf' -p 0 -n 128,256,512
| model | size | params | backend | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 32 | tg128 | 21.92 ± 0.83 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 32 | tg256 | 22.97 ± 0.13 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 32 | tg512 | 22.47 ± 0.02 |
104 threads with all cores enabled
'/Documents/llama.cpp/build/bin/llama-bench' --numa distribute -t 104 -m '/Downloads/llama-2-7b.Q4_0.gguf' -p 0 -n 128,256,512
| model | size | params | backend | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 104 | tg128 | 24.22 ± 0.05 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 104 | tg256 | 23.95 ± 0.03 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 104 | tg512 | 23.33 ± 0.01 |
32 threads with half cores disabled
'/Documents/llama.cpp/build/bin/llama-bench' --numa distribute -t 32 -m '/Downloads/llama-2-7b.Q4_0.gguf' -p 0 -n 128,256,512
| model | size | params | backend | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 32 | tg128 | 21.92 ± 0.83 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 32 | tg256 | 22.97 ± 0.13 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 32 | tg512 | 22.47 ± 0.02 |
104 threads with all cores enabled
'/Documents/llama.cpp/build/bin/llama-bench' --numa distribute -t 104 -m '/Downloads/llama-2-7b.Q4_0.gguf' -p 0 -n 128,256,512
| model | size | params | backend | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 104 | tg128 | 24.22 ± 0.05 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 104 | tg256 | 23.95 ± 0.03 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 104 | tg512 | 23.33 ± 0.01 |