GPU Memory Bandwidth Benchmark

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

CyklonDX

Well-Known Member
Nov 8, 2022
1,554
523
113
Who is Ian? Are you referring to me? Because I have all those cards and my name is Ian. I am “Ian&Steve” on Einstein.
Yes

V100 - 2500/3 = 833s per task
I presume if you enable ECC and retest, you will get slower, and closer to Titan V
(wouldn't 3080Ti be faster than Titan V? If on cuda app its sitting more closely to mem bandwidth?)


//(I would still recommend trying to do tensors I presume it will give you greater performance uplift if you can pass through validation - tensors as i recall can also speed up memory "TMA" tho not sure if it works with anything but hopper.)

and you can try/see if this one works on ampere/volta

this part of code doesn't state you need hopper *(should work on ampere) - and I think this is what you want

(samples)
1740493727792.png

and a vid about it
 
Last edited:

gsrcrxsi

Active Member
Dec 12, 2018
423
144
43
I presume if you enable ECC and retest, you will get slower, and closer to Titan V
(wouldn't 3080Ti be faster than Titan V? If on cuda app its sitting more closely to mem bandwidth?)
ECC enabled/disabled doesnt meaningfully change computation speed with Einstein that I've seen. but does seem to use more power if it is turned on. based on my testing with the A100-Drive system.

yes, a 3080Ti is faster than a Titan V for Einstein. but it uses like 2x the power, the Titan V is much more efficient, which is why I use it (and V100s) instead. the 3080Ti is a little faster than a V100 (~10%) for Einstein when all settings and app are the same.


//(I would still recommend trying to do tensors I presume it will give you greater performance uplift if you can pass through validation - tensors as i recall can also speed up memory "TMA" tho not sure if it works with anything but hopper.)
I'm not sure how much of the computation is matrix multiplication, not sure that Tensors can be used that much. I would assume that the compiler and/or scheduler would recognize any matching computations and send them to the tensors wouldnt it? tensors on the Titan V and V100 dont have as much flexibility as on Ampere and up.

I'll take a look at the other resources you linked to
 

CyklonDX

Well-Known Member
Nov 8, 2022
1,554
523
113
tensors on the Titan V and V100 dont have as much flexibility as on Ampere and up.
Looks like no benefits on Volta except on FP16 using Tensor Cores, only ampere and up actually do show good gains

*(but i would note that 4090/5090 do have locked tensor performance likely to stop them from cutting into their other tesla/quadro products)


scheduler would recognize any matching computations and send them to the tensors wouldnt it?
Its not enabled by default, you would need to call upon it - and enable mixed precision.
TF32 isn't really FP32, as i understand it there are few different approaches of using tensor cores and most common GEMM is just allocating 2 fp16 into single fp32 flop accumulation model. Thus it might be impossible to get proper precision level that is acceptable by validator.

I'm not sure how it really works in code, nvidia supplied at least couple different things with tensors producing different numbers, sparsity, accumulations and so on; so each might have their own use/caveats.
 

name stolen

Active Member
Feb 20, 2018
115
35
28
Apple M2 Mac mini, base with 16GB

Code:
[0] Platform name: Apple vendor:Apple version:OpenCL 1.2 (Dec 13 2024 23:08:33) <-- selected
    [0] Device Name: Apple M2
           Type: 2 (GPU: 4, CPU: 2)
           Vendor: Intel
           Max Compute Units: 8
           Global Memory: 17179869184
           Max Clock Frequency: 2400
           Max Alloc. Memory: 4294967296
           Local Memory: 32768
           Available: 1
    [1] Device Name: Apple M2 <-- selected
           Type: 4 (GPU: 4, CPU: 2)
           Vendor: Apple
           Max Compute Units: 10
           Global Memory: 11453251584
           Max Clock Frequency: 1000
           Max Alloc. Memory: 2147483648
           Local Memory: 32768
           Available: 1
allocated 11408506880 bytes 10.62 GB
Running Bench test on:Apple M2
Chunk:   0 (   0- 128)MB Speed: 91.23 GByte/s OK
Chunk:   1 ( 128- 256)MB Speed: 91.25 GByte/s OK
Chunk:   2 ( 256- 384)MB Speed: 90.50 GByte/s OK
Chunk:   3 ( 384- 512)MB Speed: 91.43 GByte/s OK
Chunk:   4 ( 512- 640)MB Speed: 91.32 GByte/s OK
Chunk:   5 ( 640- 768)MB Speed: 90.70 GByte/s OK
Chunk:   6 ( 768- 896)MB Speed: 90.85 GByte/s OK
Chunk:   7 ( 896-1024)MB Speed: 90.94 GByte/s OK
Chunk:   8 (1024-1152)MB Speed: 90.67 GByte/s OK
Chunk:   9 (1152-1280)MB Speed: 91.31 GByte/s OK
Chunk:  10 (1280-1408)MB Speed: 90.17 GByte/s OK
Chunk:  11 (1408-1536)MB Speed: 90.93 GByte/s OK
Chunk:  12 (1536-1664)MB Speed: 90.18 GByte/s OK
Chunk:  13 (1664-1792)MB Speed: 91.25 GByte/s OK
Chunk:  14 (1792-1920)MB Speed: 90.70 GByte/s OK
Chunk:  15 (1920-2048)MB Speed: 90.54 GByte/s OK
Chunk:  16 (2048-2176)MB Speed: 90.29 GByte/s OK
Chunk:  17 (2176-2304)MB Speed: 90.57 GByte/s OK
Chunk:  18 (2304-2432)MB Speed: 90.56 GByte/s OK
Chunk:  19 (2432-2560)MB Speed: 90.55 GByte/s OK
Chunk:  20 (2560-2688)MB Speed: 90.59 GByte/s OK
Chunk:  21 (2688-2816)MB Speed: 91.19 GByte/s OK
Chunk:  22 (2816-2944)MB Speed: 90.99 GByte/s OK
Chunk:  23 (2944-3072)MB Speed: 90.72 GByte/s OK
Chunk:  24 (3072-3200)MB Speed: 91.50 GByte/s OK
Chunk:  25 (3200-3328)MB Speed: 91.36 GByte/s OK
Chunk:  26 (3328-3456)MB Speed: 90.44 GByte/s OK
Chunk:  27 (3456-3584)MB Speed: 90.81 GByte/s OK
Chunk:  28 (3584-3712)MB Speed: 90.46 GByte/s OK
Chunk:  29 (3712-3840)MB Speed: 90.73 GByte/s OK
Chunk:  30 (3840-3968)MB Speed: 91.35 GByte/s OK
Chunk:  31 (3968-4096)MB Speed: 91.36 GByte/s OK
Chunk:  32 (4096-4224)MB Speed: 91.07 GByte/s OK
Chunk:  33 (4224-4352)MB Speed: 90.79 GByte/s OK
Chunk:  34 (4352-4480)MB Speed: 90.98 GByte/s OK
Chunk:  35 (4480-4608)MB Speed: 91.12 GByte/s OK
Chunk:  36 (4608-4736)MB Speed: 90.96 GByte/s OK
Chunk:  37 (4736-4864)MB Speed: 90.78 GByte/s OK
Chunk:  38 (4864-4992)MB Speed: 90.73 GByte/s OK
Chunk:  39 (4992-5120)MB Speed: 90.79 GByte/s OK
Chunk:  40 (5120-5248)MB Speed: 90.58 GByte/s OK
Chunk:  41 (5248-5376)MB Speed: 90.88 GByte/s OK
Chunk:  42 (5376-5504)MB Speed: 90.96 GByte/s OK
Chunk:  43 (5504-5632)MB Speed: 90.60 GByte/s OK
Chunk:  44 (5632-5760)MB Speed: 91.09 GByte/s OK
Chunk:  45 (5760-5888)MB Speed: 90.86 GByte/s OK
Chunk:  46 (5888-6016)MB Speed: 91.28 GByte/s OK
Chunk:  47 (6016-6144)MB Speed: 91.24 GByte/s OK
Chunk:  48 (6144-6272)MB Speed: 91.26 GByte/s OK
Chunk:  49 (6272-6400)MB Speed: 90.24 GByte/s OK
Chunk:  50 (6400-6528)MB Speed: 91.17 GByte/s OK
Chunk:  51 (6528-6656)MB Speed: 91.16 GByte/s OK
Chunk:  52 (6656-6784)MB Speed: 90.62 GByte/s OK
Chunk:  53 (6784-6912)MB Speed: 90.33 GByte/s OK
Chunk:  54 (6912-7040)MB Speed: 91.45 GByte/s OK
Chunk:  55 (7040-7168)MB Speed: 91.20 GByte/s OK
Chunk:  56 (7168-7296)MB Speed: 91.58 GByte/s OK
Chunk:  57 (7296-7424)MB Speed: 91.27 GByte/s OK
Chunk:  58 (7424-7552)MB Speed: 90.78 GByte/s OK
Chunk:  59 (7552-7680)MB Speed: 91.01 GByte/s OK
Chunk:  60 (7680-7808)MB Speed: 90.46 GByte/s OK
Chunk:  61 (7808-7936)MB Speed: 91.08 GByte/s OK
Chunk:  62 (7936-8064)MB Speed: 90.83 GByte/s OK
Chunk:  63 (8064-8192)MB Speed: 91.03 GByte/s OK
Chunk:  64 (8192-8320)MB Speed: 91.36 GByte/s OK
Chunk:  65 (8320-8448)MB Speed: 90.72 GByte/s OK
Chunk:  66 (8448-8576)MB Speed: 91.07 GByte/s OK
Chunk:  67 (8576-8704)MB Speed: 90.58 GByte/s OK
Chunk:  68 (8704-8832)MB Speed: 90.07 GByte/s OK
Chunk:  69 (8832-8960)MB Speed: 91.15 GByte/s OK
Chunk:  70 (8960-9088)MB Speed: 89.66 GByte/s OK
Chunk:  71 (9088-9216)MB Speed: 89.95 GByte/s OK
Chunk:  72 (9216-9344)MB Speed: 90.05 GByte/s OK
Chunk:  73 (9344-9472)MB Speed: 90.31 GByte/s OK
Chunk:  74 (9472-9600)MB Speed: 90.86 GByte/s OK
Chunk:  75 (9600-9728)MB Speed: 90.48 GByte/s OK
Chunk:  76 (9728-9856)MB Speed: 90.34 GByte/s OK
Chunk:  77 (9856-9984)MB Speed: 90.25 GByte/s OK
Chunk:  78 (9984-10112)MB Speed: 89.96 GByte/s OK
Chunk:  79 (10112-10240)MB Speed: 89.74 GByte/s OK
Chunk:  80 (10240-10368)MB Speed: 89.80 GByte/s OK
Chunk:  81 (10368-10496)MB Speed: 89.65 GByte/s OK
Chunk:  82 (10496-10624)MB Speed: 89.86 GByte/s OK
Chunk:  83 (10624-10752)MB Speed: 90.11 GByte/s OK
Chunk:  84 (10752-10880)MB Speed: 90.04 GByte/s OK
A run on the "Intel" device (-p 0 -d 0) produced ~35GB/s. The 90GB/s+ above is a run on the "Apple" device. Not sure if this is referring to APIs or emulation, guessing APIs? Again it's an M2 Mac mini - definitely no Intel device onboard.
 
  • Like
Reactions: CyklonDX

CyklonDX

Well-Known Member
Nov 8, 2022
1,554
523
113
I can't say for sure since i'm unfamiliar with apple arch but likely the case of of using dedicated memory and shared system memory.
(mac's arm cpu might be detected as opencl device and thus dev 0 could be just your cpu)
 

splifingate

Member
Oct 7, 2023
63
54
18
Apple M2 Mac mini, base with 16GB

Code:
[0] Platform name: Apple vendor:Apple version:OpenCL 1.2 (Dec 13 2024 23:08:33) <-- selected
    [0] Device Name: Apple M2
           Type: 2 (GPU: 4, CPU: 2)
           Vendor: Intel
           Max Compute Units: 8
           Global Memory: 17179869184
           Max Clock Frequency: 2400
           Max Alloc. Memory: 4294967296
           Local Memory: 32768
           Available: 1
    [1] Device Name: Apple M2 <-- selected
           Type: 4 (GPU: 4, CPU: 2)
           Vendor: Apple
           Max Compute Units: 10
           Global Memory: 11453251584
           Max Clock Frequency: 1000
           Max Alloc. Memory: 2147483648
           Local Memory: 32768
           Available: 1
allocated 11408506880 bytes 10.62 GB
Running Bench test on:Apple M2
Chunk:   0 (   0- 128)MB Speed: 91.23 GByte/s OK
Chunk:   1 ( 128- 256)MB Speed: 91.25 GByte/s OK
Chunk:   2 ( 256- 384)MB Speed: 90.50 GByte/s OK

------8<

Chunk:  84 (10752-10880)MB Speed: 90.04 GByte/s OK
A run on the "Intel" device (-p 0 -d 0) produced ~35GB/s. The 90GB/s+ above is a run on the "Apple" device. Not sure if this is referring to APIs or emulation, guessing APIs? Again it's an M2 Mac mini - definitely no Intel device onboard.
[using a M2 Max Studio 64GB]

Default poclmembench:

Bash:
splifingate@splifingate-Mac-Studio ~ % /Users/Shared/poclmembench ; exit;
[0] Platform name: Apple vendor:Apple version:OpenCL 1.2 (Mar  7 2025 21:03:19) <-- selected
    [0] Device Name: Apple M2 Max <-- selected
           Type: 2 (GPU: 4, CPU: 2)
           Vendor: Intel
           Max Compute Units: 12
           Global Memory: 68719476736
           Max Clock Frequency: 2400
           Max Alloc. Memory: 17179869184
           Local Memory: 32768
           Available: 1
    [1] Device Name: Apple M2 Max
           Type: 4 (GPU: 4, CPU: 2)
           Vendor: Apple
           Max Compute Units: 30
           Global Memory: 51539607552
           Max Clock Frequency: 1000
           Max Alloc. Memory: 9663676416
           Local Memory: 32768
           Available: 1
allocated 68719476736 bytes 64.00 GB
Running Bench test on:Apple M2 Max
Chunk:   0 (   0- 128)MB Speed: 95.57 GByte/s OK
Chunk:   1 ( 128- 256)MB Speed: 87.41 GByte/s OK
Chunk:   2 ( 256- 384)MB Speed: 79.97 GByte/s OK
Chunk:   3 ( 384- 512)MB Speed: 76.44 GByte/s OK
Chunk:   4 ( 512- 640)MB Speed: 72.13 GByte/s OK
Chunk:   5 ( 640- 768)MB Speed: 75.49 GByte/s OK
Try:

% poclmembench -d1

Bash:
splifingate@splifingate-Mac-Studio ~ % /Users/Shared/poclmembench -d1
[0] Platform name: Apple vendor:Apple version:OpenCL 1.2 (Mar  7 2025 21:03:19) <-- selected
    [0] Device Name: Apple M2 Max
           Type: 2 (GPU: 4, CPU: 2)
           Vendor: Intel
           Max Compute Units: 12
           Global Memory: 68719476736
           Max Clock Frequency: 2400
           Max Alloc. Memory: 17179869184
           Local Memory: 32768
           Available: 1
    [1] Device Name: Apple M2 Max <-- selected
           Type: 4 (GPU: 4, CPU: 2)
           Vendor: Apple
           Max Compute Units: 30
           Global Memory: 51539607552
           Max Clock Frequency: 1000
           Max Alloc. Memory: 9663676416
           Local Memory: 32768
           Available: 1
allocated 51539607552 bytes 48.00 GB
Running Bench test on:Apple M2 Max
Chunk:   0 (   0- 128)MB Speed: 439.17 GByte/s OK
Chunk:   1 ( 128- 256)MB Speed: 450.21 GByte/s OK
Chunk:   2 ( 256- 384)MB Speed: 441.55 GByte/s OK
Chunk:   3 ( 384- 512)MB Speed: 446.26 GByte/s OK
Chunk:   4 ( 512- 640)MB Speed: 445.17 GByte/s OK
Chunk:   5 ( 640- 768)MB Speed: 447.11 GByte/s OK
Not bad for a c.2017 script ;)
 
  • Like
Reactions: CyklonDX

p1415

New Member
Dec 30, 2021
1
1
3
*RTX 3090*. Note, does not exhibit the "slow memory at end" like RTX 3080 Ti or RTX 4070 as explained at the beginning of the thread.


Code:
Running Bench test on:NVIDIA GeForce RTX 3090
Chunk:   0 (   0- 128)MB Speed: 825.95 GByte/s OK
Chunk:   1 ( 128- 256)MB Speed: 825.79 GByte/s OK
Chunk:   2 ( 256- 384)MB Speed: 824.05 GByte/s OK
Chunk:   3 ( 384- 512)MB Speed: 823.72 GByte/s OK
Chunk:   4 ( 512- 640)MB Speed: 824.92 GByte/s OK
Chunk:   5 ( 640- 768)MB Speed: 825.74 GByte/s OK
Chunk:   6 ( 768- 896)MB Speed: 827.38 GByte/s OK
Chunk:   7 ( 896-1024)MB Speed: 819.30 GByte/s OK
Chunk:   8 (1024-1152)MB Speed: 822.69 GByte/s OK
Chunk:   9 (1152-1280)MB Speed: 818.28 GByte/s OK
Chunk:  10 (1280-1408)MB Speed: 819.08 GByte/s OK
Chunk:  11 (1408-1536)MB Speed: 824.10 GByte/s OK
Chunk:  12 (1536-1664)MB Speed: 821.83 GByte/s OK
Chunk:  13 (1664-1792)MB Speed: 822.21 GByte/s OK
Chunk:  14 (1792-1920)MB Speed: 823.78 GByte/s OK
Chunk:  15 (1920-2048)MB Speed: 823.02 GByte/s OK
Chunk:  16 (2048-2176)MB Speed: 820.21 GByte/s OK
Chunk:  17 (2176-2304)MB Speed: 822.10 GByte/s OK
Chunk:  18 (2304-2432)MB Speed: 823.02 GByte/s OK
Chunk:  19 (2432-2560)MB Speed: 826.45 GByte/s OK
Chunk:  20 (2560-2688)MB Speed: 822.96 GByte/s OK
Chunk:  21 (2688-2816)MB Speed: 822.04 GByte/s OK
Chunk:  22 (2816-2944)MB Speed: 821.88 GByte/s OK
Chunk:  23 (2944-3072)MB Speed: 822.26 GByte/s OK
Chunk:  24 (3072-3200)MB Speed: 824.65 GByte/s OK
Chunk:  25 (3200-3328)MB Speed: 820.59 GByte/s OK
Chunk:  26 (3328-3456)MB Speed: 819.73 GByte/s OK
Chunk:  27 (3456-3584)MB Speed: 820.96 GByte/s OK
Chunk:  28 (3584-3712)MB Speed: 819.13 GByte/s OK
Chunk:  29 (3712-3840)MB Speed: 820.91 GByte/s OK
Chunk:  30 (3840-3968)MB Speed: 825.85 GByte/s OK
Chunk:  31 (3968-4096)MB Speed: 826.45 GByte/s OK
Chunk:  32 (4096-4224)MB Speed: 822.58 GByte/s OK
Chunk:  33 (4224-4352)MB Speed: 822.31 GByte/s OK
Chunk:  34 (4352-4480)MB Speed: 824.32 GByte/s OK
Chunk:  35 (4480-4608)MB Speed: 821.83 GByte/s OK
Chunk:  36 (4608-4736)MB Speed: 823.34 GByte/s OK
Chunk:  37 (4736-4864)MB Speed: 827.16 GByte/s OK
Chunk:  38 (4864-4992)MB Speed: 828.03 GByte/s OK
Chunk:  39 (4992-5120)MB Speed: 818.12 GByte/s OK
Chunk:  40 (5120-5248)MB Speed: 826.56 GByte/s OK
Chunk:  41 (5248-5376)MB Speed: 826.34 GByte/s OK
Chunk:  42 (5376-5504)MB Speed: 820.37 GByte/s OK
Chunk:  43 (5504-5632)MB Speed: 818.22 GByte/s OK
Chunk:  44 (5632-5760)MB Speed: 820.75 GByte/s OK
Chunk:  45 (5760-5888)MB Speed: 819.62 GByte/s OK
Chunk:  46 (5888-6016)MB Speed: 826.94 GByte/s OK
Chunk:  47 (6016-6144)MB Speed: 823.45 GByte/s OK
Chunk:  48 (6144-6272)MB Speed: 822.64 GByte/s OK
Chunk:  49 (6272-6400)MB Speed: 822.48 GByte/s OK
Chunk:  50 (6400-6528)MB Speed: 820.75 GByte/s OK
Chunk:  51 (6528-6656)MB Speed: 819.56 GByte/s OK
Chunk:  52 (6656-6784)MB Speed: 822.37 GByte/s OK
Chunk:  53 (6784-6912)MB Speed: 821.61 GByte/s OK
Chunk:  54 (6912-7040)MB Speed: 820.69 GByte/s OK
Chunk:  55 (7040-7168)MB Speed: 819.67 GByte/s OK
Chunk:  56 (7168-7296)MB Speed: 822.75 GByte/s OK
Chunk:  57 (7296-7424)MB Speed: 823.29 GByte/s OK
Chunk:  58 (7424-7552)MB Speed: 823.61 GByte/s OK
Chunk:  59 (7552-7680)MB Speed: 819.89 GByte/s OK
Chunk:  60 (7680-7808)MB Speed: 824.86 GByte/s OK
Chunk:  61 (7808-7936)MB Speed: 821.56 GByte/s OK
Chunk:  62 (7936-8064)MB Speed: 823.51 GByte/s OK
Chunk:  63 (8064-8192)MB Speed: 822.31 GByte/s OK
Chunk:  64 (8192-8320)MB Speed: 820.80 GByte/s OK
Chunk:  65 (8320-8448)MB Speed: 820.26 GByte/s OK
Chunk:  66 (8448-8576)MB Speed: 819.35 GByte/s OK
Chunk:  67 (8576-8704)MB Speed: 825.95 GByte/s OK
Chunk:  68 (8704-8832)MB Speed: 824.21 GByte/s OK
Chunk:  69 (8832-8960)MB Speed: 823.61 GByte/s OK
Chunk:  70 (8960-9088)MB Speed: 819.89 GByte/s OK
Chunk:  71 (9088-9216)MB Speed: 823.83 GByte/s OK
Chunk:  72 (9216-9344)MB Speed: 824.81 GByte/s OK
Chunk:  73 (9344-9472)MB Speed: 822.31 GByte/s OK
Chunk:  74 (9472-9600)MB Speed: 824.54 GByte/s OK
Chunk:  75 (9600-9728)MB Speed: 823.78 GByte/s OK
Chunk:  76 (9728-9856)MB Speed: 822.96 GByte/s OK
Chunk:  77 (9856-9984)MB Speed: 823.61 GByte/s OK
Chunk:  78 (9984-10112)MB Speed: 825.36 GByte/s OK
Chunk:  79 (10112-10240)MB Speed: 824.97 GByte/s OK
Chunk:  80 (10240-10368)MB Speed: 817.74 GByte/s OK
Chunk:  81 (10368-10496)MB Speed: 819.62 GByte/s OK
Chunk:  82 (10496-10624)MB Speed: 820.53 GByte/s OK
Chunk:  83 (10624-10752)MB Speed: 822.15 GByte/s OK
Chunk:  84 (10752-10880)MB Speed: 824.10 GByte/s OK
Chunk:  85 (10880-11008)MB Speed: 819.94 GByte/s OK
Chunk:  86 (11008-11136)MB Speed: 817.80 GByte/s OK
Chunk:  87 (11136-11264)MB Speed: 821.56 GByte/s OK
Chunk:  88 (11264-11392)MB Speed: 819.40 GByte/s OK
Chunk:  89 (11392-11520)MB Speed: 822.48 GByte/s OK
Chunk:  90 (11520-11648)MB Speed: 823.13 GByte/s OK
Chunk:  91 (11648-11776)MB Speed: 820.69 GByte/s OK
Chunk:  92 (11776-11904)MB Speed: 819.83 GByte/s OK
Chunk:  93 (11904-12032)MB Speed: 817.21 GByte/s OK
Chunk:  94 (12032-12160)MB Speed: 822.31 GByte/s OK
Chunk:  95 (12160-12288)MB Speed: 819.94 GByte/s OK
Chunk:  96 (12288-12416)MB Speed: 821.13 GByte/s OK
Chunk:  97 (12416-12544)MB Speed: 817.80 GByte/s OK
Chunk:  98 (12544-12672)MB Speed: 825.08 GByte/s OK
Chunk:  99 (12672-12800)MB Speed: 816.03 GByte/s OK
Chunk: 100 (12800-12928)MB Speed: 818.44 GByte/s OK
Chunk: 101 (12928-13056)MB Speed: 822.64 GByte/s OK
Chunk: 102 (13056-13184)MB Speed: 819.83 GByte/s OK
Chunk: 103 (13184-13312)MB Speed: 821.88 GByte/s OK
Chunk: 104 (13312-13440)MB Speed: 821.94 GByte/s OK
Chunk: 105 (13440-13568)MB Speed: 816.99 GByte/s OK
Chunk: 106 (13568-13696)MB Speed: 814.86 GByte/s OK
Chunk: 107 (13696-13824)MB Speed: 820.05 GByte/s OK
Chunk: 108 (13824-13952)MB Speed: 821.45 GByte/s OK
Chunk: 109 (13952-14080)MB Speed: 816.78 GByte/s OK
Chunk: 110 (14080-14208)MB Speed: 820.53 GByte/s OK
Chunk: 111 (14208-14336)MB Speed: 818.28 GByte/s OK
Chunk: 112 (14336-14464)MB Speed: 817.96 GByte/s OK
Chunk: 113 (14464-14592)MB Speed: 813.86 GByte/s OK
Chunk: 114 (14592-14720)MB Speed: 819.83 GByte/s OK
Chunk: 115 (14720-14848)MB Speed: 819.40 GByte/s OK
Chunk: 116 (14848-14976)MB Speed: 815.24 GByte/s OK
Chunk: 117 (14976-15104)MB Speed: 822.58 GByte/s OK
Chunk: 118 (15104-15232)MB Speed: 823.99 GByte/s OK
Chunk: 119 (15232-15360)MB Speed: 822.75 GByte/s OK
Chunk: 120 (15360-15488)MB Speed: 823.02 GByte/s OK
Chunk: 121 (15488-15616)MB Speed: 820.16 GByte/s OK
Chunk: 122 (15616-15744)MB Speed: 819.03 GByte/s OK
Chunk: 123 (15744-15872)MB Speed: 821.72 GByte/s OK
Chunk: 124 (15872-16000)MB Speed: 819.62 GByte/s OK
Chunk: 125 (16000-16128)MB Speed: 822.75 GByte/s OK
Chunk: 126 (16128-16256)MB Speed: 816.19 GByte/s OK
Chunk: 127 (16256-16384)MB Speed: 818.22 GByte/s OK
Chunk: 128 (16384-16512)MB Speed: 820.59 GByte/s OK
Chunk: 129 (16512-16640)MB Speed: 819.56 GByte/s OK
Chunk: 130 (16640-16768)MB Speed: 818.22 GByte/s OK
Chunk: 131 (16768-16896)MB Speed: 823.02 GByte/s OK
Chunk: 132 (16896-17024)MB Speed: 820.86 GByte/s OK
Chunk: 133 (17024-17152)MB Speed: 820.53 GByte/s OK
Chunk: 134 (17152-17280)MB Speed: 822.21 GByte/s OK
Chunk: 135 (17280-17408)MB Speed: 820.53 GByte/s OK
Chunk: 136 (17408-17536)MB Speed: 818.60 GByte/s OK
Chunk: 137 (17536-17664)MB Speed: 821.67 GByte/s OK
Chunk: 138 (17664-17792)MB Speed: 823.67 GByte/s OK
Chunk: 139 (17792-17920)MB Speed: 817.63 GByte/s OK
Chunk: 140 (17920-18048)MB Speed: 815.71 GByte/s OK
Chunk: 141 (18048-18176)MB Speed: 822.64 GByte/s OK
Chunk: 142 (18176-18304)MB Speed: 821.94 GByte/s OK
Chunk: 143 (18304-18432)MB Speed: 824.16 GByte/s OK
Chunk: 144 (18432-18560)MB Speed: 820.16 GByte/s OK
Chunk: 145 (18560-18688)MB Speed: 819.08 GByte/s OK
Chunk: 146 (18688-18816)MB Speed: 820.75 GByte/s OK
Chunk: 147 (18816-18944)MB Speed: 822.42 GByte/s OK
Chunk: 148 (18944-19072)MB Speed: 821.94 GByte/s OK
Chunk: 149 (19072-19200)MB Speed: 822.75 GByte/s OK
Chunk: 150 (19200-19328)MB Speed: 822.80 GByte/s OK
Chunk: 151 (19328-19456)MB Speed: 820.75 GByte/s OK
Chunk: 152 (19456-19584)MB Speed: 823.18 GByte/s OK
Chunk: 153 (19584-19712)MB Speed: 823.18 GByte/s OK
Chunk: 154 (19712-19840)MB Speed: 822.91 GByte/s OK
Chunk: 155 (19840-19968)MB Speed: 821.29 GByte/s OK
Chunk: 156 (19968-20096)MB Speed: 818.65 GByte/s OK
Chunk: 157 (20096-20224)MB Speed: 823.23 GByte/s OK
Chunk: 158 (20224-20352)MB Speed: 819.03 GByte/s OK
Chunk: 159 (20352-20480)MB Speed: 821.99 GByte/s OK
Chunk: 160 (20480-20608)MB Speed: 820.26 GByte/s OK
Chunk: 161 (20608-20736)MB Speed: 725.27 GByte/s OK
Chunk: 162 (20736-20864)MB Speed: 820.64 GByte/s OK
Chunk: 163 (20864-20992)MB Speed: 821.45 GByte/s OK
Chunk: 164 (20992-21120)MB Speed: 824.38 GByte/s OK
Chunk: 165 (21120-21248)MB Speed: 815.02 GByte/s OK
Chunk: 166 (21248-21376)MB Speed: 824.43 GByte/s OK
Chunk: 167 (21376-21504)MB Speed: 819.83 GByte/s OK
Chunk: 168 (21504-21632)MB Speed: 818.92 GByte/s OK
Chunk: 169 (21632-21760)MB Speed: 823.72 GByte/s OK
Chunk: 170 (21760-21888)MB Speed: 823.99 GByte/s OK
Chunk: 171 (21888-22016)MB Speed: 819.35 GByte/s OK
Chunk: 172 (22016-22144)MB Speed: 822.15 GByte/s OK
Chunk: 173 (22144-22272)MB Speed: 825.63 GByte/s OK
Chunk: 174 (22272-22400)MB Speed: 823.52 GByte/s OK
Chunk: 175 (22400-22528)MB Speed: 817.96 GByte/s OK
Chunk: 176 (22528-22656)MB Speed: 819.08 GByte/s OK
Chunk: 177 (22656-22784)MB Speed: 815.45 GByte/s OK
Chunk: 178 (22784-22912)MB Speed: 822.10 GByte/s OK
Chunk: 179 (22912-23040)MB Speed: 824.05 GByte/s OK
Chunk: 180 (23040-23168)MB Speed: 817.69 GByte/s OK
Chunk: 181 (23168-23296)MB Speed: 821.45 GByte/s OK
Chunk: 182 (23296-23424)MB Speed: 824.59 GByte/s OK
Chunk: 183 (23424-23552)MB Speed: 822.31 GByte/s OK
Chunk: 184 (23552-23680)MB Speed: 826.66 GByte/s OK
Chunk: 185 (23680-23808)MB Speed: 810.11 GByte/s OK
Chunk: 186 (23808-23936)MB Speed: inf GByte/s OK
Chunk: 187 (23936-24064)MB Speed: inf GByte/s OK
 
  • Like
Reactions: CyklonDX

Syntax

New Member
Apr 25, 2025
1
1
3
Bolivar, Pennsylvania
Hi, I would like to invite anyone on both windows and linux to run memory benchmark tool and share their results.

Regards
./poclmembench -p 0 -d 0
[0] Platform name: NVIDIA CUDA vendor:NVIDIA Corporation version:OpenCL 3.0 CUDA 12.7.33 <-- selected
[0] Device Name: NVIDIA GeForce RTX 3090 <-- selected
Type: 4 (GPU: 4, CPU: 2)
Vendor: NVIDIA Corporation
Max Compute Units: 82
Global Memory: 25299976192
Max Clock Frequency: 1800
Max Alloc. Memory: 6324994048
Local Memory: 49152
Available: 1
allocated 25232932864 bytes 23.50 GB
Running Bench test on:NVIDIA GeForce RTX 3090
Chunk: 0 ( 0- 128)MB Speed: 803.46 GByte/s OK
Chunk: 1 ( 128- 256)MB Speed: 800.59 GByte/s OK
Chunk: 2 ( 256- 384)MB Speed: 807.96 GByte/s OK
Chunk: 3 ( 384- 512)MB Speed: 804.60 GByte/s OK
Chunk: 4 ( 512- 640)MB Speed: 796.58 GByte/s OK
Chunk: 5 ( 640- 768)MB Speed: 798.68 GByte/s OK
Chunk: 6 ( 768- 896)MB Speed: 811.13 GByte/s OK
Chunk: 7 ( 896-1024)MB Speed: 799.04 GByte/s OK
Chunk: 8 (1024-1152)MB Speed: 805.02 GByte/s OK
Chunk: 9 (1152-1280)MB Speed: 810.89 GByte/s OK
Chunk: 10 (1280-1408)MB Speed: 804.95 GByte/s OK
Chunk: 11 (1408-1536)MB Speed: 803.72 GByte/s OK
Chunk: 12 (1536-1664)MB Speed: 808.22 GByte/s OK
Chunk: 13 (1664-1792)MB Speed: 810.36 GByte/s OK
Chunk: 14 (1792-1920)MB Speed: 798.36 GByte/s OK
Chunk: 15 (1920-2048)MB Speed: 804.79 GByte/s OK
Chunk: 16 (2048-2176)MB Speed: 808.52 GByte/s OK
Chunk: 17 (2176-2304)MB Speed: 806.81 GByte/s OK
Chunk: 18 (2304-2432)MB Speed: 797.57 GByte/s OK
Chunk: 19 (2432-2560)MB Speed: 811.15 GByte/s OK
Chunk: 20 (2560-2688)MB Speed: 801.23 GByte/s OK
Chunk: 21 (2688-2816)MB Speed: 800.64 GByte/s OK
Chunk: 22 (2816-2944)MB Speed: 806.50 GByte/s OK
Chunk: 23 (2944-3072)MB Speed: 804.92 GByte/s OK
Chunk: 24 (3072-3200)MB Speed: 801.49 GByte/s OK
Chunk: 25 (3200-3328)MB Speed: 805.80 GByte/s OK
Chunk: 26 (3328-3456)MB Speed: 809.73 GByte/s OK
Chunk: 27 (3456-3584)MB Speed: 803.48 GByte/s OK
Chunk: 28 (3584-3712)MB Speed: 803.48 GByte/s OK
Chunk: 29 (3712-3840)MB Speed: 814.50 GByte/s OK
Chunk: 30 (3840-3968)MB Speed: 803.50 GByte/s OK
Chunk: 31 (3968-4096)MB Speed: 801.46 GByte/s OK
Chunk: 32 (4096-4224)MB Speed: 806.71 GByte/s OK
Chunk: 33 (4224-4352)MB Speed: 807.96 GByte/s OK
Chunk: 34 (4352-4480)MB Speed: 798.47 GByte/s OK
Chunk: 35 (4480-4608)MB Speed: 809.83 GByte/s OK
Chunk: 36 (4608-4736)MB Speed: 805.03 GByte/s OK
Chunk: 37 (4736-4864)MB Speed: 800.03 GByte/s OK
Chunk: 38 (4864-4992)MB Speed: 807.27 GByte/s OK
Chunk: 39 (4992-5120)MB Speed: 808.60 GByte/s OK
Chunk: 40 (5120-5248)MB Speed: 801.69 GByte/s OK
Chunk: 41 (5248-5376)MB Speed: 801.01 GByte/s OK
Chunk: 42 (5376-5504)MB Speed: 813.22 GByte/s OK
Chunk: 43 (5504-5632)MB Speed: 799.86 GByte/s OK
Chunk: 44 (5632-5760)MB Speed: 803.02 GByte/s OK
Chunk: 45 (5760-5888)MB Speed: 814.20 GByte/s OK
Chunk: 46 (5888-6016)MB Speed: 802.96 GByte/s OK
Chunk: 47 (6016-6144)MB Speed: 800.19 GByte/s OK
Chunk: 48 (6144-6272)MB Speed: 812.16 GByte/s OK
Chunk: 49 (6272-6400)MB Speed: 808.97 GByte/s OK
Chunk: 50 (6400-6528)MB Speed: 799.32 GByte/s OK
Chunk: 51 (6528-6656)MB Speed: 807.85 GByte/s OK
Chunk: 52 (6656-6784)MB Speed: 805.51 GByte/s OK
Chunk: 53 (6784-6912)MB Speed: 801.09 GByte/s OK
Chunk: 54 (6912-7040)MB Speed: 803.40 GByte/s OK
Chunk: 55 (7040-7168)MB Speed: 810.59 GByte/s OK
Chunk: 56 (7168-7296)MB Speed: 804.90 GByte/s OK
Chunk: 57 (7296-7424)MB Speed: 802.88 GByte/s OK
Chunk: 58 (7424-7552)MB Speed: 810.09 GByte/s OK
Chunk: 59 (7552-7680)MB Speed: 801.82 GByte/s OK
Chunk: 60 (7680-7808)MB Speed: 801.10 GByte/s OK
Chunk: 61 (7808-7936)MB Speed: 802.47 GByte/s OK
Chunk: 62 (7936-8064)MB Speed: 811.16 GByte/s OK
Chunk: 63 (8064-8192)MB Speed: 802.61 GByte/s OK
Chunk: 64 (8192-8320)MB Speed: 811.66 GByte/s OK
Chunk: 65 (8320-8448)MB Speed: 805.46 GByte/s OK
Chunk: 66 (8448-8576)MB Speed: 804.29 GByte/s OK
Chunk: 67 (8576-8704)MB Speed: 807.64 GByte/s OK
Chunk: 68 (8704-8832)MB Speed: 806.64 GByte/s OK
Chunk: 69 (8832-8960)MB Speed: 801.41 GByte/s OK
Chunk: 70 (8960-9088)MB Speed: 801.85 GByte/s OK
Chunk: 71 (9088-9216)MB Speed: 809.68 GByte/s OK
Chunk: 72 (9216-9344)MB Speed: 801.69 GByte/s OK
Chunk: 73 (9344-9472)MB Speed: 799.21 GByte/s OK
Chunk: 74 (9472-9600)MB Speed: 811.36 GByte/s OK
Chunk: 75 (9600-9728)MB Speed: 803.43 GByte/s OK
Chunk: 76 (9728-9856)MB Speed: 800.34 GByte/s OK
Chunk: 77 (9856-9984)MB Speed: 803.32 GByte/s OK
Chunk: 78 (9984-10112)MB Speed: 809.08 GByte/s OK
Chunk: 79 (10112-10240)MB Speed: 798.77 GByte/s OK
Chunk: 80 (10240-10368)MB Speed: 807.56 GByte/s OK
 
  • Like
Reactions: CyklonDX

CyklonDX

Well-Known Member
Nov 8, 2022
1,554
523
113
*RTX 3090*. Note, does not exhibit the "slow memory at end" like RTX 3080 Ti or RTX 4070 as explained at the beginning of the thread.


Code:
Running Bench test on:NVIDIA GeForce RTX 3090

Chunk: 186 (23808-23936)MB Speed: inf GByte/s OK
Chunk: 187 (23936-24064)MB Speed: inf GByte/s OK
note above post, you might have faulty or disabled rop on your 3090.