Memory (config) diagnosis tool

Rand__

Well-Known Member
Mar 6, 2014
4,494
878
113
Does anyone know of a tool (any regular OS) that will show current memory configuration?

In particular I'd be looking for channel interleaving, maximum bandwith, clock, latencies and all that.

Background is that I want to run a nvdimm module and nobody can tell me how Bios & OS will actually handle this
- the reason for the uncertainty is that at boot time the module is a regular RDIMM module and only after driver support has been loaded it gets taken out of the memory pool and into special /storage device pool.
  • So how does the system react to that?
  • does it change previously decided interleaving pattern?
  • does it add/remove bandwith...
 

Rand__

Well-Known Member
Mar 6, 2014
4,494
878
113
Maybe CPU-Z will do the trick, need to read up on what the info means...
upload_2019-10-25_12-23-11.png
 

Rand__

Well-Known Member
Mar 6, 2014
4,494
878
113
Just installed the nvdimm module, and with 7 modules now installed it reads this

upload_2019-10-25_19-3-19.png


looks like Windows didnt detect the nvdimm properly so its treated as memory.
7 modules should not have a hex channel layout though, so not sure how well suited CPU-Z is.
Will try yours:)

Edit:
nvdimm works fine ...

upload_2019-10-25_19-21-4.png

So CPU-Z is not really telling the truth (or is confused).

PassMark does not work since windows can't get SPD info from my mem modules (neither passmark nor CPU-Z can) - possibly due to skylake board or some weird setting of the intel board.

Will run pmbw next

edit 2 - there is still an unrecognized component possible related to nvdimm...maybe need to run a test install of Window Server 2016 (which supports nvdimm's natively) to ensure its not a driver issue...
 
Last edited:

alex_stief

Active Member
May 31, 2016
624
189
43
35
CPU-Z is not ideal for detecting stuff beyond consumer-grade hardware. For maximum bandwidth, there is only one tool: MEMORY BANDWIDTH: STREAM BENCHMARK PERFORMANCE RESULTS
Well technically, there are more. But you know what I mean ;)
No idea about latency and interleaving modes though. So I subscribe here to see what others come up with. Aida64 has built-in benchmarks for latency+bandwidth of caches and memory. But just like CPU-Z, I would not trust it with this kind of hardware.
I feel like getting reliable information from your own benchmarking in this area, would require a computer science degree.
 
Last edited:

Rand__

Well-Known Member
Mar 6, 2014
4,494
878
113
btw I meant that memory page not your spoiler. My attention span isnt that bad!
Ah, good that you mention that, I actually double checked whether the spoiler was that long/boring/complicated that ppl would drop of after half of it;)

Well, then I stand corrected. It requires more than a CS degree :D

Edit: you might want to branch out, since your question digs rather deep: Software Tuning, Performance Optimization & Platform Monitoring
:)

The question is whether I actually need to dig deeper or not.
Interesting at this point was only how nvdimm-n's are treated when OS driver support picked them up as storage device instead of memory device.
Now ideally a tool could tell me and i wouldnt need to run benchmarks to determine for myself (don't really want to open another box of pandora next to ZFS mirror [ssd|nvme] scaling {issues] and trying to get 100GB networking to run properly {on my weird HW})


Edit:
I actually cancelled pmbw after 30 minutes, its probably not smart to run with 320gb mem size (leftover from some new module tests). Will replace these with 8gb modules, should run faster then
 
Last edited:

alex_stief

Active Member
May 31, 2016
624
189
43
35
You can tweak the settings to pmbw, e.g. the maximum amount of cores or memory tested. No need to change the hardware just to shorten the test. I would tell you what the parameters are called, but I would have to look the up too

here goes nothing
Code:
$ ./pmbw -h
Usage: ./pmbw [options]
Options:
  -f <match>     Run only benchmarks containing this substring, can be used multile times. Try "list".
  -M <size>      Limit the maximum amount of memory allocated at startup [byte].
  -p <nthrs>     Run benchmarks with at least this thread count.
  -P <nthrs>     Run benchmarks with at most this thread count (overrides detected processor count).
  -Q             Run benchmarks with quadratically increasing thread count.
  -s <size>      Limit the _minimum_ test array size [byte]. Set to 0 for no limit.
  -S <size>      Limit the _maximum_ test array size [byte]. Set to 0 for no limit.
 
Last edited:

Rand__

Well-Known Member
Mar 6, 2014
4,494
878
113
Oh my,
this is going to take a while. I moved down to 8GB (single module) and even with a test size of 4 GB its been running now for 5hrs ...
Might need to turn that down a notch or two (its running with -P 36 threads atm [autodetect]) to get that done in a reasonable time
 

Rand__

Well-Known Member
Mar 6, 2014
4,494
878
113
Ran through 1 to 6 (+1) mem modules , limiting to 4GB arraysize and 8 threads...
Can't find any major differences implying a modified bandwith.. very weird

8GB memory, single module:
upload_2019-10-27_15-12-9.png

48 GB memory, 6 modules:
upload_2019-10-27_15-12-50.png

Have attached two results files if anyone wants to look at other measurements.

It might be that extra BW becomes available with more threads - will run with more threads next...
 

Attachments

alex_stief

Active Member
May 31, 2016
624
189
43
35
The memory controllers Intel currently puts in their high-end CPUs deliver very low bandwidth per core. So no big surprise that you don't see much of an improvement going from single- to hexa-channel memory, while testing with a single thread.

Since you are only interested in memory performance, you can exclude the smaller array sizes to shorten the runs further. These only test cache performance.
Going further down this road, you will eventually end up with the stream benchmark ;)
 
Last edited:

Rand__

Well-Known Member
Mar 6, 2014
4,494
878
113
Well o/c that was only the sample I took from the files, the 8 thread charts are in there too.

But you might have a point, so running the full test now (thread related).

And not sure i want to go the STREAM way - simply want to know whether the extra nvdimm will have a negative impact on day to day operations, but until now I have not seen *any* impact, not even between *single* and *hex* channel setup (which is not the expected behavior in my opinion).
 

alex_stief

Active Member
May 31, 2016
624
189
43
35
If I had to guess how NVDIMM is handled: The lower memory address space is populated with regular memory, and channel interleaving as usual. This would explain why you see no performance penalty whatsoever. And then the address space above that gets mapped to the NVDIMMs. This way, one could avoid the performance penalty associated with this type of memory for as long as possible. Only when the system actually needs the full amount of memory, it touches the NVDIMMs.
Of course, over time this could still lead to performance degradation, as more and more memory is used for caching/buffers. Flushing the caches should solve that.

Just a guess though.
 

Rand__

Well-Known Member
Mar 6, 2014
4,494
878
113
The question i have is who actually decides on module interleaving.
My guess is the Bios. The Bios sees 6+1 memory modules in my case since while the nvdimm is detected as such without actual driver support its treated as regular Dimm module so the bios cannot exempt it completely...
So at which point would i loose interleaving if the os does not load the driver... ? :)