I fired up a dual 6238 DL385G7 last night (2012 BIOS), compiled xmr-stak, and sure enough under ubuntu 16.04 the fastest I could squeek out was around 500H/sec using 12 threads on even cores. xmr-stak under windows 10, I can get 670/h with the SAME exact config.
however, if I run 2 extra threads on each cpu, so 1, 5, 13, 17 on low power mode, I jump to 830H/sec. ONE more on either CPU, and I drop back down to 750H/sec.
So it seems for me, with HT off, Windows with 16 threads, 8 regular and 2 low power on each CPU is the magic number.
I will say on Linux I was getting MALLOC errors, even though I was running as root and had large pages set. Go figure.
It's been sitting awhile now and over time looks like 844H/sec.