Intel Xeon D-1500 Series Discussion

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

jgreco

New Member
Sep 7, 2013
28
16
3
Putting on my FreeNAS cap:

1) RAIDZ1 is not recommended unless the data is not particularly important; RAIDZ1 loses redundancy in a drive loss scenario, and any URE's of pool data are unrepairable unless maybe they happen to be metadata.

2) 10Gbps with FreeNAS is a little dicey even on a well-resourced (read: MUCH larger) platform, and one needs to be very careful about defining things. It is possible to get over a Gbps out of a small setup. On the other hand, I can make a large setup crawl at a few Mbps by applying pathological traffic patterns (think, especially, random seeking of small blocks). Defining the use case is critical.

2a) Your underlying pool is incapable of 10Gbps. Today's fast drives can sustain perhaps 225MB/sec, three of those pounding out sequential data at top speeds would be 775MBytes/sec, or 0.775GBytes/sec, something shy of 10Gbps, and experience suggests that even the 775MB/sec is extremely wishful thinking.

3) To a certain extent, an environment with parallelism is beneficial in that you are more likely to be able to hit your pool's actual I/O limits. It is important to define what sort of protocol you are using to access the filer, and what sort of concurrency there might be.

I'd bet you'd see two to four Gbit/sec for ISO or other large sequential file storage as long as you kept occupancy and fragmentation down to reasonable numbers (80%/20%), tuned for large TCP buffers on both sides, and used a Chelsio T420 or T520 on the FreeNAS side, along with a well-designed machine and OS on the client side. I would be pleased to be underestimating that. :)
 

J--

Active Member
Aug 13, 2016
199
52
28
41
Any datapoints on the max IOPS and throughput if you were to populate all of the onboard SATA?
 

jgreco

New Member
Sep 7, 2013
28
16
3
Sorry, no, that's kind of like asking "how fast could my car go."

Performance on a low clock ZFS platform will always be somewhat limited, so I can generally say anything built on the Pentium D-1508 will be towards the slower side, but filling all your ports with a bunch of 4TB 2.5" laptop HDD's in a big RAIDZ3 is going to be very slow compared to filling all your ports with a bunch of 850 Pro's in mirrors, which will be expensive but definitely fairly zippy as far as these things go, and the IOPS potential of the SSD platform could easily be a hundred times greater if built correctly.

The D-1508 is very challenging for a software storage platform like ZFS where the parity computations are being done by the CPU. With only two cores on the 1508, you can be burning one doing RAIDZ parity and burning the other doing Samba. Even if you're optimistic and looking at that as threads, four threads at 2.2GHz isn't that much. Getting rid of the parity calculations and other overhead by moving to mirrors and increasing potential random IOPS by going to SSD means that you can have more parallelism in the system. Having multiple clients accessing the system simultaneously is more likely to result in an optimal situation where you're making full use of the potential of the NAS.
 
  • Like
Reactions: SSS

evancox10

New Member
Nov 12, 2015
8
2
3
124
Any datapoints on the max IOPS and throughput if you were to populate all of the onboard SATA?
Sorry, no, that's kind of like asking "how fast could my car go."

Performance on a low clock ZFS platform will always be somewhat limited, so I can generally say anything built on the Pentium D-1508 will be towards the slower side, but filling all your ports with a bunch of 4TB 2.5" laptop HDD's in a big RAIDZ3 is going to be very slow compared to filling all your ports with a bunch of 850 Pro's in mirrors, which will be expensive but definitely fairly zippy as far as these things go, and the IOPS potential of the SSD platform could easily be a hundred times greater if built correctly.

The D-1508 is very challenging for a software storage platform like ZFS where the parity computations are being done by the CPU. With only two cores on the 1508, you can be burning one doing RAIDZ parity and burning the other doing Samba. Even if you're optimistic and looking at that as threads, four threads at 2.2GHz isn't that much. Getting rid of the parity calculations and other overhead by moving to mirrors and increasing potential random IOPS by going to SSD means that you can have more parallelism in the system. Having multiple clients accessing the system simultaneously is more likely to result in an optimal situation where you're making full use of the potential of the NAS.

I think that the max SATA throughput question is valid. There could be a bottle-neck between the "South Bridge" in the Xeon-D SiP, where the SATA ports hang off of, and the main die, such that 6 SSD's going full bore could saturate the link. Six SATA3 ports have an effective transfer rate of 28.8 Gbps (have to take of 20% due to 8b/10b encoding), which is about equivalent to 3.65 PCIe Gen3 lanes (~7.88 Gbps/lane). I know a lot of the consumer motherboards with PCIe M.2 slots connected off the SB were running into bandwidth limits because the connection to the CPU was basically only 4x PCIe Gen3.

But in general, you need more context, specific workloads, specific druves, etc., before you can answer that.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
@evancox10 the difference between the M.2 on consumer motherboards is that the Xeon D SATA ports are basically through an on-die PCH.
 

jgreco

New Member
Sep 7, 2013
28
16
3
I think that the max SATA throughput question is valid. There could be a bottle-neck between the "South Bridge" in the Xeon-D SiP, where the SATA ports hang off of, and the main die, such that 6 SSD's going full bore could saturate the link. Six SATA3 ports have an effective transfer rate of 28.8 Gbps (have to take of 20% due to 8b/10b encoding),
Not to put too fine a point on it, but in the context of the question being asked, why would you care? If you're building a NAS, you're going to be limited by the network speeds. Even if you managed to get two 10Gbps connections with LACP and a well-balanced high traffic workload, there's no chance you're going to get "6 SSD's going full bore". Plus there isn't the CPU oomph to do any sort of significant software defined storage (ZFS etc) at those speeds, and using simpler technology (ext3/FFS without all the overhead, and NFS or Samba to serve) will generally start to bottleneck well before the 20Gbps, so you're not even really being limited by the network speeds, but by the overall platform's inappropriateness to the task.

That's not to say that it isn't a mildly interesting question, but in the larger picture you probably need to at least move upwards on cores for it to even begin to matter. Even the midrange Xeon D's are I/O-rich, CPU-shy parts. It's easy to build contention scenarios especially if you "go wide" by taking something like the 2116 on one of the X10SDV storage boards:

x10sdv-block-diagram.png

Fine so there's 16 6Gbps ports there and if you were to SSD them all, theoretically you have around ~8GBytes/sec of storage crossing ~4GBytes/sec of PCIe, but can you actually generate even that much traffic in a meaningful scenario (meaning via processing or file service, not dd /dev/zero)?

If you really want to be able to max out your peripheral devices, the Xeon D probably isn't the right choice. These things are aimed more at being "the new Atom" and at that they do some jaw-dropping-ly amazing stuff - for an Atom/Avoton replacement.

I really like the Xeon D as a pragmatic choice for deployments where you don't need dozens of cores and all the drama, but it is still a limited platform, and a lot of that limit is in the form of the CPU itself, and most of the rest is in the form of the limited expansion options. For the things you're likely to use this platform for, you're probably going to smash into CPU as being an issue before any realistic chance of hitting I/O bottlenecks.
 
  • Like
Reactions: Stux and evancox10

evancox10

New Member
Nov 12, 2015
8
2
3
124
Not to put too fine a point on it, but in the context of the question being asked, why would you care? If you're building a NAS, you're going to be limited by the network speeds. Even if you managed to get two 10Gbps connections with LACP and a well-balanced high traffic workload, there's no chance you're going to get "6 SSD's going full bore". Plus there isn't the CPU oomph to do any sort of significant software defined storage (ZFS etc) at those speeds, and using simpler technology (ext3/FFS without all the overhead, and NFS or Samba to serve) will generally start to bottleneck well before the 20Gbps, so you're not even really being limited by the network speeds, but by the overall platform's inappropriateness to the task.

That's not to say that it isn't a mildly interesting question, but in the larger picture you probably need to at least move upwards on cores for it to even begin to matter. Even the midrange Xeon D's are I/O-rich, CPU-shy parts. It's easy to build contention scenarios especially if you "go wide" by taking something like the 2116 on one of the X10SDV storage boards:

View attachment 3496

Fine so there's 16 6Gbps ports there and if you were to SSD them all, theoretically you have around ~8GBytes/sec of storage crossing ~4GBytes/sec of PCIe, but can you actually generate even that much traffic in a meaningful scenario (meaning via processing or file service, not dd /dev/zero)?

If you really want to be able to max out your peripheral devices, the Xeon D probably isn't the right choice. These things are aimed more at being "the new Atom" and at that they do some jaw-dropping-ly amazing stuff - for an Atom/Avoton replacement.

I really like the Xeon D as a pragmatic choice for deployments where you don't need dozens of cores and all the drama, but it is still a limited platform, and a lot of that limit is in the form of the CPU itself, and most of the rest is in the form of the limited expansion options. For the things you're likely to use this platform for, you're probably going to smash into CPU as being an issue before any realistic chance of hitting I/O bottlenecks.

I agree with all your points; it's an academic question for a NAS. However I think you missed that you were responding to two completely different users. j--, who asked about using all 6 SATA ports, is not running a NAS application, but rather some database/VM, with a CPU much beefier than the 1508. (He/she talked about it in another thread)
 

J--

Active Member
Aug 13, 2016
199
52
28
41
Ha, thanks for reading my other thread. :)

I was just asking for theoretical speeds to try to judge what the PCH is capable of, irrespective of CPU. I wouldn't use this for a DB server; that would be a bit ambitious.
 

evancox10

New Member
Nov 12, 2015
8
2
3
124
@evancox10 the difference between the M.2 on consumer motherboards is that the Xeon D SATA ports are basically through an on-die PCH.
I think the SATA, Gen 2 PCIe, and USB ports are on a separate die integrated into the package. Source:

Intel gives the ‘v3’ treatment to new 14 nm Xeon D processors

Edit: Point being, there's still some interconnect, and if all they did was put an existing die in the package it's going to be the same interconnect as previously.
 
Last edited:

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
I think the SATA, Gen 2 PCIe, and USB ports are on a separate die integrated into the package. Source:

Intel gives the ‘v3’ treatment to new 14 nm Xeon D processors

Edit: Point being, there's still some interconnect, and if all they did was put an existing die in the package it's going to be the same interconnect as previously.
That is a strange article. The block diagram shown and provided by Intel has the PCH integrated in the SoC. The article also makes it sound like the 10GbE is fully integrated. The 10GbE ports do not have PHYs integrated for example so the full solution is not integrated.

BTW if you want some more of the slides Intel provided - Intel Xeon D SoC - Changing the low end with Broadwell-DE

...and more detail on the PCH
Broadwell-DE / Intel Xeon D-1540 SoC PCH Information

We did the majority of the early Broadwell-DE reviews/ pieces and made public some of the major bugs in early silicon.

One thing we have not done is physically opened the package. @William - maybe a fun project de-lidding a package and snapping some photos?

I do not think you are going to get PCH SATA speed limited.
 
  • Like
Reactions: William

evancox10

New Member
Nov 12, 2015
8
2
3
124
Cool, thanks for the info Patrick. Also I'm familiar with your excellent coverage, having read most of it! Thank you for that, by the way :)

The PCH separate die thing seems to be something they aren't exactly trumpeting. But it's been confirmed, see the post by the author halfway down this comment thread

Intel's 1st Xeon SoC Twists ARM | EE Times

Not that it really affects the end user either way, it doesn't make the product any less great! I think I'll be getting Supermicro's mini-tower with the 1541 soon
 
  • Like
Reactions: Patrick

RobertC

New Member
Apr 8, 2016
17
4
3
57
Has anyone booted this with any of the Solaris clones, like SmartOS, Illumos, or OpenIndiana? I know these do not support USB3 (it is currently in alpha) and am concerned that the USB 2 ports are implemented via the USB3 controller.

If you HAVE booted this from a flash stick, does the booted OS allow you to use a standard USB keyboard, mouse, and a USB2 audio adapter?

Thanks in advance!
 

Laugh|nGMan

Member
Nov 27, 2012
36
7
8
Guys! Anyone have info for Broadwell-DE with Windows server 2016/windows 10:
1. idle watts with Balanced power plan and idle cpu frequency?
2. idle watts with Performance power plan and idle cpu frequency?

Just curios, how it compares with Sandy/IvyBridge ATX size 6-8core platforms. I still don't see any point to upgrade. Broadwell-DE cpu unswappable, pair PCI-E slots for expansion, pricey. Somehow i feels with Broadwell-DE my hands are tied, no room to expand.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
Guys! Anyone have info for Broadwell-DE with Windows server 2016/windows 10:
1. idle watts with Balanced power plan and idle cpu frequency?
2. idle watts with Performance power plan and idle cpu frequency?

Just curios, how it compares with Sandy/IvyBridge ATX size 6-8core platforms. I still don't see any point to upgrade. Broadwell-DE cpu unswappable, pair PCI-E slots for expansion, pricey. Somehow i feels with Broadwell-DE my hands are tied, no room to expand.
This just made my Google Keep list. I may use WS2016 Essentials on a few machines.

I do think with NVMe pricing becoming affordable that if you are thinking of building a larger platform, the E5 V4 route is the way to go. More RAM expansion, more PCIe lanes, more SATA 3.0 lanes and etc.

At the low end, (if you never want more than 128GB/ 8 cores) I think the Xeon D makes a lot of sense. If you are power constrained (1U 1A 110V hosting) then the Xeon D is a category killer.

With Windows Server 2016 licensing moving to a per-core model, higher clock speed E5 V4's are going to be preferable over Xeon D save in the WS2016 Essentials role.
 

evancox10

New Member
Nov 12, 2015
8
2
3
124
Hey, so I noticed there has been some discussion of PCIe bifurcation support in this thread on the SuperMicro boards. What is the conclusion of that? Do you need an "active" adapter with a PLX chip? Or can you split the lanes passively? If you can split them passively, how many lanes can you split them into?

Since you have a full PCIe x16 slot, I'm wondering how you could split that into 4 individual PCIe x4 M.2 slots. With something like this maybe
Use a PLX chip adapter.
Hey, so I just noticed here that the latest BIOS update (1.1c) for the SuperMicro X10SDV platform contains this:
Expose all supported bifurcation combination for PCIe slot 7.
Anyone know what the available options are? Sounds like you may not need a PLX adapter now, if the chipset supports bifurcation. Even just two M.2 would be nice.
 

WeekendWarrior

Active Member
Apr 2, 2015
356
145
43
56
The 0Q at the end of the part number is different.

Here is a slightly out of date guide on the Samsung part numbers
http://www.samsung.com/global/busin...ort/part_number_decoder/DDR3_SDRAM_Module.pdf
It does not have the current DDR4 numbering scheme but you can see the general logic on what is significant about the part numbers.

In your case. Nothing is different. Judging by what I could see, they made a small change in the factory at some point and all the newer ones are CPB0Q. Probably just something like switching to a larger die size or newer process to lower defect rates. If it makes manufacturing more efficient it just means lower prices as they get better at producing this stuff. New part number is likely just so they can tell them apart for warranty tracking.

Spec sheets show zero functional differences. So they will work exactly the same as older ones.
Came across your helpful post when searching for Samsung DDR4-related naming info, and your DDR3 link led me to Samsung's DDR4 naming explanation FYI: http://www.samsung.com/semiconductor/global/file/product/DDR4_Product_guide_May15.pdf
 

WenleZ

New Member
Aug 9, 2016
4
2
3
37
Chicago
www.wenlez.com
+1

Would love to know if the onboard sata controller is passthrough-able in ESXi 5.5 or 6!
YEs, you can passthrough the onboard Intel SATA controller. You must either install the ESXi on a USB drive, or on a NVMe SSD.
What you need to do is:
add the following in /etc/vmware/passthrough.map ( or passthru.map in ESX6.5 ):

# INTEL Lynx Point AHCI
8086 8c02 d3d0 false
 
  • Like
Reactions: Stux

IamSpartacus

Well-Known Member
Mar 14, 2016
2,515
650
113
YEs, you can passthrough the onboard Intel SATA controller. You must either install the ESXi on a USB drive, or on a NVMe SSD.
What you need to do is:
add the following in /etc/vmware/passthrough.map ( or passthru.map in ESX6.5 ):

# INTEL Lynx Point AHCI
8086 8c02 d3d0 false
Nice tip!