Compilation server

orki

New Member
Oct 4, 2021
12
3
3
If your code relies on several levels of pointer indirection, it seems that the processor will always be invalidating the cache contents...
So, main memory bandwidth will be a critical issue.

About the "template metaprogramming", it seems you are using the compiler to do two different things at the same time: write the code that you want and compile it.

Have you considered doing the metaprogramming yourself?
My code does not have much pointer indirection, precisely to avoid this problem. The description above was of the kernel (which was UhClem's comparison above), whose APIs tend to use a lot of pointer indirections. The speed of compiled binaries is a fun optimization problem, which is the reason it is a hobby project. But the server requirements are for *compiling* the code itself, not for the compiled binaries, as many of the compiled binaries do not even run on x86.

Veering way off course of this topic: Template metaprogramming in C++ is the closest I can get to Lisp reader macros, while still in full control of object memory layout. As a hobby project, writing code to write code is way more fun than just writing code ;)
 
  • Like
Reactions: UhClem and itronin

cageek

Member
Jun 22, 2018
46
34
18
What is the reason for recommending DDR4? Is it because of future proofing against memory bandwidth issues?
For me, it's just looking at the higher core counts and Passmark scores available with the v3 and v4 processors. There were some v3 e5's that took DDR3, but they were few and far between. The scores are here:
I can't seem to get a direct link. Search for '[Dual CPU] Intel Xeon E5-2' on the multi-processors chart, sort by cpu mark.
 
  • Like
Reactions: orki

RTM

Well-Known Member
Jan 26, 2014
754
273
63
What is the reason for recommending DDR4? Is it because of future proofing against memory bandwidth issues?
For me it is more about not investing money in old tech, DDR4 can possibly be installed in more recent hardware, where with DDR3 you are more or less limited to Xeon v2's (and some combinations of v3 + special motherboards - but just ignore that).
To some extent it is also for "security" reasons, I read somewhere a while back that some (not all) DDR4 is immune to rowhammer attacks.
 

WANg

Well-Known Member
Jun 10, 2018
1,107
691
113
43
New York, NY
For me it is more about not investing money in old tech, DDR4 can possibly be installed in more recent hardware, where with DDR3 you are more or less limited to Xeon v2's (and some combinations of v3 + special motherboards - but just ignore that).
To some extent it is also for "security" reasons, I read somewhere a while back that some (not all) DDR4 is immune to rowhammer attacks.
That depends on whether the DIMM/SODIMM implements TRR (target row refresh). It's not a JEDEC standard and DIMM vendors don't mention it (unless it's some type of premium that you are expected to pay for), and even then, there is a paper which suggest that it doesn't really mitigate the issue. I would say that you should buy DDR4 machines simply for the simple reason of not chasing bad money with good (or paying today's dollars for yesterday's tech).
 

WANg

Well-Known Member
Jun 10, 2018
1,107
691
113
43
New York, NY
Well, i
Longtime lurker, but https://forums.servethehome.com/ind...nt-email-with-room-to-grow.34145/#post-314910 finally impelled me to make an account here. Advice on identifying a machine for a compilation server would be much appreciated.

Use case for a development server:
  • Slow desktop development machine (i5-6400 with 16GB RAM)
  • Compilation times are very slow (about 1-5hrs depending on project, all personal; both memory and CPU limited)
  • Would like a beefier machine for compilation and for LSP server
  • Not particularly space constrained, as basement has plenty of space along with other servers (see below)
  • Development server likely to be used for 10-15 hours only per week (in ~1hr sessions) for personal projects
  • Would like to minimize idle power draw; ideally would go into hibernation or the like when not used
  • Would like at least 64GB of memory
  • Only care about Linux support
  • Price ideally less than 200 USD, located in Illinois

My home network is fairly straightforward and should be lightly loaded and should not have any impact on the choice, but in case it does:
  • Wired 1Gbps connections: Dell 3020 SFF (Opnsense, in the process of switching to VyOS for better CLI support) to HP 2530-24G + 2 Aruba IAP-205 (with redundant Dell 3020 SFF + HP 2530-24G combo as backup in case of component failure)
  • Bunch of Raspberry Pis and HP T620s used as music boxes (via Mopidy)
  • A single NUC + SilverStone RS431U with 4 spinning disks acting as backup server, DLNA server and NFS disk server for music boxes
  • The usual phones/TVs/desktops/laptops but nothing significant (no real video usage beyond Netflix streaming a couple of times a week)
Well, it won't be for less than 200 USD - the RAM alone (assuming 2x32GB DDR4 UDIMM or SODIMM for 64, and who knows about that) will run in the ballpark of at least 225 or so, or possibly less if you get something (server-like) that can use more lower density units (8x8 GB or 4x16GB.- 4x16 will be around 160-200 in the US).

Well, here's an idea from the left field - consider a Lenovo ThinkCentre P330 SFF (not the tiny) with a decent Coffee Lake CPU, or a Lenovo p52 mobile workstation laptop with a 6 core i7-8850H (both around 600-700 USD on evilbay for a complete/working machine, but you can do better if you work the hustle). Both can do 4 DDR4 DIMM/SODIMM slots (128GB RAM max, both should support ECC if you need it), it's Coffee Lake so it is barely on the Win11 acceptable CPU list (so you'll see institutions trading up) - there's a significant bump between the quadcore Skylake/Kaby Lake and their 6-core Coffee Lake counterparts which makes it worthwhile (kinda).
Get a few 16GB DDR4 units (not that expensive), which when populated in all 4 slots will give you 64GB of RAM. When you need to go above 64 just incrementally buy 32GB SODIMMs and swap them out. The bonus is that after you are done compiling things just suspend/put it to sleep - it's just a standard desktop/laptop. Easy on the power, easy to deal with, and none of the headaches of having to buy old Sandy/Ivy Xeon machines and make them fit your situation (Haswells and Broadwells are okay for the most part if used for Linux)
 
Last edited:

orki

New Member
Oct 4, 2021
12
3
3
Well, here's an idea from the left field - consider a Lenovo ThinkCentre P330 SFF (not the tiny) with a decent Coffee Lake CPU, or a Lenovo p52 mobile workstation laptop with a 6 core i7-8850H (both around 600-700 USD on evilbay for a complete/working machine, but you can do better if you work the hustle). Both can do 4 DDR4 DIMM/SODIMM slots (128GB RAM max, both should support ECC if you need it), it's Coffee Lake so it is barely on the Win11 acceptable CPU list (so you'll see institutions trading up) - there's a significant bump between the quadcore Skylake/Kaby Lake and their 6-core Coffee Lake counterparts which makes it worthwhile (kinda).
This is similar to RTM's idea, unless I'm mising something, for roughly the same cost. In terms of both memory bandwidth and CPU performance, the Dell T7810 workstation with dual cores would have roughly double the performance at the cost of (much?) higher power consumption. The advantage of the P330 would be a pre-built system with virtually no additional work needed. Is this correct?
 

WANg

Well-Known Member
Jun 10, 2018
1,107
691
113
43
New York, NY
This is similar to RTM's idea, unless I'm mising something, for roughly the same cost. In terms of both memory bandwidth and CPU performance, the Dell T7810 workstation with dual cores would have roughly double the performance at the cost of (much?) higher power consumption. The advantage of the P330 would be a pre-built system with virtually no additional work needed. Is this correct?
Well, 2 sides of the same coin, really. With @RTM, you gun for more older cores (either 1 or 2 sockets of multi-core Haswell, and 4 server grade DDR4 DIMM slots per socket), and you throw more cores at the problem. My approach is to use less cores, but newer cores (coffee lake). There are merits to both - but it’s the question of whether your build process is “embarrassingly parallel” and can keep the older cores spun up, or will code branching/dependencies essentially favor less but faster cores.
The Precision 7810 is a beast of a machine (basically like a 3U server on its side), while the P330 is much smaller (it’s like a 7 liter desktop box) and the p52 laptop even smaller yet - but unless you spin it up the Precison deskside is not going to be all that loud.

At the end of the day, its really about buying computing after someone else already paid for the amortization…the p330/p52s are probably off-lease from an engineering school or large firm, much like the 7810 - both are pre-made and you can slap parts in to suit your needs. the question is, of course, what would you value more - cheaper and more raw CPU firepower upfront trading for more bulk and quicker obsolescence, or less CPU costing more upfront but having it be easier to carry around/be relevant for longer. Of course, don’t forget about the challenges of running power and cooling to a big machine like the 7810…I remember my grad school days when someone from the department put a couple of Precison desksides onto the same circuit as the rest of the department, just to have the circuit flip because an adjunct went ahead and microwave a burrito in the break room. Sometimes your home electrical grid might be your biggest determinant….followed by whether you have any significant others who might find exception to fan noise during busy-times.
 
Last edited:
  • Like
Reactions: itronin

UhClem

Active Member
Jun 26, 2012
137
54
28
NH, USA
Does the fact that you're now discussing (considering?) CPUs with (only) 2-channel mem (vs 4-), and smaller cache (12M vs 25+M) [[these would be CoffeeLake vs e5-26xx v3/4]] mean that your valgrind results are in? Does valgrind, or its results, enable you to account for differences in cache size/behavior [6254 vs e5-26xx], and single compile vs 4+, wrt assessing your mem bandwidth needs?

How many concurrent compiles do you need? want?
So, probably that # +1, or 2, CPU cores ??
 

RTM

Well-Known Member
Jan 26, 2014
754
273
63
There is another advantage of the T7810: Maximum RAM capacity
Both the fact that it supports RDIMMs and the number of slots it has (8, T7910 has more, but is not cheap) means it can support a much greater amount of memory (if it matters).

I doubt power draw will be too bad without a large GPU (but probably not the first choice for 24/7 operation).

Anyway to throw yet another hat into the ring, here is another contender that I at least think is really cool:

Asus PN51 (think a NUC with a 8 core Ryzen - mobile CPU so probably not the greatest single threaded performance, but still very nice):
Cost: $660

RAM: 2x 32GB SODIMM DDR4 3200MHz
Cost: $210

Total: $870
 

orki

New Member
Oct 4, 2021
12
3
3
Does the fact that you're now discussing (considering?) CPUs with (only) 2-channel mem (vs 4-), and smaller cache (12M vs 25+M) [[these would be CoffeeLake vs e5-26xx v3/4]] mean that your valgrind results are in? Does valgrind, or its results, enable you to account for differences in cache size/behavior [6254 vs e5-26xx], and single compile vs 4+, wrt assessing your mem bandwidth needs?

How many concurrent compiles do you need? want?
So, probably that # +1, or 2, CPU cores ??
The valgrind results are more complicated than I expected. The issue is that there is an L3 cache miss rate of 0.1% rd and 0.6% wr, which is not realistic; valgrind modeled the entire L3 cache for the Xeon Gold 6254. When running multiple compilations in parallel, this will clearly not be true. It is indicative of much lower memory bandwidth pressure than I expected. gcc's internals must be organized better than I expected in spite of the gimple representation.
The turnaround time is about 20 hours for each valgrind run; this is starting to feel more like work than fun :(

I'd like to have 16-20 parallel compiles going. The bazel compilation graph for one of my projects indicates that, at least in theory, 30 units could be compiled in parallel. However, many of them at the widest level are fairly small (compilation time of 2 minutes or less) as most of the interesting code is templated.
 

orki

New Member
Oct 4, 2021
12
3
3
There is another advantage of the T7810: Maximum RAM capacity
Both the fact that it supports RDIMMs and the number of slots it has (8, T7910 has more, but is not cheap) means it can support a much greater amount of memory (if it matters).
I ran out of patience and decided to tempt fate by buying HP Z840 | 8 GB / 1.00 TB | DUAL Xeon E5-2680 v4 | K5000 workstation from a seller with only a 98.9% positive rating of evilBay for a total of about 550 USD (including taxes). Unless the description is grossly wrong, this is a Broadwell-EP that is slightly faster than the E5-2697 v3 from your earlier post. The monster motherboard supports quite a few DIMM slots, and the PSU should be appropriately sized for both processors already. The only thing left is memory; HP's website indicates that these Z840s take DDR4-2133, but I have a hard time believing that they cannot take at least DDR-2400. Any ideas?
 
  • Like
Reactions: WANg

orki

New Member
Oct 4, 2021
12
3
3
I don't really need or want the NVidia Quadro K5000; it appears to sell for 150-200 USD on evilBay, about the same as 64GB of the right kind of RAM. It'd be nice to figure out a way to barter it for 64GB DDR4-2400 RDIMMs. Other than the forums here, is there anywhere else such a swap might be made to work?
 

WANg

Well-Known Member
Jun 10, 2018
1,107
691
113
43
New York, NY
I ran out of patience and decided to tempt fate by buying HP Z840 | 8 GB / 1.00 TB | DUAL Xeon E5-2680 v4 | K5000 workstation from a seller with only a 98.9% positive rating of evilBay for a total of about 550 USD (including taxes). Unless the description is grossly wrong, this is a Broadwell-EP that is slightly faster than the E5-2697 v3 from your earlier post. The monster motherboard supports quite a few DIMM slots, and the PSU should be appropriately sized for both processors already. The only thing left is memory; HP's website indicates that these Z840s take DDR4-2133, but I have a hard time believing that they cannot take at least DDR-2400. Any ideas?
Oh yeah, that's a good call if you have the room for it. The Broadwell-EPs are 14nm lithography so they should idle lower than their Haswell cousins. Those 16 RDIMM slots are mighty tasty, for sure - my guess is that the machine probably have 4x2GB RDIMMs shipped (2 units for each CPU). Depending on what you have on hand and the quoted pricing on the RAM, you could get to 64GB and more at a very reasonable price.
 

RTM

Well-Known Member
Jan 26, 2014
754
273
63
I ran out of patience and decided to tempt fate by buying HP Z840 | 8 GB / 1.00 TB | DUAL Xeon E5-2680 v4 | K5000 workstation from a seller with only a 98.9% positive rating of evilBay for a total of about 550 USD (including taxes). Unless the description is grossly wrong, this is a Broadwell-EP that is slightly faster than the E5-2697 v3 from your earlier post. The monster motherboard supports quite a few DIMM slots, and the PSU should be appropriately sized for both processors already. The only thing left is memory; HP's website indicates that these Z840s take DDR4-2133, but I have a hard time believing that they cannot take at least DDR-2400. Any ideas?
Looks like a pretty good deal, especially if you can sell that GPU.

Definitely go with DDR4-2400 (or better), to achieve maximum memory bandwidth you may want to ensure that each CPU has 4 DIMMs connected (less should work just fine though).

Generally speaking I also do not recommend buying DIMMs with a capacity of less than 16GB each (so I am recommending getting 128GB if within budget) at least unless they are really cheap. It seems to me that there is less of a market for 8G and smaller DIMMs, so it is better to invest in larger sticks (will also allow greater maximum capacity).