Supermicro Xeon-D 1587 Squid Performance

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

tssrshot

Member
Mar 18, 2015
58
8
8
Omaha, NE
We're looking to repurpose an excess Server, the Supermicro Xeon-D 1587 (16c/32t). It was bought on another budget with excess everything, Max RAM (128GB), 480GB NVME SSD, 1 TB SATA-SSD. We have a lot (2200 iPads) of internal Mobile Clients that download large databases from the Internet, approx. 32GB each per month. (changes monthly).

Anyone with experience with Squid or the Xeon-D have any big glaring mistakes visible? Obviously I am going to tune the caching towards memory where possible, and define some dynamic download specifics to make sure they cache properly, but am I missing something stupid like Squid's poor threading? The network itself is solid. Server would have a 10GB network connection and some turbo-Wifi throughout the usage area. Its the WAN I need to solve with caching, but not if the processing/IO is going to die in the process.

Bryan
 
Last edited:

Patrick

Administrator
Staff member
Dec 21, 2010
12,511
5,792
113
If you are just using it for caching, my sense is that the box is overkill. On the other hand, if you have it, why not.
 
  • Like
Reactions: _alex and gigatexal

tssrshot

Member
Mar 18, 2015
58
8
8
Omaha, NE
Well...overkill never got me in trouble. :) I appreciate the response Patrick.

I live in fear of worker threads and core disparity. I'm not super-savvy, so I put this up to you masters. (I'm a networking specialist)

Does it matter that the 32GB is all at once? At a random date sometime in a 28-30 days window an entire database will drop that they have to download all at once. Sounds weird, but I'd have to go private to discuss the business I am in. However, our WAN link is only 25MB download, so you can see my dilemma.

Bryan
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,625
2,043
113
For clarity are 2200 iPads downloading their own unique 32GB DB? At once? Different times? You said databases but then want to cache, so I'm not sure how it'd help at all if they're unique?

Why are they each downloading it? Why don't you download it once on a locally hosted system and provide THAT to the ipads as the download source?

Not that I know your biz, just some other thoughts/ideas.
 

capn_pineapple

Active Member
Aug 28, 2013
356
80
28
Why don't you download it once on a locally hosted system and provide THAT to the ipads as the download source?
That's precisely what one could use squid cache for :) I use it for Steam and MS update caching to essentially do the exact same thing.
 

tssrshot

Member
Mar 18, 2015
58
8
8
Omaha, NE
For clarity are 2200 iPads downloading their own unique 32GB DB? At once? Different times? You said databases but then want to cache, so I'm not sure how it'd help at all if they're unique?

Why are they each downloading it? Why don't you download it once on a locally hosted system and provide THAT to the ipads as the download source?

Not that I know your biz, just some other thoughts/ideas.
Its ~32gb, but Squid would detect it as identical. It typically servers from mem on our test server; but we only have about 20 iPads using that one.

The Application only trusts and uses this "authoritative" internet-based source. Its great because a lot of our iPads float around the world and have good connectivity, but about 85% of them are at home, tearing up our tiny internet connection. :)

Its like a database that contains information about Airfields and how to fly into them. So its identical for every person. However it cycles 28-30 apart, and as they discover this, they download and beat up the internet.

So they can't fly the next day if they can't download it in time, so sometimes they need it that morning and its time critical.
 
  • Like
Reactions: T_Minus

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,625
2,043
113
That's precisely what one could use squid cache for :) I use it for Steam and MS update caching to essentially do the exact same thing.
I understand that.

However, if this DB utilizes another application it might be much simpler to setup a cron to download the db to a 'internal network' location, re-configure the app to utilize that, and not need to rely on squid, extra cpu, etc ;) was my point.
 

capn_pineapple

Active Member
Aug 28, 2013
356
80
28
if this DB utilizes another application it might be much simpler to setup a cron to download the db to a 'internal network' location.
Ah, yeah, I get what you mean. Usually apps like this are hard coded to pull from a specific set of vendor servers (especially if it's an ipad app).
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,625
2,043
113
Ah, yeah, I get what you mean. Usually apps like this are hard coded to pull from a specific set of vendor servers (especially if it's an ipad app).
oh, trust me, I get you ;) I'm on the development side of 'apps' and 'websites' that work like this ( I guess most do require outside sources of info, ha ha), and I strongly push customers to 'internal' handling than relying on 3rd-party caching, and/or "maybe it will work" situations ;)

Now, getting their internal developers to actually utilize a more efficient system that may require a config change now and then.. that's something else ;) ha ha.

That's what's nice about the changing tech... can change the way we look at, and solve problems :D :D
 
  • Like
Reactions: capn_pineapple

DavidRa

Infrastructure Architect
Aug 3, 2015
329
152
43
Central Coast of NSW
www.pdconsec.net
Couple of options.

First, if all these devices are in people's homes, they shouldn't be smashing your connection - they should be going direct. That might make a cloud cache (a VPS or similar) a better choice of cache/distribution points (if you find you have to use a proxy). If you've deployed VPN profiles to the iPads, perhaps reconfigure to permit going direct if that's possible?

Second, you could host the file on an internal web server and create your own internal DNS zones with the appropriate records for the app download - there's a minor niggle to solve there with getting it checked/updated every day, but it would seem to be doable. If it needs to be SSL, you could try putting an internally-generated and trusted cert on that webserver, but if the app pins the cert, that won't work. Still testable.

I think the proxy will work - I don't recall any major gotchas with Squid current or recent past. Worst case, you deploy 4 VMs and let them peercache; then either load balance them or use DNS round robin.
 

tssrshot

Member
Mar 18, 2015
58
8
8
Omaha, NE
Couple of options.

First, if all these devices are in people's homes, they shouldn't be smashing your connection - they should be going direct. That might make a cloud cache (a VPS or similar) a better choice of cache/distribution points (if you find you have to use a proxy). If you've deployed VPN profiles to the iPads, perhaps reconfigure to permit going direct if that's possible?

Second, you could host the file on an internal web server and create your own internal DNS zones with the appropriate records for the app download - there's a minor niggle to solve there with getting it checked/updated every day, but it would seem to be doable. If it needs to be SSL, you could try putting an internally-generated and trusted cert on that webserver, but if the app pins the cert, that won't work. Still testable.

I think the proxy will work - I don't recall any major gotchas with Squid current or recent past. Worst case, you deploy 4 VMs and let them peercache; then either load balance them or use DNS round robin.
Awesome reply. Thank You. Sorry for the delay, I got bogged into some terrible travel assignments.

So the content, we don't own. If a user is at home, that's fine they can go direct no VPN necessary. Most people probably have faster internet at home anyways. However, I need to get them "square" if they are at our campus. Typically, they forget until the morning before the fly out (Aviation Business), so the speed and our lack of bandwidth is the general impetus. Seems to work pretty good on a small scale test (using older hardware), but I had to do some Squid dynamic caching entries. Luckily, its all http, so I could avoid the WPAD vs Self-Signed CA battle.

My main reason for the question is that I am totally out of my element reading into Squid's usage of worker threads et. al, and me not knowing if this particular hardware (low speeds, high core count) would get bogged down in the business of managing so many smaller transactions. Luckily, this hardware is maxed out with 128GB of memory for a cache, and 480GB of SSD, so plenty of room to keeps things close-hold. If squid in this case is hanging loose transferring 35GB worth o ZIP/GZIP files, is that thread locked? Even with 16c/32t, what happens if 33 little jerks hit update at the same time?

Thank you!

Bryan
 

Stux

Member
May 29, 2017
30
10
8
46
Able to push a gig or two per second, if 33 tablets try, they'll get 30MB/s and will take 15-30 minutes to download the database.

33 users and a thread per user. It shouldn't have a problem.

I hope you have the 10gbe uplink in use.
 
  • Like
Reactions: tssrshot

tssrshot

Member
Mar 18, 2015
58
8
8
Omaha, NE
Able to push a gig or two per second, if 33 tablets try, they'll get 30MB/s and will take 15-30 minutes to download the database.

33 users and a thread per user. It shouldn't have a problem.

I hope you have the 10gbe uplink in use.
Definitely 10Gb in use. This goes straight to our biggest 10Gb Distribution Switch via SFP+ Twinax cable. all of the campus buildings are within 1 or 2 - 10Gb hop(s) from there. The APs are all Cisco Multi-Gig 2.5 uplinks and Wave 2....this network actually goes too fast, which is why I worry.

It almost crosses my mind to duplicate the Squid and cross-link them to use each other as remote if not "HIT" local. Would double my storage space, double my thread counts, etc. I don't expect the worst, but I have 2200 identical Mobile Devices from the same use case...it will only continue to grow.

What happens to users 33 - Infinity. They just hang waiting for the threads to finish? Or everybody shares in the slowness for each person who jumps onboard?
 

mstone

Active Member
Mar 11, 2015
505
118
43
46
Well, actually it pretty much is delta. Every 28 days all of the data is verified and its reset with a new date
"delta" would mean you only download changes. Now I might be naive, but in my experience airports don't move around all that much. So I would expect that once you have the full dataset, there just isn't that much that changes in the course of a month. By downloading just the changes and applying the changes to the original set, it should be possible to generate something indistinguishable from downloading the entire new set (while moving a lot fewer bits).
 

DavidRa

Infrastructure Architect
Aug 3, 2015
329
152
43
Central Coast of NSW
www.pdconsec.net
"delta" would mean you only download changes. Now I might be naive, but in my experience airports don't move around all that much. So I would expect that once you have the full dataset, there just isn't that much that changes in the course of a month. By downloading just the changes and applying the changes to the original set, it should be possible to generate something indistinguishable from downloading the entire new set (while moving a lot fewer bits).
I rather suspect that everything is marked "newly updated" because the date changes; the "unit of synchronisation" is the map object with its properties.

So if you ask "what's changed since 1-Aug" for example, you get "Map 0000001 has changed, updated 14-Sep; Map 0000002 has changed, updated 14-Sep, Map 0000003 has changed, updated 14-Sep ..."

This means you have to download all 30GB to get 50kB of changed dates. The file format has no concept of changed and unchanged blocks. Even worse might be "I have file dated 1-Aug, what's the current file date" --> "14 Sep" --> "Download new file".
 

mstone

Active Member
Mar 11, 2015
505
118
43
46
I rather suspect that everything is marked "newly updated" because the date changes; the "unit of synchronisation" is the map object with its properties.

So if you ask "what's changed since 1-Aug" for example, you get "Map 0000001 has changed, updated 14-Sep; Map 0000002 has changed, updated 14-Sep, Map 0000003 has changed, updated 14-Sep ..."

This means you have to download all 30GB to get 50kB of changed dates. The file format has no concept of changed and unchanged blocks. Even worse might be "I have file dated 1-Aug, what's the current file date" --> "14 Sep" --> "Download new file".
It's almost as though we've come full circle back to "bug the source of the download to come up with a binary delta format". I understand that the current format is stupid--that's what should get fixed. :) If there's only 50k of changes, you should only be downloading 50k, not 30G. This isn't exactly a new problem, and instead of trying to come up with crazy workarounds like caching servers the vendor should just provide something that makes more sense.