Squid Cache -- General Questions/Info

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,641
2,058
113
See last thread about caching linux related updates: squid cache -- caching updates?

The plan for me, as of now is to use pfsense+squid single hardware/system. I'm looking into using squid cache/proxy for general web resources, images, etc... hopefully can tweak/customize config to handle images+vids on FaceBook too.

Cache Storage - Primarily in RAM but seeing more guides/info about people using enterprise (write intensive worthy) SSD or NVME for cache... has anyone done this with SSD? Is it worth using ~8GB RAM for cache and 128GB of SSD? or, better off loading up with 32GB RAM and using 24GB for cache? I don't really see the latency as too much of an issue since without cache it's going through the internet to get to me, but maybe it's a big deal with how squid works???

Squid Hardware - Will squid introduce latency for all requests cached or not? Is it sensitive to a low frequency CPU, especially if single threaded and/or sharing 1 core with pfsense?


I won't have a ton of users on the system at once(for home), it's more for bandwidth savings to mitigate the occasional random slow down from a single user requesting resource(s) that consume the entire available bandwidth for short, small bursts... I'll be configuring it to cache larger files too, so when I re-load PDFs it can handle those type of scenarios too.

Beyond the above, if you want to share your overall quid experience please do, and any tips/tricks you learned along the way I'm all ears :) :)
 

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,394
511
113
It's been a good long while since I used squid for anything like this on a large scale, but to hopefully answer most of your questions:

Are you planning on MITM the proxy? Because squid won't cache most stuff over HTTPS without additional tweakery, but with the amount of traffic going over HTTPS these days you're likely going to need to MITM to make it worthwhile anyway. If you're also planning on inspecting traffic for malware or other IDS red flags this is pretty much mandatory.

RAID10 used to be strongly recommended for squid back in the days of platter-based drives since it's load was mostly random (and probably at least 75% reads). If you're using a moderately decent SSD that can cope with a sustained load I'd skew towards using more SSD and less RAM than the other way around.

Don't think I ever saw squid max out a single CPU, but you can spawn multiple workers if you need to (I seem to remember it being single threaded but apparently dedicated SMP code has been merged for some time now). Assuming you're using a >1 CPU device if shouldn't interfere with pfsense or other processes unless you run out of CPU resource. YMMV of course since all of this is highly dependant on what you're getting squid to process, number of users, WAN bandwdith, etc etc.

Yes, squid will introduce latency into any and all HTTP requests; you basically have to do some testing to see whether the time saved from pulling down the same images and scripts the whole time outweighs the additional latency. In all but the fastest WANs running no packet analysis or MITM or IDS I imagine this would be the case.

However, if your real problem is occasional spikes where one user is hogging all the down/up bandwidth, I think QoS and/or rate-limiting individual connections might be a less complex way to achieve better throughput for everyone. For instance I've got limiters set up on my home pfsense so that no single IP on the LAN can use more than 80% of the up/download bandwidth.

If you're looking to save bandwidth for things like apt, dedicated tools like apt-cacher-ng are generally better suited than squid as per your linked thread. For windows updates there's a fairly extensive guide here although I'm not sure how current it is.
 
Last edited:
  • Like
Reactions: T_Minus

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,641
2,058
113
Thanks for the break down @EffrafaxOfWug exactly what I was looking for :)

I enabled it yesterday and have been playing around with it, right now it's not doing HTTPS. I've set the limits really low to see how it does and if we get a decent cache-hit rate... I'm noticing a very good hit rate on ebay but I don't know if it alone justifies it ;) I really want to try to configure it to cache facebook, and https then we likely will have a better hit rate.

What do people consider a good hit rate % ? Or, based on network capacity compared to utilization? I mean 5% to save 5GB is one thing but 5% to save 500GB is another ;)
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,641
2,058
113
Hit Rate this morning is 22% so far.

I alone should be able to actually swing that pretty good (small home network) since I look at the same sites throughout the day, all day, every day... LOL. Ebay, Forums, etc, full of images.

I'm thinking once I get HTTPS working and even a small % of facebook images/resources will help that % up a bit too.

We'll see how it does over a few weeks if it's workable with added latency or not, going to run it then disable it and compare :)

In the meantime I'll be replacing the hardware for the router going from J1900 w/4GB & 16GB MSATA to 4C Rangley w/8GB RAM & 2x 100GB SATA SSD to start we'll see if I want to add more drives later, plan is 1 for os/pfsense 1 for squid/logs.

Right now @EffrafaxOfWug I'm not doing threat management/detection but plan to play around with that when I get the Rangley system up and running, with Squid and minimal traffic I've seen the CPU spike 10% (not much but worrisome with minimal load on the j1900).

I've also been adding some local IPs/hosts to ignore/bypass the proxy like the ROKU and TV devices, totally skewed my cache-hits yesterday, and didn't feel like manually erasing the logs ;) ridiculous you can't "clear logs" in GUI, or at-least set a 'new start date' for the Squid REports.
 

mstone

Active Member
Mar 11, 2015
505
118
43
46
Mostly squid at this point is a complete waste of time because:
1) most content is HTTPS
2) the most frequently accessed static content is all cached browser-side
3) the latency tax on every. single. request. hurts more in the real world than any speedup from caching
4) content is increasingly dynamic or otherwise uncacheable, even if plaintext

When considering MITM on HTTPS, just don't. It's basically impossible to do this effectively and still maintain the security of HTTPS transactions.

(edited to add #4)
 
Last edited:

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,641
2,058
113
Mostly squid at this point is a complete waste of time because:
1) most content is HTTPS
2) the most frequently accessed static content is all cached browser-side
3) the latency tax on every. single. request. hurts more in the real world than any speedup from caching

When considering MITM on HTTPS, just don't. It's basically impossible to do this effectively and still maintain the security of HTTPS transactions.
Yeah, really wish could efficiently cache more overall. I haven't attempted HTTPS but here's the last couple days:

10th - 6% hit rate
11th - 11.5% hit rate
12th - 6% hit rate so far

I can't really tell browsing except on ebay then as long as the images are not on a CDN or hTTPS then ebay is actually faster.

The actual hit % is lower, I've blocked some systems from using the proxy (roku) as well as ignored some IPs/domains that server big files or non-cachable like ms updates, google 'alive' checks, pandora, etc, the day I didn't take out roku device, pandora end points, etc, cache hit was like .02% but when you ignore streaming services, update services, the hit rate is more 6%. If I could get it to cache facebook images alone I'm sure that would double.
 

mstone

Active Member
Mar 11, 2015
505
118
43
46
honestly, seeing a speedup for ebay sounds more like the placebo effect; if there is cacheable content it should be in the browser cache already.

at this point you should really be asking "to what end?" -- if you're ignoring most of the volume of web traffic, what's the benefit of getting a 10% cache rate on whatever small portion is left?
 

amalurk

Active Member
Dec 16, 2016
311
116
43
102
I have heard of a small ISP squid caching at the level of a large apartment complex or college dorm facility they had exclusive contract to provide connectivity to. But this was years ago before widespread https. No idea if they are still doing it now and doing a man in the middle on their customers secure connections. And it might make sense for these guys How a group of neighbors created their own Internet service

But does this really make sense with widespread https now and for just a single household?
 
  • Like
Reactions: T_Minus

Evan

Well-Known Member
Jan 6, 2016
3,346
598
113
Given I have great connections most of the time maybe my view is not practical for everybody but generally I don't see an advantage of proxy for normal web traffic http or https, if is user is hitting the same data it will local cache anyway.

What would he really useful is super simple to deploy caching for Microsoft and Apple software updates ! That would be gold
 

EffrafaxOfWug

Radioactive Member
Feb 12, 2015
1,394
511
113
When considering MITM on HTTPS, just don't. It's basically impossible to do this effectively and still maintain the security of HTTPS transactions.
I'd disagree with this. Assuming you're doing it internally, it just means you need to trust your internal CA as much as you do the third party site you're connecting to; for a home network where you control the CA there's nothing lost by using SSL MITM other than the (very marginally) increased attack surface of the proxy. If you don't trust your own internal network then you shouldn't be using it at all.

Advantage to using a proxy for me is that:
1) it's then a doddle to enforce default deny on outgoing HTTP connections to enforce proxy-only connections
2) analysis of proxy traffic can then be easily performed in one place

At work, it's company policy and a pre-req for GDPR that all traffic traversing the geographical WANs is checked and audited to identify privileged or personally identifiable data being sent from one place to another (or worse, out to the internet).
 
Nov 13, 2022
44
11
8
Long and short from someone who ran SQUID when the internet was mostly http, it does NOTHING for a single client.. You really need a office or companies worth of computers all trying to get the same content for it to be worth running a cache in addition to what all browsers already do. Even then the main benefit was that the local network was faster than the internet connection, it reduced office bandwidth use over the connection AND the chance that someone else in the office would view the exact same content was higher so the cache hit rate would be high.

There is zero point in doing this at home. 2-5 ish home users will not get any benefit if the home internet is already FAST. Even then you would get more benefit from the browser cache as it will be able to cache content marked private AND HTTPS content that a proxy will not.
 
  • Wow
Reactions: Aluminat

alaricljs

Active Member
Jun 16, 2023
194
67
28
On the technical network side of things... How do you have multiple IPs? When I stood up virtual NICs (SR-IOV) I found out that while I could still use the non-virt device, it couldn't talk to the virt devices without extra routing help whereas the vNICs were all sitting on a vSwitch by default.
 
Nov 13, 2022
44
11
8
Squid can be great to cache Fedora/Debian repositories

If you have MULTIPULE server of the same release sure, that MIGHT help a little. sounds like OP has a small home network. Even for that use case you would need several servers for that to make any sense vs just going direct on a modern high speed connection.