Proxy Server to archive visited websites

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

LL0rd

New Member
Feb 25, 2020
15
1
3
Hey guys, I have a problem, I'd like to solve. Maybe some one can help me with it. I need some kind of caching proxy with a permanent cache. Something like archive.org.

The main problem I'm facing is, that sometimes I'm reading an article on a website. Accidentally I click on a link in the article, that leads me to another page. But when I go back, the article is gone. It's been removed or behind a paywall. For example: Yesterday I was looking for an apartment on a foreclosure auction. Today it's not on the site any longer. Even the direct link from the history does not work.

So I'd like to have a history, where I can see the served site as it was at a special date. Is there a solution for this? Years ago I did a similar setup for my school, where there was a log about the accessed sites with squid and squid_guard. But there was also no saved content, that was served.
 

i386

Well-Known Member
Mar 18, 2016
4,242
1,546
113
34
Germany
~10 year or so ago I used HTTrack to archive some sites, not sure how it holds up with modern sites and stuff like angular etc.
 

oneplane

Well-Known Member
Jul 23, 2021
845
484
63
You can configure squid to store the pages. But the problem with modern pages is that they aren't always simply stored as-is; dynamically loaded content might not be part of the initial page and might not be cached the way you expect it to. Since a lot of JS frameworks use cache busters to fix bad browser behaviour or allow for hot-reloads during updates it might even be a problem where your proxy has the right content but the request automatically is updated based on things like date/time.

Perhaps an experiment to see how well it still works is your best path forward.