Proxy Server to archive visited websites

LL0rd · Dec 14, 2021

Hey guys, I have a problem, I'd like to solve. Maybe some one can help me with it. I need some kind of caching proxy with a permanent cache. Something like archive.org.

The main problem I'm facing is, that sometimes I'm reading an article on a website. Accidentally I click on a link in the article, that leads me to another page. But when I go back, the article is gone. It's been removed or behind a paywall. For example: Yesterday I was looking for an apartment on a foreclosure auction. Today it's not on the site any longer. Even the direct link from the history does not work.

So I'd like to have a history, where I can see the served site as it was at a special date. Is there a solution for this? Years ago I did a similar setup for my school, where there was a log about the accessed sites with squid and squid_guard. But there was also no saved content, that was served.

amalurk · Dec 14, 2021

Zotero? Zotero | Your personal research assistant

i386 · Dec 14, 2021

~10 year or so ago I used HTTrack to archive some sites, not sure how it holds up with modern sites and stuff like angular etc.

oneplane · May 27, 2022

You can configure squid to store the pages. But the problem with modern pages is that they aren't always simply stored as-is; dynamically loaded content might not be part of the initial page and might not be cached the way you expect it to. Since a lot of JS frameworks use cache busters to fix bad browser behaviour or allow for hot-reloads during updates it might even be a problem where your proxy has the right content but the request automatically is updated based on things like date/time.

Perhaps an experiment to see how well it still works is your best path forward.

Search

Proxy Server to archive visited websites

LL0rd

New Member

amalurk

Active Member

i386

Well-Known Member

oneplane

Well-Known Member