Forum not reachable today 4/22/2017

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

zeynel

Dream Large, Live Larger
Nov 4, 2015
505
116
43
48
I had some problem reaching the forum , anyone else ?
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,516
5,811
113
@zeynel - I am guessing earlier today?

We now have 10x the bandwidth to the hosting environment. As a result, both the botnets and Google Bot can now hit the forums much faster and that blew up the nginx access log to the point that it filled the disk.

Fixed this morning. Longer time fix is re-platforming the forums, which I need to do. Also setting less verbose logging to the access logs.
 
  • Like
Reactions: pgh5278

PigLover

Moderator
Jan 26, 2011
3,186
1,545
113
@zeynel - I am guessing earlier today?

We now have 10x the bandwidth to the hosting environment. As a result, both the botnets and Google Bot can now hit the forums much faster and that blew up the nginx access log to the point that it filled the disk.

Fixed this morning. Longer time fix is re-platforming the forums, which I need to do. Also setting less verbose logging to the access logs.
Remote syslog...
 
  • Like
Reactions: T_Minus

Patrick

Administrator
Staff member
Dec 21, 2010
12,516
5,811
113
Remote syslog...
Yea well, Googlebot alone just started generating 150x as much traffic per day. (not kidding)

That remote syslog will happen as part of the forums re-platform.
 

PigLover

Moderator
Jan 26, 2011
3,186
1,545
113
It does help. Syslog server can fill up and die and as long as you have the other end set up correctly the "worker" hosts don't hang or die.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,516
5,811
113
@PigLover yes. Something that needs to get done. But here is an example of just the fourms crawling from Google. From ~6k pages / day to up to 1 million. I went through the logs and did reverse DNS lookup, and it is Googlebot.

upload_2017-4-22_13-39-32.png
 
  • Like
Reactions: T_Minus

Patrick

Administrator
Staff member
Dec 21, 2010
12,516
5,811
113
Wow. That's really bad behavior by the crawler.
Based on what I have been seeing, I almost think it is hitting the "new posts" page constantly. Very strange.

I mean, from a hardware/ network side it is not an issue for us to service. But imagine if you had a small VPS/ AWS instance.

It is about 1mbps traffic during those peaks.
 

T_Minus

Build. Break. Fix. Repeat
Feb 15, 2015
7,641
2,058
113
In that case you'd set the max crawl per-day in GWT. Still seems like an insanely lot of crawling for a site this size... I wonder if there's a boat load of dupe content / missing canonicals due to the forum software? IE: Forum posts/threads may be perfect but maybe guides, users, etc, type pages are not setup properly and spawning tons of duplicate content.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,516
5,811
113
This is getting scary. 1.9 million pages/ day and 20GB is huge. I am not sure how a small site would survive this.
upload_2017-4-22_23-10-9.png
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,516
5,811
113
OK changed robots.txt to block the #1. It did 260MB of nginx access log writes in 14 hours. Will see how this change goes.