MAJOR EVENT - Please Read

Patrick

Administrator
Staff member
Dec 21, 2010
12,113
5,133
113
Hi everyone.

I absolutely hate to have to break this one, but I did something bad and I need everyone to be aware of what happened.

Earlier today, I was going through the reports, and someone reported one of @hmartin 's posts as spam. It was not. I misclicked and it removed something like 188 of his posts.

Xenforo has a "restore" function, but apparently, his posts were permanently deleted in the process (while likes/ badges and such were not.)

I asked Xenforo support about if there was a way to get posts back via a SQL query or something. We do regular backups so this was something I figured that the XF team would have like a "restore user XYZ items from database X to import into Y.) I was wrong.

As a result, we were basically at the point where every one of his posts, and all of the threads with replies from others on the board were gone.

So the choice was:
1. Leave as-is keeping posts from today around but deleting someone who has been a member since 2017 -or-
2. Restoring from the just after midnight pacific backup of the database and having content restored. This caused about 3-4 minutes of downtime today and removed 10 hours or so of posts.

I made the decision to do the restore. In the process, I opened a browser tab with every public posting today since I know this impacts folks. I will be sending all of the content I have to people via PMs today so it can get reposted.

Thorough apologies to the STH community and especially @hmartin today. This was 100% my fault for clicking the incorrect radio button before drinking coffee. No excuses.

On the plus side, let this be a reminder that this is a good reason to have backups!
 

BLinux

cat lover server enthusiast
Jul 7, 2016
2,556
1,003
113
artofserver.com
  • Like
Reactions: Sleyk and bayleyw

Patrick

Administrator
Staff member
Dec 21, 2010
12,113
5,133
113
@BLinux

Working on getting all of the interim restored content that I have a browser backup of out to folks now.

Not a great day, but the alternative impact was much worse.
 
  • Like
Reactions: Sleyk

BLinux

cat lover server enthusiast
Jul 7, 2016
2,556
1,003
113
artofserver.com
@BLinux

Working on getting all of the interim restored content that I have a browser backup of out to folks now.

Not a great day, but the alternative impact was much worse.
This is just a thought, not something I've tried and tested in this type of application (forum webapp+database+??), so I don't know the implications: how about putting the database files on ZFS with rolling snapshots at "frequent" intervals? I do this with my minecraft servers so that if a glitch happens and parts of my world gets deleted or destroyed, or whatever crazy thing happens, I just stop the server, rollback to the snapshot from 10 minutes ago, and restart the server. I do this with other storage too, but it's saved several of my minecraft worlds, as well as the "oh ****, I just rm -rf the wrong path!" situations...
 
  • Like
Reactions: Sleyk and Amrhn

Patrick

Administrator
Staff member
Dec 21, 2010
12,113
5,133
113
@BLinux

That is effectively what we do, just putting the entire DB VM's snapshot, then pushed to a remote server. On these it is set for 12 hours. By the time I got done with XF support not giving me a good answer we were about 2 hours from the next backup.

If anything about sending everyone PM's on their lost threads is teaching me is that 12 hours is probably too long.

Still, it would have impacted several hundred posts if I had not done the restore, and the pain of the restore would only get worse the longer I waited.
 

hmartin

Active Member
Sep 20, 2017
248
175
43
34
@Patrick, sorry to have caused you problems!

I reported my own thread as a duplicate, since BeTeP correctly pointed out that the topic I started had been previously discussed.

I wasn't trying to erase history, I just thought less clutter more better.

I really appreciate the effort you went to restore my content!!
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,113
5,133
113
Do you have a backup / copy of the text? I can't really remember what I wrote...
I sent everyone the browser backup I had. I specifically remember this one as I think I lost the text in the process. There were 2-3 posts that I was not able to save and this was one of them.
 
  • Like
Reactions: Sleyk

MiniKnight

Well-Known Member
Mar 30, 2012
3,001
911
113
NYC
I'd look at it the other way, you've just successfully done a real-life DR exercise with downtime in the single digits and very little completely lost. You've come a long way.
 
  • Like
Reactions: Sleyk and Amrhn

amalurk

Active Member
Dec 16, 2016
222
65
28
99
I asked Xenforo support about if there was a way to get posts back via a SQL query or something. We do regular backups so this was something I figured that the XF team would have like a "restore user XYZ items from database X to import into Y.) I was wrong.
This should always be fishable from a backup if you know enough SQL and the database structure well enough. Which they should but yeah it could take hours even for a very knowledgeable person, so I understand why they won't provide that kind of support but they could. I just hate it when support people tell you something cannot be done verses, we don't provide that support or it isn't feasible to do that because of prohibitive time/cost or whatever.
 

psc

Member
Jun 30, 2019
34
6
8
Reading this, after your direct eMail, two things are immediately obvious:
1) You handled a very bad situation very well. Kudos for both being open about what happened and for the work you've put in to minimise disruption
2) You've found a serious gap in Xenforo capabilities, and they need to respond to that. You probably aren't the first, and certainly won't be the last user to click the wrong button... I'd be interested to hear how they handle this.
 
  • Like
Reactions: Sleyk and Amrhn

infoMatt

Active Member
Apr 16, 2019
210
90
28
@BLinux

That is effectively what we do, just putting the entire DB VM's snapshot, then pushed to a remote server. On these it is set for 12 hours. By the time I got done with XF support not giving me a good answer we were about 2 hours from the next backup.
I don't know the details about what DBMS you are using under the hood, but for example PostgreSQL has a very useful feature called Write Ahead Log (WAL), and it can be configured to split (and archive) the files created by size or by timeout. I imagine that most DBMS platforms have something similar... In those cases they make it possible to restore the last known good backup and replay the transaction log 'till a couple of seconds before the unlucky event.

But that said, kudos to you (and all your team!) for all the great work that you do behind the scenes every day, and for this very thread, not many people would have admitted it so clearly and would have been so proactive with the impacted users. ;)
 
  • Like
Reactions: Sleyk and Amrhn

Patrick

Administrator
Staff member
Dec 21, 2010
12,113
5,133
113
@amalurk and @infoMatt I totally get the database angle. I think it was possible, but once I knew that the fix was not "here is a script change a value here and here" or "sure it will be $x and will be fixed in an hour" the next question was how to mitigate quickly.

I actually like discussing when bad things happen to our hosting. I think it is an important aspect of the site.

@MiniKnight it was probably ~5 min of downtime. Most of that was because I had not realized we needed to restore the web front-end as well so the two big components were done sequentially rather than in parallel. I also get a little crazy and will make a full VM clone *just in case* I screw something up.