Jump to content

Archived

This topic is now archived and is closed to further replies.

Mimic

11/5/2015 Regarding Recent Downtime

Recommended Posts

Hi Everyone,


As you are aware recently (Monday, November 2nd, 2015) our servers have had major outages in services due to a mishap with our database(s) and I am making this announcement to clear up some misconceptions related to the recent downtime. I’ll start off first with the cause of the crash, what was done to recover from the crash, and what we will be doing in the future to prevent such occurrences in the future.


What caused the downtime?

Unfortunately a rogue SQL query was ran which caused the disk space on one of our SSDs to fill up causing data corruption on the disk. At first we believed it was only an issue with the SSD being filled, so we cleared up some space and were able to run temporarily before other issues arose.

After a few hours our database went down again due to corruption causing us to fall back to systems that are not reliant our database. Our servers were moved from admins and bans being handled by SourceBans to being handled by configuration files, and any and all plugins attempting to connect to the database were disabled, including Store, CTBans...etc. Servers such as Warcraft 3 took the brunt of the downtime as it is very reliant on a functioning database to store levels and experience. TeamSpeak was down temporarily until we were able to run it off SQLite and rebuild the server from scratch. Almost all of our web services were down including the forums, SourceBans, MOTD, and our Game Server Panel.


Throughout this time we attempted to be very open about problems occurring with our database through an advertisement message on our servers, and a temporary web page explaining the downtime.


How did we recover?

To put it simply we only recovered our current database thanks to the expertise of Centran and Driz. These two were invaluable and were able to recover a lot of our information and reconfigure our database while the rest of Staff worked toward keeping services running and informing the public about outages and downtime. Unfortunately with our set-up we did not have a reliable system for automated backups meaning we would have lost a considerable amount of data had it not been for them.


Once our database was recovered we were able to start various services again including TeamSpeak, SourceBans, the Forums, Open Game Panel, Store, CTBans, and Warcraft 3 which means we’re basically at the same point we were prior to the downtime. Please let us know if there is anything strange going on with any of the services above.


What will we do in the future?

Our plan for the future is to create automated backups of our database locally, then sync the backed up database(s) at an off-site location to ensure we always have a recent backup of our database. Redundancy comes in the form of the local backup and the off-site backup allowing us to recover from a disaster like this more easily in the future regardless of point-of-failure.


As it stands we have the infrastructure in place for the local backups; however, we are in the process of determining a good host for our off-site database backups and will release an announcement when we have created a system that works.


Current Status

Currently most of our services are online and we should be able to return to normalcy. Happy gaming!

-Mimic, Chairman of the JCS

Share this post


Link to post
Share on other sites

pls edit, monday was not november 5th

thnks.

 

Thanks for all the hard work to get the servers back up guys, its all bulletfrauds fault.

 

Whatchu talking about, he just got the year wrong. This clearly happened 3 years ago ):

Share this post


Link to post
Share on other sites

War3 probably being the most heavily dependent on the database, this could have potentially been very terrifying. The War3 server has been backing up daily thanks to thorgot (one of the many reasons he deserves the legend title he received). It turns out because of recent changes to the War3 server, the script had stopped functioning and I had not sat down to fix it in a couple weeks (the old "oh it's not like anything bad ever happens I'll deal with that when I get more time"). We were in a position to lose 3 weeks of data (October 11th is the exact date we had a backup from).

We had plans (get hours played from gameme from Oct 11 to whatever date we fixed shit and distribute level banks based on that automagically with a plugin, skitt has experience similar to this with this from the store mishap so it wouldn't be too hard) but that is really not optimal.

 

Thanks to the full backups that will be occurring, our War3 backups should largely become unnecessary. For the sake of redundancy though, we will continue to do daily backups. I fixed the script today (which I couldn't have done without thorgot, I'm SQL illiterate), but I'm waiting for a couple pieces to be in place before it will be running daily again.

 

This sums up staff chat Monday night: http://puu.sh/l7v7v/5654a0d1ba.png

 
Many people will probably never understand how much work went in from a handful people (namely Driz and Centran) to make everything right again. No amount of praise can do it justice.
 
(honorable mention to: Bulletford who helped clean up server plugins, get things going again, helping out with a lot of "side tasks" for Driz)

Share this post


Link to post
Share on other sites
 Remember, remember! 

    The deuce of November, 

    The database treason and plot; 

    I know of no reason 

    Why the database treason 

    Should ever be forgot! 


    Guy Bullet and his companions 

    Did the scheme contrive, 

    To blow the Forums and Servers

    All up alive. 

    Threescore queries, laid below, 

    To prove old sG's overthrow. 

    But, by driz and centran's providence, him they caught, 

    With a dark query (or so he thought).

    An upvote and a stake 

    For King Mimic's sake! 

    If you won't give me one, 

    I'll take two, 

    The better for me, 

    And the worse for you. 

    An infraction point, an infraction point, to hang the bulletford, 

    A penn'orth of cheese pizza to choke him, 

    A pint of sql to wash it down, 

    And a jolly good trolling to burn him. 

    Holloa, boys! holloa, boys! make the posts whiz! 

    Holloa, boys! holloa boys! God save the driz! 

    Hip, hip, hooor-r-r-ray!

Share this post


Link to post
Share on other sites

I dont believe this for a fucking second

 

I don't believe you didn't ruin our last two attempts at a DarkRP server either. :(

Share this post


Link to post
Share on other sites

that's just the general term now, "the database was bulletford'd" doesn't sound as good

officially, it's "klark'd" regardless of who did it

Share this post


Link to post
Share on other sites

×
×
  • Create New...