Why does the forum seem to go down every 8 days?

knightmb

100 kW
Joined
May 7, 2006
Messages
1,071
Location
Franklin, TN
LOL, well, if you've noticed that about every 8 days, the forum does down for about 3 hours, then comes back up like nothing happened.

Turns out, the DB server that I recently upgraded, was given some bad RAM. So about every 8 days like clockwork, the DB server will lock up, need a reboot, then work fine again for another 8 days.

The funny thing is, it's the nightly backup that is causing this. So it goes down at 4:00AM, then doesn't come back up until I wake up and notice all the alarms going off, I reboot the server, it's happy again for another 8 days.

So, since is really starting to bug me a lot, the website may go down again today for about 15 minutes as I turn off the server and swap out some RAM in it, hopefully curing this problem. The DB server's normal uptime was always infinity before, so it crashing every week is a bad thing. :evil:

Just a FYI
 
Yep, noticed this evening it was down a while.

DK
 
:lol: Hey Knight.. you should apply for work at my daily job's head office, their solution would have been to add a task to one of the dba admins to be doing OT sitting in front of the server waiting for it to crash at that 8th day X hour, so he can reboot it and then go home.

I noticed a regular downtime but not the 8 day interval.. trying to keep up with the flow of posts on here every hour on the hour leaves one feeling detached when the site goes down lol..
 
Ypedal said:
:lol: Hey Knight.. you should apply for work at my daily job's head office, their solution would have been to add a task to one of the dba admins to be doing OT sitting in front of the server waiting for it to crash at that 8th day X hour, so he can reboot it and then go home.

I noticed a regular downtime but not the 8 day interval.. trying to keep up with the flow of posts on here every hour on the hour leaves one feeling detached when the site goes down lol..
Yeah, that's actually a separate issue. Some visitors opens up about 10,000 connections to the forum, so I end up either banning their account + IP or clearing out the firewall states so that all of those open sessions are removed. I don't notice until either a little alarm goes off (because the website becomes inaccessible then, even though the server is working just fine)

The DB issue, very annoying because the only fix so far is to remove the new RAM, but I have to get some other RAM to replace it with first and cross my fingers that it will work for at least a solid week without problems.
 
If you want to give it a try, you could make sure all the timings are set correctly. Than run Orthos for a while, and if it doesn't crash, you know your stable.
 
Well, I thought it was the RAM, turns out the system is overheating. Mainly, it just sits at 2% to 3% CPU load all the time (everything done in RAM mostly), it's only during the backups that it will crash. Turns out the backups hit the CPU at around 100% for about 5 minutes and then that's when it locks up. Mainly, it's just gzip eating CPU (but at a very low nice level), but yeah, CPU is getting too hot.

So I take the system down his morning and just pop off the giant heat sink. Turns out, there is no conducting paste between the CPU and heat sink. So after slapping Acer around, I grab the silver paste, slope it on there, mash the heat sink back down and give it another try. I thought it was the RAM because that's the only thing that has changed since last year (upgraded from 1GB to 2GB basically). Actually what has happened, the weather has gotten warmer, thus the room where the server is, has gone from 72F to around 78F and that is enough to lock up the computer at least the way it was before. So on top of some more paste, I've installed a fan in the front and back of the tower to help better circulate the air within. I just have a little monitor hooked in running 'top' so I can see if it's going ok CPU, Memory wise and probably after the forum traffic dies down, write a script that just burns CPU in wasted cycles to see if I can get it to overheat under a load again.

If it makes it past 15 minutes of constant CPU wasting, then it's good to go since the backups never take more than 5 minutes anyway.
 
Back
Top