Why we had big downtime 04/22/18 and intermittent downtime since the upgrade

neptronix

Administrator
Staff member
Joined
Jun 15, 2010
Messages
17,407
Location
Utah, USA
Hi all. When we moved from phpbb 3.0 to 3.2, we decided to use the newest version of the database server on AWS.
It turned out that this version has a slow memory leak that would cause the server to stall for an hour, and then reboot on a 3 day cycle..

What we did was double the size of the memory on the database, and this made the system last about a month, but stall for 2-3 hours before rebooting as the memory filled up. Obviously not a solution.

I tried tweaking all kinds of database settings and also upgraded to a new subversion that claimed to fix the problem, but neither approach worked. I had planned on downgrading the database next week, but noticed that the system had stalled and rebooted twice in the last week, and was hung this morning.. so i took the system down and did the downgrade. Total downtime was around 6 hours.. mostly because the size of our database is just enormous and takes eons to process.

Sorry for no heads up on the downtime. I figured sunday morning was an ideal time to tackle the problem while the system was down already.

Here's hoping that the problem is cured once and for all. If not, there's more tricks up my sleeve, but i've already blown collectively ~18 hours on this problem so ... cross your fingers for me, fellas. :)
 
THANKS Nep. :)
 
Yep, the proper response to a memory leaking program is fixing the memory leak. As you found out, throwing more memory at it just simply prolongs the time till memory exhaustion.

So if it's the database... think a latest stable "working" version would work? Pretty sure the best database systems are designed not to leak. Or are there security concerns with not-the-newest version?
 
swbluto said:
Yep, the proper response to a memory leaking program is fixing the memory leak. As you found out, throwing more memory at it just simply prolongs the time till memory exhaustion.

So if it's the database... think a latest stable "working" version would work? Pretty sure the best database systems are designed not to leak. Or are there security concerns with not-the-newest version?

Initially we thought it was some kind of DDOS because it was happening so rapidly with the lower memory, which was weird because i have reliably hardened many linux servers running this exact OS. The extra memory was kind of a test to rule out a possible memory leak because i'd tried some very restrictive DDOS mitigations on top of that, and not had any results.

Amazingly, it's the latest major version of the database that had the memory leak. And i notice that in the release notes, they are gradually patching one memory leak after another. Mind you, this is a major vendor, so i had a feeling they'd eventually sort it out. Amazon has a note about this and offers the opportunity to use their own fork of the database to get around this leak, but i decided not to take up that offer, lest we end up with vendor lock in.

We are already vendor locked in with phpbb due to our massive amount of content.. that's bad enough :lol:
 
"Mind you, this is a major vendor..." Watt. Like manufacturers of the 20th-Century horseless carriage in a 21st-Century ("crowded") "urban" world? Surely you jest Sir. :lol:
 
Is it MySQL as the database?
From my experience if the server is dedicated to database work then the actual memory setting of MySQL can't affect performance much as the dedicated server will just load up every block of hard drive storage into kernel virtual memory it can and access it instantly instead of hard drive storage, its essentially the same as if you configured the db software to use all the available ram, as long as the server is dedicated to db then the kernel memory management can't use the ram for anything else.
If the server has say 96GB of ram and the database software is set to say 16gb then
when you reboot the linux server watch it with "vmstat 30" and see how around 80gb of the free memory gets eaten up by the kernel just loading hard drive storage blocks of data from which the mysql is requesting from db requests.
While it doesn't make much difference from my experience and benchmarking, for the very best performance its best to decide if you want the kernel vm in memory management of db data or to force the db to use as much of the ram as possible. The worst thing you can do is have half and half between the DB and free ram as its most likely the db software and the kernel vm will be caching an exact same copy of the data on each side of the memory, effectively halving the actually used ram.
 
Yeah, without revealing anything about our infrastructure, i'm aware of what you're talking about :)

But we had an out of control memory leak on the database's part which would cause it to go into swap and then stall for hours and reboot itself. The newer version of the database wasn't responding to all sorts of parameters i set in order to control memory usage, that normally work on other versions.

I'm seeing what looks like a positive trend in the database memory usage though. We will know whether the fix held in a few days. If not, more drastic measures will be needed :evil:
 
Thanks for fixing the forum. Looks good from here.

Lots of problems and solutions here:
https://www.phpbb.com/community/index.php

Wonder what server phpbb.com people use for their forum? Lets do some research.

endless-sphere.com
Name Server: NS-415.AWSDNS-51.COM
Name Server: NS-1669.AWSDNS-16.CO.UK
Name Server: NS-1320.AWSDNS-37.ORG
Name Server: NS-997.AWSDNS-60.NET

phpbb.com
Name Server: ERIC.NS.CLOUDFLARE.COM
Name Server: ERIN.NS.CLOUDFLARE.COM

marty's server
Name Server: NS5.SECURESERVER.NET
Name Server: NS6.SECURESERVER.NET

My thoughts:
AWS is Amazon Web Services. Looked at https://aws.amazon.com/ I am overwhelmed. Too much to comprehend. No phone number for questions?

CLOUDFLARE.COM know nothing about them? Located in San Francisco CA. Got a phone number for sales. The Free plan looks interesting. Should try moving one of my web sites over there.

SECURESERVER.NET is GoDaddy. Call them any time for any question. I am happy. Had a delusional idea of becoming a reseller. Having major Dreamweaver confusion :(
 
Thanks for trying to be helpful but you're thinking a couple levels out of the scope of the problem.
I am the guy you call when your linux server has gone tits up. There is no 'call a friend' option for these kinds of problems :lol:

Anyway, i'm seeing what looks like a promising trend for the memory consumption on the database. We will know if the fix held within a few days, but our DB was crashing on a 1-2 day cycle and we're on day 3 now, so :mrgreen:
 
Yahoo!!!

2018-04-25 11_58_12-RDS · AWS Console.png

As you can see, the database starts out with a lot of free ram and then starts using more memory to cache our huge database as time goes on. But then, it reaches a stasis point after 3 days. It was not doing this before. The curve just went down to zero megabytes, and into swap.

But for the last 12 hours, memory usage has been between +/- 3 megabytes. I'm calling this fixed. :mrgreen:
 
Back
Top