Hi Folks
(I hope Andy doesn’t mind me posting about a VPS with a provider over the road from him. He knows that the eventual intention is to bring all of my business over! I am tempted to “solve” this problem by migrating the offending WordPress site to my BitFolk box, but I suspect the problem would just follow with it.)
I have a server where Apache is falling over more often than I’d like. I’ve not previously been big on server monitoring, so I don’t have huge amounts of data that could point to the problem. (I’m changing that with as much time as I have available.)
I’ll try to keep the story brief:
This is happening on an Ubuntu 12.04 box. It runs Varnish and Apache with many virtual hosts, and also MySQL and Nagios. One virtual host is a WordPress site, another is a Drupal 7 site that has not had much publicity made about it, but could attract “enemies” when it does [1]. Not much customisation was done to the usual install, but some TCP tweaking was done (mostly by the hosting company) because the VPS was setup to expect massive traffic.
For the most part, I configure Drupal sites and then take on the hosting. I was asked to take on the hosting of a WordPress site. Not wanting to run my sites and the WP site with the same user, I setup a second Apache instance for the WordPress site, with Varnish forwarding to the appropriate Apache instance.
Some time later, the second Apache instance started segfaulting far too often, and this could be seen in the error logs. It couldn’t go any longer than about an hour before it fell over. It started happening around about the time, but not exactly when, I installed an extra PHP module at the users request [2]. The site was down until Apache was restarted, and as a temporary measure, I wrote a very crude cron job to automatically restart Apache as needed.
Dealing with the core dump and gdb seemed too much effort, so I decided to try mpm_itk [3] instead (which is actually a better fit with my longer term plans). This worked swimmingly for about a week. However, over the past 3 nights, all of my sites have fallen over at about 2am, with nothing being served until I woke up and restarted Apache. Some distinguishing features are:
* This happens at roughly the same time each day.
* All other services on the server are responsive, it’s just Apache that is bound up.
* CPU loads are next to nothing.
* Nothing shows up in the Apache logs during the downtime.
* Nagios reports many more processes than usual are in the system.
I am getting a bit too far from my comfort zone now, but it seems to smell of slowloris to me.
I’ve since enabled server-status, and if the issue comes up again, I should look at it before I restart Apache. Watching it now, some distinguishing features are:
* wp-login.php is getting hit quite a bit on the WordPress vhost from different IPs.
* KeepAlive doesn’t seem to be working as I understand it — why should there be Varnish connections that seem to be open and waiting for 100’s or 1000’s of seconds? (I”m looking at the SS column.)
* I was running my Drupal cron jobs too often — embarrassingly so -- via cron and wget, and cron runs for the same vhost seemed to be overlapping [4]. I’ve slowed this down.
Any thoughts? What would you do differently?
(OK, so the story wasn’t brief — apologies!)
Thanks
Ben [5]
[1] [2] My suspicion is that these are red herrings.
[3] Is this a daft thing to do?
[4] I’ll sort out separately why this might have happened, but I’ll move to managing this via Drush rather than wget.
[5] I am a site builder first, and an admin second, so please excuse the things I’ve done that are clearly sub-optimal.
Is anyone else suffering from this pingback ddos at the moment? My server is
only a low spec one and keeps being brought to its knees by it. Now I've had
the time to actually look at the logs and work out what is happening I'm in the
process of putting some form of protection in there, although the quick fixes
seem to impact functionality. I was just wondering what anyone else had done,
assuming others have been impacted too. Not having read in full detail yet, I'm
wondering why this is just a Wordpress issue. Wouldn't you be able to do the
same thing with any blog or cms that uses pingback?
--
Paul Tansom | Aptanet Ltd. | http://www.aptanet.com/ | 023 9238 0001
=============================================================================
Registered in England | Company No: 4905028 | Registered Office: Ralls House,
Parklands Business Park, Forrest Road, Denmead, Waterlooville, Hants, PO7 6XP
Hi all,
Forgive me if this is a little off-topic, but this list seems like the
(rare) kind of forum read by people who might have already thought
about this problem and maybe even come up with solutions:
How do you ensure that your online data is handled correctly if you die?
Regards,
Adam
Hi,
I’ve updated my VPS to Wheezy. Unfortunately, I can’t get it to boot.
Xen console returns
Booting instance: travelkazoo
Using config file "/etc/xen/travelkazoo.conf".
Error: Boot loader didn't return any data!
I booted into rescue, mounted my root partition, chrooted, mounted /sys, /proc, and /dev/pts, and tried to figure out where I went wrong.
I am not very familiar with xen - am I supposed to run xm create in /etc/xen? My previous squeeze system’s backups do not have a travelkazoo.conf file in /etc/xen. xm create /etc/xen/travelkazoo.conf doesn’t work. xend does not exist, even though I have xen-linux-system-686-pae installed, which is supposed to include it. So it obviously cannot run.
xen utils installed is version 4.1, as is xen-hypervisor-4.1-i386 (arch is i386). update-grub2 works fine, /boot/grub2/grub.cfg exists and looks plausible, I’ve moved /etc/grub.d/10_linux to 50_linux so 20_linux_xen is first in line once I run update-grub2. I’ve been searching like mad and can’t seem to find anything else to try.
Can someone please give me a hint what I’m missing, or do I need to reinstall squeeze from scratch and restore from backup? :(
Here are all relevant packages installed:
ii libc6-xen:i386 2.13-38+deb7u1 i386
Embedded GNU C Library: Shared libraries [Xen version]
ii libxen-4.1 4.1.4-3+deb7u1 i386
Public libs for Xen
ii libxenstore3.0 4.1.4-3+deb7u1 i386
Xenstore communications library for Xen
ii linux-image-2.6-xen-686 2.6.32+29 i386
Linux 2.6 for modern PCs (meta-package), Xen dom0 support
ii linux-image-2.6.32-5-xen-686 2.6.32-48squeeze4 i386
Linux 2.6.32 for modern PCs, Xen dom0 support
rc linux-modules-2.6.18-6-xen-686 2.6.18.dfsg.1-26etch2 i386
Linux 2.6.18 modules on i686
ii xen-hypervisor-4.0-amd64 4.0.1-5.11 i386
The Xen Hypervisor on AMD64
ii xen-hypervisor-4.1-i386 4.1.4-3+deb7u1 i386
Xen Hypervisor on i386
ii xen-linux-system-3.2.0-4-686-pae 3.2.57-3+deb7u2 i386
Xen system with Linux 3.2 on modern PCs (meta-package)
ii xen-linux-system-686-pae 3.2+46 i386
Xen system with Linux for modern PCs (meta-package)
ii xen-system-i386 4.1.4-3+deb7u1 i386
Xen System on i386 (meta-package)
Hi,
I'm not sure if there is a problem but for the last couple of months,
I've been seeing about 1 NAGIOS alert a week and various VPN problems
that seems to say that there may be intermittent connectivity issues
within the cluster.
I'm seeing SSH warnings most commonly that my VOS is uncontactable even
though I was logged in during some of them. I had a 10 hour backup file
age NAGIOS warning today. I've also recently seen my VPN dropping out
much more frequently than usual.
Has anybody else experienced any symptoms please or is it just me?
Thanks,
Paul.