[I tried sending this at roughly 3am this morning, and forgot yet again that replying to this list by default replies to the sender.]

My subconscious woke me up just a minute or two after I got a Pingdom alert about site down. I wish my mind would just let me sleep properly!

This has now happened at 2:29 on the 22nd, 2:19 on the 24th and 2:44 on the 25th. Coincidence? I think not. My feeling is that an application is being attacked. The only thing that could come from within is an errant Drupal cron job, but as of a few hours ago, they run at the start of the hour (staggered), and so would not have bought it down towards the end of the hour.

Now that I’m awake right when this has happened, I can’t login to the server via SSH. I would normally raise an emergency ticket with the hosting company [1], but given the history, I’m almost interested to see if it is back up for me in a few hours.

On 25 Jul 2014, at 01:10, Ian <ian@lovingboth.com> wrote:

I may be misunderstanding something, but why not run WP with the same
Apache? (Apart from the way it would currently bring down everything else!)

I didn’t really want to take this hosting on, but it was a personal favour for a very loyal client that I do D7 for. They rebuilt their WP site after their previous one was hacked.

I just don’t trust WP, and it seemed right to be wary of exposing my D7 vhosts to a WP vhost that was a known target.

Do you think it's crashing because of the load before anything gets
written to the access logs?

No, it’s not crashing. When I have been able to login in the past, the system is full of Apache processes doing nothing, with nothing in the logs. This is what made me think slowloris. It seems to me there are loads of connections never being allowed to finish.

See the wiki article on WordPress and use a fail2ban jail that looks for
any access to wp-login.php and bans the IP address for more than a
handful of accesses in a few minutes. If it's only legitimately accessed
from known whitelisted addresses, you can set it to ban on a single access.

I think that is the next step, yes.

Any thoughts? What would you do differently?

Have a cron job that checks if the second Apache is running and, if not,
starts it again.

(Just to be sure, I’m now running a single Apache with mom_itk. I didn’t say it in the past, but I’d also tried nginx/php-fpm, and I had to keep restarting the php-fpm handler in that case, too!)

Stay up until 2am and have a look at what's happening :)

:)

Time to go back to sleep before the little person wakes me up, again!

B

[1] They have fairly aggressive firewall settings, so I can’t be sure if the machine is genuinely uncontactable, or if they have triggered a temporary ban. And I can’t see the Nagios emails until the machine is reachable.