On Sun, 14 Mar 2010 08:51:12 +0000
Andy Smith <andy(a)bitfolk.com> wrote:
Hello,
This very long email is about possible pro-active measures I could
take to prevent customers being compromised by SSH dictionary
attacks. The first part is just a recap of how we got here and what
happens. If you want to make it shorter by skipping that, then skip
to line 59 which begins with "Being compromised by an SSH dictionary
attack..."
Every so often I get an alert about the number of outbound SSH
connections being much higher than usual. e.g. 100,000 instead of 4.
I then investigate, and most of the time it's harmless -- someone
doing a pentest or some sort of remote monitoring.
But once every month or two, it's a customer's VPS which has been
compromised and turned into an SSH scanning bot. As it happens,
there's been two instances of this in the last month.
When BitFolk first started, this sort of thing happened about three
times before I decided I'd have to have monitoring of it so it
wouldn't go on for days on end.
When this happens and I catch it in the act, I have to turn off the
customer's network, because their VPS is actively attacking remote
sites, often at a rate of several hundred thousand SSH connections
per hour. Aside from the hosts they may compromise, there will be
abuse reports to deal with and possible damage to BitFolk's
reputation, not to mention very large bandwidth usage that the
customer is unlikely to want to pay for.
I can't just block tcp/22 outbound because at this point the VPS is
compromised and the priority (after stopping it) is understanding
how it happened. Leaving the network working allows attackers more
opportunity to clean up after themselves and/or do more damage.
So, if you're the unlucky customer then at this point your VPS has
no network, you've received an email (assuming your email is
accessible from outside your VPS) about what's happened and you're
going to have to log in to the console and/or rescue environment to
try to work out how it happened. The rescue environment has access to
your block devices and has unrestricted network access, but I'm not
going to be able to turn networking back on for your VPS itself.
If the customer can work out what happened, how it happened, and
convince me that root access was not obtained, then I can turn their
network back on. If they can't convince me that root access was not
obtained then there's generally no option but to re-image the VPS.
I'm really loathe to re-image a VPS that's been compromised where we
don't know how, though, because it will likely just happen again.
The customer has to work it out, and that's maybe something they're
not used to doing, and/or don't want to be doing, while their
service is down the whole time.
Unfortunately I can't do it for them or help in any great detail
because this is a time consuming task, and is the sysadmin's
responsibility i.e. the customer.
Being compromised by an SSH dictionary attack is far and away the
most common thing that I see, and reacting to it by monitoring the
rate of outbound SSH connections is no longer good enough. Reacting
to it and cleaning up after it is wasting too much time, and when
I have customers that have it happen to them repeatedly at some
point I have to say enough is enough and not turn their service
back on at all.
Do you think there's any pro-active measures that would be
acceptable to VPS customers? Typical ways to foil SSH dictionary
attacks:
1) Only use strong passwords.
2) Don't use passwords at all, only keys.
3) Disable root login.
4) Restrict the list of usernames that are valid, in combination
with (1) and (3).
5) Install DenyHosts or Fail2Ban.
6) Move sshd to another port.
More?
I can't do anything about (1), since the password the VPS comes with
is strong and it's presumably made weaker later on.
A lot of people have trouble setting up SSH keys and I would guess
that very few customers have them before they get a VPS, so setting
it up out of the box to require keys would be rather limiting. So
that's (2) out.
(3) is already the case for Ubuntu of course, but not any of the
other distributions offered. I haven't kept track of how many
compromises have been of root and not some other user but disabling
root access by SSH and requiring some other username seems a
reasonable starting point, would at least limit the damage.
(4) sounds too confusing unless I stated in the setup email that SSH
has been locked down to only the one non-root user they had when set
up.
Doing (5) in the base images would be controversial but I think
actually very effective. Would it be an outrage for you to find
DenyHosts or Fail2Ban installed for you?
Moving the SSH port is probably the most effective of all, but I
think it would be really confusing and not accepted by people, to
find that their SSH was provisioned e.g on port 2022 to begin with.
Any thoughts?
A horribly long thread, so don't know whether anything similar to this
has been mentioned yet.
I actually wrote an article not too long back for Linux Format,
regarding basic security for a server... As this is available online
now, it might be a useful resource.
http://www.linuxformat.co.uk/includes/download.php?PDF=LXF121.tut_adv.pdf