Re: [bitfolk] DNS hiccups

Top Page

Reply to this message
Author: Keith Williams
Date:  
Subject: Re: [bitfolk] DNS hiccups

--27KoNqt0fmcl1zj/
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hi Joseph,

On Tue, Jun 26, 2012 at 11:14:43AM +0100, Joseph Heenan wrote:
> here's the graph for my > VPS:
>=20
> http://f8lure.mouselike.org/archived_graphs/button.heenan.me.uk_day25.png
>=20
> The number of huge spikes (and some packet loss, shown red) on this
> surprised me. Would this kind of result be expected?


About a month ago I was made aware of a problem with occasional
spikes of high latency, and on looking into it, it became apparent
that it had actually been the case for a long time - perhaps years -
without anyone really noticing.

What you're seeing is one or two packets out of every couple of
hundred being delayed somewhere, sometimes for hundreds of
milliseconds.

It isn't restricted to your VPS, or to any one BitFolk server. It
seems to be affecting all VPSes, but as I say, it has been doing so
for a very long time. Here's a graph that exemplifies the issue:

http://www.thinkbroadband.com/ping/share/9b7cf0ba2197b53c0aeb0f3cff42fb7e.h=
tml

Since then I've been trying to work out where it's actually
happening, and this has been a long and ongoing process.

Firstly, it *is* restricted to BitFolk. Other things hosted at the
same colo are not seeing it. Here's something else in the same rack
as some BitFolk nodes, connected to the same switches:

http://www.thinkbroadband.com/ping/share/cc1418a68757c0f78c674ca6cd0beabe.h=
tml

That lead me to wonder if it could be some form of overloading of
BitFolk's VM hosting nodes. I feel like I have by this point
discounted that possibility though, because I have been emptying off
the node "curacao" to the point where it now has just two VPSes left
on it, one of which is the "pingtest" VPS above, which still shows
the issue. So it's hard to believe that it can be overloading.

Then I wondered about proxy ARP. I worked with our colo provider to
restrict the amount of IP addresses that their routers would ARP
for, and we examined packet traces for ARP activity but that proved
to be fruitless.

So next, is it a problem inherent to Xen? Well, the "penguin" graph
above is a Xen-based VPS running on hardware similar to BitFolk's,
which was set up by me in a virtually-identical way to how I set up
BitFolk's nodes, and it doesn't show the problem.

That's where we are at the moment, and I'm continuing to work on
this. By tomorrow I'll have moved the last remaining customer off of
curacao and then I'll move that node into a different VLAN with
other (non-BitFolk) servers that aren't currently experiencing this
problem, to see what happens.

I'm afraid I can't give you any ETA on when this might be fixed as I
still don't know exactly what the problem is. I will keep you
informed of progress.

Cheers,
Andy

--=20
http://bitfolk.com/ -- No-nonsense VPS hosting

--27KoNqt0fmcl1zj/
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEAREDAAYFAk/qBmQACgkQIJm2TL8VSQurmwCg6DQPfwjNkIIqbcpfP+qOOV5w
5d8An2cNrgpz46RXGCmWu+7nyy30GZZ5
=Ijzb
-----END PGP SIGNATURE-----

--27KoNqt0fmcl1zj/--


From joseph@??? Wed Jun 27 11:21:02 2012
Received: from button.heenan.me.uk ([85.119.82.222])
    by mail.bitfolk.com with esmtp (Exim 4.72)
    (envelope-from <joseph@???>) id 1SjqIw-00071B-Kr
    for users@???; Wed, 27 Jun 2012 11:21:02 +0000
Received: from dhcp124.sh2.org.uk (home.heenan.me.uk [212.159.108.133])
    by button.heenan.me.uk (Postfix) with ESMTPSA id 783A7AC07D
    for <users@???>; Wed, 27 Jun 2012 12:20:59 +0100 (BST)
Message-ID: <4FEAEC99.1030002@???>
Date: Wed, 27 Jun 2012 12:20:57 +0100
From: Joseph Heenan <joseph@???>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7;
    rv:13.0) Gecko/20120614 Thunderbird/13.0.1
MIME-Version: 1.0
To: users@???
References: <4FE98B93.6000208@???>
    <20120626185844.GK11695@???>
In-Reply-To: <20120626185844.GK11695@???>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Virus-Scanner: Scanned by ClamAV on mail.bitfolk.com at Wed,
    27 Jun 2012 11:21:02 +0000
X-SA-Exim-Connect-IP: 85.119.82.222
X-SA-Exim-Mail-From: joseph@???
X-SA-Exim-Scanned: No (on mail.bitfolk.com); SAEximRunCond expanded to false
Subject: Re: [bitfolk] vps ping response times
X-BeenThere: users@???
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: Users of BitFolk hosting <users.lists.bitfolk.com>
List-Unsubscribe: <https://lists.bitfolk.com/mailman/options/users>,
    <mailto:users-request@lists.bitfolk.com?subject=unsubscribe>
List-Archive: <http://lists.bitfolk.com/lurker/list/users.html>
List-Post: <mailto:users@lists.bitfolk.com>
List-Help: <mailto:users-request@lists.bitfolk.com?subject=help>
List-Subscribe: <https://lists.bitfolk.com/mailman/listinfo/users>,
    <mailto:users-request@lists.bitfolk.com?subject=subscribe>
X-List-Received-Date: Wed, 27 Jun 2012 11:21:02 -0000


Hi Andy,

On 26/06/2012 19:58, Andy Smith wrote:
[snip]
> I'm afraid I can't give you any ETA on when this might be fixed as I
> still don't know exactly what the problem is. I will keep you informed
> of progress. Cheers, Andy

That's a very detailed response - thanks!

Joseph



From andy@??? Sat Jul 07 13:05:37 2012
Received: from andy by mail.bitfolk.com with local (Exim 4.72)
    (envelope-from <andy@???>) id 1SnUhd-000096-4M
    for users@???; Sat, 07 Jul 2012 13:05:37 +0000
Date: Sat, 7 Jul 2012 13:05:37 +0000
From: Andy Smith <andy@???>
To: users@???
Message-ID: <20120707130537.GA11695@???>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-ripemd160;
    protocol="application/pgp-signature"; boundary="jmkJtp15SxLq1SbD"
Content-Disposition: inline
OpenPGP: id=BF15490B; url=http://strugglers.net/~andy/pubkey.asc
X-URL: http://strugglers.net/wiki/User:Andy
User-Agent: Mutt/1.5.20 (2009-06-14)
X-Virus-Scanner: Scanned by ClamAV on mail.bitfolk.com at Sat,
    07 Jul 2012 13:05:37 +0000
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: andy@???
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
    spamd2.lon.bitfolk.com
X-Spam-Level: 
X-Spam-ASN: 
X-Spam-Status: No, score=-0.0 required=5.0 tests=NO_RELAYS shortcircuit=no
    autolearn=disabled version=3.3.1
X-Spam-Report: * -0.0 NO_RELAYS Informational: message was not relayed via SMTP
X-SA-Exim-Version: 4.2.1 (built Mon, 22 Mar 2010 06:51:10 +0000)
X-SA-Exim-Scanned: Yes (on mail.bitfolk.com)
Subject: [bitfolk] Proving that you are you
X-BeenThere: users@???
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: Users of BitFolk hosting <users.lists.bitfolk.com>
List-Unsubscribe: <https://lists.bitfolk.com/mailman/options/users>,
    <mailto:users-request@lists.bitfolk.com?subject=unsubscribe>
List-Archive: <http://lists.bitfolk.com/lurker/list/users.html>
List-Post: <mailto:users@lists.bitfolk.com>
List-Help: <mailto:users-request@lists.bitfolk.com?subject=help>
List-Subscribe: <https://lists.bitfolk.com/mailman/listinfo/users>,
    <mailto:users-request@lists.bitfolk.com?subject=subscribe>
X-List-Received-Date: Sat, 07 Jul 2012 13:05:39 -0000



--jmkJtp15SxLq1SbD
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hello,

Today a customer popped up on IRC saying that they had broken their
VPS and couldn