Re: [bitfolk] Lenny to Squeeze

Top Page

Reply to this message
Author: urb59
Date:  
Subject: Re: [bitfolk] Lenny to Squeeze
==============0335012936==--



From andy@??? Sun Jan 29 15:29:38 2012
Received: from andy by mail.bitfolk.com with local (Exim 4.72)
    (envelope-from <andy@???>) id 1RrWhF-0007YS-G8
    for users@???; Sun, 29 Jan 2012 15:29:37 +0000
Date: Sun, 29 Jan 2012 15:29:37 +0000
From: Andy Smith <andy@???>
To: users@???
Message-ID: <20120129152937.GU32046@???>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-ripemd160;
    protocol="application/pgp-signature"; boundary="SO+9/CRZBGNspxuY"
Content-Disposition: inline
OpenPGP: id=BF15490B; url=http://strugglers.net/~andy/pubkey.asc
X-URL: http://strugglers.net/wiki/User:Andy
User-Agent: Mutt/1.5.18 (2008-05-17)
X-Virus-Scanner: Scanned by ClamAV on mail.bitfolk.com at Sun,
    29 Jan 2012 15:29:37 +0000
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: andy@???
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
    spamd3.lon.bitfolk.com
X-Spam-Level: 
X-Spam-ASN: 
X-Spam-Status: No, score=-0.0 required=5.0 tests=NO_RELAYS shortcircuit=no
    autolearn=disabled version=3.3.1
X-Spam-Report: * -0.0 NO_RELAYS Informational: message was not relayed via SMTP
X-SA-Exim-Version: 4.2.1 (built Wed, 25 Jun 2008 17:14:11 +0000)
X-SA-Exim-Scanned: Yes (on mail.bitfolk.com)
Subject: [bitfolk] DNS refresh and expire values, alerting
X-BeenThere: users@???
X-Mailman-Version: 2.1.11
Precedence: list
List-Id: Users of BitFolk hosting <users.lists.bitfolk.com>
List-Unsubscribe: <https://lists.bitfolk.com/mailman/options/users>,
    <mailto:users-request@lists.bitfolk.com?subject=unsubscribe>
List-Archive: <http://lists.bitfolk.com/lurker/list/users.html>
List-Post: <mailto:users@lists.bitfolk.com>
List-Help: <mailto:users-request@lists.bitfolk.com?subject=help>
List-Subscribe: <https://lists.bitfolk.com/mailman/listinfo/users>,
    <mailto:users-request@lists.bitfolk.com?subject=subscribe>
X-List-Received-Date: Sun, 29 Jan 2012 15:29:38 -0000



--SO+9/CRZBGNspxuY
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hello,

Long email about DNS timers and alerting based on them. Unless you
have domains on BitFolk's secondary DNS platform you probably won't
care about this, and even then you still probably don't care unless
you've been receiving alerts about them. Turn back now!

Still here? OK.

I've recently implemented DNS secondary domain zone age alerts. They
send alerts when the zone on BitFolk's nameservers is too old. This
saves me having to read logs and open a support ticket to advise
customers that the zone transfers are failing, so I'm all in favour
of that.

The definition of "too old" differs on a per-domain basis. There are
two values in the SOA record of a DNS domain; refresh and expire.
The refresh value tells secondary servers how often to check in
with the primary.

The expire value tells secondary servers how long they should
consider themselves valid for without successful contact with the
primary. If there is no contact with the primary for the expire
period then the secondary server stops serving the domain and
returns SERVFAIL for every query.

So, based on the above, a DNS domain should never be "older" than
refresh. If it is older then that means that at least one refresh
attempt failed. If the age approaches expire then the domain is in
danger of not being served.

At the moment I have decided to send a warning alert on 150% of
refresh and a critical alert on 50% of expire.

RIPE recommends 84600 (one day) for refresh and 3600000 (1000 hours;
almost 6 weeks) for expire:

    http://www.ripe.net/ripe/docs/ripe-203


RFC1912 (1996) recommends one day for refresh and 2-4 weeks for
expire:

    http://www.faqs.org/rfcs/rfc1912.html


So let's say you go with RIPE's recommendations. You'd receive
a warning alert after your secondary DNS setup was broken for 36 hours,
and you'd receive a critical alert if it was still broken after 500
hours (almost 3 weeks). 500 hours after that, your domain stops
being served on the secondary servers.

That seems reasonable.

Finally getting around to the point of this email: what do you think
I should do about problematic SOA values that customers have chosen?

For example, there are some domains currently on BitFolk's servers
where the refresh and expire are both set to 300 seconds (5
minutes). Ignoring what happens with alerts for a moment, that means
that every 5 minutes the secondary servers check the primary, and if
that fails even once, the domain will return SERVFAIL for all
queries until contact is made again.

I can't understand what the use is of such a fragile setting; it
looks erroneous to me. This isn't just DNS purism saying, "ooh, I
don't like your non-standard values!" It will actually cause
breakage very easily. But perhaps it is not for me to reason why.

Those domains have been like that for a long time and I assume no
one has noticed. It must have caused some problems any time the
primary na