y domain zone age alerts. They
send alerts when the zone on BitFolk's nameservers is too old. This
saves me having to read logs and open a support ticket to advise
customers that the zone transfers are failing, so I'm all in favour
of that.
The definition of "too old" differs on a per-domain basis. There are
two values in the SOA record of a DNS domain; refresh and expire.
The refresh value tells secondary servers how often to check in
with the primary.
The expire value tells secondary servers how long they should
consider themselves valid for without successful contact with the
primary. If there is no contact with the primary for the expire
period then the secondary server stops serving the domain and
returns SERVFAIL for every query.
So, based on the above, a DNS domain should never be "older" than
refresh. If it is older then that means that at least one refresh
attempt failed. If the age approaches expire then the domain is in
danger of not being served.
At the moment I have decided to send a warning alert on 150% of
refresh and a critical alert on 50% of expire.
RIPE recommends 84600 (one day) for refresh and 3600000 (1000 hours;
almost 6 weeks) for expire:
http://www.ripe.net/ripe/docs/ripe-203
RFC1912 (1996) recommends one day for refresh and 2-4 weeks for
expire:
http://www.faqs.org/rfcs/rfc1912.html
So let's say you go with RIPE's recommendations. You'd receive
a warning alert after your secondary DNS setup was broken for 36 hours,
and you'd receive a critical alert if it was still broken after 500
hours (almost 3 weeks). 500 hours after that, your domain stops
being served on the secondary servers.
That seems reasonable.
Finally getting around to the point of this email: what do you think
I should do about problematic SOA values that customers have chosen?
For example, there are some domains currently on BitFolk's servers
where the refresh and expire are both set to 300 seconds (5
minutes). Ignoring what happens with alerts for a moment, that means
that every 5 minutes the secondary servers check the primary, and if
that fails even once, the domain will return SERVFAIL for all
queries until contact is made again.
I can't understand what the use is of such a fragile setting; it
looks erroneous to me. This isn't just DNS purism saying, "ooh, I
don't like your non-standard values!" It will actually cause
breakage very easily. But perhaps it is not for me to reason why.
Those domains have been like that for a long time and I assume no
one has noticed. It must have caused some problems any time the
primary nameserver was unreachable by the secondary servers. But
arguably that is not my problem.
When combined with this new alerting though, what happens is that
there isn't a refresh for 5 minutes then 2.5 minutes into that a
critical alert fires since we're half way to expire (5 minutes). All
being well there should be a recovery ~2.5 mins later. In reality
these times will be variable because BitFolk's Nagios doesn't check
DNS every few minutes, more like an hour plus.
That is the most extreme example of this problem, but there are a few
other domains in there where refresh and expire have been set to the
same value. It will lead to a cycle of alert and then recovery,
forever.
So, what do you think I should do?
I'm not willing to give up on the alerts because I think most people
would like to know when their DNS setup is broken (or in danger of
being broken), and it saves me having to personally interact to tell
people this. Intentional DNS breakage is not my problem, but
answering/opening support tickets is.
Alerting can be disabled on a per-domain basis. Currently only by
asking support, but eventually you'll be able to flip that on the
Panel=B9.
So how about have Panel warn on the web page about what are
considered unwise SOA values, and just allow the alerts to be
disabled if for some reason this sort of fragile DNS setup is
intentional?
Cheers,
Andy
=B9
https://panel.bitfolk.com/dns/#toc-secondary-dns
--=20
http://bitfolk.com/ -- No-nonsense VPS hosting
--SO+9/CRZBGNspxuY
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iEYEAREDAAYFAk8lZeEACgkQIJm2TL8VSQtTRwCfe9AbxMoKFbgdv/8xA8A8dfaH
7RYAn1zljbTaxOVjcWXItydio80cYDOY
=AS62
-----END PGP SIGNATURE-----
--SO+9/CRZBGNspxuY--
From keithwilliamsnp@??? Sun Jan 29 17:14:21 2012
Received: from mail-tul01m020-f176.google.com ([209.85.214.176])
by mail.bitfolk.com with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16)
(Exim 4.72) (envelope-from <keithwilliamsnp@???>)
id 1RrYKa-0003kP-VE
for users@???; Sun, 29 Jan 2012 17:14:21 +0000
Received: by obbwd18 with SMTP id wd18so1603010obb.21
for <users@???>; Sun, 29 Jan 2012 09:14:12 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
h=mime-version:in-reply-to:references:date:message-id:subject:from:to
:content-type; bh=LxZFpnPA3t+CRHnejE+rA7OQB2ckFl3b1kLs7eulXGs=;
b=HBU/rU9y3RMQejtUW/wSfz2V01B0AGBGvuEEs45V5/qzWN7nGA1IrTaGOzIq8xJLV9
j9EyFjtDZDi9PrronBX1vZYmUmclcdvBR1Yc0oe7gXj98kWR9J8dPRBGAn4nkdHWMnpO
mkkMm3aWkiczPRS/B4dS2/W4I8Zq16+QyDEsY=
MIME-Version: 1.0
Received: by 10.182.47.106 with SMTP id c10mr22234246obn.20.1327857252481;
Sun, 29 Jan 2012 09:14:12 -0800 (PST)
Received: by 10.182.11.70 with HTTP; Sun, 29 Jan 2012 09:14:12 -0800 (PST)
In-Reply-To: <20120129152937.GU32046@???>
References: <20120129152937.GU32046@???>
Date: Sun, 29 Jan 2012 17:14:12 +0000
Message-ID: <CAMe3QpNockhJfp5oYVTjEuF8bxhkZOgqAig-xOebUoc-k8u58w@???>
From: Keith Williams <keithwilliamsnp@???>
To: users@???
Content-Type: multipart/alternative; boundary=14dae9399b0dd8bad004b7addbb3
X-Virus-Scanner: Scanned by ClamAV on mail.bitfolk.com at Sun,
29 Jan 2012 17:14:21 +0000
X-SA-Exim-Connect-IP: 209.85.214.176
X-SA-Exim-Mail-From: keithwilliamsnp@???
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
spamd0.lon.bitfolk.com
X-Spam-Level:
X-Spam-ASN: AS15169 209.85.128.0/17
X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID,
DKIM_VALID_AU,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS shortcircuit=no
autolearn=disabled version=3.3.1
X-Spam-Report: * -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/,
low * trust
* [209.85.214.