WOW. Thanks so much Andy. I did see on Google that someone had that problem, but then discovered that the "rogue" processes were from dnsmasq which had been installed at sometime in the past. I knew that it could not be the case for me as a new install.
It must be some sort of Bind 9 error, because it SHOULD report failure to bind. I have just read again the complete Bind9 manual - LOL
Perhaps I need to write some little script to detect and kill all processes using port 53 on closing bind. ;-)

On Wed, 24 Jul 2019 at 13:06, Andy Smith <andy@bitfolk.com> wrote:
On Tue, Jul 23, 2019 at 10:41:04PM +0000, Andy Smith wrote:
> Hello,
>
> On Tue, Jul 23, 2019 at 11:23:48PM +0100, Keith Williams wrote:
> > it did load   Jul 23 22:57:53 westnorfolk named[22233]: zone
> > keiths-place.co.uk/IN: loaded serial 2019072335
>
> Weird. How come that wasn't in the logs before?
>
> named does have an apparmor profile by the way, but I can't see
> anything wrong with it:
>
>     https://salsa.debian.org/dns-team/bind9/blob/debian/master/debian/extras/apparmor.d/usr.sbin.named
>
> and certainly I would expect it to complain loudly if it had been
> prevented from loading a zone file.
>
> I'm getting a bit stumped but if I were you I would be stripping out
> all of that query restriction and forwarding stuff and having the
> simplest configuration possible.
>
> I take it that the increased logging verbosity has not helped at
> all?

It was really irritating me that we couldn't get this working, when
bind9 is software I like to think I know quite well, so I asked
Keith if I could take a look.

I'm pleased to say that I've worked out what was going wrong,
although not why nor why it was so difficult to spot. Also it's
still a bit annoying that I only worked it out by chance.

In exasperation I was searching around for the symptoms (zone loaded
but server thinks it is still NOTAUTH) and saw this:

    http://www.linuxsa.org.au/pipermail/linuxsa/2017-November/097564.html
    "Turns out the re-starting was leaving processes alive. Killed
    them manually and re-started and zones are transfering"

sooo I had a look:

andy@westnorfolk:~$ sudo systemctl stop bind9
andy@westnorfolk:~$ ps awux | grep bind
root      7097  0.0  0.1  49252  3680 pts/9    T    12:07   0:00 sudo vi /etc/bind/named.conf.local
root      7098  0.0  0.1  30820  3784 pts/9    T    12:07   0:00 vi /etc/bind/named.conf.local
root      9638  0.0  0.1  49252  3660 pts/9    T    12:38   0:00 sudo vi /var/lib/bind/keiths-place.co.uk.hosts
root      9639  0.0  0.1  30820  3712 pts/9    T    12:38   0:00 vi /var/lib/bind/keiths-place.co.uk.hosts
root      9912  0.0  0.1  49252  3612 pts/9    T    12:41   0:00 sudo vi /etc/bind/named.conf.options
root      9913  0.0  0.1  30820  3780 pts/9    T    12:41   0:00 vi /etc/bind/named.conf.options
andy      9999  0.0  0.0  12780   936 pts/9    S+   12:45   0:00 grep bind
root     11064  0.0  0.6 278668 19216 ?        Ssl  Jul23   0:00 /usr/sbin/named -c /etc/bind/named.conf
root     16870  0.0  0.6 279448 19776 ?        Ssl  Jul23   0:00 /usr/sbin/named -c /etc/bind/named.conf
root     20717  0.0  0.6 279968 20448 ?        Ssl  Jul23   0:01 /usr/sbin/named -c /etc/bind/named.conf
root     27705  0.0  0.7 281528 22720 ?        Ssl  Jul23   0:02 /usr/sbin/named -c /etc/bind/named.conf

WHY ARE THERE STILL 4 PROCESSES??

andy@westnorfolk:~$ sudo kill 11064 16870 20717 27705
Jul 24 12:46:04 westnorfolk named[16870]: shutting down
Jul 24 12:46:04 westnorfolk named[11064]: shutting down
Jul 24 12:46:04 westnorfolk named[16870]: no longer listening on ::#53
Jul 24 12:46:04 westnorfolk named[11064]: no longer listening on ::#53
Jul 24 12:46:04 westnorfolk named[16870]: no longer listening on 127.0.0.1#53
Jul 24 12:46:04 westnorfolk named[11064]: no longer listening on 127.0.0.1#53
Jul 24 12:46:04 westnorfolk named[16870]: no longer listening on 85.119.82.237#53
Jul 24 12:46:04 westnorfolk named[11064]: no longer listening on 85.119.82.237#53
Jul 24 12:46:04 westnorfolk named[27705]: shutting down
Jul 24 12:46:04 westnorfolk named[27705]: no longer listening on ::#53
Jul 24 12:46:04 westnorfolk named[27705]: no longer listening on 127.0.0.1#53
Jul 24 12:46:04 westnorfolk named[27705]: no longer listening on 85.119.82.237#53
Jul 24 12:46:04 westnorfolk named[20717]: shutting down
Jul 24 12:46:04 westnorfolk named[20717]: no longer listening on ::#53
Jul 24 12:46:04 westnorfolk named[20717]: no longer listening on 127.0.0.1#53
Jul 24 12:46:04 westnorfolk named[20717]: no longer listening on 85.119.82.237#53
Jul 24 12:46:04 westnorfolk named[11064]: exiting
Jul 24 12:46:04 westnorfolk named[20717]: exiting
Jul 24 12:46:04 westnorfolk named[16870]: exiting
Jul 24 12:46:04 westnorfolk named[27705]: exiting

All of those processes (16870, 11064, 27705, 20717) thought they were listening
on the main IP port 53, even while systemd thinks there is no bind9 service running.

andy@westnorfolk:~$ sudo systemctl start bind9
Jul 24 12:46:16 westnorfolk named[10005]: general: info: zone keiths-place.co.uk/IN: loaded serial 2019072401
Jul 24 12:46:16 westnorfolk named[10005]: notify: info: zone keiths-place.co.uk/IN: sending notifies (serial 2019072401)
Jul 24 12:46:19 westnorfolk named[10005]: xfer-out: info: client 85.119.80.222#46773 (keiths-place.co.uk): transfer of 'keiths-place.co.uk/IN': AXFR started (serial 2019072401)
Jul 24 12:46:19 westnorfolk named[10005]: xfer-out: info: client 85.119.80.222#46773 (keiths-place.co.uk): transfer of 'keiths-place.co.uk/IN': AXFR ended

Success!

Those other 4 processes seem like they date from some time
yesterday, clearly before this server was correctly configured to be
authoritative for those zones.

I don't know how those named processes came to be running so I'm not
going to immediately say systemd is at fault for not knowing to shut
them down.

I would like to know why through multiple restarts bind9 has been
saying it's listening on the interfaces when it can't be because
something else is already doing so. Is there a sysctl or socket
option which allows multiple silent port bindings without the second
and later apps getting an error?

If nothing non-default has been set in this area then I think there
is a bug in bind9 because that was far too hard to diagnose.

I suppose we could possibly have noticed that the complaint logging
from /usr/sbin/named was from a process that wasn't one we had
launched.

Cheers,
Andy

--
https://bitfolk.com/ -- No-nonsense VPS hosting
_______________________________________________
users mailing list
users@lists.bitfolk.com
https://lists.bitfolk.com/mailman/listinfo/users