On Wed, Jul 24, 2019 at 12:05:56PM +0000, Andy Smith wrote:
All of those processes (16870, 11064, 27705, 20717)
thought they were listening
on the main IP port 53, even while systemd thinks there is no bind9 service running.
Here's where 27705 came into being:
Jul 23 07:23:25 westnorfolk named[26462]: shutting down
Jul 23 07:23:25 westnorfolk named[26462]: stopping command channel on 127.0.0.1#953
Jul 23 07:23:25 westnorfolk named[26462]: stopping command channel on ::1#953
Jul 23 07:23:25 westnorfolk named[26462]: no longer listening on ::#53
Jul 23 07:23:25 westnorfolk named[26462]: no longer listening on 127.0.0.1#53
Jul 23 07:23:25 westnorfolk named[26462]: no longer listening on 85.119.82.237#53
Jul 23 07:23:25 westnorfolk named[26462]: exiting
Jul 23 07:23:25 westnorfolk rndc[27676]: rndc: connect failed: 127.0.0.1#953: connection
refused
Jul 23 07:23:25 westnorfolk systemd[1]: bind9.service: Control process exited, code=exited
status=1
Jul 23 07:23:25 westnorfolk systemd[1]: bind9.service: Unit entered failed state.
Jul 23 07:23:25 westnorfolk systemd[1]: bind9.service: Failed with result
'exit-code'.
Jul 23 07:23:29 westnorfolk named[27705]: starting BIND 9.10.3-P4-Debian
<id:ebd72b3> -c /etc/bind/named.conf
So since 07:23 yesterday this has been running and snarfing up all
the transfer requests, which is why none of the later configuration
changes seemed to make any difference.
The normal named command line when run under ssytemd looks like
this:
andy@westnorfolk:~$ ps awux | grep named
bind 10022 0.0 0.7 288940 23924 ? Ssl 12:47 0:00 /usr/sbin/named -f -u
bind
so I suspect that these other processes were started in some other
way - command line maybe? So that would be why systemd doesn't know
about them. I still think that bind should have complained about not
being able to bind to, e.g. 85.119.82.237#53 but perhaps it didn't
know it was unable to (silent failure)?
Turn off bind9 and run something that will hold the port (a socat
server copying everythign it receives to terminal):
andy@westnorfolk:~$ sudo systemctl stop bind9
andy@westnorfolk:~$ sudo socat -v tcp-l:53,fork -
It's defintiely got the port:
andy@westnorfolk:~$ sudo lsof -p 10688
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
socat 10688 root 5u IPv4 642785 0t0 TCP *:domain (LISTEN)
And it sees traffic:
[another machine]
$ nc 85.119.82.237 53
hello
^C
[back on Keith's VM]
2019/07/24 13:35:41.546776 length=6 from=0 to=5
hello
hello
Start bind9 again while socat is holding the port:
andy@westnorfolk:~$ sudo systemctl start bind9
andy@westnorfolk:~$ sudo systemctl status bind9
● bind9.service - BIND Domain Name Server
Loaded: loaded (/lib/systemd/system/bind9.service; enabled; vendor preset: enabled
Active: active (running) since Wed 2019-07-24 13:39:02 BST; 5s ago
Docs: man:named(8)
Process: 10592 ExecStop=/usr/sbin/rndc stop (code=exited, status=0/SUCCESS)
Main PID: 10780 (named)
Tasks: 5 (limit: 4915)
CGroup: /system.slice/bind9.service
└─10780 /usr/sbin/named -f -u bind
andy@westnorfolk:~$ sudo grep 85.119.82.237#53 /var/log/syslog
Jul 24 13:39:02 westnorfolk named[10780]: listening on IPv4 interface eth0,
85.119.82.237#53
Why didn't it cry about not being able to bind 85.119.82.237:53?
So right now we have bind9 thinking it's running fine but it will
never see a zone transfer request because this socat process is
hogging port 53.
Is this normal? I am used to daemons giving up when they can't
exclusively bind.
Interestingly if I kill the socat, other servers now see
"connection refused", i.e. named hasn't tried to bind port 53 again
(or never did).
Cheers,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting