Introduction
Over the past few years I’ve seen a number of cases where Unix systems have suffered serious outages caused by the loss of a primary name server. Such systems appear really slow, and often when used in conjunction with Samba or a remote name service such as Centrify servers may appear to hang.
The main reason for this is the manner in which Unix performs DNS lookups, by first looking at the primary name server, then trying the secondary etc. Since it is stateless every successive lookup will hit the primary server, even if it is not responding. Since there is a timeout on DNS lookups, it is not before this that it will try the second server causing all processes which require DNS resolution to hang.
On machines with a reasonable degree of DNS lookups, this eventually consumed a large amount of system resources as requests block and accumulate, and in some cases has resulted in servers running out of physical memory.
Using a bind cache to reduce the problem…
One solution is to use name service caching daemons, but experience has shown these can be troublesome. Samba for instance does not work correctly when used in conjunction with the Sun nscd.
The simple and reliable solution is to install a local caching name server, a simple lightweight bind install configured to forward requests to the primary and secondary (and other) name servers, but only listening on localhost, and with zone transfers etc disabled for security reasons. Then the nameserver 127.0.0.1 is added to the servers /etc/resolv.conf to ensure it’s used. Since bind obeys “time to live” cache times, there is no impact on name resolution accuracy.
On failure of a primary name sever, the local caching name server is most likely to hold the required address, but if not will search forwarding servers, then cache the result, hence preventing future delayed lookups.
Caching Bind Config
The named.conf file for bind is shown below, the forwarders section should contain the list of name servers from the /etc/resolv.conf, the resolv.conf file should have name server 127.0.0.1 added before the other name servers.
options {
listen-on { 127.0.0.1; };
directory "/var/named";
dump-file "logs/named_dump.db";
forwarders {
// LOCAL-FORWARDERS
};
forward only;
};
logging {
channel "mainlog" {
file "logs/named.log" versions 3 size 1m;
print-category yes;
print-severity yes;
print-time yes;
};
channel "querylog" {
file "logs/query.log" versions 2 size 1m;
print-category yes;
print-severity yes;
print-time yes;
};
category queries {
// Uncomment next line to log query messages.
#querylog;
null;
};
category default { mainlog; };
};