FreeBSD, OpenLDAP and the boot process
This is something that happens from time to time to me, so I need to write the solution to the problem somewhere as soon as I find it.
Here at the office I've a FreeBSD intranet server, which acts as an authentication server for the local network. User information is stored in a LDAP database (OpenLDAP) and every service that performs any kind of user authentication/identification connects to it to retrieve such information. That means local system accounts, samba accounts, svn and trac accounts and more.
It is not my first time setting up such a server, and I've set up LDAP authentication backends for more kind of servers (mail servers, web servers, etc), but it is the first time I have to deal with this issue.
When booting the server, it took too long to finish the boot process, because each time a daemon (like named, sendmail or mountd) was launched before slapd (the OpenLDAP daemon) it tried to connect to the LDAP database to search for the user associated with the daemon. As slapd wasn't running, it kept trying once and another until some kind of timeout forced the boot process to continue with the rest of the daemons.
And that was unacceptable, can you imagine a server that takes more than 5 minutes to boot?
The first thing in my mind was the FreeBSD boot order, perhaps changing the slapd boot order, so it boots before any other daemon could be a solution...
Could be, but I wasn't sure about the root of the problem, I mean, it was quite strange that something like that didn't create much more noise over the internet. So, I tried Google for a while, and I ended founding an email on the FreeBSD ports mailing list which covers exactly the same problem.
Someone pointed there to a posible solution, modifying the RC scripts to force slapd to be launched before anything else (yupp!, my idea) but, reading carefully, I found here something very interesting:
Additionally you could set "bind_policy soft" in ${LOCALBASE}/etc/nss_ldap.conf to let nss_ldap return in case of connection problems to slapd instead of waiting forever.
That's it!, I mean, it is supposed that having something like:
# # nsswitch.conf(5) - name service switch configuration file # $FreeBSD: src/etc/nsswitch.conf,v 1.1 2006/05/03 15:14:47 ume Exp $ # group: files ldap group_compat: nis hosts: files dns networks: files passwd: files ldap passwd_compat: nis shells: files services: compat services_compat: nis protocols: files rpc: files
in /etc/nsswitch.conf should prevent ldap queries to keep there opened forever (which in fact seemed to be my problem). User and/or group information should be retrieved first from local files (/etc/group, /etc/passwd, etc), then from the LDAP database. But apparently that's not working at all (i've no idea why)
Well, a quick look over /usr/local/etc/nss_ldap.conf and I realized that I had two options that could be the reason behind the boot delay:
bind_timelimit 120 bind_policy hard
Seems like bind_policy hard force the ldap connections (or tries) retrying once and another until reached bind_timelimit (which in my case had a value of 2 minutes). That means that every failed connection to the slapd daemon will took 2 minutes to realize the connection couldn't be done... awful.
After changed those options to:
bind_timelimit 10 bind_policy soft
everything went back to normal, and the server took only a few seconds to start.
UPDATE: Seems like setting bind_policy to soft wasn't a good idea after all. It solved the boot problem, but it caused some other problems, like users being not associated with their groups in the LDAP database. It seems like FreeBSD knows there are such groups in the ldap database (as running getent group shows all of them, even showing usernames associated with each group), but when running id as a user who is in some groups that are defined in the ldap database, it said the user is only in groups defined in /etc/group. Setting bind_policy to hard again solved the problem, but now I'm back to the boot problem again...