glibc resolv hacking

This page is only marginally useful for those still dead set on using descendents of glibc 2.1. If you're using glibc 2.2 or greater, the BIND 8 port Andreas Jaeger and I worked on is already integrated, and you have totally thread-safe name resolution, so don't sweat it.


Old notes, before glibc 2.2 was released

The libresolv library distributed with glibc2.1 suffers from some fundamental defects when linked against threaded programs. Namely, it's not thread safe. What this effectively means is that if your threaded program is linked against glibc 2.1.0, 2.1.1, or 2.1.2, and it calls gethostbyname_r() from more than one context (i.e. two calls to it can be in-scope simultaneously), your program may see unusual results, the most common of which being some threads will get wedged inside a poll() in gethostbyname_r() and never return.

A correctness fix for this problem is in glibc2.1.3, but it is almost worse than the problem - it forces serialization of all name server queries, eliminating the race condition from the original code, but eliminating any parallelism between sleep-intensive and potentially high-latency name lookups. This causes the performance of many lookup-intensive programs to (technically speaking) suck.

Last summer (June-August 1999), I undertook to port the current BIND 8.2 codebase to glibc, to replace the current resolver library (which is based on BIND 4). BIND 8 includes a number of threading-friendly features, like thread-specific resolver state, that should keep the lookup process both safe and relatively efficient (allowing full parallelism). The result was combined with some work by Andreas Jaeger (aj@arthur.rhein-neckar.de) and integrated into the glibc2.2 code base, which is just now (July 2000) starting to enter pre-release stabilizing stage.

At one point, I also maintained a backport of our changes to glibc2.1. However, it proved too much of a big, hairy, ugly nightmare - the whole resolver interface is munged between BIND4 and BIND8, and placing the result in a system-wide library is a (technically speaking) bad idea (it tended to break programs like sendmail). So I hid the patch.

So if you're dealing with either mysterious gethostbyname_r() hangups or abysmal lookup rates in your threaded program, here's what I recommend:

  • Go to www.isc.org and get the latest "released" version of the BIND 8 distribution. (As of this writing, the version to get is 8.2.2-P5.)
  • Follow the appropriate build instructions. The build process will create, among other things, an object code archive file called libbind_r.a.
  • When linking your threaded programs, omit the "-lresolv" command line directive, and replace it with a link to the libbind_r.a archive.
  • Everything should work just hunky-dory.

If that's too much work for you, well, you have two other options.

  1. Wait for glibc2.2
  2. Pay me to dust off my patch and make it work again.