[unisog] Mac OS X 10.4.x "DHCP client sometimes remains BOUND after sending DHCPDISCOVER" bug

Irwin Tillman irwin at princeton.edu
Tue Jan 30 21:37:19 GMT 2007


At Princeton we've been seeing IP address conflicts due to an issue in Mac OS X 10.4.x.
As I suspect other schools with high DHCP lease churn rates from Mac OS X 10.4.x
clients may experience the same bug, I thought I'd post the details for you.

----

Since September 2005 (yes, 2005) I've been seeing a DHCP client issue from
Mac OS 10.4.x systems at Princeton University, where I maintain DHCP service.
I call it the Mac OS X 10.4.x "DHCP client sometimes remains BOUND after sending 
DHCPDISCOVER" bug.

I reported it to Apple (Apple Bug Reporter Problem ID 4904550);
Apple's examined it and confirmed that Mac OS X does indeed behave
this way, and that they believe this behavior is correct (is consistent
with RFC 2131).   I believe the behavior violates RFC 2131.
If the behavior is part of an implementation of 'Detection of Network Attachment 
in IPv4 (DNAv4)', then it also violates RFC 4436.

Depending on how your DHCP server operates, this behavior may result
in Macs using IP addresses no longer leased to them, interfering
with network service to other devices.  (This is dependant on 
the DHCP server behavior; some DHCP servers can tolerate a client
that malfunctions in this way.)

If you don't monitor for this particular problem (i.e. that your DHCP clients
are using IP addresses no longer leased to them), you may not be
aware of the problem.  You might hear sporadic complaints from victims
when they are leased an IP address that is "stolen" by another device
(i.e. the malfunctioning Mac OS X 10.4.x device).  But given that the victim
may work around the problem by requesting a new DHCP lease, you may 
not receive many complaints from your customers; they may just chalk it up
to things being flaky.

If you have other network equipment monitoring DHCP traffic to infer the IP
addresses leased to clients, this behavior may also result in that equipment's
conclusions differing from the clients. (Again, this depends on 
whether that equipment can tolerate a client that malfunctions in this way.)

Apple currently believes that the behavior of the Mac OS X 10.4.x client
is correct; I was not able to convince them that the behavior is incorrect.
As a result, if you are affected by this problem, either you may choose to endure
the problem, or to replace the facilities (e.g. DHCP servers or DHCP-snooping equipment)
with others that are tolerant of the incorrect Mac OS X 10.4.x behavior.

Below is a (lengthy) technical description.

Irwin Tillman
OIT Network Systems / Princeton University

--

Mac OS X 10.4.x "DHCP client sometimes remains BOUND after sending DHCPDISCOVER" bug
January 30 2007

* Technical Overview:

Some time after obtaining a DHCP lease (entering the DHCP BOUND state), the
client sends one or more DHCPDISCOVER packet. This implies the client has
returned to the DHCP INIT state, relinquishing the old DHCP lease. In some
cases, the client ignores all offers sent in response to the DHCPDISCOVER
packet(s), or all those offers never reach the client (e.g. are dropped by the
network). However, instead of remaining in the DHCP INIT state, the client
continues to act as if the old DHCP lease is still in the BOUND state. It keeps
using the IP address from the old lease (even trying to RENEW and later REBIND
the old lease).

Because other DHCP clients may be leased the IP address after the first client
relinquishes its lease on the address, the first client's continued use of that
IP address interferes with service to those other clients.

I can positively confirm that the problem began no later than September 2005. At
that time, Mac OS X 10.4.2 was the latest version of the OS available. Prior to
then, I did not see the Mac OS X clients here exhibit this problem. Based on
that, I believe the issue was introduced into Mac OS X in version 10.4.2 or
earlier. Given that 10.4 was released in late April 2005, and the usage at our
institution would tend to make it difficult to notice over the Summer (when most
of our customers are away), I can imagine the problem may have been introduced
as far back as version 10.4 or 10.4.1.

At Princeton, the problem has slowly grown from a rare occurence to a frequent
problem. That's due to the growth in number of Mac OS X 10.4.x systems at our
site, and due to the higher DHCP lease churn rate for each client (associated
with the increasing use of wireless laptops that connect and disconnect often).
At first I detected only a few incidents per month throughout our entire
institution; by now I generally see several each day.

In many of these cases, our support staff have examined the malfunctioning Macs,
and found no apparent problems. The devices appeared to be properly configured
to use DHCP, with no special circumstances. (E.g. there was no second device forging the
first one's hardware address or DHCP Client Identifier, no VM software running a
separate DHCP client instance, no use of Apple's (or a third party) NAT
software. There was nothing to indicate that there was a second DHCP client
instance running using the same DHCP Client Identifier on the same network. It really
appeared to be just simple Mac OS X DHCP clients, running current (at the
time) versions of 10.4.x.)

After exhibiting the malfunction, the device may go weeks or months before
exhibiting the malfunction (stealing an IP address) again...or it may happen
again a few hours later. I know of no way to force any one device to reproduce
the problem; I only am able to detect the problem the day after the fact, when I
see "stolen IP address" problems by reviewing daily logs of unexpected DHCP
server transactions and comparing them to IP usage data drawn from router ARP
cache snapshots.

--

* The Packets

In more detail, the DHCP packets (and IP usage) I see from the malfunctioning 
Mac OS X clients is:

1) The DHCP client obtains a lease (reaches the DHCP BOUND state).

2) The client may renew the lease 0 or more times. These renewals might happen
   at the expected time T1, or might happen within seconds of the client
   reaching the BOUND state.
 
3) Before the lease is due to expire, the client broadcasts a DHCPDISCOVER
   packet. 

   Since the client is still attached to the same network, and is still
   using the same DHCP Client Identifier, this implies the client has entered the
   DHCP INIT state, implicitly relinquishing the old lease.

   Sometimes this is just a few seconds after the client entered
   the BOUND state; sometimes it is minutes or hours later.
   It seems to be before time T1, when the lease would have reached the time to renew it.

4) One or more DHCP server(s) respond to the client with DHCPOFFER(s).

5) The client does not accept any of the offers; it sends no DHCPREQUEST.
   I.e. it never proceeds to the DHCP SELECTING state.

6) Optionally, the client retransmits the DHCPDISCOVER packet several times over
   the next minute. If so, the DHCP server(s) respond with DHCPOFFER(s), which
   the client again ignores.

7) In almost all cases, the client continues to use the IP address from the
   lease it relinquished earlier.  That is, it continues to answer IP ARP
   requests for the IP address that was part of the relinquished lease.
   We can see this in snapshots taken of our IP router ARP caches.
   It continues to transmit IP packets with this value as the IP source address.

8) Optionally, at the time the relinquished lease would have reached time T1, the
   client tries to RENEW the relinquished lease. The DHCP server that had
   granted the relinquished lease responds with a DHCPNAK. The client continues
   to use the IP address, and may try to renew the relinquished lease additional
   times until time T2.

   Sometimes these DHCPREQUESTs are also malformed. Specifically, the DHCP 'Server
   IP Address Option' is 0, the DHCP 'Requested IP Address Option' is 0, and the
   'ciaddr' field is 0. (There is no case where a DHCP client should send a
   DHCPREQUEST packet with that set of characteristics.)

   Throughout this time (from the time the relinquished lease would have reached
   time T1 until it would have reached time T2), the client may also sometimes send
   DHCPDISCOVER packets, receive DHCPOFFERs, and ignore the offers.

9) Optionally, at the time the relinquished lease would have reached time T2, the
   client tries to REBIND the relinquished lease. The DHCP server that had
   granted the relinquished lease responds with a DHCPNAK. Other DHCP servers do
   not respond. The client continues to use the IP address, and may try to
   rebind the relinquished lease additional times until the time the
   relinquished lease was to have expired.

   Sometimes, these DHCPREQUESTs are also malformed. Specifically, the DHCP 'Server
   IP Address Option' is 0, the DHCP 'Requested IP Address Option' is 0, and the
   'ciaddr' field is 0. (There is no case where a DHCP client should send a
   DHCPREQUEST packet with that set of characteristics.)

   Throughout this time (from the time the relinquished lease would have reached
   time T2 until it would have expired), the client may also sometimes send
   DHCPDISCOVER packets, receive DHCPOFFERs, and ignore the offers.

10) The problem ends in one of these ways:

    If the client is offline (e.g. disconnected from the network) at the time the
    relinquished lease was due to expire, when it next reconnects it starts in the
    DHCP INIT state and works properly. (It sends DHCPDISCOVER(s), receives
    DHCPOFFER(s), proceeds to SELECTING and BOUND, and uses the IP address obtained
    via its new lease.)
    
    Alternatively, if the client is connected to the network at the time the
    relinquished lease was due to expire, at that time the client stops using the IP
    address from the relinquished lease, enters the DHCP INIT state, and works
    properly. (It sends DHCPDISCOVER(s), receives DHCPOFFER(s), proceeds to
    SELECTING and BOUND, and uses the IP address obtained via its new lease.)



-------

* Apple's Take

Apple has indicated to me that the Mac OS X DHCP client may indeed transition directly from
the DHCP INIT state to the DHCP BOUND state.  

The situation they describe is:

a) A client is in the DHCP BOUND state.

b) The DHCP client enters the INIT-REBOOT state, e.g. as a result of
a sleep/wake cycle or link state change.

c) The DHCP client sends a DHCPREQUEST.

d) The DHCP client receives neither a DHCPACK or DHCPNAK.

(I note that in our case, the DHCPREQUEST may indeed have reached the DHCP server and
the DHCP server sent a DHCPACK.  Presumably this never reached the client.) 

e) Their DHCP client chooses to stop using the old lease, although there
is still time remaining on the lease.

(I note that RFC 2131 allows the client to continue using the old lease if it wishes,
until the old lease expires.)
Instead, their client chooses to return to the DHCP INIT state.

f) The DHCP client sends a DHCPDISCOVER.

g) The DHCP client does not receive a DHCPOFFER.

(I note that in our case, the DHCPDISCOVER did indeed reach the DHCP servers,
and the DHCP servers send DHCPOFFERs to the client.  Presumably none of the
multiple DHCPOFFERs reached the client (e.g. all are dropped by the network),
or the client has gone selectively deaf to just these offers.)

h) The DHCP client sends an ARP request to the IP router that was its default
gateway in the old lease.  The router responds, and the client receives the response.

i) The DHCP client goes back to the DHCP BOUND state and resumes using the
old lease.  It will enter RENEWING and REBINDING state as usual.



I see two problems with the client behavior Apple 's described:

* It doesn't explain why the DHCPREQUEST packets I observe in step 8 and 9 
are sometimes malformed.

* More importantly, the client transitions directly from the DHCP INIT state
to the BOUND state.  I don't believe that's permitted in DHCP.

Specifically, Figure 5 in RFC 2131 contains the state transition diagram
for DHCP clients.  (There have been some changes due to later RFCs, but
none that are directly relevant to this matter.)

It makes it clear that a client the only time a DHCP client may send a DHCPDISCOVER
is when it is in the DHCP INIT state, and that there is no way for a client
in the DHCP INIT state to get to the DHCP BOUND state without the server granting
it a lease.  There's no provision in the RFC for the client attached to a single subnet
to "go back" to using an old lease on that subnet.   Once a client identified by a unique (client identifier, subnet)
tuple sends a DHCPDISCOVER, any old lease identified by that (client identifier, subnet) tuple
is no longer valid.  The client's abandoned any old lease identified by that tuple.

--

* DNAv4

The client behavior may be Apple's implementation of  "Detection
of Network Attachment in IPv4 DNAv4)" (RFC 4436).

However, Apple's client is not behaving the way that RFC describes.

Specifically, RFC 4436 (section 2.2, paragraph 1) states that a client
in the situation we're seeing (the client has an "operable routable IPv4 address")
should broadcast a DHCPREQUEST message from the INIT-REBOOT state.
(It says to broadcast a DHCPDISCOVER from the INIT state if the client
doesn't have an operable routable IPv4 address on any network, but that's not
the case here.)

So if what Apple's doing is an implementation of DNAv4, it's not doing it right.

--

* Summary

Apple's stance is that sending a DHCP client can indeed go back to resume using the old lease;
that sending a DHCPDISCOVER doesn't imply that client has abandoned that lease.

I believe that's not right; it appears to me that it is not permitted by
the state transition diagram (Figure 5) in RFC 2131.  And if what they're
doing is DNAv4, RFC 4436 also makes it clear that what their client is
doing is wrong.

So I believe that the behavior introduced in the Mac OS X 10.4.x DHCP client is not correct.

If your DHCP server makes use of the DHCPDISCOVER to decide that the
old lease identified by (client identifier, subnet) has been abandoned by the client,
this client behavior will cause a problem.  That's because as far as the server
is concerned, the old lease has been abandoned by the client, but the client
proceeds to use that IP address.    Your server may lease that IP address
to another device.  

And if you have equipment that monitors DHCP traffic
to learn which IP addresses are leased to clients, the conclusions
reached by that equipment will not always match that of Mac OS X 10.4.x clients.

--








More information about the unisog mailing list