Detecting NAT Devices using sFlow

Peter Phaal, sFlow.org

Overview

Unauthorized NAT (Network Address Translation) devices can be a significant security problem. Typically the NAT device will appear to the network administrator as an end host and it will authenticate itself onto the network. However, the NAT device provides unrestricted access to any number of hosts connecting to it directly, or more troublingly via wireless (Wi-Fi 802.11). Wi-Fi is a particular problem since it allows access to the network from a considerable distance, allowing unauthorized access without even entering the building.

Reliably detecting NAT devices is difficult since they are virtually indistinguishable from legitimate hosts. This paper describes how the detailed, pervasive, traffic monitoring capabilities of sFlow (RFC 3176) can be used to identify NAT devices on a network.

Technique

Figure 1 shows a simplified network topology. The firewall connects the router to the Internet. The router is connected to two distribution switches. In this network the administrative policy is for host computers to be directly connected to the distribution switches, as is shown by Host C. Two hosts, A and B, are connected to the distribution switch through an illicit NAT router. The two distribution switches are sending a continuous stream of sFlow data to the sFlow Analyzer. The challenge is to find the NAT router using the sFlow traffic measurements.

Figure 1: Network with NAT Router

The NAT detection technique is based on two observations about the IP TTL (Time To Live) field.

  1. Host operating systems have characteristic initial TTL values. This property of individual operating system implementations of TCP/IP is well known and can be used as part of a "fingerprint" to identify the operating system that a host is running merely by examining its traffic. The technique is well described in Passive OS Fingerprinting: Details and Techniques by Toby Miller.
  2. NAT devices or gateways decrement the TTL on packets that they forward.

sFlow provides a stream of sampled packet headers captured at the two switches. These packet headers can be decoded and IP source addresses and TTL values can be extracted. 

Suppose all the hosts use the Windows operating system, each host would then generate IP packets with a TTL value of 128. Since the TTL value is decremented each time the packet traverses a router, a packet seen at the firewall from Host C would always have a TTL of 127. Similarly, a packet from Host C seen by the other switch (Switch 10.10.49.204) would also have a TTL of 127. However, the switch connecting Host C to the network (Switch 10.10.67.1) should always see a TTL of 128. The algorithm for detecting NAT routers relies on the observation that switches directly connected to a host, or in the same subnet as a host, will always see packets from the host with a TTL that is characteristic of the host operating system.

In this example the sFlow Analyzer would see a TTL of 127 when examining packets sampled by switch 10.10.49.1 that  apparently originated from "host" 10.10.49.1. The TTL values in packets from Hosts A and B are decremented by the NAT router before they are passed to the switch, revealing the existence of the router.

The effectiveness of this algorithm is easily demonstrated using sFlow data from a production network.

Experiment

The  sflowtool utility can be used to decode sFlow packets and feed results into a script. The  findnat.awk script shown in Figure 2 implements the NAT discovery algorithm and identifies likely NAT hosts.

#!/bin/awk -f
#
# findnat.awk
#
# Script to identify likely NAT devices using sFlow.
# Usage:
#   sflowtool | ./findnat.awk
# generates results of the form:
#   10.10.158.204 127 62216 002078142a2f
#   where:
#     10.10.158.204 is a likely NAT device
#     127 is the TTL seen in packets from the device
#     62216 is a source TCP port number seen a packet from the device
#     002078142a2f is the MAC address of the device
#
# Copyright (c) InMon Corp. 2003 ALL RIGHTS RESERVED

#
# Initialize constants
#

BEGIN {
  # Specify edge switches and subnets
  agents["10.10.49.1"] = "10.10.49.0/24";
  agents["10.10.67.1"] = "10.10.67.0/24";

  # Convert mask bits to mask address
  masks[0] = "0.0.0.0";
  masks[1] = "128.0.0.0";
  masks[2] = "192.0.0.0";
  masks[3] = "224.0.0.0";
  masks[4] = "240.0.0.0";
  masks[5] = "248.0.0.0";
  masks[6] = "252.0.0.0";
  masks[7] = "254.0.0.0";
  masks[8] = "255.0.0.0";
  masks[9] = "255.128.0.0";
  masks[10] = "255.192.0.0";
  masks[11] = "255.224.0.0";
  masks[12] = "255.240.0.0";
  masks[13] = "255.248.0.0";
  masks[14] = "255.252.0.0";
  masks[15] = "255.254.0.0";
  masks[16] = "255.255.0.0";
  masks[17] = "255.255.128.0";
  masks[18] = "255.255.192.0";
  masks[19] = "255.255.224.0";
  masks[20] = "255.255.240.0";
  masks[21] = "255.255.248.0";
  masks[22] = "255.255.252.0";
  masks[23] = "255.255.254.0";
  masks[24] = "255.255.255.0";
  masks[25] = "255.255.255.128";
  masks[26] = "255.255.255.192";
  masks[27] = "255.255.255.224";
  masks[28] = "255.255.255.240";
  masks[29] = "255.255.255.248";
  masks[30] = "255.255.255.252";
  masks[31] = "255.255.255.254";
  masks[32] = "255.255.255.255";

  # Octet multipliers for converting IP addresses to integers
  b1 = 256;
  b2 = 256 * b1;
  b3 = 256 * b2;
}

#
# The following actions apply to each sFlow record
#

# The agent (switch) reporting this flow sample
/agent/  {agent = $2;}

# The source MAC address decoded from the flow sample
/srcMAC/ {mac = $2;}

# The source IP address decoded from the flow sample
/srcIP/  {src = $2;}

# The IP TTL decoded from the flow sample
/IPTTL/  {ttl = $2;}

# The TCP source port decoded from the flow sample
/TCPSrcPort/{
  port = $2;

  # The TCP port number is the last field we need, so perform the analysis  
  # Is this an edge switch?
  localSubnet = agents[agent];
  if(localSubnet) {

    # Is this a packet from a host on the subnet local to the agent?
    split(localSubnet, parts, "/");
    subnet = parts[1];
    bits = parts[2];
    mask = masks[bits];

    split(mask,parts,".");
    submask = parts[1] * b3 + parts[2] * b2 + parts[3] * b1 + parts[4];

    split(subnet,parts,".");
    subagt = and(parts[1] * b3 + parts[2] * b2 + parts[3] * b1 + parts[4], submask);

    split(src,parts,".");
    subaddr = and(parts[1] * b3 + parts[2] * b2 + parts[3] * b1 + parts[4], submask);

    if(subaddr == subagt) {
      # Is the TTL characteristic of an end host?
      if((ttl != 255) && (ttl != 1) && (ttl % 2)) {
        # A likely NAT device, have we reported on it before?
        if(!host[src]) {
           entry = src " " ttl " " port " " mac;
           print entry;
           host[src] = entry;
        }
      }
    }
  }
} 

Figure 2: findnat.awk

The script is provided with IP addresses of the distribution switches and the subnets containing their hosts. The script refines the algorithm in a couple of ways. Firstly, it only considers TCP traffic. This helps eliminate false positives created by the use of the traceroute tool (which varies TTL in order to identify routers on a path). Traceroute uses ICMP or UDP packets. The second refinement involves the determination of the hosts native TTL. It appears from empirical observation that TTL values are either 1, 255 or particular even numbers in between. Rather than enumerate all the known TTL values (such as 60, 64, 128 etc), the discriminator function simply tests to see if a packet TTL is 1, 255 or even.

sflowtool | ./findnat.awk
10.10.49.204 127 62216 002078142a2f
10.10.67.126 127 1088 0004806dd700
10.10.67.121 127 1038 0004806dd700

Figure 3: findnat.awk results

Figure 3 shows the result of running the script. It clearly identifies host 10.10.49.204 as a NAT router. It reports the TTL as 127 and shows the source TCP port as 62216. The high port number is further indication that this is a NAT router since many NAT routers assign very high port numbers to avoid clashes with well known server ports.  In addition there are apparently two other NAT routers, 10.10.67.126 and 10.10.67.121. However, both these "routers" have the same MAC address, 0004806dd700, suggesting that there is indeed a router, but that it is not performing a NAT function and that the addresses 10.10.67.126 and 10.10.67.121 are in fact host addresses.

It would be interesting to know how many active hosts there are behind the NAT router. The paper A Technique for Counting NATted Hosts, AT&T Labs, Steven M. Bellovin, describes a technique for estimating  the number of hosts behind a NAT router by examining the IP Id values in a series of packets. Each host will generate its own increasing sequence of Id values.

sflowtool -t | tcpdump -v -N -r - src host 10.10.49.204
12:19:41.000000 10.10.49.204.61993 > 10.199.201.37.http: . [tcp sum ok] ack 4292552168 win 64240 (DF) (ttl 127, id 40255, len 40)
12:20:23.000000 10.10.49.204.62017 > 10.199.201.37.http: . [tcp sum ok] ack 1 win 64240 (DF) (ttl 127, id 40756, len 40)
12:21:37.000000 10.10.49.204.62216 > 10.54.226.252.http: S [tcp sum ok] 1820371619:1820371619(0) win 16384 <mss 1460,nop,nop,sackOK> (DF) (ttl 127, id 57601, len 48)
12:21:40.000000 10.10.49.204.61993 > 10.199.201.37.http: . [tcp sum ok] ack 4293696808 win 64240 (DF) (ttl 127, id 41427, len 40)
12:22:06.000000 10.10.49.204.61993 > 10.199.201.37.http: . [tcp sum ok] ack 4293946004 win 64240 (DF) (ttl 127, id 41671, len 40)
12:23:10.000000 10.10.49.204.62017 > 10.199.201.37.http: . [tcp sum ok] ack 1633741 win 64240 (DF) (ttl 127, id 42293, len 40)
12:24:00.000000 10.10.49.204.61993 > 10.199.201.37.http: . [tcp sum ok] ack 4294795172 win 64240 (DF) (ttl 127, id 42632, len 40)
12:24:24.000000 10.10.49.204.61993 > 10.199.201.37.http: SFRPW [bad tcp cksum 1748!] 2115643160:2115643180(20) ack 972685314 win 46144 urg 3585 (DF) (ttl 125, id 42893, len 40)
12:25:05.000000 10.10.49.204.62259 > 10.200.222.45.http: P [bad tcp cksum 6be7!] ack 1 win 19652 (DF) (ttl 127, id 1673, len 40)
12:25:53.000000 10.10.49.204.61993 > 10.199.201.37.http: . [tcp sum ok] ack 940080 win 64240 (DF) (ttl 127, id 43799, len 40)
12:25:56.000000 10.10.49.204.61993 > 10.199.201.37.http: . [bad tcp cksum 579d!] ack 1019539 win 17520 (DF) (ttl 125, id 43846, len 40)
12:25:56.000000 10.10.49.204.62017 > 10.199.201.37.http: E [bad tcp cksum 8380!] 2120652494:2120652506(12) win 11414 urg 5633 (DF) (ttl 125, id 43859, len 40)
12:26:05.000000 10.10.49.204.62259 > 10.200.222.45.http: R [tcp sum ok] 3206002624:3206002624(0) win 0 (DF) (ttl 127, id 1731, len 40)

Figure 4: tcpdump results

Figure 4 shows the result of a tcpdump trace examining packets from the NAT router (10.10.49.204). It appears that there are 3 distinct sequences of IP id numbers.

Figure 5: IP Id Value vs. Time

The chart in Figure 5 plots id values as a function of time and very clearly shows the three different sequences of id numbers, indicating that there are three active hosts behind the NAT router.

Conclusion

The network-wide packet header information provided by sFlow makes it relatively easy to detect NAT devices throughout the network. It would be possible to defeat this detection technique by creating a NAT gateway that didn't decrement the IP TTL. However, it is likely that the detail provided by sFlow monitoring would allow host fingerprinting techniques to be used to detect the presense of the NAT device, even in this case.