Cisco Blogs
Share
tweet

Fosdem 2016, part 3: enter the netflow

- February 14, 2016 - 0 Comments

In the last part we used NBAR2 to classify traffic. To do this the router need to investigate every traffic flow which it sees.

Of course, it would be interesting to get this information out of the router and into some logs. This is the ‘how we did this’ post and it is a bit technical. I will discuss the results in future post.

In order to export the information the router collects we use netflow. In the past netflow was pretty limited with regards to what you could export, but now with Flexible Netflow (FnF) one can select what fields to export and this not only from the normal TCP/IP based tuples, but from extra information attached from NBAR or AVC as well, for example from the performance monitors you can select the jitter statistics.

There are precious few examples on how to set this up available, that is why I’m going into some detail here.

However first a warning: netflow exporting using FnF is feature mostly used by big Cisco Prime or IWAN setups, and only a few people do this by hand and it can be quite tricky to do so. One of the problems you might commonly face is that the CLI will accept configurations which the underlying platform doesn’t support. This will seem to work, but it won’t. In order to find out if the configuration worked, one needs to look at  what the underlying platform, in this example the ASR 1006, did with the configuration commands using show flow monitor <name> internal. If you see new entries like FNF errors/Invalid argument then you know you did something the platform doesn’t like. To fix it you will need to deconfigure everything and start over.

You may be wondering, why would we want to export the NBAR2 flow information using netflow?

For Fosdem a vexing question has always been: “how many visitors do we have?”. To answer this questions we wanted to capture the HTTP user-agent header field. This looks a bit like for example “Dalvik/1.6.0 (Linux; U; Android 4.4.2; HTC One mini 2 Build/KOT49H)”. Which, in this case, tells us the user is using Android version 4.4.2 on an HTC One mini 2. As most people will not have more then one smartphone with them we can use this to guess the number of visitors.

Collecting this information ‘ per user’ or ‘ per client’  would be great, however netflow does not do this and only gives the information per flow. One way to determine could be the IP of the device, but as the IP changes over time we need to collect the relationship between the IP and the MAC address of the client. The MAC address should not change and is unique per client. So we are going to try to correlate everything to this MAC address.

In order to collect the Mac address to IP relationship we use SNMP, walking the tree beneath 1.3.6.1.2.1.4.35.1.4   / ipNetToPhysicalPhysAddress from RFC 4293. We save this relationship and use it later to associate an IP at a particular time to a MAC address.

To get the netflow itself we first need to describe what information we want to collect in a “flow record”:

flow record FR-IPv4
 match ipv4 source address
 match ipv4 destination address
 match transport source-port
 match transport destination-port
 match ipv4 protocol
 collect counter bytes long
 collect counter packets
 collect timestamp sys-uptime first
 collect timestamp sys-uptime last
 collect interface input
 collect application http user-agent
 collect application name
 collect transport tcp flags

flow record FR-IPv6
 match ipv6 extension map
 match interface output
 match ipv6 protocol
 match ipv6 source address
 match ipv6 destination address
 match ipv6 traffic-class
 match ipv6 flow-label
 match transport source-port
 match transport destination-port
 collect transport tcp flags
 collect interface input
 collect counter bytes long
 collect counter packets
 collect timestamp sys-uptime first
 collect timestamp sys-uptime last
 collect application name
 collect application http user-agent

In this case we gather the standard tuple, interface information, flow statistics and the http user-agent which is the real information we are after.

Then we need to define an “exporter” to send this flow data to:

flow exporter FE-FM-v4
 destination <some IP>
 transport udp 9995
 export-protocol ipfix
 template data timeout 30
 option interface-table timeout 30
 option application-table timeout 30
 option metadata-version-table timeout 30
 option sub-application-table timeout 30
 option application-attributes timeout 30

We also need  a similar exporter for IPv6 traffic but crucially to a different port number, as one cannot export two different records to the same exporter.

From the   content of the exporter one can see that we will use the FnF format (ipfix) and  send the tables which are key to decode the application and interface ID’s.

Then we configure a monitor to collect and send the data:

flow monitor FE-FM-IPv4
 exporter FE-FM-v4
 cache entries 1000000
 statistics packet protocol
 statistics packet size
 record FR-IPv4

Again we have a similar configuration for IPv6 traffic. The cache is important to keep the connection table in, as monitor we are defining by default only sends the information when the flow is considered ‘done’, which is after a short timeout following a FIN packets or a larger timeout if there is just no more activity for that flow.

Finally we attach this monitor to the internal interfaces with:

interface ...
 ip flow monitor FE-FM-IPv4 unicast input
 ip flow monitor FE-FM-IPv4 unicast output
 ipv6 flow monitor FE-FM-IPv6 unicast input
 ipv6 flow monitor FE-FM-IPv6 unicast output

We don’t monitor the outside interface as too much scanning traffic arrives there.

Here are some commands to verify that everything is working:

asr1k#show flow exporter FE-FM-v4 statistics
Flow Exporter FE-FM-v4:
  Packet send statistics (last cleared 1w2d ago):
    Successfully sent:         9076680               (11964321569 bytes)
    Reason not given:          1971                  (2389176 bytes)
...
asr1k#show flow record FR-IPv6
flow record FR-IPv6:
  Description:        User defined
  No. of users:       1
  Total field space:  80 bytes
  Fields:
...
asr1k#show flow monitor FE-FM-IPv6
Flow Monitor FE-FM-IPv6:
  Description:       User defined
  Flow Record:       FR-IPv6
  Flow Exporter:     FE-FM-v6
  Cache:
    Type:                 normal (Platform cache)
    Status:               allocated
    Size:                 1000000 entries
    Inactive Timeout:     15 secs
    Active Timeout:       1800 secs
    Trans end aging:   off
  Stats:
    protocol distribution
    size distribution

Finally we can  look into the current contents of the cache:

asr1k#show flow monitor FE-FM-IPv4 cache format table
  Cache type:                               Normal (Platform cache)
  Cache size:                              1000000
  Current entries:                            1329
  High Watermark:                            27245

  Flows added:                            34290645
  Flows aged:                             34294826
    - Active timeout      (  1800 secs)       7907
    - Inactive timeout    (    15 secs)   34286919

IPV4 SRC ADDR    IPV4 DST ADDR    TRNS SRC PORT  TRNS DST PORT  IP PROT ...

===============  ===============  =============  =============  ======= ... 
151.216.191.253  141.1.1.1                47594             53       17 ...

These are the source of the ‘currently we see <N> IPv4 flows‘ I reported on twitter.

The creation of this cache and the collecting of flows consumes memory on the QFP of the router, you should monitor this usage with show platform resources.

This the data that the exporter sends to the netflow exporter’s target. Decoding this proved a bit of a struggle and I had to resort to writing a custom decoder in perl to log the flows. Unfortunately while the user-agent, IP and time information was recorded correctly, my perl script had a bug and the packets and byte counter information was not recorded properly.

In the next part we will learn what these flows teach us about the Fosdem 2016 network….

Tags:

In an effort to keep conversations fresh, Cisco Blogs closes comments after 60 days. Please visit the Cisco Blogs hub page for the latest content.

Share
tweet