Host performance statistics

From: Peter Phaal <peter.phaal@inmon.com>
Date: 03/08/10
Message-Id: <153F61CB-00B9-472B-B222-63979D3D0D16@inmon.com>

The implementation of sFlow on virtual switches places an sFlow agent in an ideal location to monitor the performance of physical and virtual machines, unifying network and system performance monitoring.

The scalability of sFlow's counter push mechanism provides an efficient way to monitor the large number physical and virtual switches and servers in a data center. The number of virtual machines per server is going up, 20-40 virtual machines per physical machine is not unusual. Monitoring a data center with 10,000 physical switch ports might involve monitoring as many as 5,000 physical server, 10,000 virtual switches and 200,000 virtual switch ports and 100,000 virtual servers. sFlow has the scalability needed to monitor the traffic and performance of all the physical and virtual switches and physical and virtual servers in this environment. Extending sFlow to add server performance monitoring is straightforward and would simplify management by providing a single, unified measurement system.

There are a relatively small number of metrics that are typically used to monitor system performance, the following set is exported by Ganglia, a widely used, open source performance monitoring system for monitoring cluster/grid performance
http://ganglia.sourceforge.net/

         bytes_in Number of bytes in per second l,f
         bytes_out Number of bytes out per second l,f
         cpu_aidle Percent of time since boot idle CPU l
         cpu_idle Percent CPU idle l,f
         cpu_intr
         cpu_nice Percent CPU nice l,f
         cpu_num Number of CPUs l,f
         cpu_rm
         cpu_speed Speed in MHz of CPU l,f
         cpu_ssys
         cpu_system Percent CPU system l,f
         cpu_user Percent CPU user l,f
         cpu_vm
         cpu_wait
         cpu_wio
         disk_free Total free disk space l,f
         disk_total Total available disk space l,f
         load_fifteen Fifteen minute load average l,f
         load_five Five minute load average l,f
         load_one One minute load average l,f
         location GPS coordinates for host e
         machine_type
         mem_buffers Amount of buffered memory l,f
         mem_cached Amount of cached memory l,f
         mem_free Amount of available memory l,f
         mem_shared Amount of shared memory l,f
         mem_total Amount of available memory l,f
         part_max_used Maximum percent used for all partitions l,f
         pkts_in Packets in per second l,f
         pkts_out Packets out per second l,f
         proc_run Total number of running processes l,f
         proc_total Total number of processes l,f
         swap_free Amount of available swap memory l,f
         swap_total Total amount of swap memory l,f

Note: Ganglia defines a common set of metrics that can be collected from a wide variety of operating systems. Basing the sFlow counters on the Ganglia metrics builds on 10 years of work in defining a set of metrics that has proven to be effective in a wide range of systems. In addition, the Ganglia project has built a library for obtaining these metrics on different platforms. Basing the sFlow specification on the Ganglia metrics would allow an sFlow agent to leverage this library, greatly simplifies the task of collecting host statistics for an sFlow agent.

For virtual machines, the xenstat library defines a similar set of counters for virtual machines in a Xen environment and VMware maintains similar performance counters for virtual machines.

It is relatively easy to come up with a set of sFlow counter structures to export this data. However, to unify network and system monitoring (i.e. to be able to associate the network traffic generated by a host with its performance counters) you need a common key.

Each physical/virtual machine is associated with one or more physical or virtual network adapters. Defining an sFlow structure to associate the adapter MAC addresses with the host performance counters provides the needed common key.

/* Physical or virtual network adapter NIC/vNIC */
struct host_adapter {
   unsigned int ifIndex; /* ifIndex associated with adapter
                                Must match ifIndex of vSwitch
                                port if vSwitch is exporting sFlow
                                0 = unknown */
   mac mac_address<>; /* Adapter MAC address(es) */
}

/* Set of adapters associated with entity.
   A physical server will identify the physical network adapters
   associated with it and a virtual server will identify its virtual
   adapters. */
/* opaque = counter_data; enterprise = 0; format = 2001 */

struct host_adapters {
   adapter adapters<>; /* adapter(s) associated with entity */
}

The basic mechanisms of sFlow (counter polling and random packet sampling) are extremely scalable and can be extended to collect the data needed to manage converged data center infrastructures. An integrated, end-to-end, instrumentation system is needed management of complex operations like virtual machine migration that affect system, network and storage performance. Extending sFlow provides a multi-vendor solution to data center monitoring that addresses these requirements.

Peter
Received on Mon Mar 8 21:22:49 2010

This archive was generated by hypermail 2.1.8 : 03/08/10 PST