Correlating flow-based network measurements for service monitoring and network troubleshooting
Authors
More about the book
The resilience of network services is continuously challenged by component failures, mis-configured devices, natural disasters and malicious users. Therefore, it is an important but unfortunately difficult task of network operators and service administrators to carefully manage their infrastructure in order to ensure high availability. In this thesis we contribute novel service monitoring and troubleshooting applications based on flow-based network measurements to help operators to address this challenge. Flow-level measurement data such as IPFIX or NetFlow typically provides statistical summaries about connections crossing a network including the number of exchanged bytes and packets. Flow-level data can be collected by off-the-shelf hardware used in backbone networks. It allows Internet Service Providers (ISPs) to monitor large-scale networks with a limited number of sensors. However, the range of security or network management related questions that can be answered directly by using flow-based data is strongly limited by the fact that only a small amount of information is collected per connection. In this work, we overcome this problem by correlating and analyzing sets of flows across different dimensions such as time, address space, or user groups. This hidden information proves very beneficial for flow-based troubleshooting applications. Using such an approach, we show how flow-based data can be instrumented to effectively support mail administrators in fighting spam. In more detail, we demonstrate that certain spam filtering decisions performed by mail servers can be accurately tracked at the ISP-level using flow-level data. Then, we argue that such aggregated knowledge from multiple e-mail domains does not only allow ISPs to remotely monitor what their “own” servers are doing, but also to develop and evaluate new scalable methods for fighting spam. To assist network operators with troubleshooting connectivity problems, we contribute FACT, a system that implements a flow-based approach for connectivity tracking that helps network operators to continuously track connectivity from their network towards remote autonomous systems, networks, and hosts. In contrast to existing solutions, our approach correlates solely flow-level data, is capable of online data processing, and is highly efficient in alerting only about those events that actually affect the studied network. Most important, FACT precisely reports which address spaces are currently not reachable by which clients – an information required to efficiently react on connectivity issues. In order to introduce such innovative and productive troubleshooting applications, we improved the entire value chain from low-level data processing all the way up to knowledge extraction.