Authors: S. Yu, S. James, Y. Tian, W. Dou
Book Title: Reliable Knoweldge Discovery
Date Accepted: September 15 2011
Web traffic information, such as website popularity, flow and congestion, can provide useful insights contributing to our understanding of cyberspace. Due to the volume of information, it is useful to aggregate the data flow at various sources over a given time interval, which we can then use to make our analyses. A key problem, however is that such information can be distorted by the presence of illegitimate traffic, e.g. botnet recruitment scanning, DDoS attack traffic, etc. An important consideration in web related knowledge discovery then is the robustness of the aggregation method, which in turn may be affected by the reliability of network traffic data. In this chapter, we present some similarity-based aggregation functions which are suited to the aggregation of traffic flows. As these functions use similarity or the distance between data inputs, we then present some recently developed information theoretical indices which can be used to discriminate between illegitimate and benign traffic.