Apache2 Version Analysis: Data Visualisation

Posted by Bagpuss on January 21, 2011
Tags: honeynet, digital forensics, data visualisation, timeline

During a recent attempt at answering the Honeynet Log Mysteries Challenge, I wrote a series of reasoned analyses for the supplied Honeynet logging data. Unfortunately, teaching workloads stopped me from submitting any realistic challenge answer.

Inspired by the idea of applying the Scientific Method to Digital Forensics (see Casey2009 and Carrier2006) and using data visualisation (see Conti2007 and Marty2008), I set about attempting to apply the same principles to analysing the Log Mysteries data sets.

In the blog post Apache2 Version Analysis, we presented an argument that purported to provide an upper bound estimate on the version of Apache2 that was present on the Log Mysteries web server. It was pointed out, in that blog post, that this version estimate had a subtle error that needed to be located and fixed. In this article, we aim to rectify this situation by using a timeline to correctly estimate that Apache2 is at a revision < 596448 (ie. tag release is ≤ 2.2.6). Under minimal additional assumptions, we can also deduce that Apache2 is at a revision ≥ 420983 (ie. tag release is ≥ 2.2.3).

The Apache2 Version Analysis blog post presented a number of observations regarding our Log Mysteries data. These observations were then used to state and prove a number of propositions. A natural language argument was then used to estimate an upper bound on the version of Apache2 being used.

Unfortunately, this natural language argument obscures the presence of a subtle error. As our argument is essentially a timeline analysis (without the data visualisation!), we use the Apache2 subversion repository to first build the following timeline rows:

  • a row to show when log events in apache2/www-*.log occurred
  • a row to show when revisions of mod_unique_id.c were released and active
  • a row to show when tagged versions of Apache2 were released and active.
This results in the following timeline (click image to view):

Apache2 Timeline
Using this timeline, we can now correct our original Apache2 version estimate:
  • We first construct an upper bound on the version of mod_unique_id.c used by our Apache2 web server as follows:
    • we know from our timeline that Apache2 must use a version of mod_unique_id.c at a revision < 951895 (this statement is justified by the fact that revision 951895 occurs after the last timestamped log event present in the files apache2/www-*.log and auth.log)
    • we know from our previous blog post that mod_unique_id.c can not be using code from revision 596448 (this statement is justified by comparing our UNIQUE_ID timestamp value against the log events timestamp and realising that the two numbers are not of the same order of magnitude).
    Thus, we get that Apache2 is using a version of mod_unique_id.c at a revision ≤ 420983 and that Apache2 is at a revision < 596448 (ie. tag release is ≤ 2.2.6).
  • If we assume that mod_unique_id.c is at revision 420983, then we can also use our timeline to determine that 2.2.3 is a lower bound estimate on the version of the Apache2 web server (this statement is justified since our assumption implies Apache2 is at a revision ≥ 420983, and the earliest tag in this interval range is 2.2.3).

In modelling our problem, our original blog post introduced two sources of error:

  • it was ambiguous about the relationship between the file mod_unique_id.c and the rest of the Apache2 code base
  • it was ambiguous about the longevity of a source code revision.
It is only with the application of mathematical rigour (eg. via a suitable formal logical system) that one can be hope to eliminate such modelling errors.

Future blog posts will focus on using data visualisation and statistical analysis techniques to further analyse the Honeynet logging data.

Tools Used

BeeDocs Timeline 3D