Apache2 Version Analysis: Data Visualisation
During a recent attempt at answering the Honeynet Log Mysteries Challenge, I wrote a series of reasoned analyses for the supplied Honeynet logging data. Unfortunately, teaching workloads stopped me from submitting any realistic challenge answer.
Inspired by the idea of applying the Scientific Method to Digital Forensics (see Casey2009 and Carrier2006) and using data visualisation (see Conti2007 and Marty2008), I set about attempting to apply the same principles to analysing the Log Mysteries data sets.
In the blog post Apache2 Version Analysis, we presented an argument that purported to provide an upper bound estimate on the version of Apache2 that was present on the Log Mysteries web server. It was pointed out, in that blog post, that this version estimate had a subtle error that needed to be located and fixed. In this article, we aim to rectify this situation by using a timeline to correctly estimate that Apache2 is at a revision < 596448 (ie. tag release is ≤ 2.2.6). Under minimal additional assumptions, we can also deduce that Apache2 is at a revision ≥ 420983 (ie. tag release is ≥ 2.2.3).
The Apache2 Version Analysis blog post presented a number of observations regarding our Log Mysteries data. These observations were then used to state and prove a number of propositions. A natural language argument was then used to estimate an upper bound on the version of Apache2 being used.
Unfortunately, this natural language argument obscures the presence of a subtle error. As our argument is essentially a timeline analysis (without the data visualisation!), we use the Apache2 subversion repository to first build the following timeline rows:
 a row to show when log events in
apache2/www*.log
occurred  a row to show when revisions of
mod_unique_id.c
were released and active  a row to show when tagged versions of Apache2 were released and active.
Apache2 Timeline
 We first construct an upper bound on the version of
mod_unique_id.c
used by our Apache2 web server as follows: we know from our timeline that Apache2 must use a version of
mod_unique_id.c
at a revision < 951895 (this statement is justified by the fact that revision 951895 occurs after the last timestamped log event present in the filesapache2/www*.log
andauth.log
)  we know from our previous blog post that
mod_unique_id.c
can not be using code from revision 596448 (this statement is justified by comparing ourUNIQUE_ID
timestamp value against the log events timestamp and realising that the two numbers are not of the same order of magnitude).
mod_unique_id.c
at a revision ≤ 420983 and that Apache2 is at a revision < 596448 (ie. tag release is ≤ 2.2.6).  we know from our timeline that Apache2 must use a version of

If we assume that
mod_unique_id.c
is at revision 420983, then we can also use our timeline to determine that 2.2.3 is a lower bound estimate on the version of the Apache2 web server (this statement is justified since our assumption implies Apache2 is at a revision ≥ 420983, and the earliest tag in this interval range is 2.2.3).
In modelling our problem, our original blog post introduced two sources of error:
 it was ambiguous about the relationship between the file
mod_unique_id.c
and the rest of the Apache2 code base  it was ambiguous about the longevity of a source code revision.
Future blog posts will focus on using data visualisation and statistical analysis techniques to further analyse the Honeynet logging data.