Clock Descriptions

Posted by Bagpuss on April 26, 2011
Tags: honeynet, digital forensics, data visualisation, clock descriptions

During an attempt at answering the Honeynet Log Mysteries Challenge, I wrote a series of reasoned analyses for the supplied Honeynet logging data. Unfortunately, teaching workloads stopped me from submitting any realistic challenge answer.

Inspired by the idea of applying the Scientific Method to Digital Forensics (see Casey2009 and Carrier2006) and using data visualisation (see Conti2007 and Marty2008), I set about attempting to apply the same principles to analysing the Log Mysteries data sets.

With this final blog article, we shall use the logging events present in sanitized_log/apache2/www-*.log to build a reference clock description. In doing this, we follow An Improved Clock Model for Translating Timestamps by Florian Buchholz.

In Apache2 Version Analysis, we reverse engineered the UNIQUE_ID cookie value to reveal that the web server was listening for incoming HTTP connections at 10.0.1.14.

Here we assume that both 10.0.1.14 and 10.0.1.2 are on the same subnet, and so are probably both under a common administrative jurisdiction. As a result, and should the need arise, we would then be able to obtain additional information regarding 10.0.1.2 (eg. clock synchronisation events, operating system version or installed applications). In this sense, we may take the point of view that 10.0.1.2 may have its clock independently related to a trusted time source. By deriving timing relationships between these two machine clocks, we should then be able to relate 10.0.1.14's clock to a trusted time source.

When looking at the 10.0.1.2 RSS newsfeed requests for the URL /feed/, we can see a clear linear relationship (modulo 3 periods of computer downtime?) between the time at which the request is received by the web server and the index/position of the received request (click to view):

This visual observation further motivates our attempts at using 10.0.1.2's clock as a trusted time source.

PubSub Timing Accuracy

On OS X 10.6 (here we have trusted 10.0.1.2's HTTP UserAgent string - see user-agent-string.info), newsfeed HTTP requests are generated using Apple's PubSub agent. By default, this agent typically performs a newsfeed update or refresh every 30 minutes (we have determined this value by inspecting vanilla installations of Mail.app and Safari under OS X 10.6).

The 10.0.1.14 log event timestamp is generated by 10.0.1.14's system clock when the log event is received (according to Capturing Timestamp Precision for Digital Forensics by Eugene Antsilevich, this is achieved via a call to gettimeofday()). The accuracy of this timestamp is related to:

  • the accuracy of the PubSub agent in triggering newsfeed HTTP update events
  • and network latency.

Given that we are assuming that 10.0.1.14 and 10.0.1.2 are on a common subnet, we further assume that network latency here is negligible. Thus, our timestamp accuracy boils down to the timing accuracy of the PubSub agent.

Ad hoc experiments suggest that PubSub is capable of fetching newsfeeds every $30 (\pm 0.1\%)$ minutes. Thus, in the remainder of this article, we will assume that PubSub provides us with an accurate periodic timing source.

For each RSS request (say request number $n \ge 0$), let $t_{ref}(n)$ be the time determined using 10.0.1.2's clock. Due to the periodicity of 10.0.1.2's newsfeed requests we can represent this time via the equation:

$t_{ref}(n) = t_{offset} + \delta \times n$
where $\delta$ is the periodicity with which these RSS requests are generated (using the analysis above, we shall take this value to be 30 minutes) and $t_{offset}$ is a starting, base or offset time value. In addition, we have that this time is equivalent to the newsfeed's recorded timestamp $t_{host}(n)$.

Clock Descriptions

From An Improved Clock Model for Translating Timestamps (by Florian Bucholz), we have that clocks may be described in terms of their clock skew (which we learn are expected to be described by linear relationships) and their synchronisation or time adjustment events.

In the absence of any further clock data for 10.0.1.2, we choose to fix $t_{offset}$ to be $t_{host}(0)$. We may now build a clock reference model for 10.0.1.14 by estimating its clock ticks as follows:

  • over the observed time period, we have that $t_{ref}$'s clock ticks correspond precisely with the order of 10.0.1.2 RSS logging events
  • by trusting 10.0.1.14 to accurately measure time duration and assuming that 10.0.1.2 generates an RSS request every 30 minutes (an inspection of 10.0.1.2 should allow one to provide an accurate alternative here), we can estimate how many clock ticks have occurred during each of our RSS refresh gaps.
As a result of these considerations, we may now implement a function (see lines 11 to 42 of Clock Description for 10.0.1.14) that calculates how many 10.0.1.2 clock ticks have occurred since $t_{host}(0)$. Using this function, we can now plot the following graph (click to view):

As the above graph has no visual indication of synchronisation events, we conclude that 10.0.1.14 has not adjusted (eg. via NTP) its clock throughout the logging of events in sanitized_log/apache2/www-*.log. Thus, in order to produce a clock description for 10.0.1.14 (in terms of 10.0.1.2's clock), we need only measure its clock skew.

Using R, we may use linear regression to curve fit our straight line to yield the following equation (we work here to 3 decimal places):

$t_{host} = 44100000 + 0.965 \times t_{ref}$
From this equation we may now calculate 10.0.1.14's clock skew (ie. the rate at which $t_{host}-t_{ref}$ varies with respect to $t_{ref}$) as follows:
  • we first plot $(44100000 - 0.035 \times t_{ref})$ vs $t_{ref}$ - this plot is chosen based on the following reasoning:
    $t_{host}-t_{ref}$$\sim$$t_{ref}$(gradient $=$ clock skew)
    $t_{host}-t_{ref}$$=$$44100000 + 0.965 \times t_{ref} - t_{ref}$(substituting for $t_{host}$)
     $=$$44100000 + (0.965 - 1) \times t_{ref}$(by simplification)
  • finally, using R and linear regression we estimate that the gradient of this plot (ie. our clock skew) is $-0.035$ (here, we ignore our error term of $\pm 10^{-15}$ as it can be taken to be a negligible quantity).

Having calculated our clock skew, we now estimate that for each $24$ hours that pass, 10.0.1.14's clock falls behind 10.0.1.2's clock by $50.0$ minutes. We thus see that clock skew has a significant impact here on the accuracy of our logging data's timestamps.

Thanks again to the Honeynet Project for organising and hosting these stimulating and engaging challenges.

Tools Used

Rails 3 used to model our data (see GitHub project for Rails application used in analysis)
JGR to visualise data and perform linear regression
Protovis 3.2 used to plot graphs in Rails application.