Estimating Apache2 Restarts

Posted by Bagpuss on March 08, 2011
Tags: honeynet, digital forensics, data visualisation, apache2, score board

During a recent attempt at answering the Honeynet Log Mysteries Challenge, I wrote a series of reasoned analyses for the supplied Honeynet logging data. Unfortunately, teaching workloads stopped me from submitting any realistic challenge answer.

Inspired by the idea of applying the Scientific Method to Digital Forensics (see Casey2009 and Carrier2006) and using data visualisation (see Conti2007 and Marty2008), I set about attempting to apply the same principles to analysing the Log Mysteries data sets.

In Apache2 Version Analysis we had shown how the UNIQUE_ID cookie value, which is present within each Apache2 logging event, could be reverse engineered to extract (amongst other things):

  • the Apache2 scoreboard thread ID
  • and the worker thread's system-level process ID.
In this article, we shall use this information (along with the logging data present in sanitized_log/apache2/www-*.log) to calculate estimates on the times that the Apache2 process was restarted. We will close this article by checking our estimates against the Apache2 restart, stop, start and killall events that are present within sanitized_log/auth.log.

We start our analysis by plotting (on a scatter chart) observed logging event times against the Apache2 worker thread process ID (click to view the graph):

From this graph we note and observe the following:

  • how process ID's cluster in horizontal lines - we suggest that this indicates periods of stability or uptime for the Apache2 process?
  • how, within horizontal process ID clusters, the worker thread ID's (here visualised by coloured dots) appear to (evenly?) spread out - we suggest that this indicates Apache2 performing load balancing on incoming requests?
  • and how unusual kinks appear to precede the horizontal process ID clustering - we suggest that this indicates Apache2 starting up and reserving process ID's for its worker thread pool?

Apache2 MPM Documentation

In reviewing the Apache2 documentation, we discover that a Multi-Processing Module (MPM) is used to implement a hybrid multi-process multi-threaded server.

By default, the prefork MPM module (ie. each worker thread process uses at most one [process] thread to handle HTTP requests) is used on an Ubuntu operating system (see Apache2 Version Analysis: Ubuntu Packaging). Unless we discover anything to the contrary, we shall assume that a default Apache2 installation is used here.

The Apache2 directive:

  • StartServers (default value is 5) is used to determine the number of worker threads to create when Apache2 starts up. According to Performance Tuning: Process Creation, every t seconds after Apache2 starts up, max(2t, 32) new worker threads are created until a total of StartServers worker threads have been created.
    When more than 4 worker threads per second are spawned, then a message is written to ErrorLog. In reviewing sanitized_log/apache2/www-error.log, we can not find any error messages indicative of such a behaviour, and so conclude that the server never spawns at a rate of more than 4 worker threads per second.
  • MinSpareServers (default value is 5) is used to determine the minimum number of idle worker threads. Should the number of idle worker threads fall below this value, then Apache2 will start a new worker thread every second until this threshold value is met.
  • MaxSpareServers (default value is 10) is used to determine the maximum number of idle worker threads. Should the number of idle worker threads ever exceed this value, then Apache2 will kill off the excess processes.
  • MaxClients (default value is 256) is used to determine the maximum number of HTTP requests that the Apache2 server may simultaneously handle. If more than MaxClients requests are received, then these are normally queued until they may be processed by an idle or free worker thread. The number of pending connections allowed to be queued, before HTTP requests are dropped, is 511 by default (see ListenBackLog).
  • MaxRequestsPerChild (default value is 10000) is used to determine how many requests a worker thread can receive before it is forcibly recycled (ie. restarted). As the total number of Apache2 requests is 594, we clearly never hit this limit! Thus, no Apache2 worker threads are restarted due to this threshold limit being met or hit.
  • KeepAliveTimeout (default value is 15 seconds) is the number of seconds Apache2 will keep a connection open waiting for a subsequent request.
  • TimeOut (default value is 300 seconds) is the number of seconds Apache2 will wait for:
    • a GET request to be received
    • receipt of TCP packets on a POST or PUT request
    • or, ACK's between subsequent transmissions of TCP response packets.
    Once this threshold value is reached, Apache2 fails the HTTP request.

When seeking to clarify process ID behaviour for our Apache2 worker threads, the following timeline graph helps to substantiate our claim regarding Apache2's start up behaviour (click to view the graph - warning: page is approx. 690KB):

Notice how, on our timeline, we have areas of rapid and incremental process ID activity that appear to correspond well with the kinks on our scatter graph. Also notice how the longer process timelines (eg. for process ID's 464, 10537 and 16838) appear to substantiate the claim that horizontal process ID clustering is related to periods of Apache2 stability or uptime.

Apache2 Modelling

We can model an Apache2 server instance by simply modelling the scoreboard worker thread to process ID mapping (eg. via a hash data structure: with keys being worker thread IDs; and values being process IDs).

Throughout Apache2's lifetime, we can expect our pool of scoreboard worker threads to remain fixed (unless some of the Apache2 directives alter that is!), and their associated process ID values to potentially vary.

So, by looking at how constant or stable process IDs are for each scoreboard worker thread, we are able to visualise:

  • the periods of Apache2 stability or uptime (these periods will appear as solid lines on our timeline)
  • and the periods within which all Apache2 worker threads have restarted (these periods appear as gaps on our timeline).

In order to explore the relationship between the Apache2 worker threads and the (system) process ID's, we can plot the following timeline (click to view the graph - warning: page is approx. 690KB):

Using the above timeline we may now estimate time intervals during which Apache2 has restarted as follows (an Apache2 restart is here defined to be an event caused by restarting all of its worker threads):

DateTime IntervalObservation
19th April 201018:59:44-19:06:54Apache2 restarts
18:17:07-18:59:16Restart of thread 1
20th April 2010≥ 00:48:16Restart of threads 0-4 and 6-9
21st April 201016:35:05-01:35:12Restart of thread 4
22nd April 201018:12:10-18:40:26Apache2 restarts
23rd April 201017:56:20-18:21:15Apache2 restarts
18:21:30-01:02:37Restart of thread 8
18:24:55-18:25:27Apache2 restarts
18:24:55-22:10:35Restart of thread 4
18:25:33-18:26:31Apache2 restarts
18:25:33-00:33:40Restart of thread 2
18:28:14-18:31:24Apache2 restarts

Comparing the above table's restart estimates against known sanitized_log/auth.log based Apache2 restarts allows the following timeline to be created (click to view the graph):

Using this data visualisation, we can now verify that all of our Apache2 restart estimates can be accounted for by the Apache2 restart events present within sanitized_log/auth.log.

In our final blog article, we shall look at how the work of Florian Buchholz (eg. see An Improved Clock Model for Translating Timestamps) can be used to measure logging event times relative to a suitable reference clock description.

Tools Used

Rails 3 used to model our data (see GitHub project for Rails application used in analysis)
JGR to initially explore and visualise data
Protovis 3.2 used to plot graphs in Rails application.