Tagging and Timelines: Part 1

Posted by Bagpuss on February 17, 2011
Tags: honeynet, digital forensics, data visualisation, tagging

During a recent attempt at answering the Honeynet Log Mysteries Challenge, I wrote a series of reasoned analyses for the supplied Honeynet logging data. Unfortunately, teaching workloads stopped me from submitting any realistic challenge answer.

Inspired by the idea of applying the Scientific Method to Digital Forensics (see Casey2009 and Carrier2006) and using data visualisation (see Conti2007 and Marty2008), I set about attempting to apply the same principles to analysing the Log Mysteries data sets.

When analysing the auth.log sudo commands, we often want to try and group these commands by (for example) their intended system or user functionality. In most modern operating systems, some form of package management system is often employed to maintain and update installed commands. If we are able to correctly relate commands to their corresponding packages, it is possible to use package meta-information to then extract descriptions of system and user functionality (which, by package association, can then be inherited by the initial command).

In this blog post, we intend to tag and classify the auth.log sudo commands using information screen scraped from Debian's package tagging project, debtags (see the vocabulary file for a description of each Debian tag). In the next blog post, we plan to use these debtag-derived taggings to locate interesting events using a timeline.

In Apache2 Version Analysis: Ubuntu Packaging, we concluded that we were working with an Ubuntu server. Now, both Debian and Ubuntu use .deb-formatted packages and target a file system organised in a manner similar to the linux standard base. However, in this article, we choose not to quantify the similarities and differences of these filesystems under their respective package managers. Instead, we opt to build an approximate or estimate model using the Debian package management system to define a mostly correct command to (Debian) package name map. When our sudo commands belong to multiple Debian packages, we shall apply the principle of minimal common knowledge and so choose the intersection of all debtag lists as our mappings result.

Rails Implementation Notes:

A basic Rails application has been implemented to hold the sudo parsed events from sanitized_log/auth.log within an Sudo model (the rake task: db:seed is used to build up the underlying database; and test:units is used to verify that the database has been correctly built - the data on this page has been verified against the master copy using these unit tests).

Within this Rails application, the Sudo model uses:

  • the function package_lookup to define our command to Debian package name mapping by screen scraping dpkg
  • the function debtags_lookup to define a Debian package name to debtag list mapping by screen scraping debtags
  • the virtual attribute debian_tags to map the current model instance (ie. an auth.log entry) to a list of associated debtags via the previous functions.
In addition, the rake task tag:with:debtags is used to build up tagging relationships for commands, Debian packages and debtags, using the virtual attribute debian_tags. The Rails gem acts-as-taggable-on provides our tagging implementation here.

Using this code framework, we can now extract frequency data (see the honeynet controller's index view) using the following ActiveRecord code pattern:

Sudo.tagged_with(tag_list).tag_counts_on(tag_context)
.map { |t| [t.name, t.count] }
where tag_list is the list of tags that we wish to filter on and tag_context is one of the tagging contexts commands, packages or debtags.

By building this Rails application, the resulting taggings (along with their subsequent frequency analysis), can be seen by clicking on the image below:

Using this tagging frequency data, we are now able to make the following observations:

  • Command Usage Overview: 60.1% of all sudo commands involve restarts of Apache2, the tee command and subversion; 76.5% of administrator sudo activity is due to configuration; 96.1% of sudo commands work with the html format and 73.1% work on files; 60.9% of all sudo commands are implemented in C, Perl or Python; 37.5% of network related sudo commands are clients and 76.0% are server commands; 86.8% of all sudo commands involve user1 using root privileges, whilst 0.3% involve dhg (a user associated with the keywords psybnc and eggdrop - see below).
  • System Web Directory Location: Ubuntu and Debian both use /var/www as their default (public facing) system web directory. By examining the other pwd directory names, we can see that /opt/software/web has 76.7% of all sudo commands occurring within it (only 0.2% occur within /var/www).
  • Keyword Analysis using Snort Rules: ad-hoc keyword searching using the snort rules, allows one to discover that the keyword psybnc can be associated with an IRC bouncer program. Based on prior Honeynet challenges (eg. see An Introduction to psyBNC 2.3.1 and Know your Enemy: Web Application Threats), we further have that both psybnc and eggdrop can be associated as parts of attacker kits for use in the post-compromise phase of an attack. We use these observations as the basis for tagging log events as threats.

In the next blog post, we plan to use these debtag-derived taggings to define timelines for use in locating interesting events.

Tools Used

Rails 3 used to model our data (see GitHub project for Rails application used in analysis)
JGR to initially explore and visualise data
Protovis 3.2 used to plot graphs in Rails application.