Thursday, July 16, 2009

Data Splunking

I’ve had my head down for the last couple of months, churning out code for that elusive framework I keep hinting at :) Right now I’m staying in a trailer with no tv, internet, or cell phone coverage and I’ve never been more productive (says I).

Still, I thought I would pop up briefly to mention a cool IT tool that can provide you with a centralized, browser-based repository to search on all the millions of log files, event viewers, and databases that are inevitably scattered around any company’s data centres.

It’s called splunk. Its name is clever – users get to spelunk into their data silos and see what’s there. It’s a simple, single package install that runs on most desktop machines and servers. There’s a free version if you use less than 500 megs of indexed data, and enterprises can pay to index larger corpuses. I’m running that on my Vista 64 bit box and it indexes and searches like a little champ.

In my case I’ve been using it on my framework log files to help analyze bugs and performance bottlenecks. Here’s a screenshot of a search on the keyword “nhibernate” (NHibernate is an Object Relational Mapping software):

Splunk Log Files

As you can see, it quickly pops up all the logged events where NHibernate was called from my classes.

To get this to work, all I had to do was add an “Input” for splunk to index – in this case the full path to my log file folder.

As you would expect, it does lots of reporting. It has broken down my log files into various columns. Examples of these columns are: custom C# properties I search on; the standard log file “stuff” such as the source name, date created, file size; even the sql commands that NHibernate generates for me. I can filter these columns for even more detailed breakdowns. In the next screenshot I am reporting on Entity ID values I use to track my framework objects.

Splunk Log Files - Report

I like splunk because it’s a one-stop shop for me to analyze all my various bits of IT Operations information. There’s a slick AJAX web user interface, and so far performance seems fine for me on my little dev laptop. I find it solid, intuitive, and I don’t have to expend much effort to install, manage, or learn it.

There’s also a way to extend splunk using its custom Application Programming Interface. I plan to investigate that when I have some free time but have not had a look yet.

I think any IT company should give splunk a test run.