Friday, November 06, 2009

SharePoint 2010 Likely To Offer App Store

This just in from ReadWriteWeb:

Microsoft will offer an application marketplace within Sharepoint 2010 that will integrate with third-party applications from its partner network. No date has been set for the marketplace lauch but it will evolve from "The Gallery" a feature that provides Sharepoint 2010 users access to templates…

Details are few about the application marketplace that will be offered through Sharepoint. But it does point to the increasing significance of third-party applications for the Sharepoint platform and how the service may evolve as cloud computing becomes more prevalent.

I was predicting this a few weeks ago on my “Things To Get Excited About in SharePoint 2010” post. Here’s what I had to say:

Service Application Architecture – the Shared Service Provider was a good idea but it was a bit hard to use in practice. Under the new architecture, you can create Service Applications for things like Excel Services, Forms Services, Business Connectivity Services, and other services that you build or buy, and you can mix and match these in your farms as you like. The services get consumed by web front ends via a standard interface.

This should allow a lot of plug-and-play customization of farms. I’m even wondering if there is an opportunity for vendors here…create some services and expose them to clients from the cloud.

There are some other big changes like Claims Based Authentication and Solution sandboxing which are intriguing to me. The Solution sandboxing feature gives me this sneaking suspicion we will one day soon see a Microsoft SharePoint App Store where we can buy, download and run SharePoint solutions in our farms.

Magic Eight-ball now says: “You may rely on it”.

Wednesday, November 04, 2009

Hosting Clockwork Web Framework With Amazon

I’ve blogged a lot about my admiration for Amazon’s web services stack. I think they understand the web as well as any company in the world. It’s always been my intention to investigate Amazon’s Electronic Compute Cloud (EC2) and since I needed hosting for my new Clockwork Web Framework, I decided to give it a try.

The reason I went with Amazon rather than a traditional hoster is that I have no idea what kind of interest there will be in the framework, and therefore cannot predict what the load on a web server will be. Amazon EC2 is designed for this kind of flexibility, and you pay per hour.

The Platform

I am running a small Windows Server 2003 32x server instance to begin with. It only has 1.7 gb of RAM. I can scale this up if I need to, or more likely I will run up another small instance and load balancing the two using Amazon’s Elastic Load Balancer technology.

On this, I am using IIS 6, .NET 3.5, SQL Server 2005 Express, and Powershell. Most of my files are kept on a permanent storage drive (more on this below) and served by IIS. In order to maximize the speed and lower the CPU burden on the server, I have decided to use another Amazon technology, CloudFront.

CloudFront Content Delivery Network

CloudFront is a Content Delivery Network (CDN), like Akamai or Limelight. I use it to serve my images and resource files. Basically Amazon has edge servers all over the world with a copy of my images and resource files, and when users request them from my website, CloudFront automatically sends them a copy from the nearest location to them, making for some very fast download times.

To make this work, you have to use Amazon Simple Storage System, or S3. This is a virtual file system. Basically you have “buckets” of files that are served up when requests come in from the CloudFront “distributions”.

I’ve optimized it a bit by having two distributions; one for images and one for resources. This means that a page which requires both things will load even faster since two parallel CDN distributions are processing the files at the same time.

You can create CloudFront distributions through code, or through Amazon’s web management portal.

Create CloudFront Distribution

Create CloudFront Distribution - Completed  Since you can control the public URL of the distributions, you will notice if you view the properties of my website that my images are handled by the path “http://images.clockworkwf.com” and my resource files are handled by the path “http://resources.clockworkwf.com” . In other words, I have full control over what path I give them. Most people will never know these picture are being served from Amazon.

I notice the website loads really quickly, so the CloudFront makes a big difference.

EC2 Hosting Challenges

So that’s the high level architecture. There are a number of impacts when using Amazon as a hoster I’d like to talk about.

Server Goes Up, Server Goes Down

To begin with, you have to assume that at any moment your server will go down. If your server dies, it vanishes, and you have to “spin up” another one, using the web interface or code. It’s very easy to do from the web console, just click “Launch Instance” and you can pick any server ranging from Ubuntu Linux to Windows 2003 Server 64x Enterprise R2.

Launching a new instance of ec2 With CloudWatch

Although the server instances you can use have their own hard drive space on C: and D: drives, you have to treat that as transitory storage.

I’ve setup my system in such a way that I can use an Elastic Block Storage (EBS) hard drive volume, provided by Amazon.This is a more permanent drive space that you pay for, but can be attached to any server instance. Think of it as a SAN (that’s probably what it is).

So I’ve got my database and web files on this EBS block, which I then mount to any server instance I’m currently running.

On the server instance, I simply point IIS web server to the EBS block files, and away we go.

The EBS can be any size you like, and you pay per GB per month. Right now I’m using 10GB since my log files and database don’t take up much room. I can add more space later if I need to.

Here’s a screenshot of that EBS volume, in the Amazon web console.

Allocate Elastic Block Storage Instance

Dynamic DNS Entries

Next problem: Since the server can go down at any moment, DNS is a problem. If my server dies and I spin another one up, it will be given its own IP address, which my DNS entry for www.clockworkwf.com wouldn’t know about. So there might be a long delay while DNS changes to the new IP address.

So, I’m using a Dynamic DNS service called Nettica. They have a management console where I can enter my various domain records and assign a short Time To Live (TTL), which means the DNS entries update frequently. So if my server dies, I can change the entry in Nettica to point to the new server’s IP address, and within a few seconds requests are going back to the right place.

Nettica even allows me to control all of this through C# code. Going forward I plan to write powershell server management scripts that can automatically spin up a new server on Amazon, determine the IP, and register that with Nettica.

Incidentally, Amazon EC2 allows you to buy what are called “Static IP Addresses”. Essentially you can “rent” a fixed IP address which can by dynamically allocated to a server instance. So, in the short run this makes life easier for me as I have rented one, used that for my Nettica domain name record, and can assign this fixed IP to any new server instance.

Allocate IP Instance

Next problem: Disaster Recovery.

Disaster Recovery is even more important in Amazon EC2 world than elsewhere, since again your instances could die at any moment….Not that they will, but the point is, they are “virtual” and Amazon isn’t making any promises (unless you buy a Service Level Agreement from them).

However, Amazon’s EC2 provides a level of DR by its very nature – you can spin up another machine in a small amount of time. Estimates for new Windows instances are about 20 minutes.

There’s also something called an Availability Zone. Essentially it means “Data Centre” – Amazon has several of these and so you can spread your servers around between US – East, US-West, Europe, and so on. So when that Dinosaur-killing comet hits North America, the Europe Availability Zone keeps chugging.

Right now I’m not really doing much with my database, so DR isn’t such an issue. I have some security since my files are on an EBS block. However, eventually I’ll setup a second server in another availability zone and load balance the two.

Another Challenge: Price

Amazon Web Services are flexible, and you are charged per hour, for only what you use. This is an amazing model but it doesn’t work so well for website hosting, because of course your servers are supposed to be online 24/7, 365 days a year.

It’s hard to tell for sure what the annual bill will be, but for my small server instance (remember, only 1.7 Gb of RAM) it will cost well over $1,000. That’s a lot more than shared space on a regular hosting provider. However I’m willing to pay this, for the flexibility I get, and also because I think Amazon web services are a strategic advantage and so the earlier I learn about them, the more business opportunities I might unlock.

One good thing is that Amazon has been aggressively dropping its prices as it improves its services. Additionally, they have started offering “Registered Servers” – basically a pre-pay option for 1-year, 2-year, and 3-year terms. Unfortunately these are only for Linux servers at the moment but hopefully they will add them for Windows and then I can save money year on year.

CloudHost Monitoring

Amazon offers a web-based monitoring option for its server instances. I’ve started using it (for an additional fee) but I’m not sold on its utility yet. I don’t think I’m using it to its full potential yet – it is supposed to help you manage server issues by monitoring thresholds.

ec2 Cloud Monitoring

Managing S3 Files Using Cloudberry Explorer

I needed an easy way to create and manage my buckets, CloudFront distributions, and S3 files. I found Cloudberry Explorer, and downloaded the free version of it. I was able to drag and drop 1600 files from my Software Development Kit to the S3 bucket where I’m serving the resources. Super!

There’s a pro version I might purchase which would allow me to set the gzip encryption and other properties on the files. This would help lower my bandwidth costs and speed up the transfer a bit.

Here’s a screenshot of Cloudberry in action:

Cloudberry Amazon S3 Explorer

I love how easy it is to setup and use Amazon’s web services stack. I think they have a great business model for the Cloud, and they’re the company to beat. I’m willing to rely on them for the launch of Clockwork Web framework and so far I haven’t been disappointed.

Sunday, November 01, 2009

Introducing Clockwork Web Framework for .NET

In 2003, I read a book, “Making Space Happen”, by Paula Berinstein. It’s about the efforts of entrepreneurs to open up space to the public. It’s the kind of thing that gets my propeller-head spinning, and after reading it I resolved to create the best website on space travel on the internet.

So, I sat down in a park and within two hours I had covered several sheets of paper with scribbles and scrawls of what my website needed. I had notes on authentication, web components, search boxes, themes, dynamic images, language toggles, and all kinds of stuff.

Being a good little programmer, the more I designed, the more intricate the design became, and pretty soon I was knee-deep in code. Flash forward six years later, and I have yet to write a single page of that space website!

But I do have a web framework :)

What It Is

Clockwork makes it easy to build powerful .NET web sites. It’s completely free, open source (under the Apache 2 license) and you can use it in proprietary or open source projects, as you like.

Some of the ways it makes web development easy:

  • Database-agnostic data access
  • Dynamically displays content in different languages
  • Leverages the .NET 3.5 framework, including the Provider Model, generics, LINQ, automatic properties, and more
  • Integrates with popular web services such as those provided by UserVoice, LinkedIn, Google and Yahoo!
  • Makes it really easy to use object-oriented programming standards like Dependency Injection / Inversion of Control, Repositories, and Specifications

Under the hood I use many popular components, including NHibernate for database access, Castle Windsor for Dependency Injection, and log4Net for logging.

Although today marks the official public release, the framework is currently at version 3.x because I’ve been using earlier versions of it in production websites since 2004.

I’ve built Clockwork using as many web standards as I can find, as many of the latest .NET elements as possible, software best practices, and a lot of love and stubbornness.

What It Will Become

Well, it’s obviously too early to say. But I am committed to continuing to develop it, I have a long list of things I plan to add, and I’m hopeful a community of .NET developers will adopt it and push it into areas I can’t even imagine today.

Please take a minute to visit the website and learn more about it. I hope you find it helpful.

Many thanks,

Nick

Monday, October 19, 2009

Central Administration in SharePoint 2010

Here’s a quick lap around the new Central Administration console in SharePoint 2010.

New Central Administration Layout

Central Administration

The navigation structure is broken down a little more than in 2007. There is no more “Operation” and “Application Management” divide; instead the new console is divided into the following sections:

  • Application Management: Manage site collections, web applications, content databases, and the new service applications
  • System Settings: Manage servers, features, solutions, and farm-wide settings
  • Monitoring: Track, view and report the health and status of your SharePoint farms
  • Backup and Restore: Performs backup or restores
  • Security: Manage settings for users, policy, and global security
  • Upgrade and Migration: Upgrade SharePoint, add licenses, enable Enterprise Features
  • General Application Settings: Anything that doesn’t fit into one of the other sections
  • Configuration Wizards: These are nice wizards to help setup or modify the farm

This is new layout is an advantage – the “Operations” and “Application Management” tabs in 2007 always felt a bit arbitrary and it wasn’t always clear which tasks went where.

Monitoring

This is quite useful – basically you can take the heartbeat of SharePoint and its services via reports, and view problems and solutions. Here’s a screenshot of the interface:

Central Administration - Monitoring

There are only a couple of reports right now, which tell you which pages loaded the slowest, and which users are the most active. I imagine for release there will be many more.

Central Administration - Monitoring - Health Reports

The problem and solution report is very helpful in identifying which services are failing on which servers, and why. Notice in in this report there is detailed information about one of the failing services, in this case Visio, and links to remedy it.

Central Administration - Monitoring - Problem Report

Surfacing common errors in this way will go a long way to reducing the IT administrative burden of SharePoint. I hope Microsoft is active in populating this report engine (or provides a way for the community to modify it).

Usage logging settings are in here as well.

Service Applications

Central Administration - Application ManagementThese new plug-and-play replacements for the Shared Service Provider are major wins for the new SharePoint version. They allow an organization to really customize its farm based on its needs and even usage patterns. Services that needs lots of performance and support can get it, while services that are less useful can have reduced resources or even be turned off altogether. Everybody’s SharePoint 2007 farm looked alike, but going forward it is likely that no two farms will be alike.

Of course to manage this Microsoft has to surface the available services and their settings in the Central Admin. This screenshot gives an indication of just how many services can be used.Central Administration - Manage Service Applications

Export Sites and Lists

Now you can export site and list data right from SharePoint! It’s straightforward with the new Backup and Restore section, which allows full Farm Backups and Restores along with far more granular backup. The backup can include full security including site users, as well as version history information for each item in the list.

I doubt this will replace the need for 3rd party backup software but it’s another tool for IT Admins.

Here I am backing up a Calendar from a site to file.

Central Administration - Site or List Export

The new service architecture of SharePoint is one of the most exciting things about it, and obviously required a bit of a Central Administration retooling. That provided an opportunity for some other quick wins, including a much more intuitive navigation structure and some neat monitoring tasks. More evidence that SharePoint 2010 is building on, but not replacing, the core strengths of 2007.

When SharePoint 2010 Met Web 2.0

One of the goals in SharePoint 2010 was to make it easier for users to update their information and pages without lots of postbacks, clicking, and delays. Accordingly, Microsoft has invested a lot in improving the web user interface.

One way they have done this is by adding the Office Ribbon concept to SharePoint. I think this has to be a first for a web application, and to be honest while I saw the value in Office 2007, I wasn’t sold on it for a web interface.

I think the major weakness of the Ribbon concept is that you can spend a fair amount of time trying to remember what command belongs to what tab. As well, it doesn’t always save clicks. More on that in a moment.

The other major investment Microsoft made is adding AJAX. This is  no-brainer and a hands-down winner for me. I’ve attached some screenshots to show how you would modify a page in the new UI.

Let’s imagine you want to modify a team site:

Step 1: You are in the Browse tab of the Ribbon (up top) – choose the Edit Tab.New Team Site - Browse RibbonNew Team Site - Edit Ribbon

To Edit, click “Edit” which is one of the buttons on the Edit tab. Then click on the area of the page you want, type some text in, and click Stop Editing. Are we saving clicks yet? :)

New Team Site - Edit Page

Well, not so far, but there weren’t any postbacks, so overall I think there’s some time saving here. An important benefit from a training perspective is the server and office products now have identical user experiences, which is a big win.

As well, there are some nice new options including an XHTML converter. And did I mention this all works flawlessly in FireFox? Web standards, hooray!

You can also insert new web parts via the Insert section of the Edit Ribbon:

New Team Site - Insert Web Part

Of course, the context-based Ribbon experience continues when managing lists and libraries. Here’s a screenshot of the out of the box Shared Documents library’s two important ribbons, Documents and Library:

New Team Site -Shared Documents Library - Documents RibbonNew Team Site -Shared Documents Library - Library Ribbon

Finally, tagging and sharing is a major concept in Web 2.0 and SharePoint 2010 addresses this by surfacing sharing activities through the Ribbon. Content can be easily tagged - Tags can be private or public and are automatically added to a suggested set so that users can share tags. New Team Site - Share and Track Ribbon

New Team Site - My Tags

Tagging is also part of a user’s Activity Stream (not sure what the official term is). You can see on my profile that I tagged an element.

My Profile - Tags and NotesI’m not showing it here but there is also an Enterprise Metadata service that allows an organization to centrally control its taxonomy. So, now you can make peace between the “folksonomy” and “centralized taxonomy” gangs in your office!

All in all these UI improvements are icing on the SharePoint 2007 cake. I’m not sure they are enough by themselves to encourage SharePoint 2007 customers to upgrade (I think there are better reasons to upgrade), but somebody with 2003 or without SharePoint at all might now make the plunge. However, these are welcome additions to an already great product.

Although I’m not convinced the ribbon will save clicks, and will certainly take some retraining and familiarization time, it at least is consistent with the Office clients, making for tighter integration. The AJAX-style UI is a big win, and the inclusion of some interesting tagging and sharing features brings SharePoint up-to-date with the Web 2.0 world.

Things To Get Excited About In SharePoint 2010

Now that Microsoft’s lifted the TAP NDA and is presenting SharePoint 2010 publicly at the SharePoint Conference in Las Vegas, there will be a spurt of queued up blog posts on the net :)

Here are some things I’ve been very excited about, in no particular order. They are fairly developer-centric.

  • Ability to develop against the SharePoint dlls on a developer desktop! ‘Nuff said.
  • Developer Dashboard – makes it easy to see tracing information and web server details when you are working on a SharePoint site.
  • LINQ to SharePoint – this is some nice syntactic sugar that helps replace CAML a little bit. You can created strongly typed SharePoint entities using a utility called SPMetal and then query and manipulate the data in them using standard LINQ syntax. I was hopefully predicting this in another post.
  • Visual Studio 2010 integration – VS2010 will have a lot more tools to make SP2010 development a snap. SharePoint Project and Item Templates, Feature Designer, and Project Packaging, will hide most of the messy details of creating, packaging, and deploying a SharePoint solution from the developers.
  • Business Connectivity Services – the next level of the Business Data Catalogue. BCS uses External Content Types which look a bit like Content Types, and are defined in the new SharePoint Designer or in Visual Studio and then added to SharePoint using a definition file (a bit like the BDC currently works). Users can then create External Lists in their sites, which pull in the data from these external sources.
  • Client Object Model – an abstraction layer that allows developers to write code that will work in client .NET applications, Javascript (for AJAX type operations), and Silverlight. Basically this is a disconnected, batch-style API that will operate on the existing SharePoint web services and handle requests and responses using XML and JSON.
  • SharePoint 2010 Designer – Whereas SPD 2007 was a warmed-over FrontPage, the new version has been rebuilt with a focus purely on SharePoint. The new navigation panel is great because it shows you a list of SharePoint objects, such as Entities, Lists, Master Pages, and Workflows. What’s great about this is it keeps you thinking about what you are trying to do in SharePoint, rather than where that command used hidden in SPD. Another big win is you can export your SPD changes as a .WSP file straight into Visual Studio for further customization.
  • The Office Ribbon makes it into SharePoint. The Ribbon kind of grew on me in Office 2007. I think it was a clever paradigm to surface many commands that used to be buried. Now the many SharePoint menus and Site Action dropdowns will coalesce into the Ribbon. I think this will make training and support a little easier. The big weakness of the Ribbon is that you often have to remember which tab the commands belong in. I found that was the case with the new SharePoint Ribbon but after a little while you get used to it, and it becomes faster to modify SharePoint pages.
  • STSADM is dead, long live PowerShell! Leveraging the great new scripting environment is a huge win for SharePoint. The ability to write .NET code to manipulate the command pipeline means we will start to see some very powerful “no-touch” deployment and management options for SharePoint
  • More events – now you can find out when your web or list was created or deleted. This may sound like a small feature but this enables some provisioning and discovery scenarios that in SP2007 were not even possible!
  • Enterprise Metadata Manager. I’ve blogged a lot about the important of governance and centralizing metadata. The new Enterprise Metadata Manager makes it easy to import and manage term sets, keyword and tags.
  • Service Application Architecture – the Shared Service Provider was a good idea but it was a bit hard to use in practice. Under the new architecture, you can create Service Applications for things like Excel Services, Forms Services, Business Connectivity Services, and other services that you build or buy, and you can mix and match these in your farms as you like. The services get consumed by web front ends via a standard interface. This should allow a lot of plug-and-play customization of farms. I’m even wondering if there is an opportunity for vendors here…create some services and expose them to clients from the cloud.

There are some other big changes like Claims Based Authentication and Solution sandboxing which are intriguing to me. The Solution sandboxing feature gives me this sneaking suspicion we will one day soon see a Microsoft SharePoint App Store where we can buy, download and run SharePoint solutions in our farms.

Anyway, there’s a lot of exciting new stuff in SharePoint and I think SharePoint development is about to become really fun!

Monday, October 12, 2009

SharePoint: A Product and a Platform

SetFocus just published another of my articles for their Technical Articles section. This one is called “SharePoint: A Product and a Platform”, and discusses the implications of SharePoint as a software platform.

My conclusions are that the platform provides significant capabilities including a unified development environment, reduced maintenance, development, support, and training costs, and may increase the risk of vendor lock-in.

I’ve written for SetFocus before because I have a long association with them, dating back a decade. I had my Java certification training and first job placement through them. For the past year I’ve been developing and teaching parts of their SharePoint programming classes for the SharePoint Master’s Program (I’m instructing evening classes again starting this Saturday).

You can read more at http://www.setfocus.com/TechnicalArticles/Articles/sharepointproductandplatform.aspx. I hope you enjoy it and welcome your feedback!

P.S. The article is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported License which means you can modify it and share it around!

Wednesday, August 19, 2009

NHibernate Performance Profiling with NHProf

NHibernate

I’ve been using NHibernate a lot recently. It’s an Object Relational Mapping software that makes it easy to “map” between SQL database syntax and standard C# object models. The goal is to talk to databases, by writing code like this:


var query = session.CreateQuery("from WebPage p where p.VirtualPath like :path")
.SetString("path", "%pages%")
IList<WebPage> list = query.List<WebPage>();

Now behind the scenes, there’s a relational database somewhere – and transactions, and validation, and syntax parsing, and query analyzing, and all of that standard relational database stuff – but as a programmer I just need to know about my object model and it will return me a list of WebPage objects and I can easily use them in my code, update them, and delete them. Shiny!

NHibernate is a straight port of Hibernate, from the Java world where it originally evolved many years ago. So the concepts behind it have been field-tested for in both Java and (now) .NET shops. This makes it a very robust ORM tool. Did I mention it is completely FREE?

While it’s amazing software, it comes with a big learning curve. There isn’t much documentation out there – and most of it is on blogs and wikis. I’ve bought Manning’s NHibernate In Action and that helps a bit. However, there isn’t much information on common performance and configuration traps.

Learning and Analyzing With NHProf

So I was glad to find out about NHProf, a profiling and analyzing tool for NHibernate created by one of NHibernate’s main developers (Oren Eini aka Ayende Rahien). His colleagues are Christopher Bennage and Rob Eisenberg.

Essentially the profiler is a slick-looking Windows Presentation  Foundation executable that “records” your application as it writes statistical data to the NHibernate log file, then provides a graphical view of the various things that are going on under the hood.

The interface is well thought out, with only a few tabs and windows, so the information is easy to sort through. Here’s a screenshot of the main interface:

NHProf Main Interface

What I Like About It

Now to the things I really like about this software:

First, you can see the exact SQL query that NHibernate is generating. Straightforward, but critical. There is a related Stack Trace which allows you to jump to the part of your code where you executed this statement.

As well, you can view the rows that are returned by any query. This makes it easy to see exactly what data you are getting back – a much-needed sanity check at times :)

 NHProf View Results

Each NHibernate action is evaluated against known best practices (or bad practices) and you get “Alerts” that can provide more information on what to do (or not do).

For example, while running some recent queries, I received the following alert: “Too many cache calls per session”.

NHProf Alerts - Small 

This leads me to the final element that I LOVE – the “read more” and “NHibernate Guidance” features. Software is so complicated that I just want to get it working most of the time – but I know that if I really understood it, I would avoid a lot of bugs and future issues.

So what makes this software shine for me is the care that has gone into helping people learn NHibernate. By clicking “read more”  you go straight to a web page that teaches you about that particular error and ways to avoid it – including code samples!

 NHProf Alerts - Learn More

As well, there is a “Guidance” option that you can always access to learn about general NHibernate performance issues such as “Select N+1” or “Unbounded Result Set”. I’ve already applied the lessons from “Unbounded Result Set” and “Do Not Use Implicit Transactions” to my code and the result is much better performance and stability.NHProf NHibernate Guidance

One thing I would like is the ability to hover over an alert in the statement in the main window, and actually see a tooltip of the alert message. At the moment you can see the icon showing the alert, but then have to click on the statement and then click on the “Alerts” tab at the bottom to see what it’s for.

NHProf is still in final beta but I have been using it for about a month and have found it to be very stable. I just bought my copy – there is a discount right now before it hits RTM and I think it has already been worth the money.

I would recommend this to anybody using NHibernate.

Thursday, July 16, 2009

Data Splunking

I’ve had my head down for the last couple of months, churning out code for that elusive framework I keep hinting at :) Right now I’m staying in a trailer with no tv, internet, or cell phone coverage and I’ve never been more productive (says I).

Still, I thought I would pop up briefly to mention a cool IT tool that can provide you with a centralized, browser-based repository to search on all the millions of log files, event viewers, and databases that are inevitably scattered around any company’s data centres.

It’s called splunk. Its name is clever – users get to spelunk into their data silos and see what’s there. It’s a simple, single package install that runs on most desktop machines and servers. There’s a free version if you use less than 500 megs of indexed data, and enterprises can pay to index larger corpuses. I’m running that on my Vista 64 bit box and it indexes and searches like a little champ.

In my case I’ve been using it on my framework log files to help analyze bugs and performance bottlenecks. Here’s a screenshot of a search on the keyword “nhibernate” (NHibernate is an Object Relational Mapping software):

Splunk Log Files

As you can see, it quickly pops up all the logged events where NHibernate was called from my classes.

To get this to work, all I had to do was add an “Input” for splunk to index – in this case the full path to my log file folder.

As you would expect, it does lots of reporting. It has broken down my log files into various columns. Examples of these columns are: custom C# properties I search on; the standard log file “stuff” such as the source name, date created, file size; even the sql commands that NHibernate generates for me. I can filter these columns for even more detailed breakdowns. In the next screenshot I am reporting on Entity ID values I use to track my framework objects.

Splunk Log Files - Report

I like splunk because it’s a one-stop shop for me to analyze all my various bits of IT Operations information. There’s a slick AJAX web user interface, and so far performance seems fine for me on my little dev laptop. I find it solid, intuitive, and I don’t have to expend much effort to install, manage, or learn it.

There’s also a way to extend splunk using its custom Application Programming Interface. I plan to investigate that when I have some free time but have not had a look yet.

I think any IT company should give splunk a test run.