I've been battling all week with space issues in Geocortex Optimizer. For those of you not familiar with it, Geocortex Optimizer is our new product that collects information regarding your ArcGIS Server installation and presents this information to you a number of ways in order for you to make good management decisions regarding your GIS investment.

When you install Optimizer for the first time, it examines your system, looking for log files from both IIS and ArcGIS servers. It then reads these log files, massages them and stores them away in a SQL Server database for further analysis. The problem is that the amount of information collected can be huge particularly if you have a busy site that you have been running for a while. On one server we tested, we processed more than 8 million records in a little over a few hours before we ran into a capacity bottleneck.

One of Optimizer's features is that it will not loose data if the SQL Server database it is using is down. When this happens Optimizer quietly serializes the data to disk that it would have written to the database. When the database becomes available again it writes those saved records to it. The problem I discovered is that .NET datasets serialized to xml are really large. I considered using binary serialization but then a coworker suggested that I serialize to a GZip steam instead of a regular file stream as I had been doing. This sounded promising because database data usually compresses 20 to 1. I had no idea how easy it would be. Check out the following snippet taken from Optimizer

using (Stream stream = new FileStream(filename, FileMode.Create))

{

  using (Stream StreamOut = new GZipStream(stream, CompressionMode.Compress))

  {

    ds.WriteXml(StreamOut, XmlWriteMode.WriteSchema);

  }

}

These few lines of code resulted in a 40x space savings over uncompressed xml. I've never used GZIP streams before but I'm beginning to think of all kinds of applications where they could be really be useful.