I’ve received a bunch of compliments on the performance of the NanaimoMap MapGuide / Fusion application that the City of Nanaimo launched in beta last week.
There is a lot involved in making a web map perform, especially if you are not leveraging tile caching. One part of the story is hardware, and I’m lucky enough to share space on a dual quad-core machine with 4GB RAM and relatively fast disk. Another part is proper generalization of the vector data for display; no point in carrying sub-micron precision on a map that will generally be displayed at 1:500 or smaller. And of course, there’s MapGuide’s inherent speed when properly configured. This leaves out one of the most important parts though: raster data.
Raster data is big, brutish and hard to work with, and optimizing raster access is often one of the most important parts of delivering a successful web map. Users have come to expect “satellite” imagery on their web maps, and complain when it doesn’t perform as well as Google Maps. One of the best ways that I have found of flipping and folding raster data is Frank Warmerdam’s FWTools, which wraps GDAL and some other utilities in a single easy-to-use package.
My starting point consisted of:
- 79 TIFF + Worldfile images, 10cm resolution, about 1.1GB each
- 14 TIFF + Worldfile images, 30cm resolution, about 600MB each
So, I was working with about 100GB of images, none of which were optimized for web-based display, and which did not contain the spatial reference information that the FDO Raster Provider (also based on GDAL!) works best with.
The first thing I did was set up a batch process to optimize the individual images. This involved three steps:
1. Obtain a correct .prj file containing the WKT spatial reference information for my images. The easiest place for me to get this was SpatialReference.org, but you might just have one hanging around.
2. Reprocess the image into a Tiled GeoTIFF, with no compression and a relatively large internal block size, and specifying the projection file obtained above. The caret (^) is the DOS line continuation character:
gdal_translate ^
-co "TILED=YES" ^
-co "PROFILE=GEOTIFF" ^
-co "INTERLEAVE=BAND" ^
-co "BLOCKXSIZE=512" ^
-co "BLOCKYSIZE=512" ^
-a_srs utm83-10.prj ^
infile.tif ^
outfile.tif
You can obtain more information on gdal_translate and the GeoTIFF options on the GDAL website. Depending on your source data and intended use, other values could be more appropriate, and you really should experiment.
3. Create internal pyramids in each image so that the entire image does not need to be fetched when zoomed out. This is one of the easiest performance gains you can get if you can afford the extra disk space.
gdaladdo -r gauss output.tif 2 4 8 16 32 64 128
Once this was done, I had a really decent set of fast images to work with, but these would only be appropriate to load at large scales when only one or a very few of the images need to be opened on each map view. For smaller scales, I needed to reduce the size of the images being processed, and also reduce the number of files being accessed on each fetch. I decided to go with a simple two-tier approach: Load the individual images at scales larger than some fixed value, and load a single overview image at scales smaller than that value.
The only problem was that I did not have an appropriate overview image. I wanted something that was relatively small, highly optimized, and which had white fill in its nodata areas. Fortunately GDAL and the awesome folks in the #gdal channel at freenode came to the rescue again, this time with four steps.
1. The first thing I needed to do was build a list of all of the images I wanted to have as part of the overview and feed these into the gdalbuildvrt command to build a single virtual image. You could do this manually, but I have the awesome GnuWin32 utilities installed so used these instead; they’re almost enough to make me not miss the days when I spent most of my time in Unix:
find images/ -name "*.tif" | xargs gdalbuildvrt -resolution highest all_images.vrt
2. Because I wanted a white background on my overviews, I then edited the all_images.vrt, adding a <NoDataValue/> section at the top of each of the three <VRTRasterBand /> sections:
<VRTRasterBand dataType="Byte" band="1">
<NoDataValue>255</NoDataValue>
3. The gdalinfo command gave me the dimensions of the virtual image, each of which I then divided iteratively to give me reasonable overview dimensions which I could feed into gdal_translate.
gdal_translate ^
-outsize 53120 14000 ^
-co "TILED=YES" ^
-co "PROFILE=GEOTIFF" ^
-co "INTERLEAVE=BAND" ^
-co "BLOCKXSIZE=512" ^
-co "BLOCKYSIZE=512" ^
all_images.vrt ^
all_images.tif
When this completed, I deleted the all_images.tif.aux.xml file because I did not want to carry the additional metadata that GDAL maintains in that file.
Careful with sizes here. If you’re using an application that supports it, you can specify the -CO “BIGTIFF=YES” option to generate files larger than 4GB, but you’re likely better off generating an intermediate level of aggregated and resampled tiles instead.
4. The final step was to once again generate internal pyramids to allow for better performance at small scales:
gdaladdo -r gauss all_images.tif 2 4 8 16 32 64 128
Once these two data sets were processed, I simply used MapGuide Maestro to make two raster data connections. For the first data connection, I added all of the individual TIFF images to a composite raster type, and Maestro generated a configuration document which allows MapGuide to know which image to access for a given extent. For the second layer, I just pointed to the overview GeoTiff. I then created layers for these, experimented until I found the scale where the overview image started looking pixelated, and set the layers’ view scale properties accordingly. There are some notes on working with rasters in the Maestro documentation.
More performance could probably be gained by having an intermediate level where the coverage area was aggregated into larger tiles before being combined into one large overview image, but for the initial launch this was deemed to have high enough performance.
On my production server, I’m lucky enough to have a fast, high-spindle-count RAID shelf dedicated to storing these uncompressed TIFFs, and they scream off the disk. My test server is VMWare-based, and disk performance and space are both at a premium. In this case, I still used the TIFF overview map, but at large scales I access a set of tiled MrSID files instead. This seemed like a decent compromise given the constraints, but did seem to thrash the CPU a bit.
GDAL was one of the first open source geospatial applications I tried (not counting GRASS and MOSS) and is constantly coming in handy, whether I’m reprojecting, adding spatial reference information to images, or converting between formats.
Thanks to hobu (Howard Butler), FrankW (Frank Warmerdam) and EvenR (Even Rouault) from the #gdal IRC channel on freenode for helping me work my way to this solution. Amazing support!
-J