Live from FMEUC, it’s the Tim and Jason show!

OK, so better late than never. At the always-awesome FME User Conference, Tim Taylor and I did a short presentation on Nanaimo’s use of FME Server.

I think we did OK, but I definitely need to spend a bit more time polishing both my presentation and the slides next time.

Check out other great FME UC Videos on Safe’s user conference website. There is a lot of valuable videos, with case studies and technical presentations which will show you how your business processes could be improved by using FME.

As an aside, I count myself fortunate to live within driving distance to two of the best geospatial conferences in the world. In these times of tight budgets, I am incredibly grateful to be able to attend both the FME User Conference and GeoWeb.

-J

FWTools FTW … because GDAL FTW didn’t sound as cool!

I’ve received a bunch of compliments on the performance of the NanaimoMap MapGuide / Fusion application that the City of Nanaimo launched in beta last week.

There is a lot involved in making a web map perform, especially if you are not leveraging tile caching. One part of the story is hardware, and I’m lucky enough to share space on a dual quad-core machine with 4GB RAM and relatively fast disk. Another part is proper generalization of the vector data for display; no point in carrying sub-micron precision on a map that will generally be displayed at 1:500 or smaller. And of course, there’s MapGuide’s inherent speed when properly configured. This leaves out one of the most important parts though: raster data.

Raster data is big, brutish and hard to work with, and optimizing raster access is often one of the most important parts of delivering a successful web map. Users have come to expect “satellite” imagery on their web maps, and complain when it doesn’t perform as well as Google Maps. One of the best ways that I have found of flipping and folding raster data is Frank Warmerdam’s FWTools, which wraps GDAL and some other utilities in a single easy-to-use package.

My starting point consisted of:

  • 79 TIFF + Worldfile images, 10cm resolution, about 1.1GB each
  • 14 TIFF + Worldfile images, 30cm resolution, about 600MB each

So, I was working with about 100GB of images, none of which were optimized for web-based display, and which did not contain the spatial reference information that the FDO Raster Provider (also based on GDAL!) works best with.

The first thing I did was set up a batch process to optimize the individual images. This involved three steps:

1. Obtain a correct .prj file containing the WKT spatial reference information for my images. The easiest place for me to get this was SpatialReference.org, but you might just have one hanging around.

http://spatialreference.org/ref/epsg/26910/

2. Reprocess the image into a Tiled GeoTIFF, with no compression and a relatively large internal block size, and specifying the projection file obtained above. The caret (^) is the DOS line continuation character:

gdal_translate ^
-co "TILED=YES" ^
-co "PROFILE=GEOTIFF" ^
-co "INTERLEAVE=BAND" ^
-co "BLOCKXSIZE=512" ^
-co "BLOCKYSIZE=512" ^
-a_srs utm83-10.prj ^
infile.tif ^
outfile.tif

You can obtain more information on gdal_translate and the GeoTIFF options on the GDAL website. Depending on your source data and intended use, other values could be more appropriate, and you really should experiment.

3. Create internal pyramids in each image so that the entire image does not need to be fetched when zoomed out. This is one of the easiest performance gains you can get if you can afford the extra disk space.

gdaladdo -r gauss output.tif 2 4 8 16 32 64 128

Once this was done, I had a really decent set of fast images to work with, but these would only be appropriate to load at large scales when only one or a very few of the images need to be opened on each map view. For smaller scales, I needed to reduce the size of the images being processed, and also reduce the number of files being accessed on each fetch. I decided to go with a simple two-tier approach: Load the individual images at scales larger than some fixed value, and load a single overview image at scales smaller than that value.

The only problem was that I did not have an appropriate overview image. I wanted something that was relatively small, highly optimized, and which had white fill in its nodata areas. Fortunately GDAL and the awesome folks in the #gdal channel at freenode came to the rescue again, this time with four steps.

1. The first thing I needed to do was build a list of all of the images I wanted to have as part of the overview and feed these into the gdalbuildvrt command to build a single virtual image. You could do this manually, but I have the awesome GnuWin32 utilities installed so used these instead; they’re almost enough to make me not miss the days when I spent most of my time in Unix:

find images/ -name "*.tif" | xargs gdalbuildvrt -resolution highest all_images.vrt

2. Because I wanted a white background on my overviews, I then edited the all_images.vrt, adding a <NoDataValue/> section at the top of each of the three <VRTRasterBand /> sections:

<VRTRasterBand dataType="Byte" band="1">
<NoDataValue>255</NoDataValue>

3. The gdalinfo command gave me the dimensions of the virtual image, each of which I then divided iteratively to give me reasonable overview dimensions which I could feed into gdal_translate.

gdal_translate ^
-outsize 53120 14000 ^
-co "TILED=YES" ^
-co "PROFILE=GEOTIFF" ^
-co "INTERLEAVE=BAND" ^
-co "BLOCKXSIZE=512" ^
-co "BLOCKYSIZE=512" ^
all_images.vrt ^
all_images.tif

When this completed, I deleted the all_images.tif.aux.xml file because I did not want to carry the additional metadata that GDAL maintains in that file.

Careful with sizes here. If you’re using an application that supports it, you can specify the -CO “BIGTIFF=YES” option to generate files larger than 4GB, but you’re likely better off generating an intermediate level of aggregated and resampled tiles instead.

4. The final step was to once again generate internal pyramids to allow for better performance at small scales:

gdaladdo -r gauss all_images.tif 2 4 8 16 32 64 128

Once these two data sets were processed, I simply used MapGuide Maestro to make two raster data connections. For the first data connection, I added all of the individual TIFF images to a composite raster type, and Maestro generated a configuration document which allows MapGuide to know which image to access for a given extent. For the second layer, I just pointed to the overview GeoTiff. I then created layers for these, experimented until I found the scale where the overview image started looking pixelated, and set the layers’ view scale properties accordingly. There are some notes on working with rasters in the Maestro documentation.

More performance could probably be gained by having an intermediate level where the coverage area was aggregated into larger tiles before being combined into one large overview image, but for the initial launch this was deemed to have high enough performance.

On my production server, I’m lucky enough to have a fast, high-spindle-count RAID shelf dedicated to storing these uncompressed TIFFs, and they scream off the disk. My test server is VMWare-based, and disk performance and space are both at a premium. In this case, I still used the TIFF overview map, but at large scales I access a set of tiled MrSID files instead. This seemed like a decent compromise given the constraints, but did seem to thrash the CPU a bit.

GDAL was one of the first open source geospatial applications I tried (not counting GRASS and MOSS) and is constantly coming in handy, whether I’m reprojecting, adding spatial reference information to images, or converting between formats.

Thanks to hobu (Howard Butler), FrankW (Frank Warmerdam) and EvenR (Even Rouault) from the #gdal IRC channel on freenode for helping me work my way to this solution. Amazing support!

-J

On the Shoulders of Giants?

I was recently reading a post by Gordon Luckett about how he’s been able to use Google Maps and Bing layers in MapGuide / Fusion maps. This is only possible because the Fusion project decided to build on top of OpenLayers, and recent builds of Fusion have enabled the OpenLayers commercial base maps.

This got me to thinking about the amount of work that the MapGuide project is leveraging every time you see a map. MapGuide directly includes about a dozen open source libraries. Many of these (such as FDO, GDAL, GD and Fusion) have their own stack of libraries that they depend on. With a bit of digging, I quickly ended up over 30–I’m sure I could have gone further–and this doesn’t even count the open source utilities such as GCC, Ant and SWIG that are integral to turning all of this code into something you can use.

I guess what I’m trying to say is that no matter how cool your code is, you’re really just the tip of the iceberg. We’re not standing on the shoulders of giants, we’re standing on the shoulders of thousands of regular people who have dedicated their time to help build this ecosystem. We have to make sure that we in turn enhance other projects where possible, and provide a solid base for those who come to build on our work in the future.

-J

NanaimoMap Testers Wanted

The City of Nanaimo is launching our new MapGuide Open Source / Fusion based map in beta. I’d love to see some feedback from testers, and to get help generating some real-world usage patterns. You can only do so much with canned load tests.

If you’ve got a few minutes to play with it, please join us here:

NanaimoMap Beta

It’s in beta because of the issues that will likely be shaken out by more widespread use, and because we have not yet built out the layers and search functionality required to match our current MapGuide 6.5 ActiveX-based mapping portal CityMap. This will be completed before the end of the year.

Thanks!

-J

P.S. This application was developed by DM Solutions Group. We’re running Fusion 1.1 with the latest test build (r4114) of MapGuide. We wouldn’t have been able to launch–even in beta–without some last minute fixes by Trevor Wekel of OTX Systems and Haris Kurtagic of SL King. From a personal perspective, these guys are both amazing to work with, moderately priced for the value they offer, and are great resources if you’re stuck with a problem in MapGuide core that you can’t fix on your own. As always, the opinions offered on this blog are my own, not necessarily those of my employer.

Do You See Spiders? Making Government Data Truly Open

The trend towards open government data is growing, with recent developments like Data.Gov and Vancouver’s Open3 motion, but these simply do not go far enough. In addition to publishing downloadable data and open interfaces, government needs to learn from successful commercial websites and bring their “Deep Web” data to the surface.

The internet search experience is constantly evolving. In the early days it was normal to search for a single keyword, be redirected to an authoritative website, and then explore that site to find what you were really looking for. As the search engines became smarter and publishers learned to expose their database records as individual web pages, people have learned to search for more specific information. For instance, searching for the name of a book will take you to an Amazon or Wikipedia entry for that book. Searching for the name of a current release movie will get you local show times and the name of the theatre it’s playing at.

Unfortunately, government has largely failed to recognise this change, and an entrenched tendency to develop stateful applications and portals is making the problem worse. As an example, try searching for US patent number 6368227. You will likely find a few results from ad-driven private websites that were re-publishing the government data, and maybe a broken link to the official patent search. Why bother publishing your information online if you are going to do so in a way that holds it apart from the web?

The good news is that fixing this problem is not hard. The search engines already assign a lot of authority to government sites, so you’re already a step ahead of commercial sites facing the same problem. Just follow a few simple suggestions that the rest of the web has already figured out for us:

  • Publish each well-formatted record to a consistent location on the web that will not change. This allows both people and search engines to come back to these records whenever they want.
     
  • Ensure that search engine spiders have a way of following basic hyperlinks to find this information. This can be either a simple paged set of results, or a more complex hierarchical system if the data allows.
     
  • Generate SiteMaps that link to all of your records as cheap insurance to make sure that the search engines can find all of your content. Be careful to pay attention to the maximum size, and break your data up into multiple sitemaps if necessary.
     
  • Make use of clear and logical metadata such as Title, Description and Canonical tags to ensure that both search engines and your prospective searchers can make sense of the results. Nothing worse than publishing a record with an HTML title like “32432-43A”. Nobody is going to click on that!
     

Do we still need to build applications? Absolutely! Sometimes free text search across the entire web does not offer enough granularity. Do we still need to make data and services available to third parties? Definitely! There are lots of smart people out there who can use our data to help make the world a better place. However, these are secondary to the single most effective way we have of giving citizens access to the data we maintain on their behalf. Our highest level of service can be delivered by being of the web, not just on the web.

Oh, and since this is a geospatial blog: Just because your data is in a GIS, don’t think you can avoid doing something about this. Spatial search is still nowhere near as powerful as general web search, but it’s getting better all the time. Government geodata needs to be published as web pages too.

For some concrete examples of the benefits of becoming part of the web, check out a slide show that I recently published as “Moving Beyond the Desk“. Make sure to turn on the speaker notes. If you don’t feel like watching the slides, just try searching Google for Mark Bate Statue or 2323 Rosstown Rd and see if you can find the City of Nanaimo’s data in the results. For technical information on the systems behind these results, see my previous posts on the public art project and the MapGuide GeoREST extension the City is using to publish property information.

-J

P.S. This post was prompted by James’ mention of the Moving Beyond the Desk slide show. Thanks James!