Monday, July 4, 2011

Comparison of Shapefiles with shpdiff

For my important coding projects I'm using a local Subversion repository and a file comparison utility like diff (provided by the IDE) to keep track of my changes.

Some time ago I started to think about a similar versioning system for Shapefiles, since in one of our projects we have GIS staff concurrently tracing and updating base data from satellite imagery. Finally I found the shpdiff utility, which is exactly what I looked for. These were my build steps following a thread from the ShapeLib mailing list.

Download and extract the latest shapelib version and the shpdiff.c file. Probably there are also other places where you can find the shpdiff.c file.
wget "http://download.osgeo.org/shapelib/shapelib-1.3.0b2.zip"
unzip shapelib-1.3.0b2.zip
cd shapelib-1.3.0b2/contrib/
wget "ftp://ftp.soc.soton.ac.uk/pub/pwc101/slackware/slackbuilds/academic/shp2text/shpdiff.c"
Next step is to build the main utilities and the library
cd shapelib-1.3.0b2/
make all
make lib
Then I wanted to build the community contributed tools. Before compiling I made the following changes to the Makefile to include the shpdiff utility (diff output):
18c18
< all: shpdxf shpproj dbfinfo shpcentrd shpdata shpwkb dbfinfo dbfcat shpinfo shpfix shpcat Shape_PointInPoly shpsort shpdiff
---
> all: shpdxf shpproj dbfinfo shpcentrd shpdata shpwkb dbfinfo dbfcat shpinfo shpfix shpcat Shape_PointInPoly shpsort
37,39d36
< 
< shpdiff: shpdiff.c $(SHPOBJ)
<  $(CC) $(CFLAGS) shpdiff.c ${SHPOBJ} $(LINKOPT) $(GEOOBJ) -o shpdiff
finally I built also these tools with
make all
It seems that the changes in shputil.c and shpgeo.c described in the mentioned thread are no longer necessary. To test shpdiff I used the amenities Shapefile from openstreetmap.la. I started without any changes:
./shpdiff amenities.shp amenities_modified.shp
Original Shapefile Type: Point, 1392 shapes 1392 database records
Comparison Shapefile Type: Point, 1392 shapes 1392 database records

NOTE: Using column NAME to identify records
It's an important note by shpdiff that it compares features based on their "NAME" attribute (and then "STREET", "TOWN" etc.). Since OpenStreetMap features have an unique identifier, I wanted to compare the files based on the "OSM_ID" attribute. These are my changes in shpdiff.c on lines 173 and 208 (again diff output):
173,175d172
<     identifyKey = DBFGetFieldIndex( iDBF, "OSM_ID" );
<     if( identifyKey >= 0 )
<         goto gotkey;
208c205
<     if(identifyKey >= 0)
---
>     if(identifyKey)
If you have another identifier attribute you want to use for comparison, just change the name and rebuild it:
make shpdiff
Then I started to edit one of the files. First I deleted a feature, shpdiff outputs the following:
Original Shapefile Type: Point, 1392 shapes 1392 database records
Comparison Shapefile Type: Point, 1391 shapes 1391 database records

NOTE: Using column OSM_ID to identify records

Record 948: deleted from original
--------------------------------
OSM_ID: 513607024  
AMENITY:                                     
TOURISM: guest_house
NAME: Annivong 2
NAME_LO:                                                             
NAME_EN: Annivong 2
The output after adding a new feature (new OSM ID should be negative, right?):
Original Shapefile Type: Point, 1392 shapes 1392 database records
Comparison Shapefile Type: Point, 1393 shapes 1393 database records

NOTE: Using column OSM_ID to identify records

New record 1392 found
-------------------------------
OSM_ID: -1
AMENITY: restaurant
TOURISM:
NAME: restaurant
NAME_LO:
NAME_EN:
Move a feature:
Original Shapefile Type: Point, 1392 shapes 1392 database records
Comparison Shapefile Type: Point, 1392 shapes 1392 database records

NOTE: Using column OSM_ID to identify records

Record 832:shape change
Change an existing field value for feature 387, delete a value for feature 667 and add a value for feature 717
Original Shapefile Type: Point, 1392 shapes 1392 database records
Comparison Shapefile Type: Point, 1392 shapes 1392 database records

NOTE: Using column OSM_ID to identify records

Record 387:
NAME_EN: Mittapharb Lao Barbecue >>> Mittaphab Lao Barbecue

Record 667:
NAME_EN: Ho Phra Keo >>> 

Record 717:
NAME_EN:  >>> Wat Kaognot
The output is quite self-explanatory and you can find the records in the attribute table with the corresponding number as long as you don't sort the table:
Screenshot QGIS Attribute Table
For the sake of completeness, I should also mention the PostGIS Versioning scripts and QGIS plugin from Kappasys. It's not only a comparison tool but a whole versioning system and looks very promising and sophisticated. But since Shapefile is still the dominant GIS format in my working environment I haven't yet tried it.

2 comments:

  1. shpdiff.c is no longer available on the mentioned ftp server.
    You can get it from http://uwmike.com/maps/shapefiles/shpdiff.c or http://www.carto.net/webrian/downloads/shpdiff.c

    ReplyDelete
  2. wow! it's awesome!

    I would like to share my experience with comparison.
    https://github.com/gipong/compare2shp

    This comparison tool can highlight differences with geometry and visualize the results.

    ReplyDelete