Geocoding with OpenRefine & the GeoLocate API
Using the GeoLocate Community Server API to geocode esoteric text-based location descriptions with uncertainty estimates
About This Guide
This tutorial demonstrates how to use OpenRefine with the GeoLocate Community Server API to geocode location data that contains textual place descriptions rather than structured street addresses. The GeoLocate API is particularly well-suited for historical, museum, and natural history collections data where locations are described in narrative form, such as "5 miles west of Springfield along Route 66" or "near the confluence of the Green and Colorado Rivers."
Unlike standard address geocoding services, GeoLocate provides uncertainty values that estimate the geographic extent of ambiguity in the location description. This makes it invaluable for geocoding older spatial location information associated with 19th and 20th century fieldwork in biology, archaeology, geology, natural history, and other fields where precise coordinates were not originally recorded.
About the Dataset
For this exercise, we'll use the California Haunted Places dataset (ca_haunted_places.csv), which contains location descriptions of reportedly haunted locations across California. This dataset is ideal for demonstrating GeoLocate's capabilities because:
- Locations are described in natural language rather than structured addresses
- Many descriptions reference landmarks, relative directions, and distances
- Location precision varies widely across records
- Some locations are historical and may no longer exist
You can view the dataset here: ca_haunted_places.csv
Dataset Fields
The dataset includes:
- location: Text description of the haunted location
- city: City name
- state: State name
- state_abbrev: State abbreviation
- description: Full narrative description of the haunting
- longitude/latitude: Pre-existing coordinates (we'll compare our results to these)
Getting Ready
Download the Data
Download the California Haunted Places dataset:
Download and Install OpenRefine
If you haven't already, download and install OpenRefine:
https://openrefine.org/download
OpenRefine is open-source software that allows you to manipulate and augment data. It runs in your browser but operates locally on your machine. For this exercise, OpenRefine will help us submit URLs to the GeoLocate API repeatedly for all our records, saving us from manual copy-and-paste work.
What is the GeoLocate API?
The GeoLocate Community Server is a web-based geocoding service developed specifically for natural history collections and research data. Unlike standard geocoding services that expect street addresses, GeoLocate:
- Interprets natural language locality descriptions
- Handles relative distances and directions ("5 km NE of...")
- Works with historical place names and geographic features
- Provides uncertainty radius estimates for each geocoded point
- Returns geographic extent polygons showing possible location areas
The GeoLocate API is freely available for educational and research use through their Community Server.
Understanding Uncertainty in Geocoding
One of GeoLocate's most valuable features is its calculation of uncertainty radius values. When geocoding historical or textual location data, we rarely have absolute precision. Consider these examples:
- "Near San Francisco" - Could be anywhere within 10-50 miles
- "5 miles north of the Golden Gate Bridge" - More precise, but "5 miles" could mean 4.5-5.5 miles
- "123 Main Street, San Jose, CA" - Very precise, uncertainty might be 10-100 meters
GeoLocate quantifies this uncertainty by providing:
- Uncertainty Radius: Distance in meters representing the possible error
- Uncertainty Polygon: The geographic area within which the true location likely falls
This is crucial for:
- Historical specimen data: Museum records from the 1800s-1900s with vague locations
- Archaeological sites: "Near the old mill, south of town"
- Geological surveys: "Outcrop along the river, 2 miles upstream from bridge"
- Biological field notes: "Oak grove on hillside above valley floor"
Building Your Template Query
The GeoLocate API endpoint for the Community Server is:
https://www.geo-locate.org/webservices/geolocatesvcv2/glcwrap.aspx
Let's build a test query using one record from our haunted places dataset.
Example Location Description
Let's use the first record from our dataset:
- Location: "Ada Cemetery"
- City: "Ada"
- State: "Michigan"
Constructing the URL
The GeoLocate API uses these key parameters:
locality: The location description textstate: State name or abbreviationcountry: Country namefmt: Output format (we'll usejson)
A complete query URL looks like this:
https://www.geo-locate.org/webservices/geolocatesvcv2/glcwrap.aspx?locality=Ada+Cemetery&state=Michigan&country=United+States&fmt=json
Test the Query
- Copy the URL above into your web browser
- Press Enter to submit the query
- You should receive JSON-formatted results
The response will look something like this:
{
"engineVersion" : "GLC:9.3|U:1.01374|eng:1.0",
"numResults" : 1,
"executionTimems" : 171.8821,
"resultSet" : { "type": "FeatureCollection",
"features": [
{ "type": "Feature",
"geometry": {"type": "Point", "coordinates": [-85.504733, 42.962303]},
"properties": {
"parsePattern" : "ADA CEMETERY",
"precision" : "High",
"score" : 87,
"uncertaintyRadiusMeters" : "Unavailable",
"uncertaintyPolygon" : "Unavailable",
"displacedDistanceMiles" : 0,
"displacedHeadingDegrees" : 0,
"debug" : ":GazPartMatch=False|:inAdm=True|:Adm=KENT|:NP=ADA CEMETERY|:KFID=|ADA CEMETERY"
}
}
],
"crs": { "type" : "EPSG", "properties" : { "code" : 4326 }}
}
}
Key Response Fields
- engineVersion — API engine/version string
JSON path:
engineVersion - numResults — Number of results returned (integer)
JSON path:
numResults - executionTimems — Query execution time in milliseconds (number)
JSON path:
executionTimems - resultSet.type — Typically
"FeatureCollection"(GeoJSON wrapper) JSON path:resultSet.type - resultSet.features — Array of Feature objects; usually use the first feature (
features[0]) JSON path:resultSet.features[0] - coordinates — Point coordinates as [longitude, latitude] (x, y)
JSON path:
resultSet.features[0].geometry.coordinatesNote: longitude = coordinates[0], latitude = coordinates[1] - parsePattern — How GeoLocate parsed/normalized the locality string (often uppercase)
JSON path:
resultSet.features[0].properties.parsePattern - precision — Quality rating (e.g., High, Medium, Low)
JSON path:
resultSet.features[0].properties.precision - score — Confidence score (0–100)
JSON path:
resultSet.features[0].properties.score - uncertaintyRadiusMeters — Estimated error radius in meters, or the string
"Unavailable"when not provided JSON path:resultSet.features[0].properties.uncertaintyRadiusMeters - uncertaintyPolygon — WKT polygon describing the uncertainty area, or
"Unavailable"JSON path:resultSet.features[0].properties.uncertaintyPolygon - displacedDistanceMiles — Distance the returned point was displaced from the original interpreted location (miles)
JSON path:
resultSet.features[0].properties.displacedDistanceMiles - displacedHeadingDegrees — Heading (degrees) for any displacement applied to the point
JSON path:
resultSet.features[0].properties.displacedHeadingDegrees - debug — Internal parse/debug string useful for troubleshooting matches
JSON path:
resultSet.features[0].properties.debug - crs — Coordinate reference system object (e.g., EPSG:4326)
JSON path:
resultSet.crs
Examples for using GREL in OpenRefine for access (first feature):
- Longitude:
value.parseJson().resultSet.features[0].geometry.coordinates[0] - Latitude:
value.parseJson().resultSet.features[0].geometry.coordinates[1] - Uncertainty:
value.parseJson().resultSet.features[0].properties.uncertaintyRadiusMeters - Parse pattern:
value.parseJson().resultSet.features[0].properties.parsePattern - Score:
value.parseJson().resultSet.features[0].properties.score - Debug:
value.parseJson().resultSet.features[0].properties.debug
Setting Up OpenRefine
Import the Data
- Launch OpenRefine (it will open in your browser at http://127.0.0.1:3333)
- Click Create Project
- Choose This Computer and browse to
ca_haunted_places.csv - Click Next
- Review the data preview and click Create Project
Examine the Data
Look at the location, city, and state columns. Notice how locations vary:
- Some are specific: "Ada Cemetery"
- Some are vague: "North Adams Rd."
- Some include directions: "Gorman Rd. west towards Sand Creek"
- Some reference landmarks: "Ghost Trestle"
This variability makes GeoLocate ideal for this dataset.
Creating the API Call in OpenRefine
Step 1: Build the Base URL
We need to construct a URL that combines:
- The GeoLocate API endpoint
- Our location data from the location column
- Our city data from the city column
- Our state data from the state column
Step 2: Add a Column by Fetching URLs
- Click on the dropdown arrow next to the location column
- Select Edit column > Add column by fetching URLs...
- Name the new column:
geolocate_json
Step 3: Create the Expression
In the Expression box, enter:
"https://www.geo-locate.org/webservices/geolocatesvcv2/glcwrap.aspx?locality=" +
escape(cells["location"].value, "url") +
"&state=" + escape(cells["state"].value, "url") +
"&country=United+States&fmt=json"
This expression:
- Starts with the base GeoLocate API URL
- Adds the locality parameter using the location column value
- Adds the state parameter using the state column value
- Adds a fixed country parameter
- Requests JSON format output
- Uses
escape()to properly encode special characters for URLs
Step 4: Configure Throttling
Important: Set the Throttle delay to at least 200 milliseconds (0.2 seconds). This prevents overwhelming the GeoLocate server with requests.
Step 5: Run the Fetch
Click OK and OpenRefine will:
- Build a custom URL for each row
- Submit each URL to the GeoLocate API
- Store the JSON response in the new
geolocate_jsoncolumn
This may take several minutes depending on the size of your dataset.
Parsing the JSON Response
Now we need to extract useful information from the JSON responses.
Extract Longitude
- Click the dropdown on geolocate_json column
- Select Edit column > Add column based on this column
- Name it:
longitude - Use this expression:
value.parseJson().resultSet.features[0].geometry.coordinates[0]
Extract Latitude
- Click the dropdown on geolocate_json column
- Select Edit column > Add column based on this column
- Name it:
latitude - Use this expression:
value.parseJson().resultSet.features[0].geometry.coordinates[1]
Extract Uncertainty Radius
- Click the dropdown on geolocate_json column
- Select Edit column > Add column based on this column
- Name it:
uncertainty_meters - Use this expression:
value.parseJson().resultSet.features[0].properties.uncertaintyRadiusMeters
Extract Precision Level
- Click the dropdown on geolocate_json column
- Select Edit column > Add column based on this column
- Name it:
precision - Use this expression:
value.parseJson().resultSet.features[0].properties.precision
Extract Match Score
- Click the dropdown on geolocate_json column
- Select Edit column > Add column based on this column
- Name it:
score - Use this expression:
value.parseJson().resultSet.features[0].properties.score
Interpreting Results
Understanding Uncertainty Values
Look at your new uncertainty_meters column. You'll see values ranging from small (high precision) to large (low precision):
- < 100 meters: Very precise, specific address or landmark
- 100-1000 meters: Good precision, named place or feature
- 1000-5000 meters: Moderate precision, general area
- > 5000 meters: Low precision, vague description
Understanding Precision Levels
The precision column shows:
- High: Specific, well-defined location
- Medium: Reasonably well-defined location with some ambiguity
- Low: Vague or ambiguous location description
Understanding Match Scores
The score column (0-100) indicates confidence:
- 90-100: Excellent match
- 80-89: Good match
- 70-79: Fair match
- < 70: Poor match, review manually
Comparing Results
If your original dataset had coordinates (like our haunted places data), you can compare:
- Create a column to calculate distance between original and GeoLocate coordinates
- Identify records with large discrepancies
- Use uncertainty values to understand if differences are within expected error
Calculate Distance Difference
Add a column with this expression (approximate):
abs(cells["longitude"].value - value.parseJson().resultSet.features[0].geometry.coordinates[0]) * 111000
This gives a rough east-west distance in meters.
Filtering and Quality Control
Filter by Precision
Use OpenRefine's faceting to filter results:
- Click dropdown on precision column
- Select Facet > Text facet
- Review counts of High/Medium/Low precision matches
- Filter to show only low precision matches for manual review
Filter by Uncertainty
- Click dropdown on uncertainty_meters column
- Select Facet > Numeric facet
- Adjust the slider to focus on high uncertainty records
- Review these records manually
Filter by Score
- Click dropdown on score column
- Select Facet > Numeric facet
- Focus on scores below 80 for review
Handling Failed Geocodes
Some records may fail to geocode. To identify them:
- Click dropdown on longitude column
- Select Facet > Customized facets > Facet by blank
- Review records that returned no coordinates
Common reasons for failure:
- Location name not found in GeoLocate's database
- Description too vague or ambiguous
- Typos or incorrect place names
- Historical names no longer recognized
Manual Review Strategies
For failed or low-quality geocodes:
- Check the original location and city fields
- Try simplifying the location description
- Use just the city name if location is too obscure
- Consider alternative geocoding services for specific addresses
Exporting Your Results
Once satisfied with your geocoding results:
- Click Export in the top-right corner
- Choose your format:
- Comma-separated values (CSV): For spreadsheets
- Tab-separated values (TSV): For import to GIS software
- Excel: For Microsoft Excel
- Save your enriched dataset
Use Cases for GeoLocate with Historical Data
Museum Collections
Natural history museums have millions of specimen records with locality descriptions like:
- "Collected 5 miles east of Yosemite Valley, 1892"
- "Along the Merced River below Vernal Falls"
- "Oak woodland, west slope Sierra Nevada"
GeoLocate interprets these descriptions and provides realistic uncertainty estimates, allowing researchers to:
- Map historical species distributions
- Track environmental change over time
- Identify sampling gaps in collections
Archaeological Records
Archaeological site descriptions often use landmarks and relative positions:
- "Midden site on bluff overlooking river confluence"
- "Rock shelter 2 km upstream from bridge"
- "Burial mound complex, south of old mission"
The uncertainty radii help researchers:
- Plan survey areas for site relocations
- Assess site density and clustering
- Protect site locations while sharing data
Geological Surveys
Historical geology field notes contain descriptions like:
- "Limestone outcrop along creek, 1.5 miles below dam"
- "Coal seam exposed in railroad cut near tunnel"
- "Fault scarp crossing valley floor east of town"
Uncertainty values help geologists:
- Revisit historical sample locations
- Integrate historical and modern datasets
- Assess data quality for different uses
Best Practices
Prepare Your Data
- Clean location descriptions: Remove excess punctuation and formatting
- Standardize state/country names: Use full names or standard abbreviations
- Separate address components: If you have them, use separate fields for better results
Optimize API Calls
- Use appropriate throttle delays: Don't overwhelm the server
- Cache results: Save the JSON responses to avoid re-geocoding
- Batch large datasets: Process in smaller chunks if needed
Document Your Process
Record:
- Date of geocoding
- GeoLocate API version (if available)
- Any manual corrections made
- Criteria for accepting/rejecting matches
- How uncertainty values were used in analysis
Validate Results
- Spot check: Manually verify a sample of geocoded locations
- Use uncertainty values: Assess if precision meets your needs
- Compare to existing coordinates: If available
- Map results: Visual inspection often reveals errors
Advanced Techniques
Extracting Uncertainty Polygons
GeoLocate returns uncertainty polygons in WKT (Well-Known Text) format. To extract:
value.parseJson().resultSet.features[0].properties.uncertaintyPolygon
These polygons can be imported into GIS software for spatial analysis.
Combining Multiple Geocoding Services
For best results, consider:
- Try GeoLocate first for natural language descriptions
- Fall back to address geocoders for structured addresses
- Use different services for different location types
Handling International Locations
For locations outside the US:
- Specify the correct country parameter
- Be aware that GeoLocate coverage varies by country
- Consider regional geocoding services for better results
Troubleshooting
No Results Returned
Problem: JSON response is empty or contains no features
Solutions:
- Simplify location description
- Check spelling of place names
- Try using only city and state
- Verify country is correct
High Uncertainty Values
Problem: Uncertainty radius is very large (>10,000 meters)
Solutions:
- Add more specific location details if available
- Accept the uncertainty if description is inherently vague
- Consider whether precision meets your research needs
- Flag for manual research
API Rate Limiting
Problem: Requests are being blocked or timing out
Solutions:
- Increase throttle delay in OpenRefine
- Process data in smaller batches
- Contact GeoLocate for higher rate limits if needed
- Spread processing over multiple sessions
Parse Errors in OpenRefine
Problem: Expression returns errors or null values
Solutions:
- Check JSON structure has expected format
- Add null-checking to expressions:
if(value != null, ...) - Verify API responses are complete
- Review OpenRefine expression syntax
Additional Resources
Documentation
Related Tools
- GEOLocate Desktop: Standalone application with visual verification
- QGIS: For mapping and analyzing geocoded results
- GeoPick: For collaborative georeferencing projects
Community Resources
Support
- For GeoLocate questions: Contact through geo-locate.org
- For OpenRefine help: OpenRefine Community Forum
← Back to Geocoding 101 Introduction
Tutorial prepared by the Stanford Geospatial Center