How geocoding sets the foundation for business insights
Geocoding an address might seem simple: You write an address and the service returns its X and Y coordinates, but there’s more than meets the eye. Not all geocoders are created equal and what can seem like minor details can become costly mistakes, especially if you rely on the geocoding results for your business processes. Furthermore, the wide variety of geocoding platforms can make what appears like an easy choice, a long and painful process.
When performing geocoding assessments for our customers, it’s common to see up to 20% of addresses return a false positive therefore, being geocoded at the wrong location. This is caused by the poor quality of the input addresses which results with addresses geocoded at a distance ranging from a few meters to hundreds of miles of the exact location. Whether it’s a delivery company that relies on these locations to deliver goods and optimize routes or an insurance company using their customer location for policy underwriting, these geocoding errors can amount to multimillion-dollar losses annually.
Before you decide which geocoding solution to purchase, here’s a list of considerations to ensure you make the right choice for your business and receive strong and accurate results, every time:
The 6 pillars of a geocoding solution
1. Source data and coverage
Once the address is parsed and its different components are identified, the address database is searched for matching candidates. The completeness of an address database depends of its sources, their update frequency, the collection methods and the database richness in terms of aliases, deprecated names, etc.
Some geocoders rely on open-source datasets, others on commercial datasets and some even aggregate multiple commercial data providers to come up with a best of breed approach. Commercial data is built by providers such as Google, HERE or TomTom who collect data on the ground to create a set of address points and street address ranges.
Given the size of the task to collect, verify and georeference every address in a country, the address point coverage varies considerably from one state to another and sometimes even within the same state. This is caused by the various addressing schemes that coexisted historically and that are still being unified. Furthermore, the time between data collection and its publication as well as the frequency of updates will greatly influence the reference data quality.
2. Fuzzy matching
The fuzzy matching capabilities allow you to enter an imperfect address while still receiving a strong and relevant result. For instance, it will find matches between non-identical strings. Ex. montgomrey = Montgomery. This can be critical if you don’t have any system in place to perform address validation and ensure the input address quality.
On the other hand, fuzzy matching can be a double-edged sword. If the geocoder is too ambiguous, it might return false-positive results by modifying too many components. Some geocoders offer parameters to control the fuzzy matching capabilities and allow to enforce certain address components, like the state, that cannot be modified to obtain a match. This guarantees that 123 Main St, Missoula, MT, is not matched to 123 Main St, Boston, MA.
3. Data confidence and traceability
One common misconception is that the geocoding result should be blindly accepted. Numerous factors such as input data quality, address parsing, fuzziness and reference data can cause false-positive results. Varying providers offer different levels of confidence ranging from “OK” to seemingly random strings such as S8HPNTSCZA and confidence scores like 0.8 or 97.
It probably means nothing to you, but these levels of confidence are critical to integrate geocoding as part of your business processes. These match codes give you information about what was (or wasn’t) matched allowing you to set rules and thresholds for automated processing. This way, you can decide if the record keeps going in your automated workflow or simply thrown out and flagged for manual verification. This is especially important if you rely on a fuzzier geocoding service.
To add to the confusion, the level of detail included differs from one provider to another. Depending on the geocoding service, you can expect up to 100 fields in the response with information such as the point location (driveway, parcel centroid, building centroid, etc.) or the Census division ID.
4. Match rate and location accuracy
These two key indicators are closely related since a matched address is worthless if it’s not precisely geocoded. Notwithstanding the other important considerations, the number of addresses that were recognized and precisely geocoded is what really matters for you in terms of business operations because this relates to the amount of business decisions that you can make with strong confidence.
Location accuracy translates the latitude and longitude to the type of location it corresponds to. This can be anything from an address point located at the building centroid to a street intersection or a city or state centroid. Obviously, the more precise the geocode is, the better your location insights will be.
Different services offer different capabilities and varying performance. Batch services allow uploading a single csv file and download the same file once the geocode is completed while micro-batch services allow to programmatically send chunks of addresses that can result in significant performance improvements and decrease processing times by as much as 10 times. In these situations, the throughput, the total number of addresses geocoded per minute, is the key metric.
On the other end of the spectrum, real-time services are built to geocode based on a user’s input and offer fast response time and capabilities like autocomplete to improve the user experience and ensure a more accurate input. To measure real-time performance capabilities, we monitor the latency. This is critical when the address is used as the basis for an underwriting process.
In addition, the type of deployment whether it’s on premise or in the cloud (SaaS) will greatly influence the performance capabilities of the geocoding service.
6. Terms and conditions
This is the most overlooked aspect when choosing a location platform but also when looking for a geocoding service. Some services allow you to fully automate your processes with bulk processing whereas some geocoding solutions prohibit you from doing so by authorizing only human based queries. This has a considerable impact if you need to process millions of addresses at a time. The latitude and longitude are also submitted to some restrictions such as not being displayed to the end user or not being stored for more than 30 days.
Our approach to geocoding benchmarking: reduce your shopping time and deploy your solution faster.
When comes the time to evaluate different providers based on their geocoding characteristics, it’s easy to fall down the geocoding rabbit hole. Once you’ve poured through the lengthy documentation of one provider and geocoded your sample list of addresses, you will find that you must do it all over again if you want to consider a second solution. In addition to the various parameters used to geocode with each vendor, the response will be different from one service to another.
At Korem, we partner with all the best geocoding providers on the market and benchmarking geocoding software is one of our strengths. Let the experts walk you through the comparison process and help you evaluate which geocoding solution is best based on your data and your business needs.