After a massive storm, a city discovers a nearby river is now polluted with toxic chemicals, thanks to inadequate storm drains—but where did the chemicals come from? Population growth necessitates building of new roads, schools, sewers, and other infrastructure—but which areas are environmentally protected and must be avoided?
In many cities, the infrastructure is decades or centuries old, and public works managers have thousands of digital documents containing important historical and current information about it. Many of these documents are “unstructured”—not in a database—and an estimated 80% of all digital writing includes at least one geospatial reference. Even when relevant information is within a database, the geographic references often lack the necessary coordinates. Only when both the structured and unstructured information is fused into a single, cohesive view can you get the complete picture needed to make an informed decision.
In public works, nearly all activity (such as wastewater management and construction, or road maintenance) is tied to a location. As such, the geographic references in documents, reports, e-mails, and blogs are important—but not necessarily easy to find. The data are collected by many different people, in different formats, both structured and unstructured. Few, if any, documents are clearly marked with a latitude and longitude.
It is difficult to search through archives. Traditional search engines cannot resolve ambiguous location references. Take the unlucky manager searching for a reference to Washington Street in the city of Lincoln. Traditional search tools cannot distinguish between the use of Washington or Lincoln as place names and as names of former presidents. Results could easily number into the thousands, most of which would have nothing to do with the location in question. Equally difficult is a search for a Sinclair Avenue, in a town whose former mayor's surname was Sinclair.
Select The Right Tool
Fortunately, the proper tools are out there. A geographic text search (GTS) is a search tool that can tell between geographic terms and automatically resolve them to a specific latitude and longitude. The tool uses geospatial information and non-geographic text to allow officials to quickly determine whether a document refers to his or her area of responsibility and interest. In addition, the geographic pattern becomes obvious when results are visualized on a map. These GTS solutions become even more powerful when the underlying map is served from a geographic information system, creating an authoritative decision support tool.
GTS solutions are already in use at federal, state, and local agencies. The power of these solutions comes from their ability to use natural language processing (NLP) to mimic the process a human would use to resolve ambiguous references within a document, then geospatially tag it with the correct latitude and longitude (to see an image of a GTS solution in action click here)
By looking at a document's context, NLP can differentiate between different places with the same name and people's names that are similar to place names. It is also used to identify relative references such as “30 miles south of Cleveland,” tagging the most likely point of reference. The solution is thereby able to provide meaning resolution—discerning the actual intended meaning of the document's author.