• Image
    Photos: MetaCarta

With the geographic references tagged, the search returns only the documents containing references to the geographic area of interest. The user can select a country, state, or city to search, or further limit the search with artificial boundaries to select a region or neighborhood. Once the geographic area is defined, the user enters search terms. The solution returns only documents with those search terms that are tagged with latitudes and longitudes within the geographic area of interest.

Search results are literally viewed on a map, providing an intuitive picture of the geospatial pattern of those results. Search query results appear as icons on a digital map and in a results list. The location of each document icon coincides with the geographic locations mentioned within. If a user wants to find all documents relative to a geographic area—such as within a country, city, or latitude/longitude bounding box—the solution renders a map appropriately marked with icons representing every document that includes text pertaining to the identified location. By clicking on a document icon, a user gains direct access to the original document.

The benefits of fusing geospatial information and text search are clear. For instance, a repair team looking for historical information on sewers in a certain part of town can first limit the search to the specific neighborhood, then search for sewers. This eliminates any information about sewers in other neighborhoods not relevant to the current project.

Similarly, a water treatment plant manager trying to identify the source of pollution in a nearby river can first limit his search to the immediate area around the river, then use keywords to search for a specific chemical or pollutant. This allows the manager to hone in on the company that may have produced the chemical, while eliminating other companies that produce the chemical but are too far away to be the pollutant source.

It is increasingly important that a manager make the best decisions based on all the available information; pertinent information should be available in both structured and unstructured documents. The use of GTS that can intelligently search unstructured and structured data containing geographic terms ensures that decision-makers can effectively and accurately access all the information necessary to make timely, appropriate decisions.

Just act naturally

Natural language processing mimics human behavior to interpret data.

Human beings can easily distinguish ambiguous references. For example, for a person in the Washington, D.C., metropolitan area, the statement “I went to Vienna last night,” clearly refers to the Fairfax County, Va., suburb, not to the capital of Austria. In fact, there are 35 separate places in the world named Vienna; 31 of them in the United States. While a human can tell the difference between those references, a computer will have more difficulty.

Using natural language processing (NLP), a subfield of artificial intelligence and linguistics, allows a computer to use context clues in a document to determine the geographic area being referenced. For example, if the speaker continued the comment about Vienna by saying “then I drove through Tysons Corner to get home,” the technology allows a significant degree of confidence that the speaker is indeed talking about Vienna, Va., as Tysons Corner is a neighboring suburb. However, if the sentence instead continued, “and we drove from there to Slovakia,” there is a degree of confidence that the Vienna in question is in fact in Austria.

Also, NLP can differentiate between references that may or may not be place names. The word “Denton” could refer to any one of 39 geographic locations—or it could be a person. Once again, the context of the surrounding document provides indicators. The words “city of” or “mayor of” preceding a name like Denton, or the words “community college” following, are strong positive indicators that the candidate name is geographic. The words mister, doctor, or a common first name preceding the name are strong negative indicators.

— Randy Ridley is vice president and general manager of Cambridge, Mass.-based MetaCarta's public sector division; John-Henry Gross is product manager.