AI as a supporting act: building a points-of-interest pipeline that fights hallucinations

While building In the Long Run, an app that maps Strava mileage onto famous long-distance routes, the developer set out to enrich its maps with notable landmarks. The pipeline pulls from GeoNames’ Creative Commons dataset, processed in Python with Apache Parquet storage and DuckDB as the query layer. Raw data (13 million rows) is filtered to roughly 725,000 global points of interest by selecting feature types like parks, castles, and monuments, applying population and elevation thresholds, and using Wikipedia links buried in GeoNames’ alternateNames file as a notability signal. A second stage matches candidates to each route via bounding boxes and distance-along-route calculations using Shapely and Pyproj.

The LLM — Anthropic’s Haiku, chosen for speed, price, and batch discounts — was relegated to a supporting role generating significance ratings rather than being the headline feature. It proved unreliable: an ungrounded first pass confused Central Park in Decatur, Illinois with its Manhattan namesake, inflated town populations, and made mountains taller than reality. Adding location and administrative metadata plus tighter system-prompt grounding reduced but didn’t eliminate hallucinations. Batched calls took hours and cost around $10 for the largest routes, and stray Anthropic Markup Language fragments leaked into tool-call outputs, requiring cleanup.

The deeper lesson is about bias and judgment. Relying on English Wikipedia as a relevance signal produced a clear distortion — Route 66 drew 14,181 POIs while the 23,257 km Cape Town to Magadan route got only 10,000 — reflecting where English speakers live and edit rather than where interesting places actually are. Cross-referencing Wikidata’s count of language editions per article gave a better notability measure. The author’s takeaway, captured in the title, is that curating for taste and catching subtle errors still demands human spot-checking; no automated test substitutes for editorial judgment.