Do LLMs understand coordinates?

’A new #benchmark called #GPSBench evaluates 14 #LLM-s across 17 coordinate manipulation and reasoning tasks and finds that models handle real-world geographic reasoning better than raw geometric computations, with country-level knowledge stronger than city-level localisation.
Author
Published

May 3, 2026

… is the question researchers asked in a recent paper (submitted, not yet peer-reviewed and published, it seems).

The team around Thinh Hung Truong created GPSBench, a benchmark to evaluate the ability of LLMs1 to understand and work with coordinates. The benchmarks contains 57,800 samples across 17 tasks. The researchers divide the tasks into the following two tracks:

The GPSBench repo contains the dataset and evaluation code to analyse LLMs’ performance on the benchmark. Supported LLM providers include OpenAI, Anthropic, Google, and providers accessible through OpenRouter.

Individual model performance across all tasks (source: Truong et al. (submitted))

Model performance in the “applied” vs. “pure GPS” track (source: Truong et al. (submitted))

Task-specific accuracy (average over all tested models). The error bars show ±1 standard deviation. The researchers classify the tasks into “solved” (> 95% accuracy), “brittle” (25–95%), and “unsolved” (<25%) tiers. (source: Truong et al. (submitted)

The summary by the research team states the following (note that the paper is not yet peer-reviewed, so take this with a grain of salt):

[W]e evaluate 14 state-of-the-art LLMs and find that GPS reasoning remains challenging, with substantial variation across tasks: models are generally more reliable at real-world geographic reasoning than at geometric computations. Geographic knowledge degrades hierarchically, with strong country-level performance but weak city-level localization, while robustness to coordinate noise suggests genuine coordinate understanding rather than memorization.

Footnotes

  1. Large language models.↩︎

  2. The track name the researchers chose is a bit misleading, as it is not limited to GPS or GNSS coordinates at all, but rather tests the understanding of coordinates in general.↩︎