AI search misidentifies what language a query is written in, and that single error quietly reshapes which sites get cited. A Search Engine Land analysis published Thursday by David Carrasco Pamies uses Catalan as a test case, and the finding generalizes well beyond it. The language-identification layer that sits at the front of every modern retrieval pipeline makes mistakes, and every system built on top of that layer inherits them.
The baseline failure is older than AI search. Google Translate misidentifies Catalan as Occitan, despite Catalan having roughly 9 million speakers against Occitan’s 200,000. A pre-AI system already cannot reliably tell the two apart. Generative retrieval did not introduce this flaw. It inherited it, then amplified it by feeding the misread query into a synthesized answer that presents the error with confidence.
Carrasco Pamies documents four patterns, and each one maps to a measurable SEO outcome. The first is vocabulary divergence. A Spanish-language query about Catalan independence surfaced BBC, the Spanish Wikipedia, and Fundación Espacio Público. The same query in Catalan added El Punt Avui, VilaWeb, the Catalan subreddit, and the Catalan Wikipedia. The query language did not just translate the request. It swapped the retrieval pool.
The second pattern is commercial. A query for an accountant in Catalan returned zero paid ads, while the Spanish equivalent showed multiple sponsors. The system also tried to autocorrect a Catalan search for calçot recipes, a regional vegetable, into ice cream shops. For a local business, that is lost commercial visibility caused entirely by the language the customer happened to type in.
The third pattern is authority reassignment. Answers about the Sant Jordi tradition shifted between hotel chains and the regional government depending on query language. The language was acting as a filter on the corpus, deciding which entities counted as authoritative before any ranking signal was applied. The fourth pattern is the most damaging for diagnosis: the same Catalan query returned Spanish answers unpredictably across sessions. A site owner watching this cannot reproduce the failure, which means they cannot file a useful bug or build a fix.
There is a compounding risk the analysis names directly. Low-quality, AI-generated content in a minority language gets scraped back into training data, which degrades the model’s grasp of that language further. Carrasco Pamies calls it a slop loop. Each pass makes the next model slightly worse at the language, and worse models generate more slop. For smaller language markets, the trend line points the wrong way.
The reason this matters outside Catalonia is structural. The same routing logic applies to any region where a national or sub-national context is ambiguous. The analysis draws the parallels: Texas against California, Quebec against Ontario. A market that looks monolingual to an outside team still contains jurisdictional and dialect boundaries that retrieval can collapse. If routing, content, entity markup, and quality assurance do not all agree on the same country and language context, the model will pick one, and it will not tell you which.
For international SEO teams, the takeaway is that “localized” in 2026 is a systems property, not a translation task. Routing rules, hreflang signals, entity markup, content language, and testing all have to encode the same context, or the retrieval layer resolves the ambiguity on its own. Teams operating in any multilingual or multi-jurisdiction market should audit how AI answers respond to queries in each language now, before that quiet misrouting becomes a permanent loss of citations.
ATTRIBUTION: Reported by Search Engine Land on 2026-05-21.