I'll start with the part no one tells you when they sell you a GEO service, because it's the only one that affects your bottom line: the risk of paying to appear in ChatGPT is not that it doesn't work. It's that some of the tactics sold under that label can wreck the organic traffic that was already bringing you customers. You pay for something new and, in passing, weaken what was working. That's the downside that doesn't appear in any sales proposal.
And the most uncomfortable thing of all: practically no one selling it to you has measured it in your language.
I've been in this since 2012, and the last two years, from Elevam Labs, I've dedicated myself almost exclusively to understanding how models decide whom to recommend. I've read everything published with a minimum of rigor on the subject. And the conclusion is as simple as it is rare to hear in a sales pitch: Spanish GEO, as of today, is being applied by eye. With a manual written for another language.
GEO has become a market for lemons
There's a concept in economics that explains this sector better than any LinkedIn post. George Akerlof named it, and they gave him a Nobel for it: the market for lemons. The idea is this. When the buyer can't verify the quality of what they're buying before paying, competition stops being about quality and starts being about promise. And since promises cost nothing, the contract goes to whoever promises most, not whoever does the best work. Whoever promises most happens, coincidentally, to be the worst. Result: good providers leave the market because they can't compete against smoke, and the smoke stays.
GEO fits that mold with a precision that's frightening. Why? Because the buyer can't verify anything. And it's not me saying it — Rand Fishkin proved it with data.
In January 2026, Fishkin and his team at SparkToro published the most serious thing that exists to date on this: about 3,000 executions of the same prompts across several engines. The finding is devastating.
Read it again. If an agency promises to put you "first in ChatGPT", they either haven't read the data, or they're counting on you not having read it. Both should worry you.
And there's the heart of the problem. If you can't verify the outcome, how do you tell who knows from who is improvising? You can't. That's why the market fills with pretty dashboards measuring things that reorder themselves every time you press enter.
The hole no one talks about: all the rigor is in English
OK, you'll say, but something is known. Yes, something is known. And it's worth knowing, because it's the little we have that's solid.
We know, from Ahrefs, that the coupling between ranking in Google and being cited in AI answers has collapsed: in March 2026 only 38% of AI Overview citations came from the organic top-10, when a year earlier it was 76%. So even "ranking well in Google" no longer guarantees you'll be in the AI answer.
We know, from Profound and its 680 million analyzed citations, that each engine draws from radically different sources: the overlap between what ChatGPT cites and what Perplexity cites is just 11%. That is, there is no "the AI" as a single place to appear. There are four or five distinct ecosystems, each with its own rules.
All that is fine. The problem is where it comes from.
Evidence map · Published serious studies
All the available rigor on how AI engines behave is published in English. In Spanish, zero.
SparkToro (Fishkin)
<1 in 100
Probability that two identical queries on ChatGPT/Google AI return the same brand list. ~3,000 executions, January 2026.
Ahrefs
38%
Of AI Overview citations coming from the organic top-10 in March 2026. A year earlier it was 76%. The SEO ↔ AI coupling has collapsed.
Profound
11%
Overlap between what ChatGPT cites and what Perplexity cites. 680M citations analyzed. There is no «the AI» as a single place.
Princeton · KDD
+30-40%
Visibility lift from adding statistics and citing sources. Only peer-reviewed academic paper on GEO. Queries and content in English.
0
No published study
With transparent methodology and verifiable data on corroboration thresholds in Spanish. This is where Elevam Labs is measuring.
The only serious academic paper on generative engine optimization, the one by Princeton and others at KDD, which showed that adding statistics and citing sources raises visibility by 30% to 40%: done with English queries and English content.
There is not a single published study with transparent methodology that measures how this behaves in Spanish. Not one. And people are taking the Anglo playbook, translating it, and selling it here as if language were a formatting detail.
It's not. And this is the part that really matters.
Spanish is not English with another skin
There's a paper that should be on the first slide of anyone speaking seriously about this, and almost no one cites it. It was published by Christina Walker and Joan Timoneda, of Purdue, in Political Science Research and Methods by Cambridge University Press. Peer-reviewed, which in this sector is already a rarity.
What they did is elegant: they took the same prompt, translated it into several languages, and measured how GPT's response changed by language. The result? The model's output becomes more conservative in languages of conservative societies and more liberal in languages of liberal societies. And, importantly, that difference holds from GPT-3.5 to GPT-4. Their explanation is direct: the norms and beliefs of whoever produced the data in each language end up reflected in the model's output.
Translated to what concerns us: the prompt's language is not a translation of the same answer. It's a door to a different corpus, with different dynamics. When you ask in Spanish, the model is not querying "the same thing as in English but translated." It's pulling from a completely different chunk of the internet.
And how big is that chunk? Small. The Common Crawl Foundation, which is where much of these models' training feeds from, acknowledges this in writing: their data has always been biased toward English content. The numbers confirm it.
Training corpus share by language
You're playing at a table where your language occupies a twentieth of the board.
And on top of that, there's what Gianluca Fiorelli has called the "global Spanish problem": the engines don't distinguish well between Spain's Spanish, Mexico's, and Argentina's. They mix in the same answer regulatory and commercial terminology from three continents. When you ask "in generic Spanish", you're not competing against the companies in your market. You're competing against the entire Hispanic world at once.
And here comes the honest part, which is the one that matters to me
I could now close by saying "and that's why in Spanish you need less to stand out, hire Elevam". Comfortable. And it would be a lie, or at least, a half-truth I cannot yet prove.
The reasonable hypothesis is that in a smaller market, with fewer competitors per niche and fewer canonical media, appearing in fewer places is enough for a model to recommend you stably. Fishkin has data pointing that way: in small universes, leading brands reach visibilities of 90% and change. For a criminal lawyer in a specific city, or a dealership in Mallorca, that probably plays in your favor.
But there's evidence pointing in the opposite direction, and an honest consultant puts it on the table. Models hallucinate more in languages with less data, so appearing with little corroboration may mean appearing wrong, attributed to a competitor, or disappearing the next day. Crawlers visit Spanish pages less, so each of your domains weighs less in the model than its English equivalent. And if your sector is colonized by global brands with massive English domain (SEO and GEO are the perfect example: Moz, Ahrefs, Semrush rule), Spanish corroboration may get crushed by the Anglo one.
Which of the two forces wins? Depends on the niche. And the honest answer, today, is that no one has measured it. Neither have I.
So we're going to measure it
At Elevam Labs we're running the experiment right now. The same prompts in Spanish and in English, across the four engines that matter, executed dozens of times each, with clean sessions and fixed geolocation. We're measuring one thing: how many independent domains a brand actually needs for the model to recommend it stably, and whether that number changes from one language to the other.
And since I can't stand people who hide behind a "we'll see", I'll put my bet in writing.
I think in Spanish the bar is lower. That in a local, bounded niche — a criminal lawyer in one city, a dealership on one island — being well-positioned in three to five authority domains is probably enough, while in its English equivalent, fought over by a thousand brands, seven to ten would be needed. Why do I think this? Because the Hispanic market per niche is smaller and there are fewer reference media, so a handful of sources saturates the answer sooner.
The bet · Corroboration threshold by language
How many independent domains a brand needs for the model to recommend it stably.
But it's a bet, not a headline. And there's serious evidence pointing the other way: models mix all the world's Spanish into the same answer, hallucinate more when they have less data in your language, and pull from English sources when local ones are scarce. Any of those three things can leave me with my pants down. I'll know when I have the data, not before.
And that's the only thing I promise: when I have it, I'll publish it whole. Methodology up front, limits listed before conclusions, dataset on the table for anyone to try to break. If I was wrong, you'll read it here, with my name next to it. I'd rather be the one who showed you his bet and the complete proof than the one who sold you a certainty that doesn't exist. That's how you build something to be trusted. And that's how this sector should work and doesn't.
In the meantime, how to tell who knows from who improvises
Since the experiment will take a few weeks and you might have a GEO proposal on the table this very afternoon, here's the only filter you need. Three questions. If the provider doesn't pass all three, it's a lemon.
Do they have their own data or do they cite other people's? If everything they show you are translated Anglo studies, they're selling you someone else's playbook. Ask them what they've measured themselves, in Spanish, with their own methodology.
Do they promise ranking or talk about probability of appearance? Whoever guarantees you "number one in ChatGPT" either hasn't read Fishkin or is counting on you not having. Whoever talks about frequency of mention over many executions knows what this is about.
Do they start with foundations or glitter? If the first thing they propose is synthetic FAQs and massive content refreshes to "boost citations", run. Those are exactly the tactics that can sink your organic. Whoever knows starts with real authority: actually being on sites worth citing.
I'm not going to tell you GEO doesn't work. It works, and a lot, and whoever doesn't take it seriously in three years will regret it. What I'm telling you is that almost everything being sold today in Spanish is built with a map of another city, and that honesty in this sector is so scarce that it has become, all on its own, a competitive advantage.
We're going to measure the map of this city. When we have it, I'll show you.
Asier López Ruiz is CEO of Elevam, a pioneering SEO and GEO firm in Spain. This article is published together with a downloadable annex containing all the sources and the methodology of the literature review on which it is based.


