Skip to content
GEO Glossary

Generative Engine Optimization (GEO): metrics glossary and concepts

What is GEO

What is Generative Engine Optimization (GEO)?

Generative Engine Optimization (GEO) is the practice of optimizing for generative AI engines such as ChatGPT, Gemini, Perplexity, and Google AI Mode to increase the probability of brand mention, citation, and correct URL attribution.

The term Generative Engine Optimization (GEO) is used in digital marketing to describe optimization specifically designed for generative AI engines.

GEO does not solely target traditional search rankings. Its goal is verifiable presence within responses generated by artificial intelligence models.

At Elevam, GEO is applied in practice through our GEO Agency service.

What it means

What does GEO mean in this context?

In digital marketing, GEO stands for Generative Engine Optimization.

In this context, GEO:

  • Does not refer to geopolitics.
  • Does not refer to police forces.
  • Does not refer to geolocation.

It is used exclusively to describe optimization for generative AI engines.

Relationship with HSA

Relationship between GEO and the HSA Protocol

The HSA Protocol is Elevam's methodology for applying Generative Engine Optimization in a structured and measurable way.

While GEO defines the conceptual framework, HSA establishes the practical rules for implementing and evaluating it through a quarterly baseline and benchmark system.

SEOOptimises:Search enginesMain objective:Organic traffic
AEOOptimises:Answer enginesMain objective:Direct answers and citable snippets
GEOOptimises:Generative engines (LLMs)Main objective:Mention, citation, and correct URL attribution
01

Baseline GEO headline metrics

SoR — Share of Recommendation
Average strength with which AI engines recommend the brand, weighted by recommendation intensity test by test. Headline metric of the Antropus GEO visibility canon. High SoR means clear recommendations; low SoR means weak or ambiguous mentions.
SoM — Share of Mentions
% of valid tests where the AI mentioned the brand with the entity correctly identified. First step of the recommendation funnel. The gap mentionRate − SoM reveals how many mentions refer to the wrong entity.
CS — Citation Share (BCR)
% of valid tests where the AI cited at least one source from the brand's domain. Citation headline of canon v1.2. Gap CR − CS = the AI cites sources in the sector, but not yours.
CR — Citation Rate
% of tests where the AI included any cited source, owned or third-party. Measures engine behaviour, not brand visibility. High CR does not mean the AI cites you. For brand citation visibility, use CS.
R — URL Rate
% of tests with a mapped canonical URL where the cited URL matches the expected URL for the prompt's intent. High CS with low R: the AI cites you, but pointing to an irrelevant page.
Top 3 — Top 3 Rate
% of tests with an ordered ranking where the brand appears in positions 1-3. Only explicitly ordered lists count: numbered, markdown tables, or lexical ordinals. Returns «—» if the engine produces no rankings for the sector.
Hallucination Rate
% of visible tests where the AI produced a hallucination about the brand. Classified into six types: entity confusion, acronym confusion, misattribution, topic drift, invented factual, and none. Lower is better.
02

Control, risk, and diagnostics

Fragile Recommendation Rate
% of tests with a strong recommendation (rsw ≥ 0.75) accompanied by false information. The highest reputational risk scenario: the AI recommends the brand clearly but with invented data. Lower is better.
Validity Rate
% of applicable tests that were methodologically valid. A low value (< 70%) indicates recurring entity confusion or a poorly calibrated prompt set. Quality control metric of the baseline.
Triggering Rate
In SERP-generative mode (Google AI Mode, AI Overview), % of prompts that triggered an AI response. Below 50%, the UI shows a low-activation warning. Always null for chat-LLM engines (ChatGPT, Claude, Perplexity…).
Mention Rate
% of tests where the AI names the brand, without applying the correct entity filter. Raw diagnostic: the gap mentionRate − SoM measures entity collision rate (the brand is named but confused with another).
03

Recommendation funnel

Funnel Stage 1 — Valid mention
Funnel layer 1. % of visible tests where the AI mentioned the brand with the correct entity. Equivalent to SoM. Represents funnel entry: does the AI talk about you without entity ambiguity?
Funnel Stage 2 — Mention + owned source
Funnel layer 2. Of layer-1 tests, % where the AI also cited at least one source from the brand's domain. The gap Stage 1 − Stage 2 reveals how often you are mentioned without being cited.
Funnel Stage 3 — Mention + owned source + correct URL
Funnel layer 3. Of layer-2 tests, % where the cited URL is the correct one for the prompt's intent. Full delivery: valid mention, owned citation, and intent-appropriate URL.
04

Source support and source share

SoS — Share of Sources
Of all sources cited by the AI in the baseline, % belonging to the brand's domain. Measures volume share, not per-test presence. High CS with low SoS: cited with one URL while competitors are cited with several.
Owned Source Share
% of tests where the AI cites only owned sources, no third parties. High = narrative control. Can also indicate isolation from external consensus and lack of independent validation.
Third Party Dependency
% of tests where the AI mentions the brand relying solely on third-party sources. Signals that owned content is not being recognised as an authoritative source by the engines.
Unsupported Positive Rate
% of tests with positive sentiment and no source cited. Fragile visibility: without a documentary anchor, the position relies on model memory and may disappear between measurement cycles.
Source Dispersion Index
Average of unique owned URLs cited per intent. A value of 1.0 indicates a clear canonical page per intent; higher values signal content dispersion or the absence of a definitive reference page.
Competitor Leakage Rate
% of tests where the AI cites URLs from competitors configured in the project. Measures direct attention leakage towards competitors in sector-related responses.
05

GEO mental model and retrievability

Signals → retrieval → citation → synthesis → shortlist
The complete chain for how a brand comes to be recommended by an AI engine. Each link is a necessary condition for the next: without signals there is no retrieval; without retrieval there is no citation.
Shortlist
The set of 3-5 options the AI decides to include in its response. Entering the shortlist is the operational goal of GEO: more decisive than classic SERP ranking for high-intent queries.
Retrievability
The condition of the RAG system being able to find and retrieve your content when needed. A prerequisite for any citation: if not retrieved, not cited.
Citable
Content that contains data, figures, claims, or unique advantages specific enough to merit being referenced as a source in a generative response.
RAG-friendly
Pages designed to be retrieved and used as factual support in RAG pipelines: clear structure, semantic chunks, verifiable data, and entity-content consistency.
Co-mentions
The brands, entities, or concepts the AI names alongside yours in the same context. Visualising co-mentions as a graph reveals competitive clusters, category associations, and contextual influence.
HSA — Human · Search · AI Framework
Elevam's methodology for applying GEO in a structured and measurable way. Combines five components (SoM, shortlist position, sentiment, citations coverage, and competitive analysis) with defined weights and quarterly measurement cycles.
06

LLM engineering and semantic retrieval

RAG — Retrieval-Augmented Generation
Architecture in which the AI model, before generating a response, retrieves relevant fragments from an external document base. The mechanism that allows your content to be used as factual support.
Embeddings
Numerical vector representation of a text that captures its semantic meaning. The distance between your content's embedding and a prompt's embedding determines whether your source is retrieved in the RAG process.
Vector index
Specialised database where embeddings are stored and similarity searches are performed. The infrastructure on which the RAG retrieval step operates.
Eligibility
The condition of being technically and semantically ready to be retrieved and cited: correct indexing, clear structure, and consistency between the brand entity and the content describing it.
07

GEO experiment design

Fixed prompt dataset
A stable set of prompts kept unchanged between measurements to enable valid comparison over time. Without dataset consistency, there is no valid comparison.
Intent-based prompt set
Informational, comparative, and transactional prompts covering the user decision funnel. Each intent type may trigger different engine behaviours.
Exact prompt
Traceability rule: the logged prompt must be identical to the one sent to the engine. Paraphrasing or summarising invalidates the temporal comparison.
Quarterly benchmark
Full dataset re-run every quarter to detect changes in brand visibility across engines. Distinguishes genuine trends from one-off noise.
Multi-engine
Running the baseline on at least 3 different engines or platforms to avoid single-provider bias. Each engine has different weights, sources, and behaviour.
Structured logging
Saving each result as structured data (CSV, JSON) with prompt, full response, cited sources, and engine metadata. Screenshots are not data.
Update log
Dated log of observed changes in models, response tone, or preferred sources. Without documented evidence, no update is valid.
Example library
Centralised repository of prompt → response → cited sources pairs. Knowledge base for auditing the evolution of engine behaviour in the sector.
08

Controllability, rigour, and automation

Controllability framework
Classification of variables into three levels: you control (your content, structure, schema markup), you influence (third-party narrative, co-mentions), and you cannot control (model weights, temperature, engine citation policy).
Probabilistic model
AI engines are non-deterministic: the same prompt can produce different responses. Measuring with method — same prompts, same engine, temporal comparison — is the only way to detect real changes.
Evidence threshold
Criterion for deciding whether a metric change is real or statistical noise. Requires dataset repetition and structured logging before concluding that something has improved or degraded.
GEO pipeline
Automated system to run the prompt set, log responses, parse metrics, and report results without manual intervention. Eliminates human error and scales measurement.
GEO observability
Continuous monitoring of engine behaviour and the measurement pipeline itself, with alerts when responses or cited sources change anomalously.
Regression testing
Full dataset re-run after model updates to detect metric degradations before they affect clients. The GEO visibility equivalent of CI/CD regression tests.
09

Competitor Intelligence — per-response analysis

BrandMention
Structure produced by the analyzer for each response and each evaluated brand. Contains: whether mentioned, ranking position, sentiment (−1 to 1), cited URLs, and up to three descriptive attributes extracted from the response. The base component for all CI study aggregation.
EmergingBrand
Brand detected by the AI in a response that was not configured in the study's competitor list. Allows discovery of competitors the AI already recognises in the sector but that the project had not been tracking. The top 12 by mention count are displayed.
ConfusionBrand
Emerging brand that matches entities marked as 'not me' in the project's entity profile. Quantifies how actively the AI is confusing the brand with another unwanted entity. A signal of an acute brand identity problem.
CompetitorResponseType
Classification of the AI response type into six buckets: provider_shortlist, comparison, educational_no_brands, generic_advice, category_mismatch, and refusal_or_no_recommendation. If most responses are generic_advice or refusal, any brand's SoV will be structurally low — not a sign of poor brand visibility.
AcronymAmbiguity
Detection that the AI used an acronym with a different meaning than the one expected in the entity profile. Critical in sectors where the same acronym spans multiple markets: 'GEO' can mean Generative Engine Optimization or Geographic Information.
MethodHallucination
Incorrect use by the AI of a technical term configured in the project's entity profile. Detects vocabulary deformation: if the AI explains 'HSA Protocol' with a different definition than the project's, it is misleading the end client. Severity: critical / warning / info.
EntityResolutionResult
Granular resolution of which entity the AI recognised when discussing the brand. Six states: correct_entity, not_mentioned, entity_confusion, ambiguous_entity, wrong_domain, and insufficient_evidence. Only correct_entity counts as a valid mention in SoV — the gating that prevents inflating the metric with identity confusions.
sourceReliabilityScore
0-100 reliability score for a response based on the nature of cited sources. Logic: memory mode without sources → 0; opaque URLs → 50; mix of opaque and real → 60; auditable URLs → 75. Feeds the Reliability Weight that modulates SoV in weighted mode. Never reaches 100: no AI engine today guarantees full reliability.
10

Competitor Intelligence — aggregated visibility and comparator

SoV — Share of Voice
Distribution of mentions across the brands in the CI study. Per brand: mentionRate, average ranking position, average sentiment, and total appearances. Switchable between raw mode (direct mention count) and weighted (weighted by sourceReliabilityScore). Headline metric of the Competitor Intelligence module.
Reliability Weight
0-1 weight applied to each analysis based on its sourceReliabilityScore: score ≥ 75 → 1.0; 50-74 → 0.7; < 50 → 0.4. Memory-mode responses → 1.0 (no penalty — no sources to evaluate). An internal multiplier: not shown to the client, but determines the weighted SoV.
Reliability Summary
Study-level reliability aggregate: average sourceReliabilityScore, count of analyses with score ≥ 75 (auditable), and total analyses. If the average drops below 50, the UI shows a visible caveat about the study's conclusions.
ResponseType Distribution
Count of analyses per CompetitorResponseType bucket. Macro diagnostic: if the distribution is dominated by refusal_or_no_recommendation or generic_advice, the AI avoids recommending in that sector — any low SoV is structural, not a brand visibility failure.
BrandDelta
Difference in SoM per brand between two consecutive CI studies. Scope-aware: distinguishes 'dropped to zero' (measured in both studies, genuine visibility loss) from 'not measured in study B' (configuration change). Out-of-scope brands return null, not 0.
ResponseTypeDelta
Difference in CompetitorResponseType distribution between two studies. Detects shifts in how the AI responds in the sector: a rise in refusal signals increasing engine caution; a rise in provider_shortlist signals more recommending activity.
ConfigDivergence
Detection of different configuration between two compared studies: different brand lists or different analysis counts. When applicable, the UI shows a CrossConfigWarning banner. Honesty safeguard: the client cannot read a BrandDelta without knowing whether the two studies are comparable.
Comparator Buckets
Classification of brands by the change observed between study A and B into four buckets: droppedToZeroInB (fell off the radar, real loss), roseFromZeroInB (emerged between measurements), outOfScopeInB (in A but not B; configuration change), newInScopeOfB (not in A but in B; added to tracking).