Generative Engine Optimization (GEO): metrics glossary and concepts
What is Generative Engine Optimization (GEO)?
Generative Engine Optimization (GEO) is the practice of optimizing for generative AI engines such as ChatGPT, Gemini, Perplexity, and Google AI Mode to increase the probability of brand mention, citation, and correct URL attribution.
The term Generative Engine Optimization (GEO) is used in digital marketing to describe optimization specifically designed for generative AI engines.
GEO does not solely target traditional search rankings. Its goal is verifiable presence within responses generated by artificial intelligence models.
At Elevam, GEO is applied in practice through our GEO Agency service.
What does GEO mean in this context?
In digital marketing, GEO stands for Generative Engine Optimization.
In this context, GEO:
- Does not refer to geopolitics.
- Does not refer to police forces.
- Does not refer to geolocation.
It is used exclusively to describe optimization for generative AI engines.
Relationship between GEO and the HSA Protocol
The HSA Protocol is Elevam's methodology for applying Generative Engine Optimization in a structured and measurable way.
While GEO defines the conceptual framework, HSA establishes the practical rules for implementing and evaluating it through a quarterly baseline and benchmark system.
| Focus | Optimises | Main objective |
|---|---|---|
| SEO | Optimises:Search engines | Main objective:Organic traffic |
| AEO | Optimises:Answer engines | Main objective:Direct answers and citable snippets |
| GEO | Optimises:Generative engines (LLMs) | Main objective:Mention, citation, and correct URL attribution |
Baseline GEO headline metrics
- SoR — Share of Recommendation
- Average strength with which AI engines recommend the brand, weighted by recommendation intensity test by test. Headline metric of the Antropus GEO visibility canon. High SoR means clear recommendations; low SoR means weak or ambiguous mentions.
- SoM — Share of Mentions
- % of valid tests where the AI mentioned the brand with the entity correctly identified. First step of the recommendation funnel. The gap mentionRate − SoM reveals how many mentions refer to the wrong entity.
- CS — Citation Share (BCR)
- % of valid tests where the AI cited at least one source from the brand's domain. Citation headline of canon v1.2. Gap CR − CS = the AI cites sources in the sector, but not yours.
- CR — Citation Rate
- % of tests where the AI included any cited source, owned or third-party. Measures engine behaviour, not brand visibility. High CR does not mean the AI cites you. For brand citation visibility, use CS.
- R — URL Rate
- % of tests with a mapped canonical URL where the cited URL matches the expected URL for the prompt's intent. High CS with low R: the AI cites you, but pointing to an irrelevant page.
- Top 3 — Top 3 Rate
- % of tests with an ordered ranking where the brand appears in positions 1-3. Only explicitly ordered lists count: numbered, markdown tables, or lexical ordinals. Returns «—» if the engine produces no rankings for the sector.
- Hallucination Rate
- % of visible tests where the AI produced a hallucination about the brand. Classified into six types: entity confusion, acronym confusion, misattribution, topic drift, invented factual, and none. Lower is better.
Control, risk, and diagnostics
- Fragile Recommendation Rate
- % of tests with a strong recommendation (rsw ≥ 0.75) accompanied by false information. The highest reputational risk scenario: the AI recommends the brand clearly but with invented data. Lower is better.
- Validity Rate
- % of applicable tests that were methodologically valid. A low value (< 70%) indicates recurring entity confusion or a poorly calibrated prompt set. Quality control metric of the baseline.
- Triggering Rate
- In SERP-generative mode (Google AI Mode, AI Overview), % of prompts that triggered an AI response. Below 50%, the UI shows a low-activation warning. Always null for chat-LLM engines (ChatGPT, Claude, Perplexity…).
- Mention Rate
- % of tests where the AI names the brand, without applying the correct entity filter. Raw diagnostic: the gap mentionRate − SoM measures entity collision rate (the brand is named but confused with another).
Recommendation funnel
- Funnel Stage 1 — Valid mention
- Funnel layer 1. % of visible tests where the AI mentioned the brand with the correct entity. Equivalent to SoM. Represents funnel entry: does the AI talk about you without entity ambiguity?
- Funnel Stage 2 — Mention + owned source
- Funnel layer 2. Of layer-1 tests, % where the AI also cited at least one source from the brand's domain. The gap Stage 1 − Stage 2 reveals how often you are mentioned without being cited.
- Funnel Stage 3 — Mention + owned source + correct URL
- Funnel layer 3. Of layer-2 tests, % where the cited URL is the correct one for the prompt's intent. Full delivery: valid mention, owned citation, and intent-appropriate URL.
Source support and source share
- SoS — Share of Sources
- Of all sources cited by the AI in the baseline, % belonging to the brand's domain. Measures volume share, not per-test presence. High CS with low SoS: cited with one URL while competitors are cited with several.
- Owned Source Share
- % of tests where the AI cites only owned sources, no third parties. High = narrative control. Can also indicate isolation from external consensus and lack of independent validation.
- Third Party Dependency
- % of tests where the AI mentions the brand relying solely on third-party sources. Signals that owned content is not being recognised as an authoritative source by the engines.
- Unsupported Positive Rate
- % of tests with positive sentiment and no source cited. Fragile visibility: without a documentary anchor, the position relies on model memory and may disappear between measurement cycles.
- Source Dispersion Index
- Average of unique owned URLs cited per intent. A value of 1.0 indicates a clear canonical page per intent; higher values signal content dispersion or the absence of a definitive reference page.
- Competitor Leakage Rate
- % of tests where the AI cites URLs from competitors configured in the project. Measures direct attention leakage towards competitors in sector-related responses.
GEO mental model and retrievability
- Signals → retrieval → citation → synthesis → shortlist
- The complete chain for how a brand comes to be recommended by an AI engine. Each link is a necessary condition for the next: without signals there is no retrieval; without retrieval there is no citation.
- Shortlist
- The set of 3-5 options the AI decides to include in its response. Entering the shortlist is the operational goal of GEO: more decisive than classic SERP ranking for high-intent queries.
- Retrievability
- The condition of the RAG system being able to find and retrieve your content when needed. A prerequisite for any citation: if not retrieved, not cited.
- Citable
- Content that contains data, figures, claims, or unique advantages specific enough to merit being referenced as a source in a generative response.
- RAG-friendly
- Pages designed to be retrieved and used as factual support in RAG pipelines: clear structure, semantic chunks, verifiable data, and entity-content consistency.
- Co-mentions
- The brands, entities, or concepts the AI names alongside yours in the same context. Visualising co-mentions as a graph reveals competitive clusters, category associations, and contextual influence.
- HSA — Human · Search · AI Framework
- Elevam's methodology for applying GEO in a structured and measurable way. Combines five components (SoM, shortlist position, sentiment, citations coverage, and competitive analysis) with defined weights and quarterly measurement cycles.
LLM engineering and semantic retrieval
- RAG — Retrieval-Augmented Generation
- Architecture in which the AI model, before generating a response, retrieves relevant fragments from an external document base. The mechanism that allows your content to be used as factual support.
- Embeddings
- Numerical vector representation of a text that captures its semantic meaning. The distance between your content's embedding and a prompt's embedding determines whether your source is retrieved in the RAG process.
- Vector index
- Specialised database where embeddings are stored and similarity searches are performed. The infrastructure on which the RAG retrieval step operates.
- Eligibility
- The condition of being technically and semantically ready to be retrieved and cited: correct indexing, clear structure, and consistency between the brand entity and the content describing it.
GEO experiment design
- Fixed prompt dataset
- A stable set of prompts kept unchanged between measurements to enable valid comparison over time. Without dataset consistency, there is no valid comparison.
- Intent-based prompt set
- Informational, comparative, and transactional prompts covering the user decision funnel. Each intent type may trigger different engine behaviours.
- Exact prompt
- Traceability rule: the logged prompt must be identical to the one sent to the engine. Paraphrasing or summarising invalidates the temporal comparison.
- Quarterly benchmark
- Full dataset re-run every quarter to detect changes in brand visibility across engines. Distinguishes genuine trends from one-off noise.
- Multi-engine
- Running the baseline on at least 3 different engines or platforms to avoid single-provider bias. Each engine has different weights, sources, and behaviour.
- Structured logging
- Saving each result as structured data (CSV, JSON) with prompt, full response, cited sources, and engine metadata. Screenshots are not data.
- Update log
- Dated log of observed changes in models, response tone, or preferred sources. Without documented evidence, no update is valid.
- Example library
- Centralised repository of prompt → response → cited sources pairs. Knowledge base for auditing the evolution of engine behaviour in the sector.
Controllability, rigour, and automation
- Controllability framework
- Classification of variables into three levels: you control (your content, structure, schema markup), you influence (third-party narrative, co-mentions), and you cannot control (model weights, temperature, engine citation policy).
- Probabilistic model
- AI engines are non-deterministic: the same prompt can produce different responses. Measuring with method — same prompts, same engine, temporal comparison — is the only way to detect real changes.
- Evidence threshold
- Criterion for deciding whether a metric change is real or statistical noise. Requires dataset repetition and structured logging before concluding that something has improved or degraded.
- GEO pipeline
- Automated system to run the prompt set, log responses, parse metrics, and report results without manual intervention. Eliminates human error and scales measurement.
- GEO observability
- Continuous monitoring of engine behaviour and the measurement pipeline itself, with alerts when responses or cited sources change anomalously.
- Regression testing
- Full dataset re-run after model updates to detect metric degradations before they affect clients. The GEO visibility equivalent of CI/CD regression tests.
Competitor Intelligence — per-response analysis
- BrandMention
- Structure produced by the analyzer for each response and each evaluated brand. Contains: whether mentioned, ranking position, sentiment (−1 to 1), cited URLs, and up to three descriptive attributes extracted from the response. The base component for all CI study aggregation.
- EmergingBrand
- Brand detected by the AI in a response that was not configured in the study's competitor list. Allows discovery of competitors the AI already recognises in the sector but that the project had not been tracking. The top 12 by mention count are displayed.
- ConfusionBrand
- Emerging brand that matches entities marked as 'not me' in the project's entity profile. Quantifies how actively the AI is confusing the brand with another unwanted entity. A signal of an acute brand identity problem.
- CompetitorResponseType
- Classification of the AI response type into six buckets: provider_shortlist, comparison, educational_no_brands, generic_advice, category_mismatch, and refusal_or_no_recommendation. If most responses are generic_advice or refusal, any brand's SoV will be structurally low — not a sign of poor brand visibility.
- AcronymAmbiguity
- Detection that the AI used an acronym with a different meaning than the one expected in the entity profile. Critical in sectors where the same acronym spans multiple markets: 'GEO' can mean Generative Engine Optimization or Geographic Information.
- MethodHallucination
- Incorrect use by the AI of a technical term configured in the project's entity profile. Detects vocabulary deformation: if the AI explains 'HSA Protocol' with a different definition than the project's, it is misleading the end client. Severity: critical / warning / info.
- EntityResolutionResult
- Granular resolution of which entity the AI recognised when discussing the brand. Six states: correct_entity, not_mentioned, entity_confusion, ambiguous_entity, wrong_domain, and insufficient_evidence. Only correct_entity counts as a valid mention in SoV — the gating that prevents inflating the metric with identity confusions.
- sourceReliabilityScore
- 0-100 reliability score for a response based on the nature of cited sources. Logic: memory mode without sources → 0; opaque URLs → 50; mix of opaque and real → 60; auditable URLs → 75. Feeds the Reliability Weight that modulates SoV in weighted mode. Never reaches 100: no AI engine today guarantees full reliability.
Competitor Intelligence — aggregated visibility and comparator
- SoV — Share of Voice
- Distribution of mentions across the brands in the CI study. Per brand: mentionRate, average ranking position, average sentiment, and total appearances. Switchable between raw mode (direct mention count) and weighted (weighted by sourceReliabilityScore). Headline metric of the Competitor Intelligence module.
- Reliability Weight
- 0-1 weight applied to each analysis based on its sourceReliabilityScore: score ≥ 75 → 1.0; 50-74 → 0.7; < 50 → 0.4. Memory-mode responses → 1.0 (no penalty — no sources to evaluate). An internal multiplier: not shown to the client, but determines the weighted SoV.
- Reliability Summary
- Study-level reliability aggregate: average sourceReliabilityScore, count of analyses with score ≥ 75 (auditable), and total analyses. If the average drops below 50, the UI shows a visible caveat about the study's conclusions.
- ResponseType Distribution
- Count of analyses per CompetitorResponseType bucket. Macro diagnostic: if the distribution is dominated by refusal_or_no_recommendation or generic_advice, the AI avoids recommending in that sector — any low SoV is structural, not a brand visibility failure.
- BrandDelta
- Difference in SoM per brand between two consecutive CI studies. Scope-aware: distinguishes 'dropped to zero' (measured in both studies, genuine visibility loss) from 'not measured in study B' (configuration change). Out-of-scope brands return null, not 0.
- ResponseTypeDelta
- Difference in CompetitorResponseType distribution between two studies. Detects shifts in how the AI responds in the sector: a rise in refusal signals increasing engine caution; a rise in provider_shortlist signals more recommending activity.
- ConfigDivergence
- Detection of different configuration between two compared studies: different brand lists or different analysis counts. When applicable, the UI shows a CrossConfigWarning banner. Honesty safeguard: the client cannot read a BrandDelta without knowing whether the two studies are comparable.
- Comparator Buckets
- Classification of brands by the change observed between study A and B into four buckets: droppedToZeroInB (fell off the radar, real loss), roseFromZeroInB (emerged between measurements), outOfScopeInB (in A but not B; configuration change), newInScopeOfB (not in A but in B; added to tracking).