Entity Schema Isn't SEO Hygiene. It's Whether AI Knows You Exist

New data from 532 surveyed URLs shows that just 2% of websites give AI systems enough structured context to reliably identify, understand, and cite them.

The Schema Debate Has a Blind Spot

The SEO industry has been arguing about schema and AI citations for months. Does JSON-LD help you get cited in AI Overviews? The Ahrefs study from May 2026 said: for pages already in the AI consideration set, adding schema produced no meaningful uplift. That finding is probably correct, and it is almost certainly measuring the wrong thing.

The debate has fixated on one moment in a multi-stage pipeline — the instant an LLM fetches a page to generate an answer. At that moment, no current AI system reads JSON-LD directly. Transformers parse tokens, not microdata tags. But Google's pipeline has five stages: crawling, indexing and entity parsing, Knowledge Graph construction, retrieval, and answer generation. Schema does its work at stages two and three, not five. As SEO analyst Gianluca Fiorelli put it, conflating "the AI ignored JSON-LD when fetching this page" with "JSON-LD has no effect on how AI systems understand this entity" is a category error.

Schema is less like placing an ad and more like registering a company. You do not register a company expecting the act of registration to immediately drive revenue. You register it because, without that registration, the company does not exist as a verifiable entity in any official system. — Gianluca Fiorelli

That framing matters practically. If AI systems are going to understand, trust, and reliably represent a brand, that brand needs to exist as a resolvable entity in machine-readable systems. Entity schema — specifically Organisation, Person, Brand, and the sameAs and knowsAbout properties that anchor them to external authority sources — is how that registration happens.

So how many websites have actually done it? We pulled the data from SiteVitals across 532 surveyed URLs to find out.

The Data

SiteVitals runs automated SEO checks against each monitored URL, including a dedicated entity_disambiguation check that evaluates the presence, completeness, and quality of entity schema. For each URL we captured the most recent check result and classified it across four dimensions: whether any entity schema was present at all, whether sameAs authority links existed, whether knowsAbout declarations were present, and whether the full combination — entity type + sameAs + knowsAbout — was in place.

The 532 URLs span a range of site types, sizes, and platforms including WordPress, Shopify, custom builds, and Canva sites.

Entity schema status across 532 surveyed URLs

Entity Data State	Count	% of 532	What it means
No entity schema at all	139 URLs	26.1%	No schema block of any kind
Entity schema present, missing both `sameAs` + `knowsAbout`	223 URLs	41.9%	Has schema but zero entity signals
Has `sameAs`, missing `knowsAbout`	95 URLs	17.9%	Halfway there
Has `knowsAbout`, missing `sameAs`	1 URL	<0.1%	Rare edge case
Broken `sameAs` link(s)	1 URL	<0.1%	Has sameAs but it's doing harm
Full gold standard (`sameAs` + `knowsAbout`)	11 URLs	2.1%	Fully resolved entity ✓

What the Numbers Actually Mean

74% of URLs have no meaningful entity identity

26% of surveyed URLs have no entity schema whatsoever. A further 42% have a schema block present — an Organisation or Person type — but without sameAs authority links or knowsAbout declarations. That combination is structurally a placeholder. It tells an AI system "something is here" but provides no means of verifying what that something is or connecting it to any external record. Combined, 68% of surveyed URLs give AI systems nothing to anchor an entity understanding to.

sameAs is the rarest signal — and the most important one

Only 20% of URLs have any sameAs links at all, and the majority of those are missing knowsAbout, making them partial implementations. Just 2% of the dataset have the full combination that enables reliable entity disambiguation: a typed entity schema, sameAs links to authoritative sources like Wikipedia, Wikidata, or LinkedIn, and knowsAbout declarations that define topical scope.

The Wells Fargo case, documented by Schema App, is instructive here. AI systems were generating incorrect information about branch closures. The fix was not better content. It was schema that gave AI systems an authoritative entity record to cite instead of inferring from stale third-party sources. Entity schema is, in that sense, brand protection infrastructure.

Content schema and entity schema are not the same thing

Much of the current industry conversation treats all JSON-LD as a single category. It is not. Article, FAQ, and HowTo schema are content signals — they describe what a page is about. Organisation, Person, Brand, and SoftwareApplication schema are entity identity signals — they describe who is publishing it and anchor that publisher to a verifiable record. The Ahrefs study pooled all schema types together, which is one reason it could not detect entity-level effects that operate across a fundamentally different timeline.

When we look at our data through this lens, the picture sharpens. The 42% of URLs with schema but no sameAs or knowsAbout have almost certainly implemented content schema — they have marked up their articles, products, or pages — without adding the entity layer that tells AI systems who is behind those pages and whether they should be trusted as a source.

The headline finding

74% of URLs have no meaningful entity identity. Just 2% have both sameAs and knowsAbout — the minimum needed for AI systems to reliably identify and cite the brand behind the content.

Why This Matters More in 2026

A 2025 survey of 400 senior B2B marketing executives found that 35% now cite GEO performance as their top success benchmark. AI-native platforms like ChatGPT and Perplexity have become the second most common source for qualified leads among B2B tech buyers. The channel shift is real.

It is also platform-fragmented in ways classical SEO was not. A Qwairy analysis of 118,000 AI responses across ChatGPT, Perplexity, Google AI Mode, and Claude found that only 11% of cited domains appeared across multiple platforms. The other 89% were platform-specific. The brands most likely to benefit across any AI platform are the ones with clear, resolvable entity records to begin with.

The practical pressure point is this: AI systems that cannot resolve a brand's entity do not cite it reliably. They either ignore it or, worse, infer about it from whatever third-party sources happen to rank. That inference is not always accurate, and it is not within the brand's control. Entity schema is one of the few mechanisms that is.

What Practitioners Should Actually Do

→ Audit for entity schema first, not last. Most schema audits start with rich results eligibility. Entity schema rarely appears on those checklists because it doesn't generate a visible SERP feature. That's exactly why 68% of surveyed URLs have gaps.
→ sameAs is non-negotiable. Wikipedia, Wikidata, LinkedIn, Companies House, and official social profiles are the authority sources that allow AI systems to cross-reference and confirm an entity's identity. Without at least two or three of these, the schema block is a declaration with no evidence.
→ Add knowsAbout to define topical scope. This is the signal that tells AI systems what queries this entity should be considered for. Only 20% of URLs in our dataset have any sameAs at all — and most of those still haven't added knowsAbout. The gap between "has some entity data" and "fully resolved entity" is almost always here.
→ Use @id for graph linking. Persistent identifiers allow AI systems to traverse relationships across pages and connect content to its author and publisher. Without them, schema blocks are isolated rather than forming a connected semantic layer.
→ Stop treating entity schema as infrastructure to defer. This is the work that determines whether AI systems treat a brand as a resolvable, trustworthy entity at all. It doesn't show up in 30-day citation reports, which is probably why 98% of surveyed URLs haven't finished it.

The Actual Problem

The debate about whether schema directly causes AI citations has obscured the more important question: whether AI systems can form a coherent, accurate understanding of a brand in the first place. Citation is the downstream outcome. Entity resolution is the foundation.

Our data suggests the industry has been so focused on content schema — marking up articles, products, FAQs — that the entity layer has been systematically neglected. 98% of surveyed URLs have not implemented the full combination of entity type, sameAs, and knowsAbout that makes reliable AI entity recognition possible.

That is not a content problem. It is not a technical SEO problem in the classical sense. It is a structural gap in how most sites have approached schema — building visible-result features while skipping the foundational work that tells AI systems who they are.

Lisa Freeman is the founder of SiteVitals, a website health, uptime, and SEO monitoring platform. Data in this article is drawn from SiteVitals' entity_disambiguation check across 532 actively surveyed URLs, June 2026.