AI Readiness Entity Disambiguation

A few weeks ago I wrote about why technical SEO needs an AI layer and walked through the schema validator, AI crawler audit, and llms.txt checker we had built into SiteVitals. The response was encouraging - but the most common question I got was some variant of: "My schema passes your checks, but I'm still not showing up in AI answers. What else can I do?"

And the honest answer was that our AI readiness scoring, while ahead of most monitoring tools, was still checking for the presence of structured data rather than the quality of the signals within it. A site could have valid schema, an llms.txt file, and all AI crawlers permitted - and still be invisible to AI recommendation engines because the structured data lacked the depth those engines actually use to make citation decisions.

So we fixed that. Today's update adds six new checks to the SiteVitals SEO scanner, all focused on the specific signals that influence whether AI systems choose to cite your brand. These are not speculative. They are grounded in what changed after Google's March 2026 core update, which produced the most significant shift in structured data strategy since rich snippets were introduced.

Here is what we added and why each one matters.

1. Entity Disambiguation: sameAs Authority Link Verification

We already checked whether your schema blocks included sameAs links pointing to authority domains - Wikipedia, Wikidata, LinkedIn, Crunchbase, Companies House, and so on. What we were not doing was checking whether those links actually worked.

This turns out to matter more than you would think. A sameAs link pointing to a deleted LinkedIn company page or a non-existent Wikidata entity is worse than having no sameAs at all. It actively confuses the entity disambiguation process - the AI system tries to resolve your identity against a dead endpoint and either fails silently or, worse, matches you to the wrong entity.

The scanner now sends a request to every sameAs URL in your schema (up to 20, batched concurrently so it does not slow the scan down meaningfully). We report three states: verified, redirected, or broken. Redirected links get a gentle nudge to update to the final destination. Broken links get a clear warning, because they are actively harming your AI discoverability.

If you are managing sites for clients, this is the kind of thing that breaks silently. A company rebrands and their LinkedIn URL changes. Someone deletes a Wikipedia page for notability reasons. The schema still validates perfectly - the link just goes nowhere. SiteVitals will now catch it on the next scan.

2. knowsAbout: Declaring Topical Authority

This is a property that most site owners have never heard of, but it has become one of the most impactful entity signals available since the March 2026 update.

knowsAbout is a schema.org property you can add to Organization or Person types to declare the topics and subject areas your entity has expertise in. When an AI system is deciding which sources to cite for a query about, say, website security monitoring - it looks for organizations whose structured data explicitly declares that as a domain of expertise.

An Organization schema that declares "knowsAbout": ["website monitoring", "technical SEO", "website security"] is more likely to be cited for queries in those domains than an equivalent organization with no topic declarations at all. The AI does not have to infer your expertise from page content alone - you have stated it explicitly in a format it can parse instantly.

SiteVitals now checks whether your Organization, Corporation, LocalBusiness, or Person schema blocks include knowsAbout, and reports the topics declared. If it is missing, you will see a recommendation explaining what it does and why it is worth adding.

3. Author/Publisher E-E-A-T Chain Depth

If you publish articles, blog posts, or any content that carries a byline, this one is for you.

AI systems do not just look at whether an article has an author declared in schema. They evaluate the depth of the chain: does the Article link to an author entity (a Person with a name, a URL, and their own sameAs authority links)? Does it link to a publisher entity (an Organization with a name, a logo, and its own sameAs links)?

This is the E-E-A-T chain - Experience, Expertise, Authoritativeness, Trustworthiness - expressed as structured data. A "strong" chain means the AI can trace the article back to a verified person at a verified organization. A "weak" chain means the author is a plain string like "author": "Tom" - technically present, but it tells the AI nothing about who Tom is or why he should be trusted on this topic.

The scanner now evaluates this chain for every Article, BlogPosting, NewsArticle, HowTo, Review, and CreativeWork block, scoring it as strong, partial, or weak with specific guidance on what is missing. We also check whether the author entity itself declares knowsAbout - because an author with declared topical expertise is a stronger citation signal than one without.

4. @id Graph Linking

JSON-LD schema can be written as a collection of disconnected blocks or as a connected graph where entities reference each other via @id values. The difference matters for AI interpretation.

When your schema uses @id identifiers and a @graph structure, it behaves like a small internal knowledge graph. An Article can reference its author as {"@id": "#tom"}, and a separate Person block with "@id": "#tom" provides the full detail. The AI does not have to guess whether the Tom in your article is the same Tom in your Organization - the @id makes the connection explicit.

Disconnected anonymous blocks - where the author is an inline object repeated on every article page - provide weaker signals. The AI has to do more work to resolve the entity, and it may not bother.

SiteVitals now checks whether your schema blocks use @id values, and whether cross-references between blocks (like author, publisher, provider) resolve correctly within the graph. If you have an @id reference pointing to an entity that does not exist in the graph, you will see a warning with the specific broken reference.

5. Schema-Content Drift Detection

We already had a check (since the last update) that compared your <title> tag against the name property in your WebPage schema block. If they did not match, it was flagged - because a mismatch between what users see and what structured data declares erodes trust with both search engines and AI systems.

This update extends that principle. We now check Organization name consistency and will flag cases where your structured data tells a different story to your visible page content. This kind of "schema drift" happens more often than you would expect - a business rebrands and updates the website copy, but nobody touches the JSON-LD template. The site looks correct to a human visitor, but the AI sees a contradiction and trusts the page less as a result.

6. Content Extractability for AI

This last check is different from the others. It is not about schema - it is about your page structure.

AI systems build answers by breaking complex questions into smaller sub-queries, searching for each one, and extracting relevant fragments from the sources they find. The easier your content is to extract from, the more likely you are to be cited.

Two structural patterns make a significant difference:

Question-form headings. H2 and H3 tags that start with "What", "How", "Why", "When", or end with a question mark map directly to the sub-queries AI systems generate. A heading like "How does website monitoring work?" is far more extractable than "Website Monitoring Overview" - because the AI is literally searching for "how does website monitoring work" and your heading matches the query shape exactly.

Paragraph length. Very long paragraphs (over 300 words) are harder for AI to extract clean citations from. The system cannot isolate the relevant 2-3 sentences without pulling in a lot of surrounding context, so it tends to prefer sources with shorter, more focused paragraphs. This does not mean you should write in bullet points - it means you should break your prose at natural topic boundaries.

SiteVitals now analyses your H2 and H3 headings for question-form patterns and measures paragraph lengths within your main content area (we look for <main>, <article>, or [role="main"] containers first, falling back to <body> if no semantic container is found). The results appear as a new "Content Extractability" check on your SEO dashboard.

What This Means for Your AI Readiness Score

Your existing AI readiness score in SiteVitals was based on entity clarity (name + URL + description), sameAs presence, speakable markup, and FAQ/QA schema. All of those checks still exist and still matter.

The score now also considers:

Whether your sameAs links actually resolve (not just whether they exist)
Whether your primary entity declares knowsAbout
Whether your content entities have a strong author/publisher chain
Whether your schema uses @id for graph linking

A site can still pass all traditional SEO checks and score poorly on these new signals. That gap - between "technically valid" and "AI-ready" - is exactly what this update is designed to close.

All of This Runs Automatically

These checks are now part of the standard SEO & AI Visibility scan in SiteVitals. If you are on a plan that includes SEO monitoring, your next scheduled scan will include all six new checks with no configuration changes needed. Historical trend tracking has been updated too, so you will be able to see your entity disambiguation and content extractability signals improve over time as you make changes.

If you are not yet monitoring with SiteVitals, you can run a one-off scan to see where you stand.

Run a free SEO and AI readiness check on your website →

What to Do Next

If you run the scan and everything is green, you are in a strong position. If you see warnings - and most sites will - here is a rough priority order:

First, fix broken sameAs links. These are actively hurting you. Update URLs that have changed, remove any that point to deleted pages.

Second, add knowsAbout to your Organization or Person schema. This is a five-minute edit to your JSON-LD template with an outsized impact on AI citation selection.

Third, strengthen your author/publisher chain. Make sure article authors are Person objects (not strings) with their own sameAs links and knowsAbout declarations. Make sure the publisher Organization has a logo.

Fourth, consider restructuring content headings into question form where it makes sense naturally. Do not force it - but where a heading like "Pricing" could just as easily be "How much does it cost?", the question form is strictly better for AI extractability.

The @id graph linking is worth implementing but requires more structural changes to your JSON-LD, so it is a good candidate for a future sprint rather than an immediate fix.

As always, if something does not look right or you have questions about what the scanner is flagging, get in touch. We are a small team and we actually respond.

By Tom Freeman · Co-Founder & Lead Developer

Full-stack developer specialising in high-performance web applications and automated monitoring.