Summary
Markdown became the default output format for AI tools because it was token-efficient and easy to read without rendering. A viral post by Thariq Shihipar, engineering lead for Claude Code at Anthropic, has kicked off a serious debate: as AI agents grow more powerful and context windows expand to a million tokens, is Markdown actually holding us back? For human readers, HTML's richer structure (real tables, colour, diagrams, interactive elements) may simply communicate better. At the same time, Markdown still has a clear role: it's the right format for machine-readable content, developer documentation, and the emerging llms.txt standard that helps LLMs discover and understand your website. This article explains the debate and gives site owners practical steps to make their content work for both humans and AI.
How Markdown Became the Default
For the last several years, Markdown has been the quiet lingua franca of the internet: the format developers, technical writers, and increasingly AI models reached for when communicating anything of substance. It was clean, portable, and readable without a renderer. It won the LLM format war almost by accident, and the challenge to that position is now coming from the engineers who build AI tools for a living.
Markdown's rise was driven by constraint, not choice. When large language models first became widely useful, context windows were tiny. GPT-4's original 8,192-token limit made every token precious, and Markdown's efficiency over HTML (some estimates put it at 68-87% fewer tokens for equivalent content) made it the obvious call. Developers adopted it, agents adopted it by default because they were trained on it everywhere, and the habit stuck.
The result was a kind of invisible standardisation. Configuration files, documentation, agent instructions, README files, output reports: everything flowed through .md. It became, as one analyst put it, "the air you breathe in an AI project, the default format nobody actually decided on."
The Anthropic Engineer Who Challenged Everything
That consensus got a very public challenge in early May 2026. Thariq Shihipar, the engineering lead for Claude Code at Anthropic, published a post titled Using Claude Code: The Unreasonable Effectiveness of HTML, and the AI developer community paid attention.
Within 48 hours, the post had racked up over 750,000 views, 14,000 likes, 30,000 bookmarks, and 1,600 quote posts on X. It sparked discussion across Hacker News, Threads, and LinkedIn, and prompted prominent AI commentators including Simon Willison to publicly reconsider their own defaults.
The argument Shihipar makes is straightforward. Markdown has become the dominant file format agents use to communicate with us: it's simple, portable, has some rich text capability, and is easy to edit. But as agents have become more powerful, Markdown has become a restricting format. Shihipar wanted richer visualisations, colour, and diagrams he could share easily, and started preferring HTML as an output format. Increasingly, others on the Claude Code team were doing the same.
To back up the claim with evidence rather than opinion, Shihipar published a companion site containing twenty self-contained HTML files, all generated by Claude Code, each illustrating a real use case. The examples include a pull-request review page with inline margin notes, a rate-limit diagram with sliders that recalculate the visualisation when users adjust inputs, a backlog view with a JSON export button, and a single-page weekly summary formatted for an executive recipient.
What Markdown Actually Can't Do
Shihipar's post catalogues what you lose when you force AI output into Markdown format, and the examples are concrete.
Markdown tables break on anything beyond a simple grid. HTML tables support column spans, row spans, and alignment, and can be styled to highlight the important row at a glance. In Markdown, emphasis is limited to bold, italic, and code blocks; HTML adds colour, size, spacing, borders, and layout. And when it comes to diagrams, Markdown forces Claude to resort to ASCII art: pipe characters for bar charts, unicode blocks for colour swatches, dashes for arrows. In HTML, it draws real vector graphics that are scalable and actually readable.
The more damning critique is about human attention. Shihipar puts it bluntly: "I tend to not actually read more than a 100-line markdown file, and I certainly am not able to get anyone else in my organisation to read it." This isn't a niche problem. AI agents now routinely produce 200-line implementation plans, detailed code reviews, and multi-page specifications. The output is often technically correct and structurally sound, and most of it goes unread.
The neuroscience supports this. Roughly 30% of the human cerebral cortex is dedicated to visual processing. Hearing accounts for 3%, touch for 8%. Vision is, as Andrej Karpathy put it, "the 10-lane superhighway of information into brain." Markdown barely uses it. HTML doesn't make AI agents smarter; it makes their output something people will actually look at.
The Context Window Shift That Changed the Calculation
There's a clear reason why this debate is only surfacing now. The constraint that made Markdown the sensible choice has quietly gone away.
Markdown became the default AI output format during the GPT-4 era, when context windows were tiny and every token counted. Agents have evolved since then, and context windows have grown to a million tokens. The token-efficiency argument that once decisively favoured Markdown no longer holds in the same way. The AI community tends to adopt formats in waves: someone tries something, it works, it spreads, and three years later everyone realises the obvious alternative was there all along. That's where Markdown finds itself now for agent outputs.
Simon Willison put it plainly: he had been defaulting to Markdown since the GPT-4 days, when the 8,192-token limit meant the efficiency gains over HTML were real and worth caring about. Shihipar's piece caused him to reconsider, at least for output.
What This Means for Website Owners and Content Creators
The HTML-versus-Markdown debate has practical implications beyond AI agent workflows. It touches on how websites communicate, with humans and with machines.
As LLMs become more capable, the bottleneck is less often what the model knows and more often how effectively it gets that knowledge to the person asking. The same is true in reverse: the bottleneck for AI crawlers reading your site is less often what you've written and more often whether they can actually parse it.
For website owners, this cuts in two directions. If your content is primarily consumed by other LLMs (via RAG pipelines, AI crawlers, or agent toolchains), Markdown and clean semantic HTML are both good choices. If the primary reader is a machine that doesn't render tables or care about colour, token efficiency is the thing that matters, and Markdown's constraints are strengths.
If the primary reader is a human, that logic reverses. A well-structured, visually navigable HTML page that a visitor actually reads and acts on is worth more than a wall of plain text, however well written.
Where Markdown Still Wins
Shihipar is careful about where his argument applies and where it doesn't, and it's worth being equally precise.
The practical rule is: use HTML when the document has a third-party reader who won't modify it, and Markdown when the document is collaborative, indexed, or going to be consumed by automated pipelines. Documentation consumed by other LLMs, files destined for Git history where line-by-line blame matters, and personal notes all stay better in Markdown.
HTML wins the session. Markdown wins the archive.
The Bigger Picture for Your Website's Health
For anyone responsible for a website, the real question isn't which format wins. It's which format suits which reader, and whether you're making that call deliberately or just by default.
Static, machine-readable documentation benefits from Markdown's simplicity and token efficiency. Reports, dashboards, summaries, and any communication a human needs to act on benefit from the structure and visual clarity that HTML provides. The idea that plain text is inherently more professional is a habit from the early LLM era, not a reasoned position.
Anthropic's Artifacts feature is worth noting here. When Claude generates an HTML artifact, it renders directly in a preview pane without needing to be saved and opened in a browser. The generated code is typically self-contained, with no external dependencies or build steps. It's a workflow that makes the HTML-first approach practical for everyday use, not just for developers.
A simple test: take your most commonly generated AI output (a site health report, a technical summary, a content audit) and ask for it in HTML instead. See whether you actually read it. That's Shihipar's point in practice.
So What Should Site Owners Actually Add in Markdown?
The debate above clarifies something useful: Markdown is for machine readers, HTML is for human readers. There are three specific things worth doing in Markdown right now if you want LLMs to find, understand, and recommend your content.
1. /llms.txt: Your LLM-Friendly Site Index
The llms.txt proposal, put forward in September 2024 by Jeremy Howard of Answer.AI, is a Markdown file at the root of your domain (https://yourdomain.com/llms.txt) that links to your most important content with one-line descriptions. Organisations already using it include Anthropic, Stripe, Cursor, Cloudflare, Vercel, Mintlify, and Supabase.
Think of it less like robots.txt (which controls access) and more like a curated hand-off to an AI agent: here's what we are, here are our most important pages, here's what they answer. The quality of the file reflects the quality of thinking behind it. A list of every page on your site isn't useful. A curated selection of pages that best represent what you do, with clear descriptions, is. For every page you consider including, ask: if an AI model read only this page and nothing else, would it come away with an accurate and useful understanding of your brand or product?
A note on expectations: major LLM crawlers (OpenAI, Google, and Anthropic) don't request llms.txt in any meaningful volume as of May 2026. It's not yet an SEO play. Where it does make a measurable difference is developer tooling: if your audience includes developers using Cursor or Claude Code, llms.txt improves how those agents reason about your documentation. That's a real and growing audience, and implementation costs a few hours at most.
A basic llms.txt file follows this structure:
# Your Site Name
> One sentence describing what you do and who you serve.
## About
2-3 sentences covering your content, audience, and what makes you useful.
## Key Pages
- [Homepage](https://yourdomain.com/) - Overview of products and services
- [Pricing](https://yourdomain.com/pricing) - Plan options and costs
- [Blog](https://yourdomain.com/blog) - Guides and industry commentary
## Optional
- [Contact](https://yourdomain.com/contact)
2. /llms-full.txt: Your Full Content in One File
The companion format to llms.txt. Keep a separate llms-full.txt with your top 5-10 most important pages fully rendered as Markdown. For sites where AI tools frequently access documentation or key content (developer tools, educational platforms, SaaS products), this lets AI systems ingest your full content without making individual page requests, and sidesteps JavaScript rendering issues for AI systems that can't execute JS.
AI coding assistants like Cursor and GitHub Copilot can reference your documentation directly from this file. If your product has an API or technical documentation, this is where it becomes particularly useful.
3. Clean, Semantic HTML on Your Actual Pages
This is the most impactful step, and it doesn't require adding a single .md file. Research has found that HTML retains semantic structure (headings, metadata, table layouts) that plain text or Markdown can strip away, and that this improves retrieval performance in RAG pipelines. Clean, semantic HTML is often more legible to LLMs than people assume.
The case for .md routes isn't that Markdown is inherently better than HTML. It's that most real-world HTML is so bloated with navigation, scripts, and boilerplate that the signal-to-noise ratio is poor. If your HTML is already clean and semantic, the gap narrows considerably. Fixing your existing HTML may do more for LLM visibility than adding Markdown files.
A few practical things that matter to AI crawlers specifically:
- AI crawlers like GPTBot and ClaudeBot do not execute JavaScript. If your important content is hidden behind JS (dropdowns, tabs, lazy-loaded sections), those crawlers won't see it. Key content should be in the server-rendered HTML.
- Use a logical heading hierarchy (H1, H2, H3) with no skipping. AI systems parse structure rather than reading linearly.
- Include FAQ sections on key pages. Structured question-and-answer pairs are well-suited to how LLMs synthesise answers.
- Keep navigation, sidebars, and footer links out of your core content area where possible; they add noise for any crawler.
What to Avoid
Some SEOs have started creating Markdown copies of every blog article as .md files, then linking all of them from llms.txt. This creates duplicate content without clear benefit, and Google's John Mueller has compared llms.txt to the keywords meta tag: a signal of intent, not a ranking factor.
Also avoid User-Agent sniffing to serve Markdown automatically to AI crawlers. This is cloaking (serving different content based on who's visiting) and Google penalises it. The standards-compliant alternative is Accept: text/markdown content negotiation, where the client explicitly requests the format.
Step Zero: Check Your robots.txt
Before any of the above, check your robots.txt. None of these techniques matter if you're blocking AI crawlers from reaching your site. Many default configurations disallow bots like GPTBot and ClaudeBot. A quick audit takes ten minutes.
SiteVitals' SEO and AI monitoring checks your site's AI crawler readiness automatically, including whether your llms.txt is present and whether AI bots are being blocked by your robots.txt. You can also run a free one-off SEO audit to see where you currently stand.
Conclusion: Two Jobs, Two Formats
The HTML-versus-Markdown debate is really about audience. HTML works better when the reader is human and needs to act on what they're reading. Markdown works better when the reader is a machine processing content at scale.
For website owners in 2026, the practical answer is to do both properly. Keep your HTML clean, semantic, and server-rendered so AI crawlers can read it. Add an llms.txt that points AI agents to your best content; not because Google demands it, but because the developer-tool ecosystem already uses it and the cost of doing it is low. And when you're generating AI output for a human reader, stop defaulting to Markdown because that's what the model produces. Ask for HTML and see whether the output is actually more useful.
SiteVitals monitors website health, uptime, SEO, and security, including AI discoverability signals like llms.txt presence and AI crawler access.
Further Reading
External References
- The Unreasonable Effectiveness of HTML: Thariq Shihipar
- The Unreasonable Effectiveness of HTML: Thariq Shihipar's example gallery
- Simon Willison's commentary on HTML vs Markdown for Claude output
- The official llms.txt proposal
On SiteVitals
By Tom Freeman · Co-Founder & Lead Developer
Full-stack developer specialising in high-performance web applications and automated monitoring.