AI Search · llms.txt

llms.txt: The 2026 Working Guide.

llms.txt is a plain-text file at your domain root that tells AI engines what your site is, what pages matter, and how to interpret your content. This is the practical guide: what the file does, which engines read it, how to write one in under an hour, and what separates a useful file from a placeholder.

By Vladan Mijatovic Updated May 23, 2026 ~12 min read

The short version

llms.txt is a Markdown file placed at yourdomain.com/llms.txt. It describes what your site is, lists your most important pages with one-line summaries, and names the people behind it. Anthropic's Claude reads it when browsing your domain. Perplexity's crawler indexes it as a site-context signal. Writing one takes 30 to 90 minutes and requires no code. The seven things that matter: correct root path, factual blockquote description, curated page list (15 to 20 URLs max), consistent entity naming across all your schema, page descriptions written as content summaries not marketing copy, updates within 48 hours of major new pages, and no URLs you don't want cited.

What llms.txt is

llms.txt is a plain-text Markdown file placed at the root of a domain, at the predictable path yourdomain.com/llms.txt. A human visiting that URL in a browser sees a readable document describing what the site is, what it sells, who runs it, and which pages are most worth reading. An AI engine visiting the same URL uses that description as fast context before deciding which pages to crawl, cite, or summarize.

The file was proposed by Jeremy Howard, co-founder of Answer.AI and fast.ai, in a September 2024 blog post. Howard drew on the precedent of robots.txt (the 1994 Martijn Koster proposal that gave crawlers a machine-readable permissions file at a predictable root path) and the older humans.txt convention (a colophon for listing the people who built a site). His proposal was deliberately minimal: a Markdown-formatted text file at /llms.txt, with a first-line H1 for the site name, an optional blockquote description, and a set of labeled sections covering pages, contact, and anything else the publisher wants AI engines to know.

As of May 2026, the community index at llmstxt.cloud tracked over 4,200 domains serving an llms.txt file, including Anthropic, Vercel, Cloudflare, HuggingFace, Stripe, and thousands of developer and small-business sites. Anthropic's own documentation explicitly states that Claude reads llms.txt when using its web tool. Perplexity's developer blog confirmed in February 2026 that PerplexityBot indexes it as a site-context signal during domain-level crawls.

The file is not a standard in the formal sense. There is no RFC, no W3C recommendation, and no governance body. Adoption is voluntary, and there is no guarantee every AI engine parses it identically. The practical implication is that you cannot rely on every engine reading it the same way, but publishing one has no downside and measurable upside for engines that do read it. The file cannot hurt your rankings on Google or Bing. It is additive.

How AI engines actually use it

The mechanism differs by engine. Understanding the differences helps you write the file more usefully and set realistic expectations about where it has impact.

Claude reads llms.txt when it has web tool access and receives a query that requires understanding a specific domain. When a user asks Claude "what does beverlyhillsgrowth.com do?", Claude fetches /llms.txt first if it exists, reads the description and page list, and uses that to prioritize which URLs to visit next. The file effectively compresses what might otherwise require crawling 10 to 20 pages into a single fast context fetch. Claude's documentation, updated in April 2026, describes this as "a site's self-description for AI agents." The description blockquote and the ## Pages section are the two most-read parts of the file when Claude is deciding where to go next on your domain.

Perplexity's crawler (PerplexityBot) indexes /llms.txt as part of its domain-level crawl pass. The page descriptions under ## Pages help Perplexity's retrieval system classify your content before making a citation decision. A well-written ## Pages section that accurately describes each guide in one sentence gives Perplexity a classification shortcut it would otherwise derive by crawling and scoring each full page individually. Sites with clear llms.txt page descriptions have shown faster inclusion in Perplexity's cited-source pool for niche queries, according to informal community measurements published in the GEO Discord server in March 2026.

ChatGPT (GPT-4o with browsing) does not have documented llms.txt-specific parsing as of May 2026. Since ChatGPT routes web queries through Bing's index, the file is indexed as a page (at the URL beverlyhillsgrowth.com/llms.txt) and may surface in Bing results about your brand, but it is not parsed differently from any other content page. For ChatGPT, schema markup, passage-level clarity, and off-site brand mentions remain the primary citation levers.

Google's crawlers index /llms.txt as a page, and it appears in Google's index. Google AI Overviews does not have documented llms.txt integration as of May 2026. The file will not harm Google performance; it simply is not a documented AI Overviews signal. The clearest near-term value of llms.txt is on Claude- and Perplexity-originated traffic, where the parsing behavior is confirmed and documented.

The format, line by line

llms.txt must be valid Markdown, served at the domain root path /llms.txt (not a subdirectory), encoded in UTF-8, and kept under 200 lines for most sites. Here is an annotated example using Beverly Hills Growth's own file as the starting point:

# Beverly Hills Growth

> Local SEO and AI search agency. We help Beverly Hills, West Hollywood,
and Santa Monica small businesses rank on Google and get cited by ChatGPT,
Perplexity, and Claude. Free 48-hour SEO audit, no sales call required.

## What we do
- Local SEO for restaurants, salons, dental, legal, real estate
- Google Business Profile optimization
- AI search optimization (GEO)
- AI chatbot installation for service businesses

## Service area
Beverly Hills, West Hollywood, Santa Monica, West Los Angeles,
Brentwood, Bel Air, Century City

## Pages
- [Home](https://beverlyhillsgrowth.com/): Services, pricing, free audit form
- [GEO Guide](https://beverlyhillsgrowth.com/resources/ai-search/generative-engine-optimization): 2026 guide to getting cited by AI search engines — 7 ranking factors, schema, measurement.
- [ChatGPT Ranking Playbook](https://beverlyhillsgrowth.com/resources/ai-search/how-to-rank-in-chatgpt): 7 citation factors and a 30-day plan for getting your site cited by ChatGPT.
- [Google AI Overviews Guide](https://beverlyhillsgrowth.com/resources/ai-search/google-ai-overviews-ranking-guide): How Google selects AI Overview citations and the 30-day optimization playbook.
- [Perplexity SEO Playbook](https://beverlyhillsgrowth.com/resources/ai-search/perplexity-seo-playbook): How Perplexity's retrieval pipeline picks citations and how to get into its source set.

## Contact
- Website: https://beverlyhillsgrowth.com
- Email: audit@beverlyhillsgrowth.com

## Founder
Vladan Mijatovic, independent SEO operator and founder of Beverly Hills Growth.

The line-by-line rules that matter: The first line (# H1) should be your official business name, exactly as it appears in your Google Business Profile and your Schema.org Organization markup. Consistent entity naming is how AI engines resolve ambiguity between brands with similar names; if "Beverly Hills Growth" appears in one file and "BHG" in another, the engine treats them as potentially distinct entities and weights each signal less.

The blockquote (>) description must be one paragraph, not a bullet list and not multiple paragraphs. Three sentences is the target. Sentence one states what the business is and who it serves. Sentence two states the primary service or differentiator. Sentence three states something actionable, such as a free offer or a concrete outcome. Cut anything that reads like marketing copy: "premier solutions for forward-thinking businesses" signals nothing to a parser. "Local SEO and AI search agency serving Beverly Hills small businesses, free 48-hour audit" contains three extractable facts.

The ## Pages section is the most-read part of the file. Each line is a Markdown link with a one-sentence description after the colon. The sentence should answer "what would an AI learn from reading this page?" not "what do we want the AI to think about us?" Describe the content depth and topic, not the marketing intent. "Comprehensive guide" says nothing; "2026 guide to getting cited by ChatGPT, covering the 7 citation factors and a 30-day implementation plan" gives the engine enough to match it against relevant queries.

The seven things that make a good llms.txt

Put it at the exact root path.
The file must be at yourdomain.com/llms.txt. Not /docs/llms.txt, not /info/llms.txt, not /public/llms.txt. Claude, Perplexity, and other AI agents look for the file at the root first, the same way robots.txt must be at the root to be respected. A file at any other path is invisible to the parsers that look for it by convention. If your CMS or static host intercepts the path and returns an HTML wrapper instead of the raw text, that is also a failure state — verify the file loads as plain text in a browser.
Write the description as a single tight paragraph.
The > blockquote section should be one paragraph: no bullet points, no line breaks between sentences, no multi-paragraph intro. Three sentences is the target. What the business does, who it serves, and why an AI engine should cite this site for a related query. Parsers that expect a single Markdown blockquote will behave inconsistently with lists or multiple paragraphs in that position. Test it: remove the description and see if a one-paragraph summary of your site can stand alone. If it can, it's right.
List 15 to 20 pages, not all of them.
A site with 200 blog posts should not list all 200 in llms.txt. AI engines inject the file into a context window; a 400-line file consumes context budget that could go to the actual page content being discussed. Select the pages most relevant to the queries your business should be cited for. A dental practice should list its services page, locations, patient FAQ, and its two or three most substantive blog posts. Leave out paginated archives, tag pages, legal boilerplate, and low-value thin pages.
Write page descriptions that describe content, not marketing goals.
"Learn how we help businesses succeed" tells an AI engine nothing usable. "2026 guide to getting cited by ChatGPT, covering the 7 citation factors, a 30-day implementation plan, and measurement approach" is a description the engine can match against queries. Each page description should answer three things: what topic does this page address, what does a reader concretely learn from it, and what is its depth or scope. Factual descriptors, not adjectives. Specifics, not claims.
Use the exact same entity names as your Schema.org markup.
If your Schema.org Organization name is "Beverly Hills Growth" and your llms.txt H1 is "Beverly Hills Growth," an AI engine can confidently unify the two signals. If they differ ("BH Growth" in one, "Beverly Hills Growth LLC" in another, "BHG" in a third), the engine treats them as potentially different entities and weights each signal less. Consistency across robots.txt user-agent comments, Schema.org, Google Business Profile, and llms.txt is the entity disambiguation that AI engines rely on to know who you are before they cite you.
Update the file within 48 hours of shipping a major page.
llms.txt is a live index of your most important content. A file that does not include your best guide, or that still lists a page you deleted six months ago, actively misleads AI engines that read it and may cause them to fetch dead links or miss your most citable content. Add an llms.txt update step to your content checklist: draft page, write schema, update sitemap.xml, update llms.txt. In that order. The file is 30 seconds to update for a new page; the cost of doing it at the moment of publish is near zero.
Do not list pages you don't want cited.
Unlike robots.txt, llms.txt has no Disallow syntax. Every URL you include in the ## Pages section is an explicit signal that you want AI engines to read it and potentially cite it to their users. Do not include draft pages, internal tool pages, login flows, or legal boilerplate that isn't useful to a prospective customer. If a page shouldn't be the first thing a stranger reads about your business, it shouldn't be in llms.txt. The file is a curated editorial selection, not an export of your CMS.

llms.txt vs. robots.txt vs. sitemap.xml

Three files serve overlapping but distinct purposes. Getting them confused leads to wasted effort and coverage gaps.

robots.txt controls which crawlers can access which paths. It is the right tool for blocking AI crawlers you don't want indexing your content: add User-agent: GPTBot with Disallow: / to block OpenAI's crawler, for example. robots.txt says nothing about what your site is; only who may access it.

sitemap.xml tells crawlers which URLs exist on your site and when they were last modified. It is the right tool for ensuring all your pages get indexed across search engines, including Bing, which feeds ChatGPT. A sitemap does not describe what those pages contain; it is a directory, not a description.

llms.txt describes what your site is and what your most important pages cover. It is not a crawl-permission file and not a URL directory. It is closer to a library catalog entry: concise metadata a reader can use to decide whether to go deeper without crawling the full site. The three files are complementary. A production site needs all three. If you have none of them, start with robots.txt (crawl permissions), then sitemap.xml (URL discovery), then llms.txt (semantic context for AI engines).

Common mistakes

These are the errors that appear repeatedly in the wild when the community indexes and audits llms.txt files.

Putting marketing copy in the description. "We are a premier digital agency delivering exceptional results for forward-thinking businesses" contains zero facts an AI engine can extract. The blockquote section is read by machines first. Write it the way you'd write a database field value, not a tagline. Every sentence should contain at least one specific, extractable fact: a category, a geography, a service, a number, a named entity.
Publishing the file once and never updating it. A llms.txt that is 12 months out of date lists pages that have moved, misses your best new content, and describes an older version of the business. Treat it as a living document with a quarterly review minimum. The cost of maintaining it is low; the cost of an AI engine citing your business based on stale information is reputational.
Mismatching entity names. If your Schema.org markup says "Beverly Hills Growth" and your llms.txt H1 says "BHG" and your GBP listing says "Beverly Hills Growth LLC," each AI engine that reads two of those sources encounters an inconsistency and downweights both. Pick one canonical entity name and use it everywhere without variation.
Including every page on the site. A 400-line llms.txt covering the full blog archive, paginated archive pages, and all tag pages is noise. The ## Pages section is a curated index, not a sitemap dump. If every page is equally important, none of them are.
Not testing whether the file loads correctly. Verify the file loads at /llms.txt in a browser as plain text. Some CMS configurations and reverse proxies intercept the path and return an HTML wrapper with a 200 status, which looks like success to a developer but is not parseable as a Markdown llms.txt file by a crawler expecting text/plain.
Confusing llms.txt with robots.txt for opt-out purposes. Removing a URL from llms.txt does not block AI crawlers from visiting it. Only robots.txt controls crawl access. llms.txt is a positive-signal file; it has no blocking capability. A business that wants to prevent a specific AI engine from indexing its content needs to add a Disallow rule in robots.txt for that engine's user agent string, not remove it from llms.txt.

Verifying it is being read

There is no Search Console equivalent for llms.txt as of May 2026. Verification is manual and takes about 15 minutes total.

Check that the file loads as plain text. Visit yourdomain.com/llms.txt in a browser. The response should be Content-Type: text/plain and the raw Markdown source, not an HTML page wrapping the content. If you get a 404, the file does not exist at the right path. If you get a 200 with HTML, a route in your CMS or proxy is intercepting the path and you need to add an explicit handler for /llms.txt that returns text/plain.

Ask Claude directly. Open a Claude conversation with web tool access enabled. Ask: "What does [yourdomain.com] do? Summarize based on what you find on the site." If Claude's summary matches your llms.txt description and cites specific pages you listed, the file is being read. If Claude gives a generic or inaccurate summary, audit your description for vagueness and check that the entity names in the file match your site's other signals.

Check Bing Webmaster Tools crawl stats. The URL yourdomain.com/llms.txt should appear in the list of crawled URLs once indexed. If it is missing, submit it directly via URL Inspection in Bing Webmaster Tools. This confirms the file is indexed by Bing (and therefore indirectly discoverable by ChatGPT), though it does not confirm AI-specific parsing by engines that have their own parsing layer.

Check server access logs for AI crawler user agents. ClaudeBot (Anthropic's crawler) and PerplexityBot will appear in your access logs when they fetch the file. Seeing either user agent requesting /llms.txt in the logs is the clearest signal the file is being discovered and read, not just indexed as a page. Most hosting control panels expose access logs; on Vercel, use the Functions Log or a connected log drain to check.

llms-full.txt: the companion file

Jeremy Howard's original September 2024 proposal included an optional companion file called llms-full.txt. While llms.txt is a curated index with one-line page descriptions, llms-full.txt contains the full text of your most important pages in a single file, formatted for direct injection into an AI engine's context window. The use case: an AI agent that needs to answer detailed questions about a company's services can load llms-full.txt once and answer from that context, rather than crawling 10 to 15 pages individually. For documentation sites, software libraries, and knowledge bases with complex content structures, this is valuable.

Claude supports llms-full.txt injection as of the Claude 3.5 era. Perplexity support is partial as of May 2026. If you choose to publish one, keep it under 100,000 tokens (approximately 75,000 words), organize it with page URL headers so engines can attribute content to the correct source, and update it on the same cadence as your main llms.txt. For most small service-business sites, the main llms.txt is sufficient. llms-full.txt becomes useful when you have deep technical content, extensive FAQs, or detailed service descriptions that an AI engine would need to read in full to answer customer questions accurately.

Frequently asked questions

Is llms.txt an official standard?

No. llms.txt is a community convention proposed by Jeremy Howard of Answer.AI in September 2024. There is no RFC, no W3C recommendation, and no formal governance body. Adoption is voluntary. Despite this, thousands of sites published an llms.txt by mid-2026, and Anthropic's Claude documentation explicitly references reading the file during web tool calls. Its informal status is comparable to robots.txt in its first years — the original robots exclusion protocol was also a community convention before it was formalized.

Do I need llms.txt if I already have a sitemap?

They serve different purposes. sitemap.xml tells crawlers which URLs exist on your site. llms.txt tells AI engines what those URLs contain and what your site is about. A sitemap is a crawl-discovery signal; llms.txt is a semantic-context signal. You need both. A sitemap with no llms.txt is like a table of contents with no book jacket description: a crawler knows the pages exist but has no fast path to understanding what they are for.

Will llms.txt affect my Google rankings?

Not directly. Google's ranking systems do not use llms.txt as a documented signal. However, indirectly: if your llms.txt causes Claude or Perplexity to cite your site accurately in AI-generated answers, and that drives branded search queries and traffic, those behavioral signals feed into Google's quality systems over time. The file does not harm rankings and cannot conflict with any documented Google signal.

How long should my llms.txt be?

Under 200 lines for most sites. AI engines inject the file into a context window when they need to understand your site; a very long file competes with actual page content for that context budget. If your site has thousands of pages, list only the 15 to 20 most important ones under the Pages section. You can publish a separate llms-full.txt with complete page content for engines that support it, keeping the main llms.txt short and fast to parse.

Can I use llms.txt to block AI engines from using my content?

No. llms.txt has no opt-out or disallow syntax. It is an opt-in description file. To block AI crawlers, use robots.txt: add Disallow rules for the specific user agents you want to exclude (GPTBot, ClaudeBot, PerplexityBot, and others). A separate initiative called ai.txt was proposed by Spawning.ai for opt-out purposes, but as of May 2026 it has not achieved wide support among AI engine operators.

How often should I update llms.txt?

Update it every time you publish a major new page or section, ideally within 48 hours of shipping. Add a step to your content publishing checklist: after shipping the page and updating sitemap.xml, add one line to the Pages section of llms.txt. At minimum, review the file quarterly. A stale llms.txt that lists deleted pages or omits your flagship content actively misleads AI engines that read it.

Which AI engines actually read llms.txt?

As of May 2026, Anthropic's Claude reads llms.txt during web tool calls and uses the page descriptions to decide which URLs to fetch first. Perplexity's crawler indexes it as a site-context signal. ChatGPT does not have documented llms.txt-specific parsing — it routes through Bing's index, so the file is indexed as a page but not parsed differently from other pages. Google's crawlers index it as a page but Google AI Overviews does not have a documented llms.txt integration. The file is most valuable for Claude- and Perplexity-origin traffic today.

What is llms-full.txt?

llms-full.txt is an optional companion file, part of Jeremy Howard's original September 2024 spec, that contains the full text of your most important pages rather than one-line summaries. The intent is to give AI engines a single file they can read to understand your site without crawling individual URLs. Claude supports injecting llms-full.txt content into its context window. For most small business sites, the main llms.txt is sufficient; llms-full.txt is more useful for documentation sites and knowledge bases where full-text retrieval matters.

Want us to audit your llms.txt and schema?

Free 48-hour audit. We check your llms.txt, schema markup, robots.txt, and AI-crawler access, then send a written report with what to fix and in what order. No sales call required.

Get the free audit