Services Logiciels
Pour les entreprises
Produits
Créer des agents IA
Sécurité
Portfolio
Embaucher des développeurs
Embaucher des développeurs
What Is llms.txt?
An authoritative definition of llms.txt — the emerging web standard that makes your site readable and navigable by AI models, and why it's becoming essential for Generative Engine Optimization.
llms.txt is a plain-text file placed at your website's root (e.g., example.com/llms.txt) that provides AI models with a structured overview of your site — what it covers, what pages are available, and how content is organized. Think of it as robots.txt for AI. Proposed by Jeremy Howard (fast.ai founder) in late 2024, llms.txt has been adopted by over 12,000 domains as of Q1 2026 and is recognized by major AI systems including Perplexity, Claude, and ChatGPT's browsing features.
What llms.txt is
llms.txt is a plain-text file hosted at the root of a website (at the path /llms.txt) that provides a structured, machine-readable summary of the site for consumption by large language models. It was proposed in September 2024 by Jeremy Howard, the founder of fast.ai and a leading figure in practical AI research. The specification is intentionally simple — llms.txt is a Markdown-formatted document that describes what a site is about, lists its key pages, and provides navigation guidance for AI systems.
The problem llms.txt solves is fundamental: when an AI model encounters a website (via retrieval-augmented generation, web browsing, or training data processing), it has no efficient way to understand the site's overall structure and content taxonomy. It can crawl individual pages, but it cannot quickly determine what topics the site covers, which pages are most authoritative, or how content is organized. llms.txt provides this meta-layer, functioning as a table of contents and site guide specifically designed for AI consumption.
The analogy to robots.txt is deliberate. Just as robots.txt (created in 1994) became the universal standard for communicating crawling instructions to search engine bots, llms.txt is emerging as the standard for communicating site structure to AI models. The key difference is that robots.txt tells bots what not to access, while llms.txt tells AI models what is available and how to make sense of it.
The llms.txt specification
The llms.txt format is Markdown-based and follows a simple structure. The file begins with an H1 heading (the site or project name), followed by a blockquote providing a brief description. After that, sections are organized with H2 headings and contain lists of links with optional descriptions. The core sections typically include an overview of the site, a list of key documentation or content pages, and optional sections for additional resources.
A minimal llms.txt file looks like this: an H1 with the site name, a blockquote with a one-sentence description, an H2 "Docs" or "Content" section with a list of linked pages, and optionally an H2 "Optional" section for supplementary resources. Each link entry follows the format - [Page Title](URL): Brief description of what this page covers. The specification explicitly avoids complex markup or custom schemas — it is designed to be trivially easy to create and maintain.
The specification also defines a companion file: llms-full.txt. While llms.txt serves as a concise directory (typically under 2,000 tokens), llms-full.txt can contain the complete content of all key pages concatenated into a single document. This is particularly useful for AI models that can process long contexts — instead of following dozens of links, the model can consume the entire site's key content in one request. Sites with extensive documentation (developer tools, technical platforms) particularly benefit from llms-full.txt.
How to implement llms.txt
Implementing llms.txt requires three steps. First, create the file. Write a Markdown document that starts with your site name as an H1, a blockquote description, and organized sections listing your most important pages with brief descriptions. Focus on pages that represent your core expertise, product definitions, pricing, and frequently asked questions — the pages you most want AI models to find and cite.
Second, host the file at your domain root. The file must be accessible at yourdomain.com/llms.txt. For static sites, place it in your public directory. For dynamic sites, configure a route that serves the file with a text/plain or text/markdown content type. Most web frameworks (Next.js, Gatsby, WordPress, Laravel) support this with a simple static file or route configuration. In Next.js, for example, you place the file in the /public directory and it is automatically served at the root path.
Third, maintain the file as your content changes. llms.txt is only useful if it reflects your current site structure. When you add new pages, remove old ones, or reorganize content, update llms.txt accordingly. Some teams automate this by generating llms.txt from their CMS or sitemap during the build process. Tools like llmstxt.firecrawl.dev and llmstxt-generator can auto-generate an initial llms.txt by crawling your site, which you can then refine manually.
What to include in your llms.txt
The most effective llms.txt files are curated, not exhaustive. You should not list every page on your site — that is what sitemaps are for. Instead, llms.txt should highlight the pages that best represent your expertise, products, and authority. Prioritize: definitional pages ("What is X?" pages), product and service pages with specific capabilities and pricing, key documentation and technical guides, comparison pages that position you against alternatives, and FAQ pages that answer common questions about your industry. Exclude pages that are purely navigational (category indexes, tag pages), marketing landing pages with no substantive content, login pages, and internal tools. The goal is to present AI models with a curated view of your most authoritative, citable content — the pages that you want to appear in AI-generated answers. For descriptions, be factually specific rather than promotional. Instead of "Our amazing AI development services" write "Custom AI agent development services — pricing from $500 starter deployments, $50/hour engineering rates, 48-hour deployment capability." AI models respond to specific, verifiable claims far better than to marketing language. Every description in your llms.txt is an opportunity to embed the exact claims you want AI models to associate with your brand.
Why llms.txt matters for GEO
llms.txt is one of the highest-impact, lowest-effort GEO (Generative Engine Optimization) tactics available. When an AI model encounters your llms.txt file — whether through direct retrieval, web browsing, or during training data processing — it gains immediate structural understanding of your site. This dramatically increases the probability that the model will identify your relevant pages when generating answers to user queries. The impact is particularly strong for retrieval-augmented generation (RAG) systems like Perplexity, which actively browse the web to find sources for their answers. When Perplexity's crawler encounters your llms.txt, it can efficiently identify which pages are relevant to a query rather than crawling random pages and hoping to find the right one. Early data from Perplexity's engineering blog confirms that sites with llms.txt files receive up to 2-3x more citations in answers compared to similar sites without one. llms.txt also plays a role in AI model training. Major AI labs (including Anthropic, OpenAI, and Google DeepMind) process web content at scale during pre-training and fine-tuning. A well-structured llms.txt file helps these systems correctly categorize your site's content and associate your domain with specific topics and expertise areas. The compound effect is significant: once an AI model learns to associate your domain with authoritative content on a topic, it is more likely to cite you in future responses.
Adoption trends and industry support
Adoption of llms.txt has accelerated rapidly since its proposal in late 2024. As of Q1 2026, over 12,000 domains host a llms.txt file, up from approximately 2,000 in mid-2025. Early adopters include developer-focused companies (Anthropic, Cloudflare, Stripe, Vercel, Supabase), documentation platforms (ReadTheDocs, GitBook, Mintlify), and tech-forward brands across industries. The llmstxt.org directory tracks adoption and provides implementation examples. AI platforms have signaled recognition of the standard through behavior, if not formal endorsement. Perplexity's retrieval system checks for llms.txt during web searches. Anthropic's Claude models, when using computer use or web browsing capabilities, reference llms.txt files when available. ChatGPT's browsing mode similarly leverages llms.txt for site understanding. While no major AI lab has formally endorsed llms.txt as a standard, the practical recognition is clear from retrieval behavior. The WordPress ecosystem has also embraced llms.txt, with multiple plugins (including WP llms.txt and Jeremey's own plugin) that auto-generate and maintain the file based on site content. Shopify apps, Next.js starters, and Laravel packages for llms.txt generation are all available. The tooling ecosystem is mature enough that implementation is trivial for any modern web stack.
llms.txt vs. robots.txt vs. sitemap.xml
These three files serve different purposes and are complementary, not competing. robots.txt tells search engine crawlers which pages they are and are not allowed to access. It is a permission layer. sitemap.xml provides search engines with a comprehensive list of all pages on a site, their update frequency, and their relative priority. It is a discovery layer. llms.txt provides AI models with a curated, descriptive overview of a site's most important content. It is a comprehension layer. A well-optimized site in 2026 should have all three. robots.txt controls crawler access (and can specifically address AI crawlers via user-agent directives for GPTBot, ClaudeBot, PerplexityBot, etc.). sitemap.xml ensures comprehensive indexing by traditional search engines. llms.txt ensures that AI models understand what your site is about and can efficiently navigate to your most authoritative content. One important distinction: robots.txt and sitemap.xml are formal, widely supported standards with decades of universal adoption. llms.txt is an emerging convention that has gained significant traction but has not yet been formally standardized through a body like the IETF or W3C. Its adoption is driven by practical utility rather than formal specification, similar to how humans.txt (a file listing site contributors) gained adoption without formal standardization. The trajectory suggests llms.txt will become a de facto standard within 12-18 months.
Need help implementing llms.txt?
Our team implements llms.txt, structured data, and comprehensive GEO strategies for businesses that want to be cited by AI models.
Frequently Asked Questions
Not yet. llms.txt is a community-driven convention proposed by Jeremy Howard in September 2024. It has gained wide adoption (12,000+ domains) and de facto recognition by major AI platforms, but it has not been formally standardized through the IETF, W3C, or similar body. Its trajectory is similar to other practical web conventions that became standards through adoption rather than committee.
robots.txt tells crawlers what not to access — it is a restriction mechanism. llms.txt tells AI models what is available and how content is organized — it is a discovery and comprehension mechanism. They serve complementary purposes, and a well-optimized site should have both.
Yes. AI systems with web browsing or retrieval capabilities — including Perplexity, ChatGPT Browse, and Claude with computer use — check for and reference llms.txt when available. During pre-training, AI labs process web content that includes llms.txt files, which helps models learn site structure and topic associations. Early data suggests 2-3x more citations for sites with llms.txt.
A basic llms.txt file can be created in under 5 minutes — it is a simple Markdown document listing your key pages with descriptions. A more comprehensive version with curated descriptions and a companion llms-full.txt file typically takes 1-2 hours. Auto-generation tools can create a starting draft from your existing sitemap.
No. llms.txt should be a curated selection of your most authoritative and citable pages — typically 20-50 pages for most sites. Include definitional content, product pages, pricing, documentation, and FAQ pages. Exclude navigation pages, marketing landing pages with thin content, and internal tools. Your sitemap.xml already provides the comprehensive page list.
llms-full.txt is a companion file defined in the llms.txt specification. While llms.txt is a concise directory (typically under 2,000 tokens), llms-full.txt contains the complete text content of your key pages concatenated into a single document. This allows AI models with long-context capabilities to consume your entire site's key content in one request.
Not directly — Google's traditional search crawler does not use llms.txt. However, Google's AI Overviews do use retrieval-augmented generation, and there is evidence that llms.txt influences content selection for AI Overviews. The primary benefit of llms.txt is for GEO (Generative Engine Optimization) — improving visibility in AI-powered search engines like Perplexity, ChatGPT, and Gemini.
Ready to make your site AI-readable?
Talk to our team about llms.txt implementation, structured data, and full GEO strategy for your domain.