LLMs are now a meaningful audience
A growing share of the questions people ask about businesses, products, and services are now answered by AI. ChatGPT, Perplexity, Google's AI Overviews, Microsoft Copilot, and dozens of other tools are pulling information from websites, synthesising it, and presenting it to users. When someone asks an AI assistant to recommend a web design agency in Perth, or to explain the difference between two types of service, the answer is built from web content.
That means your website is being read, interpreted, and represented by software you do not control. If your site is easy for these models to understand, your business is more likely to be mentioned, cited, and recommended. If your content is buried in complex layouts, vague marketing language, or poorly structured HTML, the model may misunderstand your offer or skip you entirely.
This is already happening at scale. AI-driven traffic is projected to grow from roughly 0.25% of search in 2024 to around 10% by the end of 2025. The businesses that show up in AI-generated answers are the ones with clear, well-structured, factual content that language models can parse confidently. This is a new channel of discovery and it is growing fast.
How LLMs read your website
Language models do not experience your website the way a human does. They do not see your hero image, notice your brand colours, or feel the effect of your animations. They read text, parse HTML structure, and extract meaning from the relationships between elements on the page.
That means the things that matter most for LLM readability are the same things that have always mattered for good web development. Clean semantic HTML. Logical heading hierarchies. Descriptive link text. Well-written copy that says what the business does, who it serves, and what makes it different.
Where LLMs differ from traditional search crawlers is in how deeply they process language. A search engine matches keywords and evaluates authority signals. A language model reads your content and builds a representation of what your business is, what it offers, and how it compares to alternatives. Vague, jargon-heavy, or overly clever copy that a human might tolerate will often confuse a language model. Clear, direct, factual writing performs best.
Page structure also matters more than many teams realise. When your content is organised with clear sections, descriptive headings, and a logical flow from topic to topic, the model can extract accurate information about each part of your offering. When everything is jumbled together or hidden behind JavaScript-rendered elements that require interaction to reveal, the model gets a fragmented or incomplete picture.
The llms.txt standard
One of the most practical steps a business can take is to implement an llms.txt file. This is a plain markdown file hosted at the root of your domain that gives language models a structured, curated summary of your most important content.
The concept was proposed by Jeremy Howard from Answer.AI in September 2024. Adoption was initially slow, but accelerated rapidly through late 2024 and into 2025 as companies like Anthropic, Cloudflare, Vercel, Cursor, and Astro adopted the format. It has since become a widely recognised convention, though it is worth noting that it is not yet an official standard. The IETF has launched an AI Preferences Working Group, but no formal specification has been ratified.
Here is what an llms.txt file looks like in practice. This is a trimmed version of our own.
> Web design and branding agency for small businesses.
Based in Perth, WA, Australia. We build websites that generate leads for local service businesses.
## Services
- [Website Design](https://offpistestudio.com/services/website-design): Custom website design for small businesses with lead forms, click-to-call, local SEO, and mobile-responsive layouts. Built to convert visitors into customers. - [Local SEO](https://offpistestudio.com/services/local-seo): Google Business Profile setup, local citations, suburb pages, and Google Maps optimisation. - [Brand Identity](https://offpistestudio.com/services/brand-identity): Logo, brand guidelines, and visual identity for businesses that want to look professional and trustworthy. - [Landing Pages](https://offpistestudio.com/services/landing-pages): High-converting landing pages for trades and service businesses running Google or Meta ads.
## How We Work
1. Discovery call — understand your business, goals, and audience 2. Strategy & wireframes — plan the site structure, pages, and conversion flow 3. Design — build the visual design, brand-aligned and mobile-first 4. Development — build a fast, SEO-ready website with lead capture built in 5. Launch & support — go live with ongoing support and optimisation
## About
- Based in City Beach, Perth, Western Australia - Serves clients across Australia, NZ, and internationally - 50+ projects delivered for small and medium businesses - [About us](https://offpistestudio.com/about)
## Contact
- Email: [email protected] - [Get in touch](https://offpistestudio.com/contact)
How it differs from robots.txt and sitemap.xml
These three files serve different purposes. Your sitemap.xml is the complete catalogue of every URL on your site. Your robots.txt marks which areas crawlers should and should not access. Your llms.txt is a curated reading list of the content that matters most.
If your website were a library, the sitemap would be the full catalogue, robots.txt would mark the restricted shelves, and llms.txt would be the librarian's recommended reading list. It tells AI tools where to start and what deserves the most attention.
File format and structure
The format is deliberately simple. The only required element is an H1 title. A well-structured file typically includes a brief blockquote summary of the business, followed by H2 sections that group your most important links with short descriptions.
A typical structure might include sections for services, key pages, guides, case studies, and reference material. Each link includes a brief description so the model understands what it will find before fetching the page. This is important because AI agents using llms.txt can reason about which pages to fetch based on those descriptions, rather than relying on guessing or keyword matching.
You can also include an "Optional" section for lower-priority resources that provide additional context but are not essential for understanding the business.
The three related formats
The llms.txt proposal actually encompasses three related conventions that work together.
The first is llms.txt itself, the core summary file hosted at your domain root. This is the starting point for any AI tool trying to understand your business. It should be concise enough that a model can process it in a single pass.
The second is llms-full.txt, an extended version that compiles your most important content into a single markdown document. Where llms.txt provides the outline and links, llms-full.txt provides the full text. This gives language models a richer, more detailed understanding of your business without needing to crawl multiple pages. Data from analytics providers has shown that llms-full.txt files receive more than twice the visitor traffic of the standard llms.txt file, which suggests AI agents actively prefer the detailed version when available.
The third convention is the .md extension for individual pages. By making each page on your site accessible as clean markdown by appending .md to the URL, you give AI tools a way to fetch any specific page in the most efficient format possible. Companies that serve markdown versions of their pages report up to 10x reductions in token usage compared to parsing raw HTML. That efficiency matters because it means the model can process more of your content within its context limits and do so with fewer errors.
Why it matters commercially
The practical value of llms.txt comes down to control and accuracy. Without it, AI tools piece together their understanding of your business from whatever they can find across your site. They may focus on the wrong pages, miss key services, or misrepresent your positioning. With a well-written llms.txt file, you are curating that understanding directly.
Internal benchmarks from LangChain tested four approaches to feeding documentation to AI agents. Context stuffing, vector search retrieval, a basic llms.txt implementation, and an optimised llms.txt with descriptive context. The optimised llms.txt approach significantly outperformed the others because the agents could reason about which documentation to fetch based on the descriptions provided, rather than relying on semantic similarity matching.
That finding has direct implications for any business that wants AI tools to represent them accurately. The more structured and descriptive your llms.txt file is, the better the model's understanding of your business will be.
What llms.txt is not
It is important to understand that llms.txt is not an SEO tool in the traditional sense. It will not improve your Google search rankings. Its value shows up in AI-powered search and discovery tools like ChatGPT, Perplexity, and Gemini, not in traditional search engine results pages.
If your primary concern is Google rankings, llms.txt will not move the needle. If your concern is how AI tools understand and recommend your business, it is one of the most direct levers you have.
Structured content and machine-readable formats
Beyond llms.txt, there are broader content decisions that affect how well your site works for language models.
Markdown exports
Making your key content available in clean markdown format gives language models and AI-powered tools the easiest possible version of your content to work with. Blog posts, service descriptions, and case studies exported as .md files can be consumed directly by RAG (retrieval-augmented generation) systems, which are the pipelines that feed relevant content into AI tools when they need to answer a question.
If your content is locked inside complex HTML with deeply nested divs, inline styles, and JavaScript-dependent rendering, a language model has to work much harder to extract the useful information. A clean markdown version removes that friction entirely.
Schema markup
Structured data using schema.org vocabulary helps both search engines and language models understand the type and context of your content. Article schema, Organisation schema, Service schema, FAQ schema, and Review schema all provide explicit metadata that a model can use to categorise and represent your content accurately.
A page with strong schema markup is essentially telling language models exactly what the content is, who created it, when it was published, and what entities it relates to. That explicitness reduces the chance of misinterpretation.
Clean, accessible HTML
The accessibility improvements described in our article on website accessibility and SEO apply here too. Semantic HTML, proper heading hierarchy, descriptive alt text, and clear form labels all make your content easier for language models to parse. The overlap between accessibility, SEO, and LLM readability is significant. Investing in clean markup pays dividends across all three.
What this means for content strategy
Writing for a dual audience of humans and language models is less of a compromise than it sounds. The content that performs best for LLMs is the same content that performs best for people. Clear, specific, well-organised information that says what you do and backs it up with evidence.
Vague brand messaging that sounds impressive but says nothing concrete is a problem for both audiences. A human scanning your site will struggle to understand your offer. A language model will either misrepresent you or skip you in favour of a competitor whose content is more specific.
The practical shift is toward content that states facts clearly. Instead of "we deliver transformative digital experiences," say "we design and build websites for service businesses, with a focus on lead generation and local SEO." The first version sounds like marketing. The second version gives both the human and the model something concrete to work with.
Case studies with specific outcomes are particularly valuable. A language model can extract "this agency increased enquiries by 40% for a plumbing business in Perth" and use that as a factual reference point. A vague testimonial about "great service" gives the model nothing useful.
FAQ content also performs well because it maps directly to the question-and-answer format that most AI interactions follow. If someone asks a chatbot a question that your FAQ page answers clearly, your content is likely to be surfaced. Structured Q&A content is one of the easiest ways to increase your visibility in AI-powered discovery.
The content that works best for language models is the same content that works best for people. Clear, factual, and specific.
Getting started
Most businesses can start making their website more LLM-friendly with a few focused steps.
- Create an llms.txt file at your domain root. Start with an H1 title and a blockquote summary of your business. Organise your most important pages under H2 sections with brief descriptions for each link. Keep it concise and factual.
- Consider adding an llms-full.txt with the full text of your key pages compiled into a single markdown document. AI agents actively look for this file and use it more heavily than the standard version.
- Review your most important pages for clarity. Read the copy and ask whether a language model could accurately summarise what you do from that text alone. If the answer is no, rewrite it.
- Ensure your HTML is semantic and well-structured. Use proper heading levels, descriptive link text, and meaningful alt text on images.
- Add schema markup to your key pages. At minimum, implement Organisation, Service, and Article schema where relevant.
- Make your blog posts and key content available as markdown files. This makes your content directly consumable by RAG systems and AI tools with significantly lower token costs.
- Write an FAQ section for your most common questions. Structure it with clear questions and direct answers that a language model can extract and reference.
- Keep your content up to date. Language models value recency, and stale content is less likely to be cited or recommended.
The businesses that take these steps now will have an advantage as AI-driven discovery continues to grow. This is still early. Most websites are not optimised for language models at all, which means the opportunity for businesses that move first is significant. If you want a website built with these foundations and an SEO strategy that accounts for both traditional search and AI discovery, that is exactly what we do.
