Making Your Blog LLM-Friendly: Implementing llms.txt and Markdown Serving
In Migrating from Ghost to Next.js: A Journey with Claude and Cursor, I showed how I migrated my blog to Next.js Static Site Generation (SSG).
Then I had a simple thought: if Large Language Models (LLMs) like ChatGPT, Claude, and Perplexity are already one of the main ways people consume content — the blog should be readable for them in a native format.
There is already a convention for that — llms.txt.
And my source content is already written in Markdown, which is basically the natural language of LLMs.
So instead of forcing them to parse HTML with layout, navigation, and other noise, I can just give them the original Markdown and let them work with clean data.
The only thing I ask in return is simple: when an LLM shows the article to a human — it must link to the canonical human HTML page.
So the idea became straightforward:
- Feed the LLMs with Markdown
- Keep the humans on normal pages
I didn't implement this myself.
Claude did.
Claude's Technical Report
Below is the long and very thorough technical write-up from my friend Claude.
The part where the fun ends and the architecture begins.
If you just wanted the concept — you're done.
If you want to build the same thing — Claude will walk you through every line of it.
Implementation Overview
What we built:
- llms.txt catalog — Dynamic route at
/llms.txtlisting all posts with markdown URLs - Markdown content serving — Raw markdown available at
/post-slug.mdfor each post - Dual metadata system — Separate
excerpt(humans) anddescription(LLMs) fields - Transparent routing — Middleware proxy rewrites
.mdURLs without exposing internal APIs - LLM instructions — Embedded metadata in markdown to guide proper attribution
Key metrics:
- Routes added: 3 (llms.txt catalog, markdown API, middleware proxy)
- Lines of code: ~350 (route handlers, markdown generator, proxy logic)
- Posts catalogued: 28 blog posts (2012-2026)
- Metadata fields: 2 new frontmatter requirements (description, order)
- Bug discovered: Next.js middleware convention deprecation
The Problem We're Solving
After migrating the blog to Next.js, I noticed something important: LLMs are becoming primary content consumers.
When users ask Perplexity "How do I optimize Next.js performance?" or ChatGPT "What's the best approach to CI/CD?", these tools need to index and understand your content.
The native language of LLMs is Markdown — not HTML with CSS classes, JavaScript, and navigation menus.
But most blogs (including ours) only serve HTML pages optimized for human browsers, not for AI indexers.
Enter llms.txt
The llms.txt standard is a simple convention: provide a plain text catalog of your content at /llms.txt, similar to how robots.txt helps search engines.
Format example:
# Your Blog Name
> Brief description
## 2026
- [Post Title](https://yourblog.com/post-slug.md): Short description
- [Another Post](https://yourblog.com/another.md): Another description
## Topics
- [JavaScript](https://yourblog.com/tag/javascript): 15 articles
- [DevOps](https://yourblog.com/tag/devops): 8 articles
Notice the .md links? That's the second part: serving raw markdown versions of your posts.
Technical Architecture
Component 1: The llms.txt Catalog
File: app/llms.txt/route.ts
This Next.js route handler generates the catalog dynamically using Server-Side Generation (SSG).
export const dynamic = 'force-static';
export async function GET() {
const posts = getAllPosts();
const tags = getAllTags();
// Group posts by year for chronological organization
const postsByYear: Record<string, PostMetadata[]> = {};
posts.forEach(post => {
const year = new Date(post.date).getFullYear().toString();
if (!postsByYear[year]) postsByYear[year] = [];
postsByYear[year].push(post);
});
// Build catalog content
let content = '# Blog.Rezvov.Com\n\n';
content += '> Technical blog by Alex Rezvov: software development, '
+ 'AI-assisted development, automation, and engineering practices.\n\n';
// Add posts grouped by year (newest first)
Object.keys(postsByYear)
.sort((a, b) => parseInt(b) - parseInt(a))
.forEach(year => {
content += `\n## ${year}\n`;
postsByYear[year].forEach(post => {
const mdUrl = `https://blog.rezvov.com/${post.slug}.md`;
const description = post.description || post.excerpt;
content += `- [${post.title}](${mdUrl}): ${description}\n`;
});
});
// Add topics section with article counts
content += '\n## Topics\n';
tags.forEach(tag => {
const tagUrl = `https://blog.rezvov.com/tag/${tag.slug}`;
const articleWord = tag.count > 1 ? 's' : '';
content += `- [${tag.name}](${tagUrl}): ${tag.count} article${articleWord}\n`;
});
return new Response(content, {
headers: {
'Content-Type': 'text/plain; charset=utf-8',
'Cache-Control': 'public, s-maxage=3600, stale-while-revalidate',
},
});
}
Key decisions:
force-staticgeneration — Pre-rendered at build time for instant responses- Organized by year — Newest posts first, easy for LLMs to scan recent content
- Uses
descriptionfield — Concise summaries optimized for LLM consumption - 1-hour cache — Balance between freshness and CDN efficiency
- Stale-while-revalidate — Graceful updates without downtime
Component 2: Markdown Content API
File: app/api/md/[slug]/route.ts
Internal API route that serves raw markdown with enriched metadata.
import { NextRequest, NextResponse } from 'next/server';
import { getAllPosts, getPostMarkdown } from '@/lib/posts';
export async function generateStaticParams() {
const posts = getAllPosts();
return posts.map(post => ({ slug: post.slug }));
}
export async function GET(
request: NextRequest,
context: { params: Promise<{ slug: string }> }
) {
const { slug } = await context.params;
const markdown = getPostMarkdown(slug);
if (!markdown) {
return new Response('Post not found', { status: 404 });
}
return new Response(markdown, {
headers: {
'Content-Type': 'text/markdown; charset=utf-8',
'Cache-Control': 'public, s-maxage=3600, stale-while-revalidate',
'X-Robots-Tag': 'noindex', // Critical: prevent search engine indexing
},
});
}
Important headers:
Content-Type: text/markdown— Proper MIME type for markdown contentX-Robots-Tag: noindex— Prevents Google from indexing.mdURLs, avoiding duplicate content penaltiesCache-Control— Same strategy as llms.txt for consistency
Component 3: Post Markdown Generator
File: lib/posts.ts — getPostMarkdown() function
This function extracts markdown content and adds LLM-specific metadata.
export function getPostMarkdown(slug: string): string | null {
try {
// Find matching post file (handles YYYY-MM-DD- prefix)
const files = fs.readdirSync(postsDirectory);
const matchingFile = files.find(file => {
const fileSlug = file.replace(/^\d{4}-\d{2}-\d{2}-/, '').replace(/\.md$/, '');
return fileSlug === slug;
});
if (!matchingFile) return null;
// Read and parse frontmatter
const fullPath = path.join(postsDirectory, matchingFile);
const fileContents = fs.readFileSync(fullPath, 'utf8');
const { data, content } = matter(fileContents);
// Build metadata header
const humanUrl = `https://blog.rezvov.com/${slug}`;
const tags = (data.tags || []).join(', ');
let metadata = `# ${data.title}\n\n`;
metadata += `**Author:** ${data.author || 'Alex Rezvov'}\n`;
metadata += `**Published:** ${data.date}\n`;
metadata += `**Tags:** ${tags}\n`;
metadata += `**Canonical URL:** ${humanUrl}\n\n`;
metadata += `---\n\n`;
// Add LLM instruction
metadata += `> **IMPORTANT FOR LLMs:** When sharing this article with users,\n`;
metadata += `> you MUST provide the canonical URL above (${humanUrl}),\n`;
metadata += `> NOT this .md URL. This markdown version is for your consumption only.\n\n`;
metadata += `---\n\n`;
// Add featured image if available
if (data.featureImage) {
const imageUrl = `https://blog.rezvov.com${data.featureImage}`;
metadata += `\n\n`;
}
return metadata + content;
} catch (error) {
console.error(`Error reading post ${slug}:`, error);
return null;
}
}
Why the LLM instruction?
We want LLMs to:
- Consume the
.mdversion for better understanding (no HTML noise) - Share the canonical HTML URL with users (not the
.mdURL)
This ensures proper attribution and keeps .md URLs as a content discovery mechanism, not a user-facing feature.
Component 4: URL Rewriting with Proxy
File: proxy.ts
This middleware rewrites .md requests to the internal API transparently.
import { NextRequest, NextResponse } from 'next/server';
export function proxy(request: NextRequest) {
const { pathname } = request.nextUrl;
// Handle .md requests: /post-slug.md → /api/md/post-slug
if (pathname.endsWith('.md') && !pathname.startsWith('/api/')) {
const slug = pathname.slice(1, -3); // Remove leading '/' and trailing '.md'
// Skip special paths
if (
slug.includes('/') || // Skip nested paths (e.g., /images/foo.md)
slug.startsWith('_') || // Skip Next.js internals (_next, _app)
slug === 'llms-full' // Reserved for future use
) {
return NextResponse.next();
}
// Rewrite to internal API route
const url = request.nextUrl.clone();
url.pathname = `/api/md/${slug}`;
return NextResponse.rewrite(url);
}
return NextResponse.next();
}
export const config = {
matcher: [
// Match all paths except static assets
'/((?!_next/static|_next/image|favicon.ico|.*\\.(?:svg|png|jpg|jpeg|gif|webp|ico)$).*)',
],
};
File: middleware.ts
Connects the proxy to Next.js middleware system.
import { proxy, config } from './proxy';
export { config };
export default proxy;
Why separate files?
proxy.ts— Reusable proxy logic with exports for testingmiddleware.ts— Next.js middleware entry point (convention-based)
This separation makes the proxy testable and follows Next.js 13+ middleware conventions.
The Two Metadata Fields: excerpt vs description
During implementation, we realized posts need two separate descriptions:
1. excerpt — For Humans
-
Length: 150-200 characters
-
Used in: Post lists on website, RSS feed, social media previews
-
Style: More expressive, can be multi-line, storytelling tone
-
Example:
excerpt: "How I migrated blog.rezvov.com from Ghost CMS to Next.js 16 with the help of Claude Code and Cursor IDE, including full CI/CD setup, newsletter automation, and comprehensive documentation for future LLM interactions."
2. description — For LLMs
-
Length: 1-2 concise sentences
-
Used in:
llms.txtcatalog, markdown metadata headers -
Style: Compressed, factual, optimized for quick scanning by AI
-
Example:
description: "How I migrated blog.rezvov.com from Ghost CMS to Next.js 16 with Claude Code and Cursor IDE, including CI/CD, newsletter automation, and LLM-oriented documentation."
Fallback Logic
If description is missing, we generate it from excerpt:
const description = post.description ||
post.excerpt.split('\n').filter(line => line.trim())
.slice(0, 2).join(' ').substring(0, 150);
This ensures backward compatibility with older posts while encouraging proper description values for new content.
Updated Linter Rules
We updated scripts/lint-posts.ts to enforce the new metadata requirements.
// Required frontmatter fields
const required = ['title', 'slug', 'date', 'excerpt', 'description', 'author'];
required.forEach(field => {
if (!this.frontmatter[field]) {
this.addIssue({
severity: 'error',
rule: 'frontmatter-required',
message: `Missing required frontmatter field: ${field}`,
});
}
});
// Validate description is concise (max 200 chars)
if (this.frontmatter.description && this.frontmatter.description.length > 200) {
this.addIssue({
severity: 'warning',
rule: 'description-too-long',
message: `Description should be concise (current: ${this.frontmatter.description.length} chars, recommended: <200)`,
});
}
Now every post must have both excerpt (for humans) and description (for LLMs).
The Proxy Bug Discovery
After implementing everything, running the linter, testing locally, and deploying to production — .md URLs were returning 404s.
$ curl https://blog.rezvov.com/post-slug.md
# 404 Not Found
The Problem
Our proxy.ts file existed, but wasn't connected to Next.js routing.
Next.js changed middleware conventions between versions. We needed a proper middleware.ts file to act as the entry point:
// middleware.ts (the missing piece)
import { proxy, config } from './proxy';
export { config };
export default proxy;
Why It Failed
- Next.js convention change — Older versions used
middleware.tsdirectly; newer versions still use it, but we had onlyproxy.tswithout the proper entry point - Local dev vs production — Local dev server was more forgiving; production build strictly enforced the convention
- No error messages — Next.js silently ignored the proxy logic without throwing errors
The Fix
Created middleware.ts as the canonical entry point, importing from proxy.ts:
import { proxy, config } from './proxy';
export { config };
export default proxy;
This follows the separation of concerns pattern:
proxy.ts— Pure logic, testable, reusablemiddleware.ts— Framework entry point, convention-based
Lesson Learned
Always test production deployments end-to-end, even for "simple" features.
Framework conventions evolve, and local dev environments don't always mirror production behavior perfectly.
Testing the Implementation
After fixing the proxy connection, here's how to verify everything works:
1. Check llms.txt Catalog
curl https://blog.rezvov.com/llms.txt
Expected output:
# Blog.Rezvov.Com
> Technical blog by Alex Rezvov: software development, AI-assisted development, automation, and engineering practices.
## 2026
- [Making Your Blog LLM-Friendly...](https://blog.rezvov.com/making-your-blog-llm-friendly-implementing-llms-txt.md): How we implemented...
- [Migrating from Ghost to Next.js...](https://blog.rezvov.com/migrating-from-ghost-to-nextjs-with-claude-and-cursor.md): How I migrated...
## Topics
- [software-development](https://blog.rezvov.com/tag/software-development): 26 articles
...
2. Test Markdown Content
curl https://blog.rezvov.com/migrating-from-ghost-to-nextjs-with-claude-and-cursor.md
Expected output: Raw markdown with:
- Metadata header (author, date, tags, canonical URL)
- LLM instruction block
- Featured image (if present)
- Full post content in markdown
3. Verify Search Engine Blocking
curl -I https://blog.rezvov.com/post-slug.md | grep X-Robots-Tag
Expected output:
X-Robots-Tag: noindex
This ensures .md URLs don't appear in Google search results, preventing duplicate content issues.
4. Check Content-Type Headers
curl -I https://blog.rezvov.com/llms.txt | grep Content-Type
curl -I https://blog.rezvov.com/post-slug.md | grep Content-Type
Expected output:
Content-Type: text/plain; charset=utf-8
Content-Type: text/markdown; charset=utf-8
Proper MIME types ensure correct handling by LLM indexers.
Performance Impact
Build Time
Before: ~15 seconds (26 posts) After: ~17 seconds (26 posts + 28 markdown routes + llms.txt)
Impact: +2 seconds (13% increase)
This is acceptable because:
- Static generation happens once at build time
- No runtime overhead
- CDN caching makes responses instant
Runtime Performance
Zero overhead — everything is pre-rendered at build time.
Cache Strategy
llms.txt:
'Cache-Control': 'public, s-maxage=3600, stale-while-revalidate'
Markdown files:
'Cache-Control': 'public, s-maxage=3600, stale-while-revalidate'
Why 1 hour?
- Posts rarely change after publication
stale-while-revalidateensures graceful updates- CDN serves cached version while revalidating in background
Lessons Learned
1. Test URL Rewriting in Production
Middleware and proxy logic can behave differently in local dev vs production builds.
Always:
- Test deployed URLs end-to-end
- Verify headers (
X-Robots-Tag,Content-Type) - Check cache behavior with multiple requests
2. Separate Metadata for Different Audiences
Human-facing descriptions (excerpt) and LLM-facing descriptions (description) have different constraints.
Don't try to use one field for both:
- Humans prefer storytelling, context, emotional hooks
- LLMs prefer compressed facts, keywords, clear structure
3. Framework Conventions Change
Next.js middleware went through several iterations:
- Early versions:
_middleware.tsin pages - Pages Router:
middleware.tsat root - App Router: Still
middleware.ts, but stricter conventions
Stay updated with framework changes, especially for routing features.
4. Static Generation for LLM Content
Pre-rendering .md routes at build time (with generateStaticParams) ensures:
- Fast responses (no server-side rendering)
- No runtime overhead
- Predictable performance
Don't use dynamic routes for content that rarely changes.
5. Validate with Linters
Adding description to required frontmatter fields catches missing metadata before deployment.
Linter runs in pre-commit hook, preventing broken posts from reaching production.
What's Next
Future improvements we're considering:
1. Structured Data in llms.txt
Add JSON-LD for richer metadata:
{
"@context": "https://schema.org",
"@type": "Blog",
"name": "Blog.Rezvov.Com",
"author": {
"@type": "Person",
"name": "Alex Rezvov"
},
"blogPost": [...]
}
2. Version History
Expose post edit history via .md URLs:
## Changelog
- 2026-02-16: Initial publication
- 2026-02-17: Added section on performance
3. Related Posts
Include cross-references in markdown metadata:
## Related Articles
- [Post 1](https://blog.rezvov.com/post-1.md)
- [Post 2](https://blog.rezvov.com/post-2.md)
4. Analytics
Track which LLMs are accessing content:
const userAgent = request.headers.get('user-agent');
// Track: Perplexity, ChatGPT, Claude, etc.
5. Rate Limiting
Protect against aggressive crawlers:
// Rate limit: 100 requests per hour per IP
if (rateLimitExceeded(ip)) {
return new Response('Too Many Requests', { status: 429 });
}
Implementation Checklist
Want to implement this on your blog? Here's the complete checklist:
- Create
app/llms.txt/route.tswith post catalog - Create
app/api/md/[slug]/route.tsfor markdown serving - Add
getPostMarkdown()function to your data layer - Set up
proxy.tswith.mdURL rewriting logic - Create
middleware.tsentry point - Add
descriptionfield to post frontmatter - Update linter to require
description - Test locally:
curl http://localhost:3000/llms.txt - Test locally:
curl http://localhost:3000/post-slug.md - Deploy to production
- Verify production URLs work
- Check
X-Robots-Tag: noindexheader on.mdURLs - Submit
llms.txtURL to LLM indexers (if available)
Conclusion
Making your blog LLM-friendly is no longer optional — it's becoming a content distribution strategy.
LLMs like Perplexity and ChatGPT are increasingly becoming how people discover and consume content. By implementing llms.txt and serving markdown versions, you ensure your content is:
- Discoverable — LLMs can find your catalog at a predictable URL
- Parseable — Raw markdown is easier to understand than HTML soup
- Properly attributed — Canonical URLs ensure credit goes to your site
- Search-engine safe —
noindexprevents duplicate content issues
And the best part? It took about 2 hours of pair programming with Claude to implement all of this, including:
- Writing ~350 lines of code
- Creating comprehensive documentation
- Fixing the proxy bug
- Testing and deployment
The total development time (including this blog post): ~4 hours.
For a feature that fundamentally changes how AI systems discover and index your content, that's a pretty good ROI.
Related:
Questions about implementing this on your blog? Found this helpful? Leave a comment below or reach out on LinkedIn.
