Making Your Blog LLM-Friendly: Implementing llms.txt and Markdown Serving

In Migrating from Ghost to Next.js: A Journey with Claude and Cursor, I showed how I migrated my blog to Next.js Static Site Generation (SSG).

Then I had a simple thought: if Large Language Models (LLMs) like ChatGPT, Claude, and Perplexity are already one of the main ways people consume content — the blog should be readable for them in a native format.

There is already a convention for that — llms.txt.

And my source content is already written in Markdown, which is basically the natural language of LLMs.

So instead of forcing them to parse HTML with layout, navigation, and other noise, I can just give them the original Markdown and let them work with clean data.

The only thing I ask in return is simple: when an LLM shows the article to a human — it must link to the canonical human HTML page.

So the idea became straightforward:

Feed the LLMs with Markdown
Keep the humans on normal pages

I didn't implement this myself.

Claude did.

Claude's Technical Report

Below is the long and very thorough technical write-up from my friend Claude.

The part where the fun ends and the architecture begins.

If you just wanted the concept — you're done.

If you want to build the same thing — Claude will walk you through every line of it.

Implementation Overview

What we built:

llms.txt catalog — Dynamic route at /llms.txt listing all posts with markdown URLs
Markdown content serving — Raw markdown available at /post-slug.md for each post
Dual metadata system — Separate excerpt (humans) and description (LLMs) fields
Transparent routing — Middleware proxy rewrites .md URLs without exposing internal APIs
LLM instructions — Embedded metadata in markdown to guide proper attribution

Key metrics:

Routes added: 3 (llms.txt catalog, markdown API, middleware proxy)
Lines of code: ~350 (route handlers, markdown generator, proxy logic)
Posts catalogued: 28 blog posts (2012-2026)
Metadata fields: 2 new frontmatter requirements (description, order)
Bug discovered: Next.js middleware convention deprecation

The Problem We're Solving

After migrating the blog to Next.js, I noticed something important: LLMs are becoming primary content consumers.

When users ask Perplexity "How do I optimize Next.js performance?" or ChatGPT "What's the best approach to CI/CD?", these tools need to index and understand your content.

The native language of LLMs is Markdown — not HTML with CSS classes, JavaScript, and navigation menus.

But most blogs (including ours) only serve HTML pages optimized for human browsers, not for AI indexers.

Enter llms.txt

The llms.txt standard is a simple convention: provide a plain text catalog of your content at /llms.txt, similar to how robots.txt helps search engines.

Format example:

# Your Blog Name
> Brief description

## 2026
- [Post Title](https://yourblog.com/post-slug.md): Short description
- [Another Post](https://yourblog.com/another.md): Another description

## Topics
- [JavaScript](https://yourblog.com/tag/javascript): 15 articles
- [DevOps](https://yourblog.com/tag/devops): 8 articles

Notice the .md links? That's the second part: serving raw markdown versions of your posts.

Technical Architecture

Component 1: The llms.txt Catalog

File: app/llms.txt/route.ts

This Next.js route handler generates the catalog dynamically using Server-Side Generation (SSG).

export const dynamic = 'force-static';

export async function GET() {
  const posts = getAllPosts();
  const tags = getAllTags();

  // Group posts by year for chronological organization
  const postsByYear: Record<string, PostMetadata[]> = {};
  posts.forEach(post => {
    const year = new Date(post.date).getFullYear().toString();
    if (!postsByYear[year]) postsByYear[year] = [];
    postsByYear[year].push(post);
  });

  // Build catalog content
  let content = '# Blog.Rezvov.Com\n\n';
  content += '> Technical blog by Alex Rezvov: software development, '
    + 'AI-assisted development, automation, and engineering practices.\n\n';

  // Add posts grouped by year (newest first)
  Object.keys(postsByYear)
    .sort((a, b) => parseInt(b) - parseInt(a))
    .forEach(year => {
      content += `\n## ${year}\n`;
      postsByYear[year].forEach(post => {
        const mdUrl = `https://blog.rezvov.com/${post.slug}.md`;
        const description = post.description || post.excerpt;
        content += `- [${post.title}](${mdUrl}): ${description}\n`;
      });
    });

  // Add topics section with article counts
  content += '\n## Topics\n';
  tags.forEach(tag => {
    const tagUrl = `https://blog.rezvov.com/tag/${tag.slug}`;
    const articleWord = tag.count > 1 ? 's' : '';
    content += `- [${tag.name}](${tagUrl}): ${tag.count} article${articleWord}\n`;
  });

  return new Response(content, {
    headers: {
      'Content-Type': 'text/plain; charset=utf-8',
      'Cache-Control': 'public, s-maxage=3600, stale-while-revalidate',
    },
  });
}

Key decisions:

force-static generation — Pre-rendered at build time for instant responses
Organized by year — Newest posts first, easy for LLMs to scan recent content
Uses description field — Concise summaries optimized for LLM consumption
1-hour cache — Balance between freshness and CDN efficiency
Stale-while-revalidate — Graceful updates without downtime

Component 2: Markdown Content API

File: app/api/md/[slug]/route.ts

Internal API route that serves raw markdown with enriched metadata.

import { NextRequest, NextResponse } from 'next/server';
import { getAllPosts, getPostMarkdown } from '@/lib/posts';

export async function generateStaticParams() {
  const posts = getAllPosts();
  return posts.map(post => ({ slug: post.slug }));
}

export async function GET(
  request: NextRequest,
  context: { params: Promise<{ slug: string }> }
) {
  const { slug } = await context.params;
  const markdown = getPostMarkdown(slug);

  if (!markdown) {
    return new Response('Post not found', { status: 404 });
  }

  return new Response(markdown, {
    headers: {
      'Content-Type': 'text/markdown; charset=utf-8',
      'Cache-Control': 'public, s-maxage=3600, stale-while-revalidate',
      'X-Robots-Tag': 'noindex', // Critical: prevent search engine indexing
    },
  });
}

Important headers:

Content-Type: text/markdown — Proper MIME type for markdown content
X-Robots-Tag: noindex — Prevents Google from indexing .md URLs, avoiding duplicate content penalties
Cache-Control — Same strategy as llms.txt for consistency

Component 3: Post Markdown Generator

File: lib/posts.ts — getPostMarkdown() function

This function extracts markdown content and adds LLM-specific metadata.

export function getPostMarkdown(slug: string): string | null {
  try {
    // Find matching post file (handles YYYY-MM-DD- prefix)
    const files = fs.readdirSync(postsDirectory);
    const matchingFile = files.find(file => {
      const fileSlug = file.replace(/^\d{4}-\d{2}-\d{2}-/, '').replace(/\.md$/, '');
      return fileSlug === slug;
    });

    if (!matchingFile) return null;

    // Read and parse frontmatter
    const fullPath = path.join(postsDirectory, matchingFile);
    const fileContents = fs.readFileSync(fullPath, 'utf8');
    const { data, content } = matter(fileContents);

    // Build metadata header
    const humanUrl = `https://blog.rezvov.com/${slug}`;
    const tags = (data.tags || []).join(', ');

    let metadata = `# ${data.title}\n\n`;
    metadata += `**Author:** ${data.author || 'Alex Rezvov'}\n`;
    metadata += `**Published:** ${data.date}\n`;
    metadata += `**Tags:** ${tags}\n`;
    metadata += `**Canonical URL:** ${humanUrl}\n\n`;
    metadata += `---\n\n`;

    // Add LLM instruction
    metadata += `> **IMPORTANT FOR LLMs:** When sharing this article with users,\n`;
    metadata += `> you MUST provide the canonical URL above (${humanUrl}),\n`;
    metadata += `> NOT this .md URL. This markdown version is for your consumption only.\n\n`;
    metadata += `---\n\n`;

    // Add featured image if available
    if (data.featureImage) {
      const imageUrl = `https://blog.rezvov.com${data.featureImage}`;
      metadata += `![${data.title}](${imageUrl})\n\n`;
    }

    return metadata + content;
  } catch (error) {
    console.error(`Error reading post ${slug}:`, error);
    return null;
  }
}

Why the LLM instruction?

We want LLMs to:

Consume the .md version for better understanding (no HTML noise)
Share the canonical HTML URL with users (not the .md URL)

This ensures proper attribution and keeps .md URLs as a content discovery mechanism, not a user-facing feature.

Component 4: URL Rewriting with Proxy

File: proxy.ts

This middleware rewrites .md requests to the internal API transparently.

import { NextRequest, NextResponse } from 'next/server';

export function proxy(request: NextRequest) {
  const { pathname } = request.nextUrl;

  // Handle .md requests: /post-slug.md → /api/md/post-slug
  if (pathname.endsWith('.md') && !pathname.startsWith('/api/')) {
    const slug = pathname.slice(1, -3); // Remove leading '/' and trailing '.md'

    // Skip special paths
    if (
      slug.includes('/') ||        // Skip nested paths (e.g., /images/foo.md)
      slug.startsWith('_') ||       // Skip Next.js internals (_next, _app)
      slug === 'llms-full'          // Reserved for future use
    ) {
      return NextResponse.next();
    }

    // Rewrite to internal API route
    const url = request.nextUrl.clone();
    url.pathname = `/api/md/${slug}`;
    return NextResponse.rewrite(url);
  }

  return NextResponse.next();
}

export const config = {
  matcher: [
    // Match all paths except static assets
    '/((?!_next/static|_next/image|favicon.ico|.*\\.(?:svg|png|jpg|jpeg|gif|webp|ico)$).*)',
  ],
};

File: middleware.ts

Connects the proxy to Next.js middleware system.

import { proxy, config } from './proxy';

export { config };
export default proxy;

Why separate files?

proxy.ts — Reusable proxy logic with exports for testing
middleware.ts — Next.js middleware entry point (convention-based)

This separation makes the proxy testable and follows Next.js 13+ middleware conventions.

The Two Metadata Fields: excerpt vs description

During implementation, we realized posts need two separate descriptions:

1. `excerpt` — For Humans

Length: 150-200 characters
Used in: Post lists on website, RSS feed, social media previews
Style: More expressive, can be multi-line, storytelling tone

Example:

excerpt: "How I migrated blog.rezvov.com from Ghost CMS to Next.js 16 with the help of Claude Code and Cursor IDE, including full CI/CD setup, newsletter automation, and comprehensive documentation for future LLM interactions."

2. `description` — For LLMs

Length: 1-2 concise sentences
Used in: llms.txt catalog, markdown metadata headers
Style: Compressed, factual, optimized for quick scanning by AI

Example:

description: "How I migrated blog.rezvov.com from Ghost CMS to Next.js 16 with Claude Code and Cursor IDE, including CI/CD, newsletter automation, and LLM-oriented documentation."

Fallback Logic

If description is missing, we generate it from excerpt:

const description = post.description ||
  post.excerpt.split('\n').filter(line => line.trim())
    .slice(0, 2).join(' ').substring(0, 150);

This ensures backward compatibility with older posts while encouraging proper description values for new content.

Updated Linter Rules

We updated scripts/lint-posts.ts to enforce the new metadata requirements.

// Required frontmatter fields
const required = ['title', 'slug', 'date', 'excerpt', 'description', 'author'];

required.forEach(field => {
  if (!this.frontmatter[field]) {
    this.addIssue({
      severity: 'error',
      rule: 'frontmatter-required',
      message: `Missing required frontmatter field: ${field}`,
    });
  }
});

// Validate description is concise (max 200 chars)
if (this.frontmatter.description && this.frontmatter.description.length > 200) {
  this.addIssue({
    severity: 'warning',
    rule: 'description-too-long',
    message: `Description should be concise (current: ${this.frontmatter.description.length} chars, recommended: <200)`,
  });
}

Now every post must have both excerpt (for humans) and description (for LLMs).

The Proxy Bug Discovery

After implementing everything, running the linter, testing locally, and deploying to production — .md URLs were returning 404s.

$ curl https://blog.rezvov.com/post-slug.md
# 404 Not Found

The Problem

Our proxy.ts file existed, but wasn't connected to Next.js routing.

Next.js changed middleware conventions between versions. We needed a proper middleware.ts file to act as the entry point:

// middleware.ts (the missing piece)
import { proxy, config } from './proxy';
export { config };
export default proxy;

Why It Failed

Next.js convention change — Older versions used middleware.ts directly; newer versions still use it, but we had only proxy.ts without the proper entry point
Local dev vs production — Local dev server was more forgiving; production build strictly enforced the convention
No error messages — Next.js silently ignored the proxy logic without throwing errors

The Fix

Created middleware.ts as the canonical entry point, importing from proxy.ts:

import { proxy, config } from './proxy';
export { config };
export default proxy;

This follows the separation of concerns pattern:

proxy.ts — Pure logic, testable, reusable
middleware.ts — Framework entry point, convention-based

Lesson Learned

Always test production deployments end-to-end, even for "simple" features.

Framework conventions evolve, and local dev environments don't always mirror production behavior perfectly.

Testing the Implementation

After fixing the proxy connection, here's how to verify everything works:

1. Check llms.txt Catalog

curl https://blog.rezvov.com/llms.txt

Expected output:

# Blog.Rezvov.Com

> Technical blog by Alex Rezvov: software development, AI-assisted development, automation, and engineering practices.

## 2026
- [Making Your Blog LLM-Friendly...](https://blog.rezvov.com/making-your-blog-llm-friendly-implementing-llms-txt.md): How we implemented...
- [Migrating from Ghost to Next.js...](https://blog.rezvov.com/migrating-from-ghost-to-nextjs-with-claude-and-cursor.md): How I migrated...

## Topics
- [software-development](https://blog.rezvov.com/tag/software-development): 26 articles
...

2. Test Markdown Content

curl https://blog.rezvov.com/migrating-from-ghost-to-nextjs-with-claude-and-cursor.md

Expected output: Raw markdown with:

Metadata header (author, date, tags, canonical URL)
LLM instruction block
Featured image (if present)
Full post content in markdown

3. Verify Search Engine Blocking

curl -I https://blog.rezvov.com/post-slug.md | grep X-Robots-Tag

Expected output:

X-Robots-Tag: noindex

This ensures .md URLs don't appear in Google search results, preventing duplicate content issues.

4. Check Content-Type Headers

curl -I https://blog.rezvov.com/llms.txt | grep Content-Type
curl -I https://blog.rezvov.com/post-slug.md | grep Content-Type

Expected output:

Content-Type: text/plain; charset=utf-8
Content-Type: text/markdown; charset=utf-8

Proper MIME types ensure correct handling by LLM indexers.

Performance Impact

Build Time

Before: ~15 seconds (26 posts) After: ~17 seconds (26 posts + 28 markdown routes + llms.txt)

Impact: +2 seconds (13% increase)

This is acceptable because:

Static generation happens once at build time
No runtime overhead
CDN caching makes responses instant

Runtime Performance

Zero overhead — everything is pre-rendered at build time.

Cache Strategy

llms.txt:

'Cache-Control': 'public, s-maxage=3600, stale-while-revalidate'

Markdown files:

'Cache-Control': 'public, s-maxage=3600, stale-while-revalidate'

Why 1 hour?

Posts rarely change after publication
stale-while-revalidate ensures graceful updates
CDN serves cached version while revalidating in background

Lessons Learned

1. Test URL Rewriting in Production

Middleware and proxy logic can behave differently in local dev vs production builds.

Always:

Test deployed URLs end-to-end
Verify headers (X-Robots-Tag, Content-Type)
Check cache behavior with multiple requests

2. Separate Metadata for Different Audiences

Human-facing descriptions (excerpt) and LLM-facing descriptions (description) have different constraints.

Don't try to use one field for both:

Humans prefer storytelling, context, emotional hooks
LLMs prefer compressed facts, keywords, clear structure

3. Framework Conventions Change

Next.js middleware went through several iterations:

Early versions: _middleware.ts in pages
Pages Router: middleware.ts at root
App Router: Still middleware.ts, but stricter conventions

Stay updated with framework changes, especially for routing features.

4. Static Generation for LLM Content

Pre-rendering .md routes at build time (with generateStaticParams) ensures:

Fast responses (no server-side rendering)
No runtime overhead
Predictable performance

Don't use dynamic routes for content that rarely changes.

5. Validate with Linters

Adding description to required frontmatter fields catches missing metadata before deployment.

Linter runs in pre-commit hook, preventing broken posts from reaching production.

What's Next

Future improvements we're considering:

1. Structured Data in llms.txt

Add JSON-LD for richer metadata:

{
  "@context": "https://schema.org",
  "@type": "Blog",
  "name": "Blog.Rezvov.Com",
  "author": {
    "@type": "Person",
    "name": "Alex Rezvov"
  },
  "blogPost": [...]
}

2. Version History

Expose post edit history via .md URLs:

## Changelog
- 2026-02-16: Initial publication
- 2026-02-17: Added section on performance

3. Related Posts

Include cross-references in markdown metadata:

## Related Articles
- [Post 1](https://blog.rezvov.com/post-1.md)
- [Post 2](https://blog.rezvov.com/post-2.md)

4. Analytics

Track which LLMs are accessing content:

const userAgent = request.headers.get('user-agent');
// Track: Perplexity, ChatGPT, Claude, etc.

5. Rate Limiting

Protect against aggressive crawlers:

// Rate limit: 100 requests per hour per IP
if (rateLimitExceeded(ip)) {
  return new Response('Too Many Requests', { status: 429 });
}

Implementation Checklist

Want to implement this on your blog? Here's the complete checklist:

Conclusion

Making your blog LLM-friendly is no longer optional — it's becoming a content distribution strategy.

LLMs like Perplexity and ChatGPT are increasingly becoming how people discover and consume content. By implementing llms.txt and serving markdown versions, you ensure your content is:

Discoverable — LLMs can find your catalog at a predictable URL
Parseable — Raw markdown is easier to understand than HTML soup
Properly attributed — Canonical URLs ensure credit goes to your site
Search-engine safe — noindex prevents duplicate content issues

And the best part? It took about 2 hours of pair programming with Claude to implement all of this, including:

Writing ~350 lines of code
Creating comprehensive documentation
Fixing the proxy bug
Testing and deployment

The total development time (including this blog post): ~4 hours.

For a feature that fundamentally changes how AI systems discover and index your content, that's a pretty good ROI.

Related:

Questions about implementing this on your blog? Found this helpful? Leave a comment below or reach out on LinkedIn.

Claude's Technical Report

Implementation Overview

The Problem We're Solving

Enter llms.txt

Technical Architecture

Component 1: The llms.txt Catalog

Component 2: Markdown Content API

Component 3: Post Markdown Generator

Component 4: URL Rewriting with Proxy

The Two Metadata Fields: excerpt vs description

1. excerpt — For Humans

2. description — For LLMs

Fallback Logic

Updated Linter Rules

The Proxy Bug Discovery

The Problem

Why It Failed

The Fix

Lesson Learned

Testing the Implementation

1. Check llms.txt Catalog

2. Test Markdown Content

3. Verify Search Engine Blocking

4. Check Content-Type Headers

Performance Impact

Build Time

Runtime Performance

Cache Strategy

Lessons Learned

1. Test URL Rewriting in Production

2. Separate Metadata for Different Audiences

3. Framework Conventions Change

4. Static Generation for LLM Content

5. Validate with Linters

What's Next

1. Structured Data in llms.txt

2. Version History

3. Related Posts

4. Analytics

5. Rate Limiting

Implementation Checklist

Conclusion

Comments

1. `excerpt` — For Humans

2. `description` — For LLMs