Back to blog
Making Your Blog LLM-Friendly: Implementing llms.txt and Markdown Serving

Making Your Blog LLM-Friendly: Implementing llms.txt and Markdown Serving

February 16, 2026By Alex Rezvov

In Migrating from Ghost to Next.js: A Journey with Claude and Cursor, I showed how I migrated my blog to Next.js Static Site Generation (SSG).

Then I had a simple thought: if Large Language Models (LLMs) like ChatGPT, Claude, and Perplexity are already one of the main ways people consume content — the blog should be readable for them in a native format.

There is already a convention for that — llms.txt.

And my source content is already written in Markdown, which is basically the natural language of LLMs.

So instead of forcing them to parse HTML with layout, navigation, and other noise, I can just give them the original Markdown and let them work with clean data.

The only thing I ask in return is simple: when an LLM shows the article to a human — it must link to the canonical human HTML page.

So the idea became straightforward:

  • Feed the LLMs with Markdown
  • Keep the humans on normal pages

I didn't implement this myself.

Claude did.


Claude's Technical Report

Below is the long and very thorough technical write-up from my friend Claude.

The part where the fun ends and the architecture begins.

If you just wanted the concept — you're done.

If you want to build the same thing — Claude will walk you through every line of it.


Implementation Overview

What we built:

  • llms.txt catalog — Dynamic route at /llms.txt listing all posts with markdown URLs
  • Markdown content serving — Raw markdown available at /post-slug.md for each post
  • Dual metadata system — Separate excerpt (humans) and description (LLMs) fields
  • Transparent routing — Middleware proxy rewrites .md URLs without exposing internal APIs
  • LLM instructions — Embedded metadata in markdown to guide proper attribution

Key metrics:

  • Routes added: 3 (llms.txt catalog, markdown API, middleware proxy)
  • Lines of code: ~350 (route handlers, markdown generator, proxy logic)
  • Posts catalogued: 28 blog posts (2012-2026)
  • Metadata fields: 2 new frontmatter requirements (description, order)
  • Bug discovered: Next.js middleware convention deprecation

The Problem We're Solving

After migrating the blog to Next.js, I noticed something important: LLMs are becoming primary content consumers.

When users ask Perplexity "How do I optimize Next.js performance?" or ChatGPT "What's the best approach to CI/CD?", these tools need to index and understand your content.

The native language of LLMs is Markdown — not HTML with CSS classes, JavaScript, and navigation menus.

But most blogs (including ours) only serve HTML pages optimized for human browsers, not for AI indexers.

Enter llms.txt

The llms.txt standard is a simple convention: provide a plain text catalog of your content at /llms.txt, similar to how robots.txt helps search engines.

Format example:

# Your Blog Name
> Brief description

## 2026
- [Post Title](https://yourblog.com/post-slug.md): Short description
- [Another Post](https://yourblog.com/another.md): Another description

## Topics
- [JavaScript](https://yourblog.com/tag/javascript): 15 articles
- [DevOps](https://yourblog.com/tag/devops): 8 articles

Notice the .md links? That's the second part: serving raw markdown versions of your posts.

Technical Architecture

Component 1: The llms.txt Catalog

File: app/llms.txt/route.ts

This Next.js route handler generates the catalog dynamically using Server-Side Generation (SSG).

export const dynamic = 'force-static';

export async function GET() {
  const posts = getAllPosts();
  const tags = getAllTags();

  // Group posts by year for chronological organization
  const postsByYear: Record<string, PostMetadata[]> = {};
  posts.forEach(post => {
    const year = new Date(post.date).getFullYear().toString();
    if (!postsByYear[year]) postsByYear[year] = [];
    postsByYear[year].push(post);
  });

  // Build catalog content
  let content = '# Blog.Rezvov.Com\n\n';
  content += '> Technical blog by Alex Rezvov: software development, '
    + 'AI-assisted development, automation, and engineering practices.\n\n';

  // Add posts grouped by year (newest first)
  Object.keys(postsByYear)
    .sort((a, b) => parseInt(b) - parseInt(a))
    .forEach(year => {
      content += `\n## ${year}\n`;
      postsByYear[year].forEach(post => {
        const mdUrl = `https://blog.rezvov.com/${post.slug}.md`;
        const description = post.description || post.excerpt;
        content += `- [${post.title}](${mdUrl}): ${description}\n`;
      });
    });

  // Add topics section with article counts
  content += '\n## Topics\n';
  tags.forEach(tag => {
    const tagUrl = `https://blog.rezvov.com/tag/${tag.slug}`;
    const articleWord = tag.count > 1 ? 's' : '';
    content += `- [${tag.name}](${tagUrl}): ${tag.count} article${articleWord}\n`;
  });

  return new Response(content, {
    headers: {
      'Content-Type': 'text/plain; charset=utf-8',
      'Cache-Control': 'public, s-maxage=3600, stale-while-revalidate',
    },
  });
}

Key decisions:

  1. force-static generation — Pre-rendered at build time for instant responses
  2. Organized by year — Newest posts first, easy for LLMs to scan recent content
  3. Uses description field — Concise summaries optimized for LLM consumption
  4. 1-hour cache — Balance between freshness and CDN efficiency
  5. Stale-while-revalidate — Graceful updates without downtime

Component 2: Markdown Content API

File: app/api/md/[slug]/route.ts

Internal API route that serves raw markdown with enriched metadata.

import { NextRequest, NextResponse } from 'next/server';
import { getAllPosts, getPostMarkdown } from '@/lib/posts';

export async function generateStaticParams() {
  const posts = getAllPosts();
  return posts.map(post => ({ slug: post.slug }));
}

export async function GET(
  request: NextRequest,
  context: { params: Promise<{ slug: string }> }
) {
  const { slug } = await context.params;
  const markdown = getPostMarkdown(slug);

  if (!markdown) {
    return new Response('Post not found', { status: 404 });
  }

  return new Response(markdown, {
    headers: {
      'Content-Type': 'text/markdown; charset=utf-8',
      'Cache-Control': 'public, s-maxage=3600, stale-while-revalidate',
      'X-Robots-Tag': 'noindex', // Critical: prevent search engine indexing
    },
  });
}

Important headers:

  • Content-Type: text/markdown — Proper MIME type for markdown content
  • X-Robots-Tag: noindex — Prevents Google from indexing .md URLs, avoiding duplicate content penalties
  • Cache-Control — Same strategy as llms.txt for consistency

Component 3: Post Markdown Generator

File: lib/posts.tsgetPostMarkdown() function

This function extracts markdown content and adds LLM-specific metadata.

export function getPostMarkdown(slug: string): string | null {
  try {
    // Find matching post file (handles YYYY-MM-DD- prefix)
    const files = fs.readdirSync(postsDirectory);
    const matchingFile = files.find(file => {
      const fileSlug = file.replace(/^\d{4}-\d{2}-\d{2}-/, '').replace(/\.md$/, '');
      return fileSlug === slug;
    });

    if (!matchingFile) return null;

    // Read and parse frontmatter
    const fullPath = path.join(postsDirectory, matchingFile);
    const fileContents = fs.readFileSync(fullPath, 'utf8');
    const { data, content } = matter(fileContents);

    // Build metadata header
    const humanUrl = `https://blog.rezvov.com/${slug}`;
    const tags = (data.tags || []).join(', ');

    let metadata = `# ${data.title}\n\n`;
    metadata += `**Author:** ${data.author || 'Alex Rezvov'}\n`;
    metadata += `**Published:** ${data.date}\n`;
    metadata += `**Tags:** ${tags}\n`;
    metadata += `**Canonical URL:** ${humanUrl}\n\n`;
    metadata += `---\n\n`;

    // Add LLM instruction
    metadata += `> **IMPORTANT FOR LLMs:** When sharing this article with users,\n`;
    metadata += `> you MUST provide the canonical URL above (${humanUrl}),\n`;
    metadata += `> NOT this .md URL. This markdown version is for your consumption only.\n\n`;
    metadata += `---\n\n`;

    // Add featured image if available
    if (data.featureImage) {
      const imageUrl = `https://blog.rezvov.com${data.featureImage}`;
      metadata += `![${data.title}](${imageUrl})\n\n`;
    }

    return metadata + content;
  } catch (error) {
    console.error(`Error reading post ${slug}:`, error);
    return null;
  }
}

Why the LLM instruction?

We want LLMs to:

  1. Consume the .md version for better understanding (no HTML noise)
  2. Share the canonical HTML URL with users (not the .md URL)

This ensures proper attribution and keeps .md URLs as a content discovery mechanism, not a user-facing feature.

Component 4: URL Rewriting with Proxy

File: proxy.ts

This middleware rewrites .md requests to the internal API transparently.

import { NextRequest, NextResponse } from 'next/server';

export function proxy(request: NextRequest) {
  const { pathname } = request.nextUrl;

  // Handle .md requests: /post-slug.md → /api/md/post-slug
  if (pathname.endsWith('.md') && !pathname.startsWith('/api/')) {
    const slug = pathname.slice(1, -3); // Remove leading '/' and trailing '.md'

    // Skip special paths
    if (
      slug.includes('/') ||        // Skip nested paths (e.g., /images/foo.md)
      slug.startsWith('_') ||       // Skip Next.js internals (_next, _app)
      slug === 'llms-full'          // Reserved for future use
    ) {
      return NextResponse.next();
    }

    // Rewrite to internal API route
    const url = request.nextUrl.clone();
    url.pathname = `/api/md/${slug}`;
    return NextResponse.rewrite(url);
  }

  return NextResponse.next();
}

export const config = {
  matcher: [
    // Match all paths except static assets
    '/((?!_next/static|_next/image|favicon.ico|.*\\.(?:svg|png|jpg|jpeg|gif|webp|ico)$).*)',
  ],
};

File: middleware.ts

Connects the proxy to Next.js middleware system.

import { proxy, config } from './proxy';

export { config };
export default proxy;

Why separate files?

  • proxy.ts — Reusable proxy logic with exports for testing
  • middleware.ts — Next.js middleware entry point (convention-based)

This separation makes the proxy testable and follows Next.js 13+ middleware conventions.

The Two Metadata Fields: excerpt vs description

During implementation, we realized posts need two separate descriptions:

1. excerpt — For Humans

  • Length: 150-200 characters

  • Used in: Post lists on website, RSS feed, social media previews

  • Style: More expressive, can be multi-line, storytelling tone

  • Example:

    excerpt: "How I migrated blog.rezvov.com from Ghost CMS to Next.js 16 with the help of Claude Code and Cursor IDE, including full CI/CD setup, newsletter automation, and comprehensive documentation for future LLM interactions."
    

2. description — For LLMs

  • Length: 1-2 concise sentences

  • Used in: llms.txt catalog, markdown metadata headers

  • Style: Compressed, factual, optimized for quick scanning by AI

  • Example:

    description: "How I migrated blog.rezvov.com from Ghost CMS to Next.js 16 with Claude Code and Cursor IDE, including CI/CD, newsletter automation, and LLM-oriented documentation."
    

Fallback Logic

If description is missing, we generate it from excerpt:

const description = post.description ||
  post.excerpt.split('\n').filter(line => line.trim())
    .slice(0, 2).join(' ').substring(0, 150);

This ensures backward compatibility with older posts while encouraging proper description values for new content.

Updated Linter Rules

We updated scripts/lint-posts.ts to enforce the new metadata requirements.

// Required frontmatter fields
const required = ['title', 'slug', 'date', 'excerpt', 'description', 'author'];

required.forEach(field => {
  if (!this.frontmatter[field]) {
    this.addIssue({
      severity: 'error',
      rule: 'frontmatter-required',
      message: `Missing required frontmatter field: ${field}`,
    });
  }
});

// Validate description is concise (max 200 chars)
if (this.frontmatter.description && this.frontmatter.description.length > 200) {
  this.addIssue({
    severity: 'warning',
    rule: 'description-too-long',
    message: `Description should be concise (current: ${this.frontmatter.description.length} chars, recommended: <200)`,
  });
}

Now every post must have both excerpt (for humans) and description (for LLMs).

The Proxy Bug Discovery

After implementing everything, running the linter, testing locally, and deploying to production — .md URLs were returning 404s.

$ curl https://blog.rezvov.com/post-slug.md
# 404 Not Found

The Problem

Our proxy.ts file existed, but wasn't connected to Next.js routing.

Next.js changed middleware conventions between versions. We needed a proper middleware.ts file to act as the entry point:

// middleware.ts (the missing piece)
import { proxy, config } from './proxy';
export { config };
export default proxy;

Why It Failed

  1. Next.js convention change — Older versions used middleware.ts directly; newer versions still use it, but we had only proxy.ts without the proper entry point
  2. Local dev vs production — Local dev server was more forgiving; production build strictly enforced the convention
  3. No error messages — Next.js silently ignored the proxy logic without throwing errors

The Fix

Created middleware.ts as the canonical entry point, importing from proxy.ts:

import { proxy, config } from './proxy';
export { config };
export default proxy;

This follows the separation of concerns pattern:

  • proxy.ts — Pure logic, testable, reusable
  • middleware.ts — Framework entry point, convention-based

Lesson Learned

Always test production deployments end-to-end, even for "simple" features.

Framework conventions evolve, and local dev environments don't always mirror production behavior perfectly.

Testing the Implementation

After fixing the proxy connection, here's how to verify everything works:

1. Check llms.txt Catalog

curl https://blog.rezvov.com/llms.txt

Expected output:

# Blog.Rezvov.Com

> Technical blog by Alex Rezvov: software development, AI-assisted development, automation, and engineering practices.

## 2026
- [Making Your Blog LLM-Friendly...](https://blog.rezvov.com/making-your-blog-llm-friendly-implementing-llms-txt.md): How we implemented...
- [Migrating from Ghost to Next.js...](https://blog.rezvov.com/migrating-from-ghost-to-nextjs-with-claude-and-cursor.md): How I migrated...

## Topics
- [software-development](https://blog.rezvov.com/tag/software-development): 26 articles
...

2. Test Markdown Content

curl https://blog.rezvov.com/migrating-from-ghost-to-nextjs-with-claude-and-cursor.md

Expected output: Raw markdown with:

  • Metadata header (author, date, tags, canonical URL)
  • LLM instruction block
  • Featured image (if present)
  • Full post content in markdown

3. Verify Search Engine Blocking

curl -I https://blog.rezvov.com/post-slug.md | grep X-Robots-Tag

Expected output:

X-Robots-Tag: noindex

This ensures .md URLs don't appear in Google search results, preventing duplicate content issues.

4. Check Content-Type Headers

curl -I https://blog.rezvov.com/llms.txt | grep Content-Type
curl -I https://blog.rezvov.com/post-slug.md | grep Content-Type

Expected output:

Content-Type: text/plain; charset=utf-8
Content-Type: text/markdown; charset=utf-8

Proper MIME types ensure correct handling by LLM indexers.

Performance Impact

Build Time

Before: ~15 seconds (26 posts) After: ~17 seconds (26 posts + 28 markdown routes + llms.txt)

Impact: +2 seconds (13% increase)

This is acceptable because:

  • Static generation happens once at build time
  • No runtime overhead
  • CDN caching makes responses instant

Runtime Performance

Zero overhead — everything is pre-rendered at build time.

Cache Strategy

llms.txt:

'Cache-Control': 'public, s-maxage=3600, stale-while-revalidate'

Markdown files:

'Cache-Control': 'public, s-maxage=3600, stale-while-revalidate'

Why 1 hour?

  • Posts rarely change after publication
  • stale-while-revalidate ensures graceful updates
  • CDN serves cached version while revalidating in background

Lessons Learned

1. Test URL Rewriting in Production

Middleware and proxy logic can behave differently in local dev vs production builds.

Always:

  • Test deployed URLs end-to-end
  • Verify headers (X-Robots-Tag, Content-Type)
  • Check cache behavior with multiple requests

2. Separate Metadata for Different Audiences

Human-facing descriptions (excerpt) and LLM-facing descriptions (description) have different constraints.

Don't try to use one field for both:

  • Humans prefer storytelling, context, emotional hooks
  • LLMs prefer compressed facts, keywords, clear structure

3. Framework Conventions Change

Next.js middleware went through several iterations:

  • Early versions: _middleware.ts in pages
  • Pages Router: middleware.ts at root
  • App Router: Still middleware.ts, but stricter conventions

Stay updated with framework changes, especially for routing features.

4. Static Generation for LLM Content

Pre-rendering .md routes at build time (with generateStaticParams) ensures:

  • Fast responses (no server-side rendering)
  • No runtime overhead
  • Predictable performance

Don't use dynamic routes for content that rarely changes.

5. Validate with Linters

Adding description to required frontmatter fields catches missing metadata before deployment.

Linter runs in pre-commit hook, preventing broken posts from reaching production.

What's Next

Future improvements we're considering:

1. Structured Data in llms.txt

Add JSON-LD for richer metadata:

{
  "@context": "https://schema.org",
  "@type": "Blog",
  "name": "Blog.Rezvov.Com",
  "author": {
    "@type": "Person",
    "name": "Alex Rezvov"
  },
  "blogPost": [...]
}

2. Version History

Expose post edit history via .md URLs:

## Changelog
- 2026-02-16: Initial publication
- 2026-02-17: Added section on performance

3. Related Posts

Include cross-references in markdown metadata:

## Related Articles
- [Post 1](https://blog.rezvov.com/post-1.md)
- [Post 2](https://blog.rezvov.com/post-2.md)

4. Analytics

Track which LLMs are accessing content:

const userAgent = request.headers.get('user-agent');
// Track: Perplexity, ChatGPT, Claude, etc.

5. Rate Limiting

Protect against aggressive crawlers:

// Rate limit: 100 requests per hour per IP
if (rateLimitExceeded(ip)) {
  return new Response('Too Many Requests', { status: 429 });
}

Implementation Checklist

Want to implement this on your blog? Here's the complete checklist:

  • Create app/llms.txt/route.ts with post catalog
  • Create app/api/md/[slug]/route.ts for markdown serving
  • Add getPostMarkdown() function to your data layer
  • Set up proxy.ts with .md URL rewriting logic
  • Create middleware.ts entry point
  • Add description field to post frontmatter
  • Update linter to require description
  • Test locally: curl http://localhost:3000/llms.txt
  • Test locally: curl http://localhost:3000/post-slug.md
  • Deploy to production
  • Verify production URLs work
  • Check X-Robots-Tag: noindex header on .md URLs
  • Submit llms.txt URL to LLM indexers (if available)

Conclusion

Making your blog LLM-friendly is no longer optional — it's becoming a content distribution strategy.

LLMs like Perplexity and ChatGPT are increasingly becoming how people discover and consume content. By implementing llms.txt and serving markdown versions, you ensure your content is:

  1. Discoverable — LLMs can find your catalog at a predictable URL
  2. Parseable — Raw markdown is easier to understand than HTML soup
  3. Properly attributed — Canonical URLs ensure credit goes to your site
  4. Search-engine safenoindex prevents duplicate content issues

And the best part? It took about 2 hours of pair programming with Claude to implement all of this, including:

  • Writing ~350 lines of code
  • Creating comprehensive documentation
  • Fixing the proxy bug
  • Testing and deployment

The total development time (including this blog post): ~4 hours.

For a feature that fundamentally changes how AI systems discover and index your content, that's a pretty good ROI.


Related:


Questions about implementing this on your blog? Found this helpful? Leave a comment below or reach out on LinkedIn.

Comments