What user-agent string does ChatGPT use for robots.txt?

ChatGPT uses two distinct user-agent strings depending on the context. GPTBot is the crawler that OpenAI uses to index web content for training data and ChatGPT's web search feature. ChatGPT-User is the agent used when a user actively triggers a live web browse from within ChatGPT. Both need to be explicitly allowed in robots.txt if you want full ChatGPT visibility. In 2026, both strings are well-documented in OpenAI's official documentation. To allow both, add separate user-agent blocks for GPTBot and ChatGPT-User, each with Allow: /. If you only allow GPTBot, real-time ChatGPT browsing sessions may still be blocked, which affects citation in live ChatGPT queries. Allow both.

How do I add robots.txt to a Lovable app?

Place your robots.txt file in the /public/ folder of your Lovable project. Lovable builds React SPAs that serve everything in the public folder at the root domain, so a file at /public/robots.txt becomes yourdomain.com/robots.txt automatically. No configuration needed. If you've deployed via a custom domain connected to Lovable's hosting, the public folder is served directly. If you're using a third-party host like Netlify or Vercel, the same rule applies: files in public/ are served at root. After adding or updating the file, verify it's accessible by visiting yourdomain.com/robots.txt in a browser. Then submit your sitemap via Google Search Console to trigger a fresh crawl.

What happens if I block GPTBot in robots.txt?

If GPTBot is blocked in robots.txt, OpenAI's crawler won't index your Lovable site's content. Your pages won't appear in ChatGPT's web search results or be available for citation in ChatGPT responses. For a Lovable app that's already fighting JavaScript rendering challenges, blocking GPTBot is a double penalty: the content is hard to crawl to begin with, and then you actively tell the bot to stay away. GPTBot respects robots.txt directives, so a Disallow: / rule for GPTBot is fully enforced. The same logic applies to PerplexityBot, ClaudeBot, and every other AI crawler. Blocking them means zero AI citation potential from those platforms, no matter how good your content is.

Should I block CCBot in robots.txt?

CCBot is the crawler for Common Crawl, a non-profit that creates an open dataset of web content used to train many open-source and commercial LLMs. Blocking CCBot means your content won't appear in that training dataset, which can reduce your site's presence in AI systems that rely on Common Crawl data. That said, some site owners choose to block CCBot for privacy or competitive reasons, since the dataset is publicly available. For most Lovable app owners focused on AI visibility, allowing CCBot is the right call. Your content helps shape future AI models, and being present in Common Crawl data correlates with higher citation rates in LLMs trained on that data. If your Lovable app has sensitive content or user data, you'd want to block CCBot on those specific paths only.

What is Google-Extended and do I need to allow it?

Google-Extended is a separate user-agent Google introduced in 2023 for AI training and AI Overviews specifically, distinct from the main Googlebot. Blocking Google-Extended prevents Google from using your content for AI feature training and can reduce your visibility in Google AI Overviews. In 2026, Google AI Overviews appears on a large share of informational queries, so sites that block Google-Extended may see significantly reduced AI-generated visibility on Google even if they rank normally in standard results. For a Lovable app aiming for maximum visibility, allow Google-Extended with Allow: /. Only block it if you have a specific business reason to opt out of Google's AI training, such as a content licensing policy or brand protection concern.

What is the difference between robots.txt and llms.txt for a Lovable app?

robots.txt is an access control file. It tells crawlers whether they're permitted to visit specific URLs. A Disallow rule in robots.txt is enforced by compliant bots, who won't fetch those paths at all. llms.txt is a content guidance file. It doesn't block or allow anything. Instead, it highlights which pages on your site are most valuable and should be prioritized by AI systems during indexing. For a Lovable app, you need both: robots.txt to ensure no AI crawler is accidentally blocked, especially given common Disallow: /assets/ mistakes, and llms.txt to direct AI attention to your key pages once access is confirmed. Think of robots.txt as the gate and llms.txt as the tour guide. One controls entry, the other shapes the visit.

Why is my Lovable app being blocked by AI crawlers despite an open robots.txt?

Even with a permissive robots.txt, Lovable apps can be invisible to AI crawlers for a different reason: JavaScript rendering. All major AI crawlers, including GPTBot, PerplexityBot, and ClaudeBot, fetch raw HTML only. They don't execute JavaScript. A Lovable React SPA delivers an almost empty HTML document before JavaScript runs, so even a crawler that's allowed in reads nothing meaningful. Fixing robots.txt is the first step, not the last. After confirming your robots.txt allows all AI bots, you still need to either implement pre-rendering, use Cloudflare Workers for bot detection and HTML delivery, or migrate public pages to a server-rendered framework. robots.txt removes a blocker. Pre-rendering is what actually makes your content visible.

How do I test if my robots.txt is blocking AI crawlers?

The fastest method is Google Search Console's robots.txt tester, accessible under Settings in your Search Console property. Paste your robots.txt content and test specific user-agent strings like GPTBot, PerplexityBot, and ClaudeBot against your key URLs. A green result means access is allowed. You can also test via the command line using curl with a custom user-agent: curl -A 'GPTBot' https://yourdomain.com/robots.txt. This confirms the file is being served at the correct URL. For verifying which paths each bot can access, test individual URLs through the Search Console tester. It's the most reliable way to catch accidental Disallow rules before they affect your AI visibility.

Lovable robots.txt: Allow GPTBot, PerplexityBot, ClaudeBot in 2026

Most robots.txt files I've seen on Lovable apps accidentally block AI crawlers. Not intentionally. Usually it's a template someone copied years ago that has a blanket Disallow: /assets/ or a wildcard rule that was never updated to account for AI bots that didn't exist when the template was written.

The result is that GPTBot, PerplexityBot, ClaudeBot, and several other AI crawlers quietly bounce off your site and move on. No error. No warning. Your content just isn't there for them.

This article gives you the exact robots.txt to use on a Lovable app, with every major AI crawler user-agent listed. There's nothing complicated here. It's mostly a reference document you'll bookmark and come back to when you need it.

Why Your Lovable robots.txt Might Be Blocking AI Crawlers

robots.txt is an access control file. It tells crawlers which parts of your site they're allowed to visit. A compliant bot reads it before crawling anything else. If it sees a Disallow rule that covers a URL, it skips that URL entirely.

The most common way Lovable apps accidentally block AI crawlers is through a wildcard rule that looks like this:

User-agent: *
Disallow: /assets/
Disallow: /admin/

That's a reasonable-looking robots.txt. Block assets, block admin. Seems fine.

It isn't fine, for two reasons. First, blocking /assets/ tells Googlebot it can't access your JavaScript and CSS files. Google needs those files to render your React SPA. Without them, even Google's deferred JavaScript rendering pipeline fails. You're essentially asking Google to render a page while blocking the code that makes the page work.

Second, if a newer AI crawler like GPTBot or ClaudeBot encounters a wildcard Disallow rule with no explicit Allow: / override, it follows the most restrictive interpretation. Some bots treat a missing explicit allow as permission to proceed; others don't. The only safe approach is to be explicit.

Add explicit Allow: / blocks for every AI crawler you want to reach your content. This isn't optional if you care about AI citation.

The Complete List of AI Crawler User-Agents in 2026

Ten AI crawler user-agent strings matter for a Lovable app in 2026. Every major AI platform has at least one. Some have two.

Bot Name	User-Agent String	Owner	Crawls For	Allow?
GPTBot	`GPTBot`	OpenAI	ChatGPT training + web search	Yes
ChatGPT-User	`ChatGPT-User`	OpenAI	Live ChatGPT browsing sessions	Yes
PerplexityBot	`PerplexityBot`	Perplexity	Perplexity search index	Yes
ClaudeBot	`ClaudeBot`	Anthropic	Claude training data	Yes
Google-Extended	`Google-Extended`	Google	AI Overviews + Gemini training	Yes
Applebot-Extended	`Applebot-Extended`	Apple	Apple AI features	Yes
FacebookBot	`facebookexternalhit`	Meta	Meta AI, link previews	Yes
CCBot	`CCBot`	Common Crawl	Open LLM training datasets	Yes
Bytespider	`Bytespider`	ByteDance	TikTok AI, Doubao	Yes
Diffbot	`Diffbot`	Diffbot	Knowledge graph AI	Yes

There are more obscure AI crawlers, but these 10 cover the platforms that matter for AI citation in 2026. Allowing all of them takes 20 lines in your robots.txt.

What Each Crawler Does and Why You Should Allow It

GPTBot and ChatGPT-User are both OpenAI's. GPTBot is the background crawler that indexes the web for training data and enables ChatGPT's web search capability. ChatGPT-User is the agent triggered during a live browsing session when a user asks ChatGPT to search the web in real time. You need both allowed. Blocking GPTBot removes you from ChatGPT's index. Blocking ChatGPT-User means live ChatGPT queries can't retrieve your content even if you rank elsewhere.

PerplexityBot feeds Perplexity's search index, the one it uses to generate cited answers in real time. Perplexity is one of the fastest-growing AI search platforms in 2026. If you're not in its index, you're not in its answers. Perplexity is also one of the few AI platforms that actively reads llms.txt, so allowing PerplexityBot is just the first step of a two-part setup. See our llms.txt guide for the second part.

ClaudeBot is Anthropic's crawler. It indexes content that can inform Claude's responses and training. The user-agent string is literally ClaudeBot.

Google-Extended is a separate Google crawler introduced in 2023, distinct from Googlebot. It's used specifically for AI training and for generating content that appears in Google AI Overviews. Blocking it with a Disallow rule in your robots.txt means Google won't use your content in AI Overviews, even if your pages rank normally in standard search results. In 2026, AI Overviews appears on a significant portion of informational queries. Don't block it.

Applebot-Extended is Apple's AI-specific crawler, separate from the standard Applebot used for Apple Search. It feeds Siri, Apple Intelligence, and other Apple AI features.

CCBot is the Common Crawl crawler. Common Crawl is a non-profit that publishes a massive open dataset of web content, and that dataset is used to train a huge range of open-source and commercial LLMs. Allowing CCBot means your content can end up in training data for models you've never heard of, which expands your citation surface significantly.

Bytespider is ByteDance's crawler. It feeds TikTok's AI features, Doubao (ByteDance's LLM), and other AI products in their portfolio.

Diffbot builds a structured knowledge graph from web content and licenses it to AI developers. Many AI applications use Diffbot data as a structured content source rather than raw crawl data.

Free Tool

See how visible your Lovable app is to AI crawlers

The free Ranking Lens GEO analysis checks your robots.txt, crawlability, and AI citation signals in one scan.

Check My AI Visibility →

The Perfect robots.txt for a Lovable App (Copy-Paste)

This is the complete recommended robots.txt for a Lovable app. Copy it, replace the sitemap URL with your own domain, and drop it in your /public/ folder.

# Allow all crawlers full access by default
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /dashboard/
Disallow: /api/
Disallow: /private/

# OpenAI - ChatGPT training and web search
User-agent: GPTBot
Allow: /

# OpenAI - Live ChatGPT browsing sessions
User-agent: ChatGPT-User
Allow: /

# Perplexity
User-agent: PerplexityBot
Allow: /

# Anthropic - Claude
User-agent: ClaudeBot
Allow: /

# Google AI Overviews and AI training (separate from Googlebot)
User-agent: Google-Extended
Allow: /

# Apple AI features
User-agent: Applebot-Extended
Allow: /

# Meta AI and link previews
User-agent: facebookexternalhit
Allow: /

# Common Crawl - trains many open-source LLMs
User-agent: CCBot
Allow: /

# ByteDance / TikTok AI
User-agent: Bytespider
Allow: /

# Diffbot knowledge graph
User-agent: Diffbot
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

A few things to notice about this file.

The wildcard block at the top includes Allow: / explicitly, with only specific admin and API paths disallowed. Notice that /assets/ is not in the Disallow list. Never disallow your assets directory on a Lovable app. That's where your JavaScript bundle lives, and Google needs it to render your SPA.

Each AI crawler gets its own block with an explicit Allow: /. This removes any ambiguity. Even if a bot is stricter about wildcard interpretation, the explicit block makes your intent clear.

The Sitemap directive at the bottom tells every crawler where your sitemap lives. Always include it.

Common robots.txt Mistakes on Lovable Projects

The /assets/ mistake is the worst offender. Lovable bundles your JavaScript into /assets/, and if Googlebot can't fetch that directory, it can't render your SPA. At all. Not even with its deferred rendering pipeline. The rendered content it stores will be the empty <div id="root"></div>. You can check whether this is happening by using the URL Inspection tool in Google Search Console and looking at the rendered screenshot. If you see a blank page, blocked resources are probably the cause.

Another common mistake is using a template robots.txt that has Disallow: / for specific bots without understanding what those bots are. Some older templates blocklist scrapers, and some of those scraper user-agents overlap with AI crawlers. If you see lines like User-agent: ia_archiver followed by Disallow: /, that's blocking the Internet Archive's crawler. Not usually a problem for SEO, but it illustrates the risk: robots.txt files accumulate rules over time, and it's worth reading yours from scratch.

Missing the Sitemap directive is another oversight that slows things down. Your sitemap tells crawlers what pages exist. Without it, bots have to discover pages by following links, which is slower and less complete on a Lovable SPA where internal navigation is JavaScript-driven.

Forgetting ChatGPT-User when adding GPTBot is a subtle gap. Both user-agents are from OpenAI, but they serve different functions. If you copy a robots.txt that only mentions GPTBot, live ChatGPT browsing sessions won't have explicit permission. Add both.

Finally, placing robots.txt in the wrong location. On a Lovable app, it must be in the /public/ folder so it serves at yourdomain.com/robots.txt. A file placed anywhere else won't be served correctly.

How to Test Your robots.txt Is Working

Google Search Console has a built-in robots.txt tester. Go to Settings in your Search Console property and look for the robots.txt tester link. Paste your robots.txt content and then test individual user-agent strings (try GPTBot, PerplexityBot, ClaudeBot) against specific URLs on your site. A green result means that bot is allowed to crawl that URL. A red result means it's blocked.

The command-line method is faster for quick checks. Run this in your terminal:

curl -A "GPTBot" https://yourdomain.com/robots.txt

This fetches your robots.txt as if you were GPTBot. You'll see the file content returned. If you get a 404 or the file isn't there, crawlers are also getting a 404, which most bots interpret as "allow all" (since no restrictions are stated). That's okay but not ideal. You always want an explicit, well-structured file rather than relying on the absence of one.

To test whether a specific path is allowed for a specific bot, you can also use Google's online robots.txt tester at search.google.com/search-console/robots-testing-tool. It accepts any robots.txt content and lets you test any URL against any user-agent string.

Cloudflare's firewall rules can sometimes block crawlers before they reach your robots.txt, independently of your robots.txt content. If you're using Cloudflare and bots seem to be blocked despite a correct robots.txt, check whether any WAF rules or bot management settings are blocking non-browser user-agents. This is a separate layer from robots.txt and needs to be checked independently.

After any robots.txt change, submit your sitemap via Google Search Console to trigger fresh crawling activity. Don't wait for bots to discover the change on their own.

Free Tool

Run a free Lovable SEO and GEO audit

Ranking Lens checks robots.txt, crawlability, AI visibility, and structured data for your Lovable app in one report.

Start Free Analysis →

robots.txt vs llms.txt: What's the Difference

robots.txt and llms.txt are both files that live at the root of your domain and both relate to how bots interact with your site. They do very different things.

robots.txt is a gate. It controls access. A Disallow rule in robots.txt is enforced by compliant bots. If you disallow a URL, a well-behaved crawler won't fetch it. Period. It's the tool for access control.

llms.txt is a tour guide. It doesn't grant or restrict access to anything. Instead, it highlights which pages on your site are most valuable, in a format that AI systems can read and act on. Perplexity actively reads llms.txt and uses it to weight citation priority. Other platforms are moving toward adopting it.

For a Lovable app, you need both files.

robots.txt ensures no AI crawler is accidentally blocked, which is the foundation. llms.txt then directs AI attention to your best pages once access is confirmed. A site with a good llms.txt but a broken robots.txt is a locked door with a beautiful welcome sign in front of it. A site with correct robots.txt but no llms.txt is an open door with no signage inside.

The two files complement each other. Neither replaces the other.

Our llms.txt guide covers the exact format, what to include, and how to implement it in a Lovable project specifically. If you've sorted your robots.txt and want to take the next step on AI visibility, that's where to go.

One practical note: if you haven't fixed the underlying JavaScript rendering problem on your Lovable app, robots.txt and llms.txt alone won't make you visible to AI crawlers. AI bots fetch raw HTML and don't execute JavaScript. A correctly configured robots.txt with no /assets/ block is necessary, but your content still needs to exist in the initial HTML response. The Lovable SEO guide covers the pre-rendering and Cloudflare Worker approaches that actually solve that problem.

Useful Resources

Ranking Lens Free SEO and GEO Analysis: Scan your Lovable app for robots.txt issues, crawlability gaps, and AI visibility problems in one free report.
Ranking Lens GEO Basics Guide: AI visibility fundamentals including robots.txt configuration, llms.txt, and structured data for AI citation.
Google Search Console: Test your robots.txt, inspect individual URLs, and monitor which crawlers are accessing your Lovable site.
Lovable SEO Guide: Fix the underlying SPA crawling problem with pre-rendering and Cloudflare Workers, the foundation that makes robots.txt configuration meaningful.
llms.txt Guide: Create your llms.txt file to direct AI crawler attention to your best Lovable pages after fixing access controls.

Lovable robots.txt: Allow GPTBot, PerplexityBot, ClaudeBot in 2026

Why Your Lovable robots.txt Might Be Blocking AI Crawlers

The Complete List of AI Crawler User-Agents in 2026

What Each Crawler Does and Why You Should Allow It

The Perfect robots.txt for a Lovable App (Copy-Paste)

Common robots.txt Mistakes on Lovable Projects

How to Test Your robots.txt Is Working

robots.txt vs llms.txt: What's the Difference

Useful Resources

Is your site cited by ChatGPT?

Frequently Asked Questions

Continue Reading

Framer SEO Guide 2026: Get Your Framer Site Found

Lovable HTML Indexing: Why Crawlers See Empty Pages in 2026

Pre-Rendering for Lovable Apps: Step-by-Step Guide 2026