Most robots.txt files I've seen on Lovable apps accidentally block AI crawlers. Not intentionally. Usually it's a template someone copied years ago that has a blanket Disallow: /assets/ or a wildcard rule that was never updated to account for AI bots that didn't exist when the template was written.
The result is that GPTBot, PerplexityBot, ClaudeBot, and several other AI crawlers quietly bounce off your site and move on. No error. No warning. Your content just isn't there for them.
This article gives you the exact robots.txt to use on a Lovable app, with every major AI crawler user-agent listed. There's nothing complicated here. It's mostly a reference document you'll bookmark and come back to when you need it.
Why Your Lovable robots.txt Might Be Blocking AI Crawlers
robots.txt is an access control file. It tells crawlers which parts of your site they're allowed to visit. A compliant bot reads it before crawling anything else. If it sees a Disallow rule that covers a URL, it skips that URL entirely.
The most common way Lovable apps accidentally block AI crawlers is through a wildcard rule that looks like this:
User-agent: *
Disallow: /assets/
Disallow: /admin/
That's a reasonable-looking robots.txt. Block assets, block admin. Seems fine.
It isn't fine, for two reasons. First, blocking /assets/ tells Googlebot it can't access your JavaScript and CSS files. Google needs those files to render your React SPA. Without them, even Google's deferred JavaScript rendering pipeline fails. You're essentially asking Google to render a page while blocking the code that makes the page work.
Second, if a newer AI crawler like GPTBot or ClaudeBot encounters a wildcard Disallow rule with no explicit Allow: / override, it follows the most restrictive interpretation. Some bots treat a missing explicit allow as permission to proceed; others don't. The only safe approach is to be explicit.
Add explicit Allow: / blocks for every AI crawler you want to reach your content. This isn't optional if you care about AI citation.
The Complete List of AI Crawler User-Agents in 2026
Ten AI crawler user-agent strings matter for a Lovable app in 2026. Every major AI platform has at least one. Some have two.
| Bot Name | User-Agent String | Owner | Crawls For | Allow? |
|---|---|---|---|---|
| GPTBot | GPTBot | OpenAI | ChatGPT training + web search | Yes |
| ChatGPT-User | ChatGPT-User | OpenAI | Live ChatGPT browsing sessions | Yes |
| PerplexityBot | PerplexityBot | Perplexity | Perplexity search index | Yes |
| ClaudeBot | ClaudeBot | Anthropic | Claude training data | Yes |
| Google-Extended | Google-Extended | AI Overviews + Gemini training | Yes | |
| Applebot-Extended | Applebot-Extended | Apple | Apple AI features | Yes |
| FacebookBot | facebookexternalhit | Meta | Meta AI, link previews | Yes |
| CCBot | CCBot | Common Crawl | Open LLM training datasets | Yes |
| Bytespider | Bytespider | ByteDance | TikTok AI, Doubao | Yes |
| Diffbot | Diffbot | Diffbot | Knowledge graph AI | Yes |
There are more obscure AI crawlers, but these 10 cover the platforms that matter for AI citation in 2026. Allowing all of them takes 20 lines in your robots.txt.
What Each Crawler Does and Why You Should Allow It
GPTBot and ChatGPT-User are both OpenAI's. GPTBot is the background crawler that indexes the web for training data and enables ChatGPT's web search capability. ChatGPT-User is the agent triggered during a live browsing session when a user asks ChatGPT to search the web in real time. You need both allowed. Blocking GPTBot removes you from ChatGPT's index. Blocking ChatGPT-User means live ChatGPT queries can't retrieve your content even if you rank elsewhere.
PerplexityBot feeds Perplexity's search index, the one it uses to generate cited answers in real time. Perplexity is one of the fastest-growing AI search platforms in 2026. If you're not in its index, you're not in its answers. Perplexity is also one of the few AI platforms that actively reads llms.txt, so allowing PerplexityBot is just the first step of a two-part setup. See our llms.txt guide for the second part.
ClaudeBot is Anthropic's crawler. It indexes content that can inform Claude's responses and training. The user-agent string is literally ClaudeBot.
Google-Extended is a separate Google crawler introduced in 2023, distinct from Googlebot. It's used specifically for AI training and for generating content that appears in Google AI Overviews. Blocking it with a Disallow rule in your robots.txt means Google won't use your content in AI Overviews, even if your pages rank normally in standard search results. In 2026, AI Overviews appears on a significant portion of informational queries. Don't block it.
Applebot-Extended is Apple's AI-specific crawler, separate from the standard Applebot used for Apple Search. It feeds Siri, Apple Intelligence, and other Apple AI features.
CCBot is the Common Crawl crawler. Common Crawl is a non-profit that publishes a massive open dataset of web content, and that dataset is used to train a huge range of open-source and commercial LLMs. Allowing CCBot means your content can end up in training data for models you've never heard of, which expands your citation surface significantly.
Bytespider is ByteDance's crawler. It feeds TikTok's AI features, Doubao (ByteDance's LLM), and other AI products in their portfolio.
Diffbot builds a structured knowledge graph from web content and licenses it to AI developers. Many AI applications use Diffbot data as a structured content source rather than raw crawl data.
Free Tool
See how visible your Lovable app is to AI crawlers
The free Ranking Lens GEO analysis checks your robots.txt, crawlability, and AI citation signals in one scan.
The Perfect robots.txt for a Lovable App (Copy-Paste)
This is the complete recommended robots.txt for a Lovable app. Copy it, replace the sitemap URL with your own domain, and drop it in your /public/ folder.
# Allow all crawlers full access by default
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /dashboard/
Disallow: /api/
Disallow: /private/
# OpenAI - ChatGPT training and web search
User-agent: GPTBot
Allow: /
# OpenAI - Live ChatGPT browsing sessions
User-agent: ChatGPT-User
Allow: /
# Perplexity
User-agent: PerplexityBot
Allow: /
# Anthropic - Claude
User-agent: ClaudeBot
Allow: /
# Google AI Overviews and AI training (separate from Googlebot)
User-agent: Google-Extended
Allow: /
# Apple AI features
User-agent: Applebot-Extended
Allow: /
# Meta AI and link previews
User-agent: facebookexternalhit
Allow: /
# Common Crawl - trains many open-source LLMs
User-agent: CCBot
Allow: /
# ByteDance / TikTok AI
User-agent: Bytespider
Allow: /
# Diffbot knowledge graph
User-agent: Diffbot
Allow: /
Sitemap: https://yourdomain.com/sitemap.xml
A few things to notice about this file.
The wildcard block at the top includes Allow: / explicitly, with only specific admin and API paths disallowed. Notice that /assets/ is not in the Disallow list. Never disallow your assets directory on a Lovable app. That's where your JavaScript bundle lives, and Google needs it to render your SPA.
Each AI crawler gets its own block with an explicit Allow: /. This removes any ambiguity. Even if a bot is stricter about wildcard interpretation, the explicit block makes your intent clear.
The Sitemap directive at the bottom tells every crawler where your sitemap lives. Always include it.
Common robots.txt Mistakes on Lovable Projects
The /assets/ mistake is the worst offender. Lovable bundles your JavaScript into /assets/, and if Googlebot can't fetch that directory, it can't render your SPA. At all. Not even with its deferred rendering pipeline. The rendered content it stores will be the empty <div id="root"></div>. You can check whether this is happening by using the URL Inspection tool in Google Search Console and looking at the rendered screenshot. If you see a blank page, blocked resources are probably the cause.
Another common mistake is using a template robots.txt that has Disallow: / for specific bots without understanding what those bots are. Some older templates blocklist scrapers, and some of those scraper user-agents overlap with AI crawlers. If you see lines like User-agent: ia_archiver followed by Disallow: /, that's blocking the Internet Archive's crawler. Not usually a problem for SEO, but it illustrates the risk: robots.txt files accumulate rules over time, and it's worth reading yours from scratch.
Missing the Sitemap directive is another oversight that slows things down. Your sitemap tells crawlers what pages exist. Without it, bots have to discover pages by following links, which is slower and less complete on a Lovable SPA where internal navigation is JavaScript-driven.
Forgetting ChatGPT-User when adding GPTBot is a subtle gap. Both user-agents are from OpenAI, but they serve different functions. If you copy a robots.txt that only mentions GPTBot, live ChatGPT browsing sessions won't have explicit permission. Add both.
Finally, placing robots.txt in the wrong location. On a Lovable app, it must be in the /public/ folder so it serves at yourdomain.com/robots.txt. A file placed anywhere else won't be served correctly.
How to Test Your robots.txt Is Working
Google Search Console has a built-in robots.txt tester. Go to Settings in your Search Console property and look for the robots.txt tester link. Paste your robots.txt content and then test individual user-agent strings (try GPTBot, PerplexityBot, ClaudeBot) against specific URLs on your site. A green result means that bot is allowed to crawl that URL. A red result means it's blocked.
The command-line method is faster for quick checks. Run this in your terminal:
curl -A "GPTBot" https://yourdomain.com/robots.txt
This fetches your robots.txt as if you were GPTBot. You'll see the file content returned. If you get a 404 or the file isn't there, crawlers are also getting a 404, which most bots interpret as "allow all" (since no restrictions are stated). That's okay but not ideal. You always want an explicit, well-structured file rather than relying on the absence of one.
To test whether a specific path is allowed for a specific bot, you can also use Google's online robots.txt tester at search.google.com/search-console/robots-testing-tool. It accepts any robots.txt content and lets you test any URL against any user-agent string.
Cloudflare's firewall rules can sometimes block crawlers before they reach your robots.txt, independently of your robots.txt content. If you're using Cloudflare and bots seem to be blocked despite a correct robots.txt, check whether any WAF rules or bot management settings are blocking non-browser user-agents. This is a separate layer from robots.txt and needs to be checked independently.
After any robots.txt change, submit your sitemap via Google Search Console to trigger fresh crawling activity. Don't wait for bots to discover the change on their own.
Free Tool
Run a free Lovable SEO and GEO audit
Ranking Lens checks robots.txt, crawlability, AI visibility, and structured data for your Lovable app in one report.
robots.txt vs llms.txt: What's the Difference
robots.txt and llms.txt are both files that live at the root of your domain and both relate to how bots interact with your site. They do very different things.
robots.txt is a gate. It controls access. A Disallow rule in robots.txt is enforced by compliant bots. If you disallow a URL, a well-behaved crawler won't fetch it. Period. It's the tool for access control.
llms.txt is a tour guide. It doesn't grant or restrict access to anything. Instead, it highlights which pages on your site are most valuable, in a format that AI systems can read and act on. Perplexity actively reads llms.txt and uses it to weight citation priority. Other platforms are moving toward adopting it.
For a Lovable app, you need both files.
robots.txt ensures no AI crawler is accidentally blocked, which is the foundation. llms.txt then directs AI attention to your best pages once access is confirmed. A site with a good llms.txt but a broken robots.txt is a locked door with a beautiful welcome sign in front of it. A site with correct robots.txt but no llms.txt is an open door with no signage inside.
The two files complement each other. Neither replaces the other.
Our llms.txt guide covers the exact format, what to include, and how to implement it in a Lovable project specifically. If you've sorted your robots.txt and want to take the next step on AI visibility, that's where to go.
One practical note: if you haven't fixed the underlying JavaScript rendering problem on your Lovable app, robots.txt and llms.txt alone won't make you visible to AI crawlers. AI bots fetch raw HTML and don't execute JavaScript. A correctly configured robots.txt with no /assets/ block is necessary, but your content still needs to exist in the initial HTML response. The Lovable SEO guide covers the pre-rendering and Cloudflare Worker approaches that actually solve that problem.
Useful Resources
- Ranking Lens Free SEO and GEO Analysis: Scan your Lovable app for robots.txt issues, crawlability gaps, and AI visibility problems in one free report.
- Ranking Lens GEO Basics Guide: AI visibility fundamentals including robots.txt configuration, llms.txt, and structured data for AI citation.
- Google Search Console: Test your robots.txt, inspect individual URLs, and monitor which crawlers are accessing your Lovable site.
- Lovable SEO Guide: Fix the underlying SPA crawling problem with pre-rendering and Cloudflare Workers, the foundation that makes robots.txt configuration meaningful.
- llms.txt Guide: Create your llms.txt file to direct AI crawler attention to your best Lovable pages after fixing access controls.