- **GPTBot** (OpenAI training)
- **OAI-SearchBot** (ChatGPT real-time search)
- **PerplexityBot** (Perplexity AI search)
- **ClaudeBot** (Anthropic's Claude)
- **Google-Extended** (Google Gemini and AI Overviews)
Your robots.txt file might be silently blocking your website from appearing in ChatGPT, Perplexity, Claude, and other AI search results. If you go to yoursite.com/robots.txt right now and find lines like "Disallow: /" next to user agents like GPTBot, PerplexityBot, or ClaudeBot, that means AI crawlers are being told to stay away from your content. No crawling means no indexing. No indexing means you are invisible in AI search. The fix takes about 15 minutes, and the vast majority of businesses that have this problem do not even know it exists. In a recent study, 18.9% of websites actively block AI crawlers in their robots.txt, and many of those sites did not make that choice deliberately. It happened because a WordPress plugin, a security tool, or a hosting provider made the decision for them.
This guide walks you through exactly how to check your robots.txt for AI crawler blocks, understand what each AI bot does, and fix any issues with the right code. If you are not sure whether your site is visible to AI search tools, you will know within the next five minutes.
What robots.txt Actually Does (And Why AI Crawlers Care)
Before we get into the step-by-step, it helps to understand what robots.txt is and why it matters so much for AI search visibility.
The robots.txt file is a plain text file that lives at the root of your website. It has been a web standard since 1994. Its job is simple: tell web crawlers which parts of your site they can and cannot access. Every major search engine, from Google to Bing, respects robots.txt directives. When Googlebot sees a "Disallow" rule pointing to a specific URL path, it will not crawl that path.
AI companies follow the same convention. OpenAI, Anthropic, Perplexity, and Google all operate web crawlers that check your robots.txt before accessing your pages. If your robots.txt tells their bots to go away, they go away. Your content does not get crawled, does not get processed, and does not appear in AI-generated responses.
Here is where things get important for businesses in 2026: AI search is no longer a niche channel. ChatGPT has over 800 million weekly active users. Perplexity processes more than 200 million queries per week. Google AI Overviews appear in roughly 60% of Google search results. Claude is widely used by professionals and researchers who rely on its web search for sourced, cited answers.
If your robots.txt is blocking AI crawlers, you are cutting yourself off from all of these channels. And unlike traditional SEO, where being blocked from Google is an obvious disaster that someone would notice quickly, being blocked from AI search often flies completely under the radar. Traffic from AI citations does not always show up cleanly in analytics. Many businesses have no idea they are being excluded.
The AI Crawlers You Need to Know About
Not all AI crawlers do the same thing. Understanding the difference between them is critical before you start editing your robots.txt, because blocking the wrong one (or the right one accidentally) has real consequences.
GPTBot (OpenAI)
GPTBot is OpenAI's crawler for collecting training data. When GPTBot crawls your site, it is gathering content that may be used to train future versions of GPT models. Allowing GPTBot means your content can influence ChatGPT's general knowledge, its "parametric memory" of the world. If your content is part of GPT's training data, ChatGPT may reference your brand, products, or expertise even in conversations where it does not perform a live web search.
Some businesses choose to block GPTBot because they do not want their content used for model training. That is a legitimate choice. But it is important to understand what you are giving up: influence over what ChatGPT "knows" about your industry.
OAI-SearchBot (OpenAI)
This is the one that directly impacts whether you show up in ChatGPT Search results. OAI-SearchBot is the crawler that retrieves content for real-time search queries. When a ChatGPT user asks a question that triggers a web search, OAI-SearchBot is what fetches the pages that ChatGPT will cite in its response.
Here is the key distinction that trips people up: you can block GPTBot while allowing OAI-SearchBot, or vice versa. They are separate user agents with separate purposes. If you want to appear in ChatGPT's real-time search results but do not want your content used for training, you would block GPTBot and allow OAI-SearchBot. If you want the opposite, you would do the reverse.
Most businesses that care about AI visibility will want to allow both, but the option to separate training from search is there if you need it.
PerplexityBot (Perplexity)
PerplexityBot is Perplexity's web crawler. Perplexity is a pure AI search engine, so its crawler serves a single purpose: retrieving content for real-time answers. When someone asks Perplexity a question, the system searches the web, reads the content PerplexityBot has access to, and generates a cited answer.
Perplexity typically shows 5 to 6 source citations per response, with the cited pages displayed prominently in a sidebar. These citations are highly clickable compared to other AI search tools, which makes Perplexity visibility particularly valuable for driving referral traffic. Blocking PerplexityBot means none of your pages will appear in Perplexity answers.
ClaudeBot (Anthropic)
ClaudeBot is Anthropic's crawler for Claude. When Claude uses its web search capability during conversations, ClaudeBot is what fetches the content. Claude tends to cite fewer sources per response but weights authority and quality heavily. Getting cited by Claude often means your content is seen as genuinely authoritative on a topic.
Claude's user base skews toward professionals, researchers, and technical audiences. For B2B companies, SaaS providers, and anyone targeting informed decision-makers, Claude citations can be disproportionately valuable relative to the overall user count.
Google-Extended (Google)
Google-Extended is the user agent that controls whether your content is used for training Google's Gemini models and powering AI Overviews. This is separate from Googlebot, which handles traditional search indexing. Blocking Google-Extended will not affect your regular Google search rankings, but it will affect whether your content feeds into Gemini-powered features.
Given that AI Overviews now appear in the majority of Google searches and directly influence what users see at the top of results pages, blocking Google-Extended has significant implications for visibility in Google's own AI features.
Amazonbot (Amazon)
Amazonbot crawls the web for Amazon's Alexa and other AI services. While not as prominent as the others for search visibility, Amazonbot is worth allowing if you want comprehensive AI crawler coverage.
Meta-ExternalAgent (Meta)
Meta's AI crawler supports Meta AI, which is integrated across Facebook, Instagram, and WhatsApp. As Meta's AI features expand, this crawler becomes increasingly relevant for brands with audiences on Meta's platforms.
Step-by-Step: How to Check Your robots.txt for AI Crawler Blocks
Here is the exact process to audit your robots.txt and identify any AI crawler blocks. This should take you less than five minutes.
Step 1: Open Your robots.txt File
Go to your browser and type your domain followed by /robots.txt. For example:
https://yourdomain.com/robots.txt
You should see a plain text file. If you get a 404 error, your site does not have a robots.txt file, which actually means AI crawlers can access everything by default (no robots.txt is better than a restrictive one, at least for AI visibility). If you see the file, proceed to Step 2.
Step 2: Search for AI Bot Names
Look through the file for these specific user agent names:
- GPTBot (OpenAI training)
- OAI-SearchBot (ChatGPT real-time search)
- PerplexityBot (Perplexity AI search)
- ClaudeBot (Anthropic's Claude)
- Google-Extended (Google Gemini and AI Overviews)
- Amazonbot (Amazon AI services)
- Meta-ExternalAgent (Meta AI)
If none of these names appear in your robots.txt, move to Step 4. If any of them appear, check what directive follows them.
Step 3: Identify "Disallow" Rules
If you see any AI bot name followed by a "Disallow: /" rule, that crawler is completely blocked from your site. Here is what a block looks like:
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
Each of those three-line blocks is telling the named bot: do not crawl anything on this site. If you see this pattern for any AI crawler, that tool cannot access your content.
Sometimes the block is less obvious. You might see partial disallow rules like:
User-agent: GPTBot
Disallow: /blog/
Disallow: /resources/
This blocks GPTBot from your blog and resources sections specifically, while still allowing access to the rest of your site. Partial blocks may be intentional, but make sure you understand what they are excluding before leaving them in place.
Step 4: Check for Blanket Wildcard Blocks
This is the step most people miss, and it is often the actual culprit. Look for a wildcard user agent rule that looks like this:
User-agent: *
Disallow: /
That two-line directive tells every crawler, including all AI bots, to stay away from your entire site. Even if you do not have specific GPTBot or ClaudeBot rules, this blanket block catches them all. The wildcard * matches any user agent that is not specifically named elsewhere in the file.
Here is a nuance that matters: robots.txt follows a specificity rule. If your file says:
User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /
Then Googlebot can crawl everything (because it has a specific rule), but every other crawler, including all AI bots, is blocked. This is a pattern you will see on many sites that were configured to only allow known search engine bots. It worked fine when Google and Bing were the only crawlers that mattered. It is a disaster for AI visibility.
Step 5: Identify What Put Those Rules There
If you found blocks you did not intentionally set, the next question is: where did they come from? The usual suspects are:
WordPress security plugins. Wordfence, Sucuri, iThemes Security, and similar plugins sometimes add restrictive robots.txt rules as part of their default security configuration. They treat unknown bots as potential threats and block them preemptively. AI crawlers, being relatively new, often end up on the block list.
WordPress themes. Some themes, particularly those focused on security or performance, modify robots.txt automatically. The site owner may never have opened the file or even known it existed.
Hosting providers. Certain hosting platforms, especially shared hosting environments, set default robots.txt rules that restrict access from non-standard bots. If your hosting provider manages your robots.txt at the server level, you may need to contact their support to make changes.
SEO plugins with overly aggressive defaults. Some SEO plugins include AI bot blocking as a toggle, and it may be turned on by default. Check your Yoast, Rank Math, or All in One SEO settings to see if there is an AI crawler option you did not know about.
Manual additions from years ago. If a developer or SEO consultant worked on your site in the past, they may have added blanket bot blocks as a precaution. Those rules may have been fine at the time but are now silently killing your AI search presence.
The robots.txt Code to Allow All AI Crawlers
Here is the exact code you need to add to your robots.txt to ensure all major AI crawlers can access your site. This is the simplest version that allows full access:
# AI Search Crawlers - Allow Full Access
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: Amazonbot
Allow: /
User-agent: Meta-ExternalAgent
Allow: /
Add this to your existing robots.txt file. Do not replace your entire file with this code. Keep your existing rules for Googlebot, Bingbot, and any other crawlers you have already configured. Just add the AI crawler rules alongside them.
If you want to allow AI search crawlers but block training crawlers, here is the modified version:
# Allow AI Search Crawlers (real-time search citations)
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Google-Extended
Allow: /
# Block Training Crawlers (model training data)
User-agent: GPTBot
Disallow: /
This setup lets your content appear in real-time AI search results while preventing it from being used to train future models. Note that this is a simplified distinction. The line between "training" and "search" is not always perfectly clear, and each AI company handles it slightly differently. But this is the closest you can get to a clean separation using robots.txt.
Complete robots.txt Template for AI and Traditional Search
Here is a complete robots.txt template that covers both traditional search engines and AI crawlers. You will need to customize the sitemap URL and any paths you genuinely want to block, but this gives you a solid starting point:
# Traditional Search Engines
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
User-agent: Slurp
Allow: /
User-agent: DuckDuckBot
Allow: /
User-agent: Yandex
Allow: /
# AI Search Crawlers
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: Amazonbot
Allow: /
User-agent: Meta-ExternalAgent
Allow: /
# Default rule for all other bots
User-agent: *
Allow: /
# Block sensitive directories for all crawlers
User-agent: *
Disallow: /admin/
Disallow: /wp-admin/
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /search/
Disallow: /api/
Disallow: /*?s=
Disallow: /*?p=
Disallow: /wp-json/
# Sitemap
Sitemap: https://yourdomain.com/sitemap.xml
A few notes on this template:
The sensitive directory blocks at the bottom apply to all crawlers, which is what you want. There is no reason for any bot, AI or otherwise, to be crawling your admin panel, checkout flow, or internal search results.
The query parameter blocks (/*?s= and /*?p=) prevent crawlers from indexing your internal search results and direct post ID URLs, which are common sources of duplicate content.
The sitemap declaration at the bottom tells all crawlers where to find your XML sitemap, which helps with discovery of your pages.
You should customize this template for your specific site. If you have directories that contain proprietary content, internal tools, or staging pages, add disallow rules for those paths.
How to Edit Your robots.txt (By Platform)
Knowing what to change is one thing. Knowing how to make the change depends on what platform your site runs on.
WordPress
There are three ways to edit robots.txt on WordPress:
Through your SEO plugin. If you use Yoast SEO, go to Yoast > Tools > File Editor. You can edit robots.txt directly from the WordPress dashboard. Rank Math has a similar feature under Rank Math > General Settings > Edit robots.txt. All in One SEO offers this under AIOSEO > Tools > Robots.txt Editor.
Through your hosting file manager. Log into your hosting control panel (cPanel, Plesk, or similar), navigate to your site's root directory, and open the robots.txt file in the text editor. Make your changes and save.
Through FTP/SFTP. Connect to your server using an FTP client like FileZilla, navigate to the root directory of your WordPress installation, download robots.txt, edit it locally, and upload the modified version.
One important caveat: if a WordPress plugin is dynamically generating your robots.txt, editing the physical file may not work because the plugin will overwrite it. Check your SEO and security plugin settings first. Many plugins have a specific section for managing robots.txt rules, and that is where your changes need to go.
Shopify
Shopify automatically generates a robots.txt file for every store, and for a long time, store owners could not edit it. That changed when Shopify introduced the robots.txt.liquid template. To customize your robots.txt on Shopify:
- Go to Online Store > Themes > Actions > Edit Code
- In the Templates folder, look for robots.txt.liquid
- If it does not exist, create it
- Add your custom directives using Liquid syntax
Shopify's documentation has specific guidance on the format, since it uses Liquid templating rather than a static text file.
Static Sites and Custom Platforms
If your site is built on a static site generator (Next.js, Gatsby, Hugo, Jekyll) or a custom platform, robots.txt is typically a static file in your public or root directory. Edit it directly in your codebase, commit the change, and deploy.
For sites hosted on platforms like Vercel, Netlify, or AWS, make sure you are editing the robots.txt that actually gets served in production. Sometimes there is a build process that generates or overrides the file.
Wix and Squarespace
Both Wix and Squarespace generate robots.txt automatically. Wix allows limited customization through the SEO settings panel. Squarespace offers even less control. If you are on one of these platforms and your robots.txt has problematic rules, you may need to contact their support or look into platform-specific workarounds.
Why 18.9% of Sites Block AI Crawlers (And Why Most Do Not Know)
That 18.9% figure is not random. Research has consistently shown that roughly one in five websites blocks at least one major AI crawler. But here is the part that does not get enough attention: most of those blocks are not intentional.
The pattern usually looks like this. A business sets up a WordPress site. They install a security plugin for protection against bots and scrapers. The plugin includes a default configuration that blocks unknown or unrecognized crawlers. When OpenAI launched GPTBot in 2023, GPTBot was not in the plugin's "known good" crawler list, so it got blocked by default. The same thing happened with PerplexityBot, ClaudeBot, and other AI crawlers.
The site owner never made a conscious decision to block AI crawlers. They may not even know what GPTBot is. But their site has been invisible to ChatGPT Search ever since.
This is compounded by the fact that most analytics tools do not surface "AI search visibility" as a metric the way they surface Google search impressions. If your Google rankings drop, you notice. If you were never visible in AI search to begin with, there is nothing to notice. The absence is invisible.
This is exactly the kind of issue that GetCited was built to catch. Running your site through a GetCited audit will tell you whether AI crawlers can actually access your content, which specific bots are blocked, and how your configuration compares to your competitors. It surfaces the problems you cannot see in traditional analytics.
Advanced: Selective Access and Partial Blocks
Not every business wants to give AI crawlers full access to everything. There are legitimate reasons to use partial blocks, and robots.txt gives you the granularity to do it.
Blocking Specific Directories
If you have content that is behind a paywall, proprietary research you do not want scraped, or customer-only resources, you can block AI crawlers from those specific paths while allowing access to everything else:
User-agent: GPTBot
Allow: /
Disallow: /members-only/
Disallow: /premium-content/
Disallow: /internal-reports/
This tells GPTBot it can crawl your entire site except for those three directories. Apply the same pattern for each AI crawler you want to partially restrict.
Allowing Only Specific Directories
The reverse approach works too. If you only want AI crawlers to access your blog and public resources:
User-agent: GPTBot
Disallow: /
Allow: /blog/
Allow: /resources/
Allow: /about/
When both Allow and Disallow rules exist for the same user agent, the more specific rule wins. Since /blog/ is more specific than /, GPTBot will be able to access your blog even though the general disallow rule covers the rest of the site.
Using Crawl-Delay
Some robots.txt files include a Crawl-delay directive that tells bots to wait a certain number of seconds between requests:
User-agent: GPTBot
Allow: /
Crawl-delay: 10
This can be useful if AI crawlers are hitting your site too frequently and affecting performance. Not all crawlers honor crawl-delay, but many do. Use it if you have a smaller server that cannot handle aggressive crawling.
Testing Your Changes
After you edit your robots.txt, you need to verify that the changes are working correctly. Here is how:
Browser check. Go to yourdomain.com/robots.txt in your browser and confirm your new rules are visible. This is the most basic check but also the most important. If you do not see your changes, the file is either cached, being overwritten by a plugin, or you edited the wrong file.
Google Search Console. Google provides a robots.txt Tester in Search Console (under Crawl > robots.txt Tester). You can enter a URL and a user agent name to see whether that combination would be allowed or blocked. Test with GPTBot, PerplexityBot, ClaudeBot, and Google-Extended to make sure they all get the access you intend.
Cache clearing. If you use a CDN like Cloudflare, your robots.txt may be cached at the edge. Purge your CDN cache after making changes to ensure crawlers see the updated file immediately. The same applies to any server-level caching your hosting provider uses.
Wait and monitor. AI crawlers do not re-check your robots.txt on every visit. They typically cache it and re-fetch periodically. After making changes, it may take a few days to a few weeks before all AI crawlers pick up the new rules. There is no way to force an immediate re-crawl, but the crawlers will update.
The Relationship Between robots.txt, llms.txt, and AI Visibility
robots.txt and llms.txt serve completely different functions, but they work together to determine your AI search presence.
robots.txt is the gatekeeper. It tells AI crawlers whether they can access your site at all. If robots.txt blocks a crawler, nothing else matters. The crawler cannot get in.
llms.txt is the tour guide. It gives AI models a structured overview of what your site is about, what content you have, and how it is organized. An llms.txt file does not override robots.txt blocks. If the crawler cannot access your site, it will never see your llms.txt.
This is why robots.txt should always be checked first. It is the foundation. You can have the most beautifully crafted llms.txt file on the internet, but if your robots.txt is blocking AI crawlers, it is like putting a welcome sign inside a locked building.
The correct order of operations for AI search visibility is:
- Fix your robots.txt to allow AI crawlers (this article)
- Add an llms.txt file to provide structured context
- Optimize your content for AI citation (clear claims, data, structured formatting)
- Monitor your AI search presence with a tool like GetCited
Common Mistakes to Avoid
Mistake 1: Editing robots.txt without checking plugin settings. If a WordPress plugin manages your robots.txt dynamically, your manual edits will get overwritten the next time the plugin regenerates the file. Always check whether a plugin is controlling the file before making direct edits.
Mistake 2: Blocking GPTBot and assuming ChatGPT Search is still accessible. GPTBot and OAI-SearchBot are separate crawlers with separate functions, but some sites only block GPTBot thinking they have fully opted out. Others block GPTBot thinking they are only blocking training, not realizing that GPTBot access also influences ChatGPT's general knowledge about your brand. Understand what each crawler does before deciding what to block.
Mistake 3: Adding Allow rules without removing conflicting Disallow rules. If your robots.txt has both a Disallow and an Allow for the same path and the same user agent, the outcome depends on which rule is more specific. Simply adding "Allow: /" under a user agent that also has "Disallow: /" will work (because the more specific rule wins), but it makes the file messy and confusing. Remove the conflicting Disallow rule entirely.
Mistake 4: Forgetting the wildcard block. You can add explicit Allow rules for every AI crawler, but if you still have a "User-agent: * / Disallow: /" block and a new AI crawler comes along next month that you have not explicitly named, it will be caught by the wildcard. Consider whether your wildcard rule should be "Allow: /" or at least not a blanket block.
Mistake 5: Not testing after changes. This one is simple but surprisingly common. Make a change, verify it works, verify it is not being cached or overwritten. Five minutes of testing saves weeks of invisible problems.
Why This 15-Minute Fix Matters More Than You Think
Here is the reality of AI search in 2026. When someone asks ChatGPT "what is the best project management tool for small teams" or asks Perplexity "how do I choose a CRM for my business," the AI does not show ten blue links. It gives an answer. And it cites sources.
If your robots.txt is blocking AI crawlers, your site cannot be one of those cited sources. Not because your content is not good enough. Not because your domain authority is too low. Simply because you told the crawlers not to come in.
This is a 15-minute fix that most companies do not know they need. You do not need to hire a consultant. You do not need to overhaul your content strategy. You need to open a text file, look for a few specific lines, and either remove or modify them. That is it.
The 18.9% of sites currently blocking AI crawlers are leaving citations, traffic, and brand visibility on the table. Every week those blocks stay in place is a week of missed AI search presence that competitors are capturing instead.
Check your robots.txt today. If you find blocks, fix them. If you want a comprehensive audit of your AI search readiness that goes beyond robots.txt, including llms.txt configuration, content structure, and citation performance, run your site through GetCited at getcited.tech. It will tell you exactly where you stand and what to fix next.
Frequently Asked Questions
Will allowing AI crawlers in robots.txt guarantee my site gets cited by ChatGPT or Perplexity?
No. Allowing AI crawlers is a necessary condition but not a sufficient one. Think of it like this: if your front door is locked, nobody can come in. Unlocking the door does not mean everybody will walk through it. Allowing GPTBot and PerplexityBot simply makes your content eligible for crawling and citation. Whether an AI model actually cites your content depends on many other factors, including content quality, authority signals, topical relevance, formatting, and whether your content directly answers the queries people are asking. But if AI crawlers are blocked, your content has a zero percent chance of being cited, regardless of how good it is.
Is there any risk to allowing AI crawlers access to my site?
The primary concern most people raise is about content being used for AI model training. If you allow GPTBot, OpenAI may use your content to train future GPT models. If that is a concern, you can block GPTBot specifically while allowing OAI-SearchBot (for ChatGPT Search citations). Similarly, you can make selective decisions for each AI company's crawlers. There is no meaningful security risk from allowing AI crawlers. They follow the same protocols as traditional search engine crawlers and will not access anything behind authentication or that requires form submission. They only read publicly accessible pages.
How often do AI crawlers check my robots.txt for changes?
There is no fixed schedule, and each AI company handles it differently. In general, well-behaved crawlers re-fetch robots.txt every few hours to every few days, depending on how frequently they crawl your site overall. After you make changes, you should expect it to take anywhere from a few days to two weeks before all AI crawlers have picked up your new rules. Google provides the most transparency here through Search Console, where you can see when Googlebot last fetched your robots.txt. Other AI companies do not offer similar visibility, which is why testing with a tool like GetCited can help confirm that your changes are being recognized.
My WordPress security plugin keeps overwriting my robots.txt changes. What do I do?
This is one of the most common issues people run into. If a plugin like Wordfence or Sucuri is managing your robots.txt dynamically, you need to make your changes within the plugin's settings rather than editing the file directly. Look for a robots.txt or bot management section in the plugin's configuration. If the plugin does not offer granular control over specific bot names, you may need to disable the plugin's robots.txt management feature entirely and manage the file manually. Some plugins also have an "allow list" where you can add specific user agents. Adding GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot, and Google-Extended to that allow list will prevent the plugin from blocking them.
What is the difference between robots.txt and the "noindex" meta tag for AI crawlers?
robots.txt and noindex serve different purposes. robots.txt controls whether a crawler can access and crawl a page. The noindex meta tag (or X-Robots-Tag header) tells a crawler that has already accessed the page not to include it in search results. With traditional search engines, blocking a page in robots.txt prevents crawling but does not technically "deindex" it (Google can still show the URL in results if it finds links to it elsewhere). For AI crawlers, the distinction is less settled since AI search tools are newer and their indexing behavior varies. The safest approach is to use robots.txt for broad access control (which pages can crawlers see?) and noindex for pages you want excluded from results but do not mind being crawled. For most sites focused on AI visibility, the goal is to allow crawling and indexing of all public content, which means removing both robots.txt blocks and noindex directives from your important pages.