With the rise of AI models like ChatGPT, Google Gemini, and Perplexity, more crawlers are accessing public websites to collect content for training datasets. If your site is hosted with Hosting Australia on cPanel, this guide explains how to control which bots can crawl your site, using tools built right into your hosting panel.
What Are AI Crawlers?
AI crawlers work like traditional search engine bots, but instead of indexing pages for search results, they scan content to train language models. Examples include:
ChatGPT-User
(OpenAI)Google-Extended
(used by Google Gemini)CCBot
(used by Common Crawl)
These crawlers may pull large amounts of text from your public pages. If you prefer to control how your content is used, you can restrict access directly from your cPanel account.
Step 1: Access Your robots.txt
File via cPanel
- Log in to cPanel.
- Scroll to the Files section and open File Manager.
- Navigate to the root folder of your domain (usually
public_html
). - Check if a file named
robots.txt
exists. If not, right-click anywhere and choose Create New File, name itrobots.txt
.
Step 2: Block AI Crawlers in robots.txt
Edit the file and add the following:
User-agent: ChatGPT-User
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
This tells those specific bots not to crawl your website. Reputable bots will obey these instructions, though some third-party scrapers may not.
Once done, save the file and confirm it’s publicly accessible at:
https://yourdomain.com/robots.txt
Step 3: Use Meta Tags for Page-Level Control (Optional)
For more granular control, you can add meta tags inside the <head>
section of specific pages. For example:
<meta name="robots" content="noindex, nofollow">
This prevents indexing and link following for a given page. If you’re using WordPress, SEO plugins like Yoast or Rank Math let you control this per page or post.
Step 4: Monitor Bot Activity (Optional)
If you notice strange behavior, you can check visitor logs from cPanel:
- Go to Metrics > Raw Access or Visitors.
- Look for unusual user agents like
ChatGPT-User
or anything suspicious
Summary
AI crawlers are now a common part of web traffic. If you want to protect your content or reduce unwanted load, you can manage access easily through cPanel by editing your robots.txt
file. This gives you full control over which bots can scan your site.
Need Help?
If you’re unsure how to implement this or need assistance with anything in cPanel, reach out to Hosting Australia’s support team. We’re happy to walk you through the process or make the changes for you.