What is a robots.txt file?
Robots.txt is a text file which allows a website to provide instructions to web crawling bots.
Search engines such as Google use these web crawlers, sometimes called web robots, to archive and categorize websites. Mosts bots are configured to search for a robots.txt file on the server before it reads any other file from the website. It does this to see if a website’s owner has some special instructions on how to crawl and index their site. It's even possible to completely block web robots from reaching your site.
Why You May Decide to Block Robots
Typically web crawlers are well behaved and won't cause any trouble. Problems can occur when too many of them query your site at once. This can put a high load on the server. When this happens a robots.txt file can be very useful in blocking access to the bots.
Format and location rules
- The file must be named
- Your can only have one robots.txt file per site.
- The robots.txt file must be located at the root of the website host to which it applies. For instance, to control crawling on all URLs below https://www.example.com, the robots.txt file must be located at https://www.example.com/robots.txt. It cannot be placed in a subdirectory (for example, at https://example.com/pages/robots.txt).
- Alternatively, you can add a robots.txt file to an individual subdomain only (for example, https://website.example.com/robots.txt).
For more information on folder structures in File Manager, read the following guide: Understanding Folder Structures in cPanel
Creating a robots.txt file
Within your File Manager, Click + File in the upper left corner and then name the file
robots.txt as shown in the screenshot below. Based on the rules covered in the previous section, confirm or make changes the directory field to make sure your robots.txt file will impact the intended site. If you're working with one site in cPanel, this will likely be directly within public_html:
After confirming the directory, click Create New File.
If you've created your robots.txt file elsewhere, know that you can always upload it to your hosting account using the steps outlined here: Uploading Folders & Files
You can now use the Edit feature within File Manager to alter the content of the robots.txt file and save your changes.
Blocking All Web Crawlers
In our example below, we'll want to use the robots.txt file to block access to all web crawlers. To do this, add the following lines to your new robots.txt file:
User-agent: * Disallow: /
Once this is copied over, remember to save your changes. Your robots.txt file should look like this...
Please note that sometimes bots do not adhere to the rules set in a robots.txt file. These are known are bad robots. Unfortunately there isn't anything we can do to stop them short of blocking the IP addresses they originate from.
Congratulations, you've created a robots.txt file!
If you need further assistance or have additional questions, please reach out to Reclaim Hosting Support and we'll be happy to help.