• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

Tips and Tricks HQ

  • Home
  • Blog
  • Projects
    • All Projects
    • Simple WP Shopping Cart
    • WP Express Checkout Plugin
    • WP Download Monitor
    • WP Security and Firewall Plugin
    • WP eStore Plugin
    • WP Affiliate Platform
    • WP eMember
    • WP Lightbox Ultimate
    • WP Photo Seller
  • Products
    • All Products
    • Checkout
  • Support
    • Support Portal
    • Customer Only Forum
    • WP eStore Documentation
    • WP Affiliate Software Documentation
    • WP eMember Documentation
  • Contact

How to Control Access of the Web Crawlers or Web Robots to Your Site

You are here: Home / Web Development / How to Control Access of the Web Crawlers or Web Robots to Your Site

Last updated: April 12, 2013





There are numerous reasons as to why or when you should control the access of the web robots or web crawlers to your site.  As much as you want Googlebot to come to you site, you don’t want the spam bots to come and collect private information from your site. Not to mention that when a robot crawls your site it uses the website’s bandwidth too! In this post I have explained how you can control the access of the web robots to your site through the usage of a simple ‘robots.txt’ file.

What are web robots or web spiders?

Web Robots (also known as bots, web spiders, web crawlers, Ants) are programs that traverses the World Wide Web in an automated manner. Search engines (like Google, Yahoo etc.) use web crawlers to index the web pages to provide up to date data.

Why use ‘robots.txt’ file?

Gooble bot may be crawling your site to provide better search results but at the same time other spam bots may be collecting personal information such as email addresses for spamming purpose. If you want to control the access of the web crawlers on your site, you can do so by using the “robots.txt” file.

How do I create ‘robots.txt’ file?

‘robots.txt’ is a plain text file. Use any text editor to create the ‘robots.txt’ file.



‘robots.txt’ file format

The entries (rules) in the robots.txt file are entered in a ‘field’ ‘value’ pair.
<field>:<value>

A simple robots.txt file uses the following three fields:

User-agent: the web robot the following rule applies to.
Disallow: the URL you want to block the robot from accessing.
Allow: the URL you want to allow the robot to access.

Examples

The following will stop all robots from crawling your site (‘*’ means all and ‘/’ is the root directory.)

User-agent: *
Disallow: /

The following will stop all robots from crawling the ‘/private’ directory.

User-agent: *
Disallow: /private

Stops Googlebot from indexing your images for Google image search. Use this to save bandwidth if u don’t want your images to be available for Google image search. Read the Reduce Bandwidth Usage post to learn more.

User-agent: Googlebot-Image
Disallow: /

The following will block all robots from crawling your site except Googlebot

User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /

Where to put the robots.txt file?

Put the robots.txt file in the root directory of your website. For example, put the file in the www.yoursite.com not in a sub-directory like www.yoursite.com/sub-directory. In most cases it will be the “public_html” directory of your site.

You can verify that a bot that is visiting your site is really the Googlebot by following the instruction on this page.

Related Posts

  • Reduce Your Website’s Bandwidth and Storage Usage

Web Development robots.txt configuration,  web crawler,  Web Development,  web masters,  Web Robots

Reader Interactions

Comments (18 responses)

  1. admin says:
    April 12, 2013 at 8:09 pm

    If a bot doesn’t respect the directives in the robots.txt then you can’t really do anything about it. Your option would be block that bot in some other means like a htaccess restriction.

  2. Harshan R says:
    April 12, 2013 at 7:27 am

    What if the ‘spam bot’ doesn’t look for robots.txt and just does what it is designed to do – crawl away?

  3. Nani says:
    July 17, 2012 at 3:18 pm

    Ever since I added a robots.txt file to my site, I’ve noticed my pages get indexed almost instantly after publishing them. Probably because I added a line to show the google bots where my sitemap is

  4. Neha says:
    June 2, 2012 at 10:48 am

    Nicely explained article.When new bloggers start their site , they are unaware of many important things like robots.Your tip helped me in doing optimization.

  5. Fergus says:
    April 29, 2012 at 2:59 pm

    Just went through about ten sites trying to learn about this. Yours was the first article that was well written and you understand and remember what someone needs to know when they are reading an article like this. Better than Wikipedia. Thanks for the link.

  6. Anna Hettick says:
    April 6, 2012 at 10:37 am

    thank you for such an easy to understand article! I have no idea about coding and such and this told just what I needed to know!!

  7. Robert Hawkins says:
    March 14, 2012 at 5:14 pm

    Great tips. The only thing I would ad is that it might be a good idea to block images in robots.txt. The traffic from images is crap anyway and it’s unnecessary traffic to you website.

  8. Adam says:
    December 28, 2011 at 12:39 pm

    Thanks for teaching me how to prevent Google from seeing my website in these moments when I’m clumsily trying to install a CMS on it… and it is mess, wouldn’t want it to be indexed like that.

  9. thanksgiving 2011 says:
    August 29, 2011 at 11:37 am

    needed this for a thanksgiving 2011 website I am creating right now. thanks a lot

  10. easter says:
    July 29, 2011 at 4:55 pm

    thanks a lot for this tip. i was looking for that for my easter website. this article had everything i needed to know about how to control the access of robots to your site. cheers

  11. Togrul says:
    January 18, 2011 at 9:27 am

    Thanks for sharing, again.

    Cheers,
    Togrul

  12. John Gamings says:
    January 3, 2011 at 11:07 am

    Nice article. Ever since I added a robots.txt file to my site, I’ve noticed my pages get indexed almost instantly after publishing them. Probably because I added a line to show the google bots where my sitemap is

  13. Jim says:
    December 14, 2010 at 10:37 pm

    This is a good start for what is an extremely important and complicated process for perfecting the effectiveness of any SEO efforts you are putting in to your site. Of course it just gets trickier from here, but having this much under your belt will give even the most inexperienced webmaster a leg-up.

  14. Rapid Prototyping says:
    August 19, 2010 at 1:27 am

    I must say that this information was really necessary for me. First of all, I just started a new website and to say the truth I have added some of my information into it and after reading this article I was so much worried about whether my privacy would get compromised. I am also a newbie and hence I really did panic. Thanks to you guys, I do have some confidence now and I have made the text file with those lyrics like values!

  15. website laten maken says:
    August 10, 2010 at 4:49 pm

    Tip: also use a robots.txt for test environments and temporary sites like domain.com/temporary/ and stuff. Spiders might also crawl that directories and you don’t want them to be indexed.

  16. [email protected] To Make Money Online says:
    July 26, 2010 at 10:05 am

    Thanks ,i agree with you that robots text helping to crawl your pages.
    But the disallow have benefit too if you have private page or you are promoting product and you want to keep your download page of this product hidden ,this disallow can help.

  17. Webdesign Roosendaal says:
    March 4, 2010 at 12:35 pm

    As a freelance webdeveloper, I’m always taking care of the little details. The same goes for using robots.txt. I always put in, even when bots are allowed to crawl everywhere.

    Why? Because a lot of bots and spiders are looking for it all the time and return a 404 message when they can’t find it. Therefore, I always include it in the root directory of the websites. It saves a lot of unnecessary traffic.

  18. Aaron Wakling says:
    November 12, 2008 at 1:15 am

    I discovered your homepage by coincidence.
    Very interesting posts and well written.
    I will put your site on my blogroll.
    🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

Primary Sidebar

Featured & Popular Articles

Video Answers to Top WordPress QuestionsWordPress Optimization Tips and Tricks for Better Performance and SpeedEssential WordPress Security Tips - Is Your Blog Protected?WordPress Simple PayPal Shopping Cart PluginTop 15 Search Engine Optimization (SEO) Techniques I Forget to DoList of the Best and Must Use WordPress PluginsHow do I Start a Blog and Make Money Online?Good Domain Name Picking Tips for Your Blog SetupFind Out Which WordPress Web Hosting Company Offers the Cheapest and Reliable Web Hosting Solution

Featured WordPress Plugins

wordpress estore plugin
wordpress membership plugin
WP Express Checkout Plugin
WordPress Lightbox Ultimate Plugin
WordPress photo seller plugin
wordpress affiliate plugin

Recent Posts

  • Accept Donations via PayPal from Your WordPress Site Easil [...]
  • Buy Now Button Graphics for eCommerce Websites [...]
  • Subscription Button Graphics for eCommerce Websites [...]
  • Adding PayPal Payment Buttons to Your WordPress Sidebar Ea [...]
  • PayPal QR Codes [...]

Comment & Socialize

  • @Rodrigo Souza, Thank you f ...
    - admin
  • The example for 'slm_add_ed ...
    - Rodrigo Souza
  • @Ron, All the valid transac ...
    - admin
  • Hello, when people have sel ...
    - Ron
  • We have hte following featu ...
    - admin

Check out our solutions

View our WordPress plugin collection and start using them on your site.

Our WordPress Solutions

Footer

Company

  • About
  • Privacy Policy
  • Terms and Conditions
  • Affiliate Login

Top WordPress Plugins

  • Simple Shopping Cart
  • PayPal Donations
  • WP Express Checkout
  • WP eStore
  • WP eMember

Blogging Tips

  • How to Start a Blog
  • Selecting a Good Domain
  • Cheap WP Hosting
  • WP Video Tutorials
  • Simple SEO Tips

Search


Keep In Touch

Copyright © 2023 | Tips and Tricks HQ