Skip to content Skip to sidebar Skip to footer

How To Customize The Robot.txt For Your Blogger

How To Set Robot.txt For Your Blogger
W3zetmedia - The robots.txt file is a very powerful file if you’re working on a site’s SEO. At the same time, it also has to be used with care. It allows you to deny search engines access to certain files and folders, but that’s very often not what you want to do.

Basic robots.txt examples

It works likes this: a robot wants to vists a Web site URL, say http://www.yourblogger.com/yourpost.html. Before it does so, it firsts checks for http://www.yourblogger.com/robots.txt, and finds: 
 User-agent: *  
 Disallow: /  

The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.

There are two important considerations when using /robots.txt:
  1. Robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
  2. The /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.
So don't try to use /robots.txt to hide information.

Allow Full Access
 User-agent: *  
 Disallow:  

Block All Access
 User-agent: *  
 Disallow: /  

Block Custom Folder
 User-agent: *  
 Disallow: /folder/  

Block Custom File
 User-agent: *  
 Disallow: /file.html  

Default Robot.txt From Blogger

Blogger automatically creates its own robot.txt file that may have the chance to crawl and index your labels and demo pages too. So by using this custom robot.txt file feature properly, you'll be able to tell search engine spiders which URLs to be indexed and which shouldn't be.
Each blog hosted on blogger have its default robots.txt file which is something look like this:
 User-agent: Mediapartners-Google  
 Disallow:  
 User-agent: *  
 Disallow: /search  
 Allow: /  
 Sitemap: http://yourblogname.blogspot.com/feeds/posts/default?orderby=UPDATED  

Explanation:
This code is divided into three sections. Let’s first study each of them after that we will learn how to add custom robots.txt file in blogspot blogs.

1# User-agent: Mediapartners-Google 
This code is for Google Adsense robots which help them to serve better ads on your blog. Either you are using Google Adsense on your blog or not simply leave it as it is. 

2# User-agent: * 
This is for all robots marked with asterisk (*). In default settings our blog’s labels links are restricted to indexed by search crawlers that means the web crawlers will not index our labels page links because of below code.

Disallow: /search

That means the links having keyword search just after the domain name will be ignored. See below example which is a link of label page named SEO.

http://www.yourblogname.com/search/label/SEO 

And if you remove Disallow: /search from the above code then crawlers will access our entire blog to index and crawl all of its content and web pages. Here Allow: / refers to the Homepage that means web crawlers can crawl and index our blog’s homepage.

Disallow Particular Post
Now suppose if we want to exclude a particular post from indexing then we can add below lines in the code.

Disallow: /yyyy/mm/post-url.html

Here yyyy and mm refers to the publishing year and month of the post respectively. For example if we have published a post in year 2013 in month of March then we have to use below format.

Disallow: /2013/03/post-url.html

To make this task easy, you can simply copy the post URL and remove the blog name from the beginning.

Disallow Particular Page
If we need to disallow a particular page then we can use the same method as above. Simply copy the page URL and remove blog address from it which will something look like this:

Disallow: /p/page-url.html

3# Sitemap: http: //example.blogspot.com /feeds/posts/default?orderby= UPDATED
This code refers to the sitemap of our blog. By adding sitemap link here we are simply optimizing our blog’s crawling rate. Means whenever the web crawlers scan our robots.txt file they will find a path to our sitemap where all the links of our published posts present. Web crawlers will find it easy to crawl all of our posts. Hence, there are better chances that web crawlers crawl all of our blog posts without ignoring a single one. 

Note: This sitemap will only tell the web crawlers about the recent 25 posts. If you want to increase the number of link in your sitemap then replace default sitemap with below one. It will work for first 500 recent posts.  
 Sitemap: http://example.blogspot.com/atom.xml?redirect=false&start-index=1&max-results=500  


If you have more than 500 published posts in your blog then you can use two sitemaps like below:
1:  Sitemap: http://example.blogspot.com/atom.xml?redirect=false&start-index=1&max-results=500  
2:  Sitemap: http://example.blogspot.com/atom.xml?redirect=false&start-index=500&max-results=1000  

Adding Custom Robots.Txt to Blogger

Now the main part of this tutorial is how to add custom robots.txt in blogger. So below are steps to add it.
  • Go to your blogger blog.
  • Navigate to Settings >> Search Preferences ›› Crawlers and indexing ›› Custom robots.txt ›› Edit ›› Yes
  • Now paste your robots.txt file code in the box.
  • Click on Save Changes button.
  • Finish!

Post a Comment for "How To Customize The Robot.txt For Your Blogger"