Make effective use of robots.txt

Welcome to your guide on the best practices for search engine optimization (SEO). We will be covering how you can make effective use of the robots.txt file.

Restrict crawling where it’s not needed with robots.txt

The root directory of your site houses a file named “robots.txt” that gives search engines permission to access and crawl certain sections of your website.

Certain pages are meant to be kept private from a search engine, and Google Webmaster Tools has a generator intended to help you establish robots.txt on the right pages, so only those pages and subdomains you wish to share will appear in search results.

Google Webmaster Tools can also help you remove content that is currently accessible. Another way to block content from popping up in search engines is by password protecting directories using .htaccess or by adding “NOINDEX” to your robots meta tag.

Summary

Use the tools available to you to control the content you want accessible for crawling!

 Robots Exclusion Standard is not a guaranteed method for keeping your confidential information out of search results. There could still be a reference in the results to blocked URls, and more information could be revealed by non-compliant or rogue search engines that disregard robots.txt. In order to keep the information secure from users who may be able to track your URL through your directories or sub-directories, be sure to encrypt the sensitive content with a password or .htaccess.

  • Letting proxy services create crawlable URLs on your site

Next, learn how to be aware of rel=”nofollow” for links or click here to learn how you should use heading tags or here for the introduction to SEO Best Practices.

  • Determining if you need to be on search engines
  • Getting found on search engines
  • Getting phone calls and emails leads from your web site
  • Where to start with your web site