Robots.txt works on the directory or site level – meta robots, on the other hand, instruct robots on how to behave on the page level.
The meta robots tag should be placed in the <head>. The format is:
<meta name=“ ”, content=“ ”>
In the meta name, you can put robots (to inform all bots how they should behave), or a specific bot (like Googlebot).
While there are a variety of things you can do with meta robots, for this introductory guide we’ll focus on content=“noindex” and content= “nofollow”, conveniently known, respectively, as noindex and nofollow.
Noindex is useful in a ton of scenarios – according to Google, any web page you don’t want indexed should be tagged with noindex. That’s because Google can index pages without crawling them if they’re pointed to by links.
Internal search results are great pages for noindex. You don’t want to include them in your robots.txt, because you want Googlebot and other search engine crawlers to follow all of the links on your search page – it’s a great way for them to get a more complete inventory of your website. On the flipside, if I’m using a search engine, the last thing I want in the results is a link to more search engine results.
The nofollow tag tells bots not to follow any links on a given page. This tag is rarely used on the page level, because you can instead choose to nofollow specific links by following this format when linking:
<a href=”https://websiteurl.com” title=”Website URL” rel=”nofollow”>.
There are a ton of things you might want to nofollow, but you should know that two of the most common nofollow cases, sponsored links and user comments, have specific code: for sponsored links, you should use rel=“sponsored”, while for user generated content, you should use rel=“ugc”.
X-Robots can do all of the things meta robots can do and more – you can, for example, use X-Robots to block bots from crawling and indexing any pdf on your site. While using X-Robots for this type of work falls outside the scope of this beginner’s guide, you can check out Google’s guide to X-Robots-Tag.
One last note before we get off the topic of robots – some of you might have been using the noindex tag in your robots.txt file. Google no longer recognizes noindex in robots.txt, so you’ll have to manually noindex those pages.
You should also avoid including pages you don’t want indexed in your robots.txt. That might seem counterintuitive, but if a page can’t be crawled, search engines can’t see the noindex tag, and so they may index the page from links.