Blocking search engine indexing with robots.txt
How to control which pages and directories search engines can index.
robots.txt is a plain text file that sits in your site's root directory and tells search engine crawlers how to behave on your site. It's the very first thing any bot looks for when it visits.
You can use it to:
- Block indexing of specific pages or directories
- Point search engines to your canonical domain
- Set a crawl delay between page requests
- And much more
The file belongs in your site's root directory — the same place as your main index.* file. For your primary domain, that's the public_html folder. If it doesn't exist yet, just create it.
Core directives
User-agent— specifies which crawler the rule applies to. Use*to target all bots.Disallow— blocks the specified path from being indexed. An empty value means no restrictions.Crawl-delay— suggests a delay (in seconds) between consecutive page requests.
Examples
Block a specific crawler:
# Block Googlebot
User-agent: Googlebot
Disallow: /
# Block Yandex
User-agent: Yandex
Disallow: /
# Block MSNBot (Bing)
User-agent: MSNBot
Disallow: /
# Block Yahoo
User-agent: Slurp
Disallow: /
Block all search engines:
User-agent: *
Disallow: /
Block specific directories:
User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Allow all search engines to index everything:
User-agent: *
Disallow:
An empty
Disallowvalue is equivalent to having no robots.txt file at all — everything is open.
Our products and services
Allow only specific crawlers, with a crawl delay: In the example below, the entire site is blocked for all bots except Yandex, Google, and Rambler. Each of those is given a 4-second delay between page requests:
User-agent: *
Disallow: /
User-agent: Yandex
Crawl-delay: 4
Disallow:
User-agent: Googlebot
Crawl-delay: 4
Disallow:
User-agent: StackRambler
Crawl-delay: 4
Disallow:
Help
If you have any questions or need assistance, please contact us through the ticket system — we're always here to help!