Using robots.txt
How to control search engine crawling with a robots.txt file.
Robots.txt is a plain text file in your site's root directory that tells search engine crawlers which parts of your site to index and which to leave alone. It's one of the first things a bot checks when it visits your site.
Creating your robots.txt
- Create a text file named
robots.txt - Fill it in following the rules below
- Validate it using Google Search Console or Yandex Webmaster → Robots.txt Analysis
- Upload the file to your site's root directory so it's accessible at
http://example.com/robots.txt
If the file is missing or returns anything other than 200 OK, crawlers assume your entire site is open for indexing.
The User-agent directive
Specifies which crawler a set of rules applies to. Use a specific bot name to target one crawler, or * to address all of them.
User-agent: YandexBot # applies only to Yandex's main indexing bot
Disallow: /*id=
User-agent: Yandex # applies to all Yandex crawlers (unless overridden above)
Disallow: /*sid=
User-agent: * # applies to everyone else
Disallow: /cgi-bin
Yandex-specific bots you can target separately:
YandexBot— the main indexing crawlerYandexDirect— collects data for Yandex Advertising NetworkYandexDirectDyn— generates dynamic ad bannersYandexMedia— indexes multimedia contentYandexImages— Yandex Images indexerYandexBlogs— indexes blog posts and commentsYandexNews— Yandex News crawlerYandexPagechecker— microdata validatorYandexMetrika— Yandex Metrica crawlerYandexMarket— Yandex Market crawler
Disallow and Allow
Disallow — blocks a crawler from accessing a path:
User-agent: Yandex
Disallow: / # blocks the entire site
User-agent: Yandex
Disallow: /cgi-bin # blocks the /cgi-bin section only
Allow — explicitly permits access to a path, used alongside Disallow:
User-agent: Yandex
Allow: /cgi-bin
Disallow: / # blocks everything except /cgi-bin
The
#character marks a comment — everything after it on that line is ignored by crawlers. Leave a blank line between differentUser-agentblocks to keep things readable.
How directives are evaluated:
Rules are sorted by URL path length (shortest to longest) and applied in that order. The sequence they appear in the file doesn't matter — path length is what drives precedence.
# Written as:
User-agent: Yandex
Allow: /catalog
Disallow: /
# Evaluated as:
User-agent: Yandex
Disallow: / # blocks everything...
Allow: /catalog # ...except /catalog
# Written as:
User-agent: Yandex
Allow: /
Allow: /catalog/auto
Disallow: /catalog
# Evaluated as:
User-agent: Yandex
Allow: / # allows everything...
Disallow: /catalog # ...except /catalog...
Allow: /catalog/auto # ...but /catalog/auto is allowed again
The Sitemap directive
Points crawlers to your XML sitemap. List multiple files if you have more than one:
User-agent: Yandex
Allow: /
Sitemap: https://example.com/sitemap1.xml
Sitemap: https://example.com/sitemap2.xml
This directive is section-agnostic — it applies regardless of where it appears in the file.
The Host directive
Tells Yandex which domain is your canonical (main) mirror. It's not a guarantee, but Yandex weighs it heavily when making its decision:
User-agent: *
Disallow: /forum
Disallow: /cgi-bin
Host: https://www.example.com
The Crawl-delay directive
Sets a minimum wait time (in seconds) between page requests. Useful if crawling is putting strain on your server. Yandex supports decimal values:
User-agent: Yandex
Crawl-delay: 2 # 2-second pause between requests
User-agent: *
Disallow: /search
Crawl-delay: 4.5 # 4.5-second pause
The Clean-param directive
If your URLs contain dynamic parameters that don't change the actual page content — session IDs, referrer tokens, ad parameters — you can declare them with Clean-param. Yandex's crawler will then treat URLs that differ only in those parameters as duplicates and avoid re-crawling them, saving both bandwidth and server load.
Full documentation is available on the Yandex support site.
Help
If you have any questions or need assistance, please contact us through the ticket system — we're always here to help!