Using robots.txt

How to control search engine crawling with a robots.txt file.

Robots.txt is a plain text file in your site's root directory that tells search engine crawlers which parts of your site to index and which to leave alone. It's one of the first things a bot checks when it visits your site.

Creating your robots.txt

Create a text file named robots.txt
Fill it in following the rules below
Validate it using Google Search Console or Yandex Webmaster → Robots.txt Analysis
Upload the file to your site's root directory so it's accessible at http://example.com/robots.txt

If the file is missing or returns anything other than 200 OK, crawlers assume your entire site is open for indexing.

The User-agent directive

Specifies which crawler a set of rules applies to. Use a specific bot name to target one crawler, or * to address all of them.

User-agent: YandexBot   # applies only to Yandex's main indexing bot
Disallow: /*id=

User-agent: Yandex      # applies to all Yandex crawlers (unless overridden above)
Disallow: /*sid=

User-agent: *           # applies to everyone else
Disallow: /cgi-bin

Yandex-specific bots you can target separately:

YandexBot — the main indexing crawler
YandexDirect — collects data for Yandex Advertising Network
YandexDirectDyn — generates dynamic ad banners
YandexMedia — indexes multimedia content
YandexImages — Yandex Images indexer
YandexBlogs — indexes blog posts and comments
YandexNews — Yandex News crawler
YandexPagechecker — microdata validator
YandexMetrika — Yandex Metrica crawler
YandexMarket — Yandex Market crawler

Disallow and Allow

Disallow — blocks a crawler from accessing a path:

User-agent: Yandex
Disallow: /             # blocks the entire site

User-agent: Yandex
Disallow: /cgi-bin      # blocks the /cgi-bin section only

Allow — explicitly permits access to a path, used alongside Disallow:

User-agent: Yandex
Allow: /cgi-bin
Disallow: /             # blocks everything except /cgi-bin

The # character marks a comment — everything after it on that line is ignored by crawlers. Leave a blank line between different User-agent blocks to keep things readable.

How directives are evaluated:

Rules are sorted by URL path length (shortest to longest) and applied in that order. The sequence they appear in the file doesn't matter — path length is what drives precedence.

# Written as:
User-agent: Yandex
Allow: /catalog
Disallow: /

# Evaluated as:
User-agent: Yandex
Disallow: /             # blocks everything...
Allow: /catalog         # ...except /catalog

# Written as:
User-agent: Yandex
Allow: /
Allow: /catalog/auto
Disallow: /catalog

# Evaluated as:
User-agent: Yandex
Allow: /                # allows everything...
Disallow: /catalog      # ...except /catalog...
Allow: /catalog/auto    # ...but /catalog/auto is allowed again

The Sitemap directive

Points crawlers to your XML sitemap. List multiple files if you have more than one:

User-agent: Yandex
Allow: /
Sitemap: https://example.com/sitemap1.xml
Sitemap: https://example.com/sitemap2.xml

This directive is section-agnostic — it applies regardless of where it appears in the file.

The Host directive

Tells Yandex which domain is your canonical (main) mirror. It's not a guarantee, but Yandex weighs it heavily when making its decision:

User-agent: *
Disallow: /forum
Disallow: /cgi-bin
Host: https://www.example.com

The Crawl-delay directive

Sets a minimum wait time (in seconds) between page requests. Useful if crawling is putting strain on your server. Yandex supports decimal values:

User-agent: Yandex
Crawl-delay: 2          # 2-second pause between requests

User-agent: *
Disallow: /search
Crawl-delay: 4.5        # 4.5-second pause

The Clean-param directive

If your URLs contain dynamic parameters that don't change the actual page content — session IDs, referrer tokens, ad parameters — you can declare them with Clean-param. Yandex's crawler will then treat URLs that differ only in those parameters as duplicates and avoid re-crawling them, saving both bandwidth and server load.

Full documentation is available on the Yandex support site.

Help

If you have any questions or need assistance, please contact us through the ticket system — we're always here to help!