All search engine crawlers look for the robots.txt file when entering the site first of all. This is a text file located in the root directory of the site (in the same place as the main index file, for the main domain/site, this is the public_html folder), special instructions for search engine crawlers are written in it

    These instructions can prohibit indexing of folders or pages on the site, point the robot to the main mirror of the site, recommend to the search robot to comply with a certain time interval of site indexing and much more

    If the robotx.txt file is not in the directory of your site, then you can create it.
    To disallow indexing of the site through the robots.txt file, two directives are used: User-agent and Disallow.

    • User-agent: specify_search_bot
    • Disallow: / # will prohibit indexing of the entire site
    • Disallow: /page/ # will not allow indexing of /page/ directory

    Examples:

    Disallow indexing of your site by MSNbot*

    User-agent: MSNBot  
    Disallow: /  
    

    Prevent your site from being indexed by the Yahoo bot*

    User-agent: Slurp  
    Disallow: /  
    

    Prevent your site from being indexed by the Yandex bot*

    User-agent: Yandex  
    Disallow: /  
    

    *Prevent your site from being indexed by the Google bot

    User-agent: Googlebot  
    Disallow: /  
    

    Disallow indexing of your site by all search engines*

    User-agent: *  
    Disallow: /  
    

    Barring all search engines from indexing the cgi-bin and images folders

    User-agent: *  
    Disallow: /cgi-bin/  
    Disallow: /images/  
    

    Now how to allow all pages of the site to be indexed by all search engines (note: the equivalent of this instruction would be an empty robots.txt file):*

    User-agent: *  
    Disallow:  
    

    Example:

    Allow only Yandex, Google, Rambler bots to index the site with a delay of 4 seconds between page polls.

    User-agent: *  
    Disallow: /  
    
    User-agent: Yandex  
    Crawl-delay: 4  
    Disallow  
    
    User-agent: Googlebot  
    Crawl-delay: 4  
    Disallow  
    
    User-agent: StackRambler  
    Crawl-delay: 4  
    Disallow: