Technical Documentation

Understanding Robots.txt

What is a Robots.txt file?

A robots.txt file is a text file that tells search engine crawlers which pages or files the crawler can or can't request from your site. This is used mainly to avoid overloading your site with requests; it is not a mechanism for keeping a web page out of Google.

User-agent: *
Disallow: /admin/
Sitemap: https://example.com/sitemap.xml

How to use the Architect

Define your User-Agents: Use '*' for all crawlers or specify by name.
Add Directives: Use 'Allow' or 'Disallow' followed by the relative path.
Include Sitemaps: Add the full absolute URL to your sitemap files.
Add ASCII Art: Upload your logo to generate a unique header/footer.
Export: Copy the code or download the file directly to your root directory.

Best Practices

Always ensure your robots.txt file is located in the root directory of your site (e.g., example.com/robots.txt). Use relative paths for directives and absolute URLs for sitemaps. Remember that robots.txt is a public file—do not use it to hide sensitive information.