Management of the robots.txt file has been significantly updated in order to protect sites from accidental misconfigurations. This should also allow greater flexibility for better SEO for most sites.

The robots.txt configuration panel in the settings area has now been removed. Per-URL management of robots.txt will now be available under the URL map.

New robots.txt management in the URL panel

New robots.txt management is available in the URL panel.

In each URL, there are now two new checkboxes: "allow archiving" and "prevent indexing." These new checkboxes will allow those configurations to be set on any specific URL.

  • Allow archiving of content on search engines: The default behavior is to "disallow" to prevent unintended paywall bypassing issues. This will add the noarchive attribute to a X-Robots-Tag HTTP header.
  • Prevent indexing of content for this and all descendant URLs: This will modify the robots.txt file to automatically block this URL on entries (similar to automatic system exclusions).

Changing one of these configurations only needs to happen at the top-level URL, and it will then inherit down to sub-URLs. To change a configuration for the entire site, it only needs to be changed on the root URL. Switching to another setting in a sub-URL will override the inherited configuration for that URL and its children.

When this BLOX Core software version is released, existing robots.txt files will be migrated to the new settings, which will be reflected in the appropriate URL panels.

By default, the new robots.txt configuration is much more open, without blanket restrictions in place as it was previously. Thus, entitlements to proactively allow access for specific search bots (such as bots related to Twitter or Facebook) are no longer needed. If there are restrictions created for a specific bot, all will be restricted in the URL map settings as part of the site migration.

Blocking specific asset UUIDs is now ignored. Please contact Customer Support with any issues related to this.

Blocking anything related to the /app directory is also ignored. If robot blocks are needed in the /app directory, please handle these issues directly in the application by using an X-Roots-Tag in the header of your application, or a meta tag.

Robot management settings will be maintained when the URL map is duplicated, imported or exported.

Read the detailed release notes here.