How to create a Robots.txt handler for a multi-site episerver project

With my team we had the opportunity to start working on a new multi-site project using the EPiServer 11.20.0 & DXP and it's been a very exciting journey so far. We are planning to release 4 websites over the next months, each website with different templates, behaviors and routes.

While we are busy defining the different templates that each website will use, we also noticed that some components could be created once and be available for each website without a need to "override" their default behavior. I am talking about the following components:

Robots.txt file
Sitemap (xml) file

While the sitemap component is also worth an article on its own, today I want to explore the idea of a single entrypoint per website to generate a robots.txt in EPiServer.

It might sound controversial, but I do love being a lazy developer. I do love working with such developers. They don't write that much code. They might spent the time to look for plugins or examples online that have been battle-tested but in the end it's a methodology that is brings dividends over time.

So I started looking for plugins online. But we need to have rules. Without rules there's chaos.

So here's my set of rules:

The plugin / package must not be a beta.
The plugin / package must be up to date.
The plugin can be open-source but it has to come from a trusted source and must be maintained properly.
There's a possibility to add it from a package manager or to copy the parts that we need.
If the plugin is not open-source then there must be a contract involving maintenance.
It's often better to pay a license to make sure there's support instead of adding some dlls with little transparency. We do not like ticking bombs.
For this scenario, the plugin / package must be working in a multi-site environment.

Unfortunately, I couldn't find what I was looking for 😭 so I began to think of a way to put the pieces together.

Before starting our coding journey, let's list our requirements:

We want a unique "/routes.txt" endpoint available for each website we are hosting and get a different result based on the website we are visiting.
We want to be able to edit the content of our robots.txt file inside the CMS - and in one place only (per website).
We want to write as little code as possible.

First step was to allow Mvc attribute routes in our EPiServer project:

 [InitializableModule]
    [ModuleDependency(typeof(EPiServer.Web.InitializationModule))]
    public class AttributeRoutingInitializationModule : IInitializableModule
    {
        public void Initialize(InitializationEngine context)
        {
            var routes = RouteTable.Routes;

            routes.MapMvcAttributeRoutes();
        }

        public void Uninitialize(InitializationEngine context)
        {
            
        }
    }

This code will allow us to bind an MVC action to a specific route. In our scenario, I want the "/robots.txt" route to become an endpoint for my project.

Second step was to setup an interface that will contain the content of our robots.txt file:

public interface ISitePage
{
     string RobotsTxtContent { get; set; }
}

public class MyFreshWebsiteSitePage : PageData, ISitePage
{
        [UIHint(UIHint.Textarea)]
        public virtual string RobotsTxtContent { get; set; }
}

Each of my websites will have a site page that will be the first ancestor for all the content that I want. It looks like the right place to store a property that is unique for each website. I also want to edit this property inside a textarea:
We are almost there. Final step is to setup our MVC endpoint:

//after calling UrlResolver and IContentLoader in the controller constructor using Dependency injection

[Route("robots.txt")]
[HttpGet]
public ActionResult RenderRobotsTxt()
{
	var site = _contentLoader.Get<ISitePage>(ContentReference.StartPage);
	if (site == null)
		return new HttpNotFoundResult();

	var robotsContent = site.RobotsTxtContent;

	if (string.IsNullOrWhiteSpace(robotsContent))
		return new HttpNotFoundResult();

	return Content(robotsContent, "text/plain", Encoding.UTF8);
}

Before we wrap up and go home, let's analyse the code we have here. For the sake of my example I allowed some assumptions in order to find the page containing the Robots.txt content:

All the site pages are at the root level. It's easy to find the site using the GetRoot extension method ContentReference.StartPage as we know that the start page is the site page that contains the property we are looking for.

It is recommended to allow only a specific group of people to edit this property as it is a very sensitive one. We do not want Google to decide that one of our website should no longer be indexed. I would also recommend to add a "robots.txt" validator to make sure the syntax is the right one.

Update: I replaced the GetRoot and Url.GetLeftPart by a call to the cache with ContentReference.StartPage as reference. Credits to Stefan Holm Olsen.

From there we can locate the site page, find the content of our robots.txt and send it as a text ! Hurrayyyyyy 🥳🥳

Please feel free to provide feedback & comments in the comment section & don't forget to like the article if it was useful 🤩

Ps: While this solution will work for our project, we are currently considering the idea of having a more generic solution inside a NuGet package where we would "resolve" the path to the Robots.txt content property inside the CMS.

It could be some configuration in the code, a [RobotsTxtContent] attribute to decorate the property, etc. If you have a clever implementation in mind we would love to hear more about it in the comment section 😊

Oct 17, 2020

Comments

Please login to comment.