World is now on Opti ID! Learn more

Johan Björnfot
Nov 15, 2016
  1428
(0 votes)

Internationalized Resource Identifiers (IRIs)

An Internationalized Resource Identifier (IRI) is a network address that contain non ASCII characters as below:

Image IRI.png

EPiServer CMS has previously (prior to 10) only allowed characters in url segments according to RFC 1738 which basically allows ALPHA / DIGIT / '-'/ '_'/ '~' / '.'/ '$'/. 

It is now (from CMS.Core version 10.1.0) however possible to define a custom character set that are used for url segments and simple address. This is done by registering an instance of UrlSegmentOptions with a custom regular expression in IOC container. When an expression is set that allows characters outside RFC 1738 the setting UrlSegementOptions.Encode is recommended to be set to true so that url:s gets properly encoded. Below is an example of how a character set that allows unicode characters in the letter category.

using EPiServer.ServiceLocation;
using EPiServer.Framework.Initialization;
using EPiServer.Framework;
using EPiServer.Web;

namespace EPiServerSite6
{
    [ModuleDependency(typeof(EPiServer.Web.InitializationModule))]
    public class IRIConfigurationModule : IConfigurableModule
    {
        public void ConfigureContainer(ServiceConfigurationContext context)
        {
            context.Services.RemoveAll<UrlSegmentOptions>();
            context.Services.AddSingleton<UrlSegmentOptions>(s => new UrlSegmentOptions
            {
                Encode = true,
                ValidUrlCharacters = @"\p{L}0-9\-_~\.\$"
            });
        }

        public void Initialize(InitializationEngine context)
        {}

        public void Uninitialize(InitializationEngine context)
        {}
    }
}

UrlSegmentOptions also exposes a CharacterMap property where it is possible to define a mapping for unsupported characters, for example 'ö' => 'o'. 

Internationalized Domain Names (IDN)

As explained in IDN and IRI are internationalized domain names registered in its punycode format (a way of representing Unicode characters using only ASCII characters). 

Internationalized domain names should be registered in admin mode under Manage Websites in their punycode format. 

Nov 15, 2016

Comments

Please login to comment.
Latest blogs
Make Global Assets Site- and Language-Aware at Indexing Time

I had a support case the other day with a question around search on global assets on a multisite. This is the result of that investigation. This co...

dada | Jun 26, 2025

The remote server returned an error: (400) Bad Request – when configuring Azure Storage for an older Optimizely CMS site

How to fix a strange issue that occurred when I moved editor-uploaded files for some old Optimizely CMS 11 solutions to Azure Storage.

Tomas Hensrud Gulla | Jun 26, 2025 |

Enable Opal AI for your Optimizely products

Learn how to enable Opal AI, and meet your infinite workforce.

Tomas Hensrud Gulla | Jun 25, 2025 |

Deploying to Optimizely Frontend Hosting: A Practical Guide

Optimizely Frontend Hosting is a cloud-based solution for deploying headless frontend applications - currently supporting only Next.js projects. It...

Szymon Uryga | Jun 25, 2025

World on Opti ID

We're excited to announce that world.optimizely.com is now integrated with Opti ID! What does this mean for you? New Users:  You can now log in wit...

Patrick Lam | Jun 22, 2025

Avoid Scandinavian Letters in File Names in Optimizely CMS

Discover how Scandinavian letters in file names can break media in Optimizely CMS—and learn a simple code fix to automatically sanitize uploads for...

Henning Sjørbotten | Jun 19, 2025 |