A critical vulnerability was discovered in React Server Components (Next.js). Our systems remain protected but we advise to update packages to newest version. Learn More.

AI OnAI Off

Home / 2016 / 8 /

Viktor Sahlström - Search & Navigation

Aug 22, 2016

3254

(0 votes)

Episerver Find stemming

One of the things that makes search challenging (and really interesting) is language handling. Often, spoken languages differ from what a programmer is familiar with, since the rules are more comparable to a never-ending set of exceptions than actual rules. To properly analyze a string of text, the system must understand all of these exceptions.

A great feature of Episerver Find is stemming. Stemming reduces an inflected word to its root form (a.k.a. stem), for example, "fishing", "fished", and "fisher" have a root word of "fish." If the root word is determined, it can be used to return the full set of related items, thus improving retrievability and relevancy of search results.

Episerver Find uses snowball stemmers shipped as the default stemmer with Elastic search. This stemmer handles the general rules quite well but does not handle all special cases. In many languages, this works very well (English, for example). But depending on the complexity of the language and the maturity of the stemmer, this is not always enough. Swedish is a case where the default stemmer is not always perfect in execution. Often, the default stemmer creates a conflict that, in turn, causes unexpected search hits.

As an example, consider the Swedish words “bananen” (the banana) and “banans” (the race tracks). Using normal Swedish stemming rules, they would both be stemmed down to “banan.” In this case, any search also stemmed down to “banan” would give both results even though half of them are not relevant.

To fix this, a list of exceptions has been added to the Find stemming. We have started with Swedish and will look at additional languages going forward. The new algorithm recognizes that “bananen” and “banans” are different words even though their stem is the same. Hence, it creates unique tokens from them so the search engine can distinguish them at query time. This is a great improvement to search relevancy in many cases. One thing that remains to be solved is the case of “banan” (banana) and “banan” (the race track). In this form, the words are spelled exactly the same and cannot be distinguished without looking at the context. For these cases, search results are returned for both words.

To keep the list updated we would love users and partners to let us know if they find searches that results in weird results.

Aug 22, 2016

Comments

Please login to comment.

A day in the life of an Optimizely OMVP: Learning Optimizely Just Got Easier: Introducing the Optimizely Learning Centre

On the back of my last post about the Opti Graph Learning Centre, I am now happy to announce a revamped interactive learning platform that makes...

Graham Carr | Jan 31, 2026

Scheduled job for deleting content types and all related content

In my previous blog post which was about getting an overview of your sites content https://world.optimizely.com/blogs/Per-Nergard/Dates/2026/1/sche...

Per Nergård (MVP) | Jan 30, 2026

Working With Applications in Optimizely CMS 13

💡 Note: The following content has been written based on Optimizely CMS 13 Preview 2 and may not accurately reflect the final release version. As...

Mark Stott | Jan 30, 2026

Experimentation at Speed Using Optimizely Opal and Web Experimentation

If you are working in experimentation, you will know that speed matters. The quicker you can go from idea to implementation, the faster you can...

Minesh Shah (Netcel) | Jan 30, 2026

How to run Optimizely CMS on VS Code Dev Containers

VS Code Dev Containers is an extension that allows you to use a Docker container as a full-featured development environment. Instead of installing...

Daniel Halse | Jan 30, 2026

A day in the life of an Optimizely OMVP: Introducing Optimizely Graph Learning Centre Beta: Master GraphQL for Content Delivery

GraphQL is transforming how developers query and deliver content from Optimizely CMS. But let's be honest—there's a learning curve. Between...

Graham Carr | Jan 30, 2026

See all blogs