Take the community feedback survey now.

Ben Nitti
May 1, 2021
  72
(0 votes)

Optimizing your Optimizely Search & Navigation service for large files

Awhile ago I had a client with an excess of large files. I had increased their upload size limit to 2 GB and many of their documents were between 50 MB; a dozen or so files were between 1 and 2 GB.  Episerver recommends not exceeding the  by default 50 MB maximum request size.

Not surprisingly the indexing job started timing out and required immediate attention. 

I found there were several ways to tweak the the performance by filtering these files from the indexing job. 

I created an initialization module and changed the default batch sizes for the Find service. ContentBatchSize is used for the find index job, MediaBatchSize is for the event-driven indexing on media types. 

    [InitializableModule]
    [ModuleDependency(typeof(IndexingModule))]
    public class FileIndexingConventions : IInitializableModule
    {
        public void Initialize(InitializationEngine context)
        {
            ContentIndexer.Instance.MediaBatchSize = 3;     // Default is 5
            ContentIndexer.Instance.ContentBatchSize = 50;  // Default is 100
        }

        public void Uninitialize(InitializationEngine context)
        {
            throw new NotImplementedException();
        }
    }

I had several ways to filter out these large files. I could filter out IContentMedia from the index entirely or do the same with a custom type for pdfs and zip extensions.

ContentIndexer.Instance.Conventions.ForInstancesOf<MyPdfMediaType>().ShouldIndex(x => false);

Alternatively, I could stop the binary data from being indexed by decorating the propery with the [JsonIgnore] attribute:

    public class MyPdfMediaType : MediaData
    {
        [JsonIgnore]
        public override Blob BinaryData { get; set; }
    }

But since the client wanted to have the file content searchable, I decided only to filter the property when the filesize reached the find service limit. 

ContentIndexer.Instance.Conventions.ForInstancesOf<IContentMedia>().IndexAttachment(x => !IsFileSizeLimitReached(x));

...and for this I used an extention method to check against filesize binary data:

        private static bool IsFileSizeLimitReached(IBinaryStorable binaryContent)
        {
            // Note: 37 MB max. size refers to the base64 encoded file size .
            const int limitKb = 37000;

            try
            {
                var blobByte = (binaryContent.BinaryData as AzureBlob)?.ReadAllBytes() ??
                               (binaryContent.BinaryData as FileBlob)?.ReadAllBytes();

                if (blobByte == null)
                    return false;

                double fileSize = blobByte.Length;

                var isLimitReached = (int)(fileSize / 1024) >= limitKb;

                return isLimitReached;
            }
            catch
            {
                return false;
            }
        }

Once in place I was able to run the job with no exceptions, no timeouts and a happy client!

May 01, 2021

Comments

Please login to comment.
Latest blogs
A day in the life of an Optimizely OMVP - Opticon London 2025

This installment of a day in the life of an Optimizely OMVP gives an in-depth coverage of my trip down to London to attend Opticon London 2025 held...

Graham Carr | Oct 2, 2025

Optimizely Web Experimentation Using Real-Time Segments: A Step-by-Step Guide

  Introduction Personalization has become de facto standard for any digital channel to improve the user's engagement KPI’s.  Personalization uses...

Ratish | Oct 1, 2025 |

Trigger DXP Warmup Locally to Catch Bugs & Performance Issues Early

Here’s our documentation on warmup in DXP : 🔗 https://docs.developers.optimizely.com/digital-experience-platform/docs/warming-up-sites What I didn...

dada | Sep 29, 2025

Creating Opal Tools for Stott Robots Handler

This summer, the Netcel Development team and I took part in Optimizely’s Opal Hackathon. The challenge from Optimizely was to extend Opal’s abiliti...

Mark Stott | Sep 28, 2025

Integrating Commerce Search v3 (Vertex AI) with Optimizely Configured Commerce

Introduction This blog provides a technical guide for integrating Commerce Search v3, which leverages Google Cloud's Vertex AI Search, into an...

Vaibhav | Sep 27, 2025

A day in the life of an Optimizely MVP - Opti Graph Extensions add-on v1.0.0 released

I am pleased to announce that the official v1.0.0 of the Opti Graph Extensions add-on has now been released and is generally available. Refer to my...

Graham Carr | Sep 25, 2025