Some high-level musings on performance in a CMS
I’ve been working for some time with performance optimization of a reasonably large EPiServer site, and it’s been a very learning experience. This is not a post about specific fixes to specific issues, it’s about high-level design principles.
The lessons learned are applicable in many scenarios, but I’d like to share some of my high-level insights in the context of a CMS in general, and EPiServer in particular. While it’s not reasonable to expect any adoption of these principles in EPiServer in any near time, you may find some guidelines on what features to use when expecting heavy growth of the number of pages in your site.
In general, I have found that a common denominator for problems is site enumeration. Site enumeration should simply never be allowed - this precludes the current FindPagesWithCritiera architecture for example, and most of the other ways to produce deep listings. It also disallows the current mirroring, subscription or archival algorithms where the site is enumerated to find updates. All such operations have to be done by other mechanisms, perhaps using variations on the subscriber pattern to inform listeners of updates or placing these operations on a queue that can be processed in linear time in relation to the amount of actual work to do - not the total number of pages.
A different way to express the above – all operations taking measurable amount of time per page used by the application must have sub-linear time complexity in relation to the total number of pages. A truly scalable application will not implement any features that have linear or worse time complexity in relation to the total number of pages, if the time per page is measurable. It can still have linear time complexity as far as the result set is concerned for a given operation.
A good example of a nice way to implement a feature is the blog post describing upcoming improvements of dynamic properties in CMS 6 – that algorithm has linear time complexity in relation to the number of parents that a page has, that’s quite a change from the current implementation where cache revalidation has worse than linear time complexity in relation to the total number of pages!
In EPiServer, loading a page takes significant time. Therefore, any operation that has linear time complexity in relation to the total number of pages will break down quite quickly when the size of the site is significant. If we have 100 000 pages, 100 000 x measurable time == too long. That’s the simple fact.
A corollary that I have found is that while caching is great and required for performance, the first-time hit must still be within acceptable limits. The site must still run without the cache, just not great. Otherwise we end up with situations where the site cannot start at all when under load. EPiServer suffers from this scenario when certain lists and other operations require almost the whole site to be enumerated, causing site startup to be very difficult at times.
The caching corollary also leads to the conclusion that while you can place design limits requiring for example all pages to be resident in memory for optimal performance – it must still perform well enough to actually be able to start even when under maximum load before the site has been loaded into memory.
The rules apply to background jobs as well. In an EPiServer site that runs at the limit, running a job such as mirroring or archival may well act as a denial of service attack by invalidating caches to the extent that the site stops working, or by growing the memory use so a recycle is forced. So, not even those jobs can use site enumeration – also when the time required to run a job is several hours or more, it becomes extremely troublesome.
While the above conclusions may seem self-evident at first glance, please recall that EPiServer implements many features that break these rules, and many EPiServer sites face some serious performance issues when the number of pages grow.
I propose that a good rule to apply when designing a framework such as a programmable CMS, is that if a feature exists it will be used. It’s not sufficient to to document limitations with texts like “Don’t use for large amounts of data”. Either a feature can be used within the design constraints of the framework, or it cannot. Don’t allow different design goals for different parts of the framework. If you can’t make it fast enough, don’t do it.
In EPiServer, there are many features that tend to break down and become unusable as the total number of pages grow – I hope EPiServer over time will work to change the architecture to remove those obstacles to scalability. It’s still amazing what can be done with the current architecture, but at least in my current case I’ve had to work around quite a few issues that are caused by various EPiServer features having linear time complexity or worse in relation to the total number of pages in the site.
Comments