World is now on Opti ID! Learn more

Deane Barker
Sep 1, 2009
  4626
(0 votes)

EPiServerSearchMeta

Over the last few years, I’ve done four implementations of the Google Mini search appliance.  This is a piece of hardware (a 1U rack mount) that acts has a search crawler and engine.

It crawls your Web site (or whatever else you point it at) 24 hours a day, and you can throw queries at it via a REST interface, and get results back as XML (you can also transform the XML on the device itself, and use it to actually present queries to the end user, but this is awkward and requires you to dupe your interface on another machine, which is never fun).

The device is quite good for text-heavy search, and retails for $2,995, making it a cheap solution for a lot of situations.

The Mini can do fairly granular searching of META (search protocol reference). Over the years, we’ve figured out that you should stack as many META tags as possible in your pages, because you never know what you’re going to want to search on.  If, for instance, your client wants to isolate a search to just news articles, then it’s helpful to have a META tag in there with the type of content (alternately, you could create a distinct collection in the device, but maintaining these can be tedious).

For another CMS, we developed a control that dumped all sorts of META to the HEAD tag of the page.  We refined this over the years to only run for the Mini, since it got to the point where it was computationally expensive to find and return all this information, and we only needed it for the Mini (we didn’t need it for public search engines, for instance).

For our first EPiServer/Mini integration, we adapted the control a bit, but the functionality is roughly the same – it dumps all sort of information to META tags, including any properties you might specify.

Register it like this:

<%@ Register TagPrefix=”Blend” Namespace=”Blend.EPiServer.Controls” Assembly=”[insert your assembly name here"]” />

Then put the control in the HEAD tag like this:

<Blend:EPiServerSearchMeta TagNameFormat="MySite.EPiServer.{0}" UserAgentString=”gsa” QuerystringCode=”OpenSesame” Properties="Title,Summary" runat="server" />

It will only run when the currently executing page is of type TemplatePage (so, only for EPiServer templates that have a content object attached).

The control outputs the following information:

  • The page ID
  • The page type ID
  • The page type name
  • The page name
  • The parent page ID
  • The parent page type ID
  • The parent page type name
  • Every page ID from the current page’s parent back to the start page (in multiple META tags)
  • The depth of the page (the start page is 0, top level pages are 1, etc.)

It looks like this:

<meta name="MySite.EPiServer.PageID" content="9" />
<meta name="MySite.EPiServer.PageTypeID" content="7" />
<meta name="MySite.EPiServer.PageTypeName" content="NewsArticle" />
<meta name="MySite.EPiServer.PageName" content="Deane Saves the World" />
<meta name="MySite.EPiServer.ParentPageID" content="8" />
<meta name="MySite.EPiServer.ParentTypeID" content="5" />
<meta name="MySite.EPiServer.ParentTypeName" content="NewsArchive" />
<meta name="MySite.EPiServer.AncestorID" content="8" />
<meta name="MySite.EPiServer.AncestorID" content="7" />
<meta name="MySite.EPiServer.AncestorID" content="3" />
<meta name="MySite.EPiServer.PageDepth" content="3" />
<meta name="MySite.EPiServer.Category" content="7" />
<meta name="MySite.EPiServer.Category" content="9" />
<meta name="MySite.EPiServer.Category" content="13" />
<meta name="MySite.EPiServer.Category" content="15" />
<meta name="MySite.EPiServer.Category" content="16" />

There are a few control attributes…

TagNameFormat is the format of the “name” attribute of the resulting META tag.  So, in the above example, the Page Type ID of the content will output as:

<meta name=”MySite.EPiServer.PageTypeID” content=”7”/>

Properties is a comma-delimited list of properties you want to dump to META.  Be careful here, obviously – the entire text of the content object is unnecessary and potentially problematic.  The control will simply call ToWebString() on all of them, so make sure this outputs what you want.  Also, if the property is a Category selection, the control will split the IDs up under separate tags.

UserAgentString is used to identify the crawler. Enter a value in here that will be unique to the user agent string of your crawler – “gsa” works well for the Mini.  If the control finds this string it will execute, otherwise it will exit without doing anything.

QuerystringCode is a secret code you can use to debug the control.  If this value is found in a querystring argument called “show_meta,” the control will always execute (regardless of the user agent string). This is useful for debugging, so you can see the META it outputs.

Get the Code (.zip file, containing a single .cs file)

Sep 01, 2009

Comments

Please login to comment.
Latest blogs
Make Global Assets Site- and Language-Aware at Indexing Time

I had a support case the other day with a question around search on global assets on a multisite. This is the result of that investigation. This co...

dada | Jun 26, 2025

The remote server returned an error: (400) Bad Request – when configuring Azure Storage for an older Optimizely CMS site

How to fix a strange issue that occurred when I moved editor-uploaded files for some old Optimizely CMS 11 solutions to Azure Storage.

Tomas Hensrud Gulla | Jun 26, 2025 |

Enable Opal AI for your Optimizely products

Learn how to enable Opal AI, and meet your infinite workforce.

Tomas Hensrud Gulla | Jun 25, 2025 |

Deploying to Optimizely Frontend Hosting: A Practical Guide

Optimizely Frontend Hosting is a cloud-based solution for deploying headless frontend applications - currently supporting only Next.js projects. It...

Szymon Uryga | Jun 25, 2025

World on Opti ID

We're excited to announce that world.optimizely.com is now integrated with Opti ID! What does this mean for you? New Users:  You can now log in wit...

Patrick Lam | Jun 22, 2025

Avoid Scandinavian Letters in File Names in Optimizely CMS

Discover how Scandinavian letters in file names can break media in Optimizely CMS—and learn a simple code fix to automatically sanitize uploads for...

Henning Sjørbotten | Jun 19, 2025 |