Five New Optimizely Certifications are Here! Validate your expertise and advance your career with our latest certification exams. Click here to find out more

Best way to index multiple attachments?

Vote:
 

To index a single attachment together with a page by adding a property of the Attachment type to the page model - like this - works well:
http://world.episerver.com/documentation/Items/Developers-Guide/EPiServer-Find/9/DotNET-Client-API/File-attachments/

But how to handle multiple attachments when they are dynamic in numbers?

(The reason for this question is to get a search hit on a page also from the contents of the files in a LinkItemCollection.)

#143331
Jan 20, 2016 13:54
Vote:
 

All right.
I know that this question is very old. (relativelly speaking) 
But I just had this problem and could not find any solutions to it while searching. Creating a IEnumerable<Attachment> is out of the picture so something else had to be done.

The solution that I came up with is not the most straight forward one. But it's a solution that fits my immediate needs.

What I finally settled on was to create a "In memory ziparchive" and attach all my documents to that zipfile. Once all documents where added to the "virtual zipfile" I attached the Zipfile 

pretty much like this 

page.Attachments = new Attachment(() => ZipFileStream)

The reason for creating a Zip file and not just simply stream all the files into a single MemoryStream is because that only works for plain text files. So I had to go down the route of a in memory zip archive

Following is a Rough implementation of the solution that I propose and is basically what we ended up using

private void ContentEvents_PublishedContent(object sender, EPiServer.ContentEventArgs e)
        {
            if (e.Content is PageWithMultipleDocuments page)
            {
                var paths = new List<string>() {@"F:\randomplaintext.txt", @"F:\randomplaintext2.txt", @"F:\randompdf2.pdf" };
                var zipFile = GetZipedAttachments(paths);
                page.Attachments = new Attachment(() => zipFile);
                SearchClient.Instance.Index(page);
                zipFile.Dispose();
            }
        }

        private Stream GetZipedAttachments(List<string> paths)
        {
            if (paths.Any(File.Exists))
            {
                var outFile = new MemoryStream();

                var zipArchive = new ZipArchive(outFile, ZipArchiveMode.Create, false);
                foreach (var path in paths.Where(File.Exists))
                {
                    var entry = zipArchive.CreateEntry(path, CompressionLevel.Fastest);
                    using (var entryStream = entry.Open())
                    {
                        var fileBytes = File.ReadAllBytes(path);
                        entryStream.Write(fileBytes, 0, fileBytes.Length);
                        entryStream.Close();
                    }
                }

                outFile.Position = 0;
                return outFile;
            }
            return null;
        }

Yes, I am aware that zipFile might be null and maybe a Null Check would be in order.

Edit: 
Instead of using a published event listener as exemplified in the example above. This is a better way to handle it:

SearchClient.Instance.Conventions.ForInstancesOf<PageWithMultipleDocuments>()
                .IncludeField(x => GetAttachments(x));

// GetAttachments basically returns new Attachment(() => zipFile)



#188673
Edited, Feb 28, 2018 22:14
Vote:
 

Interesting solution, thanks for sharing!

#188688
Mar 01, 2018 8:45
This topic was created over six months ago and has been resolved. If you have a similar question, please create a new topic and refer to this one.
* You are NOT allowed to include any hyperlinks in the post because your account hasn't associated to your company. User profile should be updated.