Quantcast
Channel: Ryan Bailey Development
Viewing all 287 articles
Browse latest View live

Sitecore Lucene searching by phrase and each of its words

$
0
0
With simple implementations of search in Sitecore, one might be using search to find items in the index where the content is like or contains the exact search term. This works well if a user is searching for a given keyword, however if they enter a phrase we would also want to return results which match certain words in that phrase (albeit with less of a boost).

This can be achieved using the predicate builder in Sitecore, along with the standard search code and indexes in Lucene. In this example I am searching on a simple Sitecore page template, and in particular 3 of it's fields: title, page description and main text. These fields have been included in the custom Lucene index XML and have the storage type set to true (this means I can access the data stored in the fields from the search index without having to get the Sitecore item).

In the example above there are two objects defined: the SearchResult object which contains the fields from the index we want to pass through to the front end and the SearchModel object which maps the data from the Lucene index.

The GetSearchPredicate method builds up the object for which each indexed item will be evaluated against. If an indexed item meets the logic inside this predicate, then it will be returned as a search result. In this case there are 3 main parts to this predicate object:
  1. Searching the indexed fields for content that is like the search term. This has a boost of 60 because like is not an exact match.
  2. Searching the indexed fields for content that contains the search term. This would be considered an exact match so has a boost of 100.
  3. Searching the indexed fields for content that contains each of the words that make up the search phrase. This has a boost of 20, because the content might not be exactly what the user is searching for but may be relevant.
These pieces are all connected together with or statements which means a result only needs to match one of these to be returned. For more complex searching you could use the and statement.

Then in the main DoSearch method, we are getting the search index and creating a context for it. We then query this index for items which match the search predicate built up earlier (items that match any of the true statements for contains/likes). We then get all of the search results and parse them into a custom object for use on the front end. Please note that this can be an expensive task to run, so it is recommended that you get the results by paging (with Google how often do you go past the first page or 2?).

This example extends on from Sitecore 8 Lucene search index with some help from stack overflow.

Sitecore Lucene indexing datasource child content against an item

$
0
0
Sitecore Lucene search indexes and code can be simple to implement when a given item contains all of the fields which will need to be indexed against itself. This can get more complicated when it comes to sublayouts/renderings which contain content from a datasource (another item).

One solution is to create a custom index field which concatenates the content from child items/datasources into the Lucene index against the main item. This way all of the content that makes up a page is indexable against that page.

In this example, each item of page type is a Sitecore branch that automatically creates a child item called datasources. This can contain templates (with rich text fields) which are used as datasources for displaying additional content on the page. In this case we would want these datasources indexed against the parent item, because they only appear there (they don't have a layout assigned to them directly).

The code sample above contain the custom index field IndexChildContent. This looks for a child item of the datasources template type, then gets items inside that datasources folder which are of the template types which we wish to index. It then concatenates content from a rich text field called text (off of these indexable templates) into a string which is then the value stored in the index as the custom field.

The snippet from the Lucene index XML shows how the field is added as a computed index field stored locally with the name childcontent.

This is only a simple example, you would likely want to add logging and check if the item being indexed is of the correct type - for example media items won't have the datasources folder.

Sitecore document manager in rich text editor coming up empty

$
0
0
When using the document manager in the rich text editor of Sitecore 8, the UI would be coming up empty but saying it was showing x items of x.


A fix for this one is to make the following change to the EditorPage.aspx file located at Website\sitecore\shell\Controls\Rich Text Editor.

At the top of the file you will see the following:
                MediaManager-UploadPaths="/media library"
                MediaManager-DeletePaths="/media library"
                MediaManager-ViewPaths="/media library"
                TemplateManager-UploadPaths="/media library"
                TemplateManager-DeletePaths="/media library"
                TemplateManager-ViewPaths="/media library"

Insert the following two lines:
                DocumentManager-ViewPaths="/"
                DocumentManager-DeletePaths="/"

Then have the users clear their browsers cache and the document manager should be working as expected.

Source: Sitecore community.

Sitecore allowing content authors to exclude items from search

$
0
0
Third party search engines like Google will find pages to index via a sitemap or what is linked off of other pages on the internet. This makes it easier to control what pages are index-able and able to be found by users. With Sitecore any item which has a template included in the search index (or not excluded) will be indexed - as long as it's published to the web database. For system pages or campaign pages for example, you might not want users to be able to find them in your custom site search (on a case by case basis). There is a simple solution to this, which empowers content authors to exclude a given page/item as required.

Template changes

The first change to make is to add the following field to any templates which will be indexed by the Lucene search index. This can be done with a base template or adding it the relevant template(s) manually.

It is an exclude from search checkbox field, that when checked will be used to indicate a result that should not appear on the search front end. Technically the item is still inside the Lucene index, but we will use this field in the custom search code.

Lucene search index (XML)

Now we need to index the exclude from search field and set the storage type to yes so that the field value is accessible without having to get each individual Sitecore item. For a full sample of a Lucene index see Sitecore 8 search implementation.


Search code

In a previous post I introduced the concept of using a predicate builder to query the items in the Lucene index. This concept works well because we can have all of our normal search logic using or statements (where field A is like or field B is like) and then on top of that have an and statement where exclude from search is false.

Now when we query the results, any items with the exclude from search check box checked, will not be returned.

Sitecore 8 Lucene indexing PDF content

$
0
0
By default, Sitecore will not index the content inside document types such as PDF or DOC. It requires the use of an iFilter and a custom Sitecore index field. Simply put, the custom index field will read the content of the document (using the iFilter) and then the content will be inside the Lucene index and available for searching.

A number of various iFilters are available for use with PDFs:
In this example I will be using the PDFBox.NET (version 1.8.9) iFilter. This is due to the fact that it has no cost, and installs cleaner (not in program files etc.).

Installing PDFBox.NET

Download the latest version and then open the package to find a folder of DLLs. Your visual studio solution will require the following references:
  • IKVM.OpenJDK.Core.dll
  • IKVM.OpenJDK.SwingAWT.dll
  • pdfbox-1.8.9.dll
and the bin folder of your Sitecore web site will require the following references:
  • commons-logging.dll
  • fontbox-1.8.9.dll
  • IKVM.OpenJDK.Text.dll
  • IKVM.OpenJDK.Util.dll
  • IKVM.Runtime.dll

Creating the custom search index field

On the index definition XML document the computed field index will need to be defined, and the actual logic implemented.

This computed field uses the iFilter to read in the PDF content and then append it to the main _content field in Sitecore. The storage type is not set to no by default, however I have tested it with stored content to allow the context of the search term to be shown in the results.

Querying the PDF content

As with any indexed field in Lucene, we simply map the indexed field (_content or custom) to out model and can then use predicate or linq logic to query it.

Sitecore Lucene facet phrases or sentences with spaces

$
0
0
I came across an interesting error with an implementation of Sitecore search using Lucene. I had a facet based on a phrase (page category) that contained spaces. When the facets came back on search code they were split up into individual words - "New" and "Zealand" instead of "New Zealand".

I spent a bit of time searching for a resolution for this one, however most results appeared to work with SOLR (where an untokenized field simply fixed the issue). Then I came across this post on the Sitecore blog.

The fix is to define your computed index field as untokenized (which I had already done), but then to also add that same field under the standard fieldMap> fieldNames element - with the LowerCaseKeywordAnalyzer.

That fix worked and my facets were coming back as single phrases. Of course if you are faceting and searching on the same field, you should probably have a separate version that does not use this fix (so that searching on the individual words works as expected).

Facets with Sitecore lucene search

$
0
0
With search engines, faceting is a concept which allows users to filter the results set to give them more relevant results to what they are attempting to search. A common example would be an online clothing store; when searching for clothing they would have facets on type (mens, womans or childrens) and even sizing (small, medium large, etc.). These facets are great from a users perspective, because it allows them to filter out results that are not relevant to them (to use that clothing store example again, I would only be interested in mens clothing in my size).

With Sitecore search using Lucene, facets are simple to implement and make for a much better search experience for users.

In this example, both web page and PDF content is being indexed by Lucene, therefore the facet will be based on content type: either web page or document. Please note that if your facet value stored in the index has spaces, then you will need to read this post on facet phrases or sentences with spaces.

Computed field definition

A facet will require an indexed field in the Lucene search index configuration XML to actually perform the facet on. In this case we will use a computed index field to store whether a given item in the index is a PDF document or a web page.

A pretty simple computed field where we check if the item is a media item or not. Notice that the index type is untokenized because we want the result stored as a single value.

Search code

Implementing the facet into your search code is relatively simple, especially if you are building your search query using predicate logic

As you can see the build predicate logic has the standard or statements to search the actual content. We then combine this with another predicate logic which selects items which are of a specific type. if no type is selected, all of the results that meet the search phrase would appear as normal.

If you had multiple facet selections (for example type web page, PDF and word document selected), the type predicate builder would loop through each selected type and build an or predicate. Thus the results would return where there is a search term match and any of the selected type facets are met.

Getting facets with count for a given search

In the example above, there were only two possible outcomes for the facet (web page or PDF), so the front end logic used a switch to display the facets. However in cases where there could be any number of facets, it is useful to display them on the page with a count - much like the date facets on this blog (each year and month shows a count of articles).

Now we have a list of facets and their counts, and we can display them on the front-end with checkboxes or a simple list.

Sitecore Lucene search all documents have the same score of 1

$
0
0
I spent a bit of time banging my head against the wall when my Sitecore Lucene search was returning all documents with the score of 1. Even with boosting on key fields, less relevant documents were appearing first (because they all had the same score).

In this particular case the issue was the when building the search query I was using filter instead of where.
var searchResults = searchContext.GetQueryable<SearchModel>().Where(searchPredicate);
 would be correct instead of:
var searchResults = searchContext.GetQueryable<SearchModel>().Filter(searchPredicate);

Sitecore Lucene search index when an item was last updated

$
0
0
With your Sitecore search it can be useful to the end user to provide sorting based on the last updated date of the Sitecore item. A good use case of this is actually when using Google to search for Sitecore help - if you set the results to show from the past year, you are more likely to get results relating to the current version of Sitecore.

To enable sorting by updated date in Sitecore you will need to first, add the __smallupdateddate field to your search Model. Please note that this is small updated date... and is a default field that is included in the Lucene index.
[IndexField("__smallupdateddate")]
public DateTime LastUpdated { get; set; }
Once this has been added, you can then sort by this with your search query:
searchResults = searchContext.GetQueryable<SearchModel>().Filter(searchPredicate).OrderBy(x => x.LastUpdated);
For more complex logic such as Google's last year filtering, you could use the where clause in your search logic:
.Where(x => x.LastUpdated >= DateTime.Now.AddYears(-1))

Sitecore accessing a search index whilst it is rebuilding

$
0
0
Depending on the indexing strategy of your Sitecore search index, there will be time when the index needs to rebuild. This may be at the end of publish (onPublishEndAsync) or even after a full publish (rebuildAfterFullPublish). What you might not realize is that the index will be unavailable (and therefore the search itself) during this republish. This is not best practice as it can ;lead to large amounts of downtime depending  on the size of the index and how often it's rebuilt.
Luckily there is a solution where Sitecore will keep two versions of the search index, so that whilst one is being rebuilt, another will be available for querying. This can be enabled on the main XML definition of your Lucene search index.

A standard Lucene search index might be defined by:
<index id="MyIndex" type="Sitecore.ContentSearch.LuceneProvider.LuceneIndex, Sitecore.ContentSearch.LuceneProvider">
 However to enable the duplication of index, you will need to set the type to SwitchOnRebuildLuceneIndex.
<index id="MyIndex" type="Sitecore.ContentSearch.LuceneProvider.SwitchOnRebuildLuceneIndex, Sitecore.ContentSearch.LuceneProvider">
Please be aware that this will then require double the storage space (as there are two versions of the index) and it's always worth investigating your indexing strategy to ensure it meets the given Sitecore implementations needs.

Sitecore accessing items in a computed index field

$
0
0
When the search index is indexing a computed index field for a given item, there is no Sitecore context. This means that if you have code which requires the context of Sitecore - to get an item for example, then there will be a null reference exception.

I found this when I was using a helper class to generate the custom title of my pages - to then be indexed. This class used the context to get a configuration item and therefore was throwing an exception for every page.

The simple solution here is that instead of referencing the context to get an item like follows:
var item = Context.Database.GetItem("Mypath");
You reference the given database directly:
 var webDb = Sitecore.Configuration.Factory.GetDatabase("web");
 var item = webDb.GetItem("MyPath");
This will now get the computed index field working as expected.

Sitecore error when highlighting search keywords with Lucene

$
0
0
When referencing the Lucene.Net DLL that comes out of the box with Sitecore (version 7 or 8 with my tests), you will encounter the following error when attempting to highlight the search keywords.
Method not found: 'System.Collections.Generic.ISet`1<!!0> Lucene.Net.Support.Compatibility.SetFactory.CreateHashSet()'.
This is because the version used with Sitecore is not compatible with the Lucene contrib libraries. This library which is maintained by contributors with special rights includes the highlight functionality (among other features such as spellchecker).
The solution to this problem is to head over to the Lucene.Net nuget package and grab the latest package from there (version 3.0.3 as of time of writing this post) and deploy the Lucene.Net DLL over the version that is included with Sitecore.

Sitecore Lucene boosting content of a specific template

$
0
0
When testing out my new Sitecore Lucene search engine, I noticed that for some key search terms, news articles were outranking the content pages (which should have been the highest matches). This was due to a high scoring based on keyword density in the news articles.

My solution for this was to boost the score for templates of the page type. This is achieved by including another predicate statement in my search logic that boosts templates of the page type (by 4.0f) along with an or statement with no boost for items not of those templates. This or statement is required otherwise items that weren't of the page type would not be included in the results - because we use and statements to join the outer predicates together.

As the above demo shows we join the main search predicate logic GetSearchPredicate with the template boosting predicate GetBoosterPredicate to create a single search predicate which will return the matching results.

As with any boosting/search logic it's important to thoroughly test to ensure the expected results are being returned across a range of search terms.

Sitecore Lucene getBestFragment returning null with search results highlighting

$
0
0
I have been using the getBestFragment method of the highlighter class available through the Lucene contirb library - to highlight the search term(s) on the search results page. It worked as expected in most cases, however I noticed that some results were returning as null. After looking through the documentation, I found out that:
Returns: highlighted text fragment or null if no terms found
So it is important to have handling in your code and not just expect that there will be an output from this method.

Sitecore Lucene highlighting the search term(s) in the search results

$
0
0
A popular feature with any good search engine, is the ability to highlight the search terms in the content shown for each search result. The main benefit with this is that the user is able to see the context of which the search term appears for each result returned, which allows them to choose the result which is most relevant to them.

This feature is not provided out of the box with Sitecore, but you can implement it if you update the Lucene DLL in your Sitecore web site.
A good example of this feature in action is available at Sitecore.Context.Item and it gives a good explanation of the process. However I came across an issue with this code, where if the content that you attempt to highlight the keywords in does not contain the search term(s), null will be returned. This would happen because other fields you are searching have the match and when you send through the content to highlight the getBestFragment method of Lucene will return null.

So here is my updated code which handles this:

Basically what it does is builds the Lucene query (with fuzzy logic) against a piece of content then uses the GetBestFragment method to place strong tags around the search terms if found in the content. It's also able to pickup plural terms (for example dogs when the search term is dog).

You can also modify the HTML tags to use for the highlighting if you have some custom CSS, and if relevant can update the Lucene search logic used to match that of your actual search code.

Sitecore Lucene highlight search term for best matching field

$
0
0
In a previous post I outlined an example of some code to highlight the search term in the search results using Lucene in Sitecore. This works well if you have one field you want to display on the search results page (the page description for example). But if you have multiple fields which could contain the search terms, it's a good idea to check them all to see which has the best match.
Luckily the GetBestTextFragments method in Lucene will provide a score for each piece of content you search against, which allows you to show the most relevant content.

When passing through the list of content to search against, you should send the most relevant fields first. Because if there are no matches the first piece of content is returned, and if 2 pieces of content have the same score, the one higher in the list will be sent back.

Sitecore WFFM email not sending on submit

$
0
0
A colleague came across an interesting error with a web form for marketers form that was not completing the send email save action. The mail settings were setup for the save action (/sitecore/system/Modules/Web Forms for Marketers/Settings/Actions/Save Actions/Send Email Message), the success message was showing, and there was no errors in the logs. However no email was generated.

After some digging into the form and it's fields, it was discovered that one of the Radio List form fields had some inconsistencies in it's data section on the corresponding Sitecore item.


The data source for the radio list field were for some reason stored under the parameters field when it should have been under the localized parameters field (as were the other radio lists on the form). We were unsure if content editors or a recent Sitecore version upgrade, caused this - but the fix worked.

Sitecore faceting search results based on top level site architecture

$
0
0
Many web sites will have a structure/site architecture which contains a number of top level categories of which each contains the relevant pages/content. When a user is searching that same web site, it can be a useful feature to have a facet which allows the user to narrow down the results by category (aka the top level architecture).

An example of this would be the following diagram:


It's a simple example, yet illustrates a site with three categories at the top level (about, products and services). The goal here would then to have a feature on the search results page that allows faceting on these categories (dynamically). So if a given search does not have results available under about, that category won't show and you also see a results count for available categories.


 This is actually quite simple to implement in Sitecore using Lucene and improves the user experience.

Indexing the category

To allow us to facet on this category, we will need to create a computed index field for Lucene. This field will store the top level category as text (rather than an item GUID).

In the computed index field code, we first check to see if the current item is a child of the Sitecore home node, if so we check if the template is a landing page (this is the template for top level items). If it is a landing page we return it's display name, as this is the category it falls under, if not we return null as the page is a system page and won't appear in search.

For all other items, we use a recursive function that navigates to the parent item until we hit a landing page and then return this. There is also a check here to ensure we don't go past the top level Sitecore item in the tree (if so a null is returned).

The end result is that each item in the index (non media items of course) will have a field called categoryfacet which contains the display name for the top level item the given page falls under.

You may notice that the computed index field is actually defined twice in the search index XML, this is because we need to use the LowerCaseKeywordAnalyzer to ensure the facet name is not stored untokenized (two words for the word New Zealand for example).

Getting the facets for a given search

Now that we have indexed the content for this facet, we need to get the available facets (with counts) for a given search.

In the code example we are using predicate logic to query our search index. We use the Page function on the search results to return the first X results but it's the following line which gets us the facets for all search results:
var searchFacets = searchContext.GetQueryable<SearchModel>().Where(searchPredicate).FacetOn(x => x.Type).FacetOn(x => x.Category).GetFacets();
We then get the facet named categoryfacet (as this is our index fields name) and loop through the results (which are not null), to get each facet and a count.

Faceting the search

Once we display the category facets on the front end to the user, they can then narrow the search results by selecting which category/categories their results should appear under. The search code can then include additional logic, to only show results which fall under the selected categories:

Now the users have an additional facet to further narrow their results, which empowers them to find the information they require faster and with a better experience.

Sitecore no lists are appearing in the list manager

$
0
0
If you have contact lists under the All Lists item in your content tree (/sitecore/system/List Manager/All Lists), but the lists are not showing in the list manager. This is likely an issue with a system index. To fix this issue:
  1. Log into the Sitecore desktop
  2. Open the control panel
  3. Select indexing manager
  4. Check the index named sitecore_list_index
  5. Click the rebuild button
  6. Load the list manager and your lists should now appear


Sitecore Lucene serach indexing sublayout or rendering data sources

$
0
0
Most Sitecore sites will use renderings or sublayouts in Sitecore as a method to display data, and more often than not the data is a custom template assigned via data source. It's this sort of logic which could then be extended to personalize or A/B test which content works best. The only trouble is ensuring that this content is able to be added to the Lucene search index against the correct item (that which the rendering/sublayout is assigned to).

The following computed index field can be used to index the templates used as datasources against renderings or sublayouts. You may also be interested in indexing child content against an item.

The code will need modification to work in your implementation, but how it works is as follows:
  1. Get all renderings/sublayouts for the item
  2. Filter down to renderings/sublayouts of a desired type (and which have a datasource)
  3. For each datasource, concatenate the desired field to a string and return this string for indexing
The result here is a field in the index which contains the datasource content ready for indexing. It can also be modified to index datasources of multiple types along with multiple fields.

Please note the code would need to be extended to handle rending/sublayout changes based on device along with personalization and A/B multivariate testing.
Viewing all 287 articles
Browse latest View live