Watson Explorer performance decrease with null values
Working with Watson Explorer (WEX) we saw that the search performance decrease a lot when you have null values for some field/facet. (Our WEX release at this moment is 11, we run at Linux machines and our application was written in Java, using BigIndex to index and search. (Also have pure REST version of our application in test and the problem still happen)).
For example: lets suppose that you have a facet called VENDOR in an entity called Product. Suppose that you have 5 millions Products indexed and for some of then you have nulls, in my case 2 Millions have NULL values for VENDOR field.
In this case (and similar ones), we notice a performance decrease in searches. We start to see problems when the relation of null are greater than 20%.
In order to solve the problem we have 2 options:
1- One technique we’ve used for certain dimensions is to always ensure non-null values in the index — so at index time, we either coalesce in our SQL pulls from DB2 or do transformation after ingestion to replace nulls with some predefined value. In our case we use the literal string “(no value available)”. It: a) ensures non-null values, b) is fairly meaningful to users, and c) gives users a way to actually filter on those records if needed.
2- For some FIELDS, we can not add another values, must leave null (Business reasons) and must not show null option to user select in the facet. In this case, in the moment of search we append boolean($FIELD) to the query. For example: