Finding bottlenecks at Watson Explorer queries

If you are having problem with some Watson Explorer query, an excellent way to find bottlenecks is to perform the query with Debug and Profile options enabled, it will help you to find where exactly you have problems.

Usually, when you perform a query at WEX, you call some URL like the following (in my case port is 7205, MY_COLLECTION can be a shard, for example MY_COLLECTION_1_1):

<SERVER>:<PORT>/search?collection=MY_COLLECTION&query-xml=<%3fxml version%3d”1.0″ encoding%3d”UTF-8″%3f><operator logic%3d”and”%2f>&num=1&max=1&binning-mode=normal&start=0&show-duplicates=1&doc-axl=<%3fxml version%3d”1.0″ encoding%3d”UTF-8″%3f><document key-hash%3d”{vse%3adoc-hash()}”%2f>&binning-config=<%3fxml version%3d”1.0″ encoding%3d”UTF-8″%3f><binning-sets><binning-set bs-id%3d”VENDOR” logic%3d”or” max-bins%3d”8″ select%3d”%24VENDOR”%2f><binning-set bs-id%3d”REVENUE_USD_FACET” logic%3d”or” max-bins%3d”11″ select%3d”%24REVENUE_USD_FACET”%2f>……………field%3d”SERVICE_AREA”><field-to name%3d”SERVICE_AREA”%2f><%2ffield-map><field-map field%3d”MAX_IGS_REV_OM_BRAND_CD”><field-to name%3d”MAX_IGS_REV_OM_BRAND_CD”%2f><%2ffield-map><field-map field%3d”EMAIL_SENT”><field-to name%3d”EMAIL_SENT”%2f><%2ffield-map><field-map field%3d”REVENUE_USD_FACET”><field-to name%3d”REVENUE_USD_FACET”%2f><%2ffield-map><field-map field%3d”REVENUE”><field-to name%3d”REVENUE”%2f><%2ffield-map><field-map field%3d”CLIENT_NAME”><field-to name%3d”CLIENT_NAME”%2f><%2ffield-map><%2ffield-mapping>&sort-keys=1&score=1&shingles=0&summarize=0&gen-key=0&cache-data=0&force-binning=1&output-acls=1

If you don’t have IDEA about HOW to get the query that your Application is doing, you can enable Debug at your collection. Go to WEX console, under Configuration -> Searching -> Debugging and enable Query Logging.

Selection_355

When saved, it will start to generate log in a file called queries.log, under you collection folder, some place like:

/opt/IBM/dataexplorer/WEX-11/Engine/data/search-collections/YYY/MY_COLLECTION/crawl1/

You can check it at WEX console, under your collection configuration, tab META, field Filebase.

Ok, now, if you call this URL from your browser, appending “&debug=1&profile=1″ to the URL, you will got a XML file. Download it and lets analyze. For our case, see this:

<xpath-performances>
<xpath-performance xpath=”($FIELD_X) = ‘GBS – No’ or ($FIELD_X) = ‘GBS – Yes'” slow-ms=”10295″ n-slow=”192000″ n-fast=”0″ n-direct=”0″ n-hashes=”1″ />
</xpath-performances>

THIS tell me that JUST in order to get the field FIELD_X, I’m having slow! (I’m my case it is because my Field its an Array)

So, probably I have a problem with this field, that can be a lot, for example:
1- Null values (see my other posts)
2- Its an array to index
3- Its a long text field
4- You have a lot of possible statements using it (OR, AND, WHERE, etc)

With this information, you can go to next step, that is find a way to change the field and make it work better.

Important: I tested this with Watson Explorer 9, 10 and 11. Running at Linux Machines.

Enjoy!

Watson Explorer performance decrease with null values

Working with Watson Explorer (WEX) we saw that the search performance decrease a lot when you have null values for some field/facet. (Our WEX release at this moment is 11, we run at Linux machines and our application was written in Java, using BigIndex to index and search. (Also have pure REST version of our application in test and the problem still happen)).

For example: lets suppose that you have a facet called VENDOR in an entity called Product. Suppose that you have 5 millions Products indexed and for some of then you have nulls, in my case 2 Millions have NULL values for VENDOR field.

In this case (and similar ones), we notice a performance decrease in searches. We start to see problems when the relation of null are greater than 20%.

In order to solve the problem we have 2 options:

1- One technique we’ve used for certain dimensions is to always ensure non-null values in the index — so at index time, we either coalesce in our SQL pulls from DB2 or do transformation after ingestion to replace nulls with some predefined value. In our case we use the literal string “(no value available)”.  It: a) ensures non-null values, b) is fairly meaningful to users, and c) gives users a way to actually filter on those records if needed.

2- For some FIELDS, we can not add another values, must leave null (Business reasons) and must not show null option to user select in the facet. In this case, in the moment of search we append boolean($FIELD) to the query. For example:

General example of a slow facet query:
$VENDOR=’IBM or $VENDOR=’PEPSICO’
Rewritten query that solve the slowness:
boolean($VENDOR) and ($VENDOR=’IBM or $VENDOR=’PEPSICO’)
This way, it ignore the null values when searching and it will be very fast.
Maybe this is not the final solution and for newer WEX versions it will handle better with null values, but, this was the solution for us.
Enjoy!

Slides – Palestra: Desmistificando Tecnologias

Como prometido, seguem os slides da minha palestra: Desmistificando Tecnologias.

Shell Script: Checar o resultado do Grep

É muito comum precisarmos rodar um comando X ou Y dependendo se acharmos algum valor em um arquivo qualquer ou em um resultado de um comando no Linux, Unix, etc. As pessoas usam massivamente o Grep para efetuar tal teste, então, um script muito simples e útil, é este:

#!/bin/bash
cat /etc/hosts | grep -q uol.com.br && echo “Achei uol em hosts” || echo “ERRO: Nao achei uol em hosts”

Neste exemplo, procuro uol.com.br em /etc/hosts, se encontrar, vou imprimir ACHEI, caso nao encontre, mostro um erro.

Enjoy!

Criando um Web Service Restful com Jersey

Criei um conjunto de slides bem simples e objetivo com o passo a passo para se criar um Web Service RESTful utilizando a API Jersey. Para quem quer conhecer esse mundo, acredito que vá ajudar bastante.

Tenha em mente que implementar um Web Service é relativamente simples, porém, preocupe-se sempre com a segurança e volumetria (carga, stress, usuários, etc), pois isso tende a derrubar muito servidor por ai!

Enjoy!

Introdução ao Apache JMeter

Muitas pessoas tem a necessidade de testar sua aplicação quanto a níveis de carga, stress, etc. O Apache JMeter é uma ótima ferramenta para tal finalidade. A apresentação abaixo visa introduzir as pessoas e mostra como criar um simples teste. Serve como um passo inicial. Enjoy:

Ver todos os valores de um Request com Java

Pergunta recorrente… Para não esquecer mais..

Enumeration enumeration = request.getParameterNames();
while (enumeration.hasMoreElements()) {
String parameterName = (String) enumeration.nextElement();
String value = request.getParameter(parameterName);
System.out.println(parameterName + “:” + value);
}

Categorias:JAVA Tags:, ,