Archive

Arquivo por Autor

Implementing Natural Language Query with IBM Watson Explorer

If you have a Watson Explorer (WEX) collection and want to be able to handle with Natural Query Language, you need to know that since WEX release 11.0.1, it have a native component to handle with this – its the query-modifier service.

Basically, this service parse the queries and apply some strategies, transforming the query in Keywords that WEX can understand and apply in the queries. Lets suppose that user search is:

“I’m looking for a Java Developer that know Struts and Spring and work from Brazil.”

The service will extract the keywords, based on configurations, and will search for:

Java Developer + Struts + spring + Brazil

We need to keep in mind that NLQ is different from Cognitive. This service will not understand questions, it will just extract terms. If you are looking for cognitive, you are looking for Watson (https://www.ibm.com/watson/developercloud/). With Watson we can understand the text and apply filter using location, range, etc. This also can be done using Machine Learning Models created at Watson Knowledge studio. But, Ill talk about this soon.

Backing to Query-Modifier, if you look at the folder nlq, inside Engine folder from your WEX installation, you will find the configuration stuff. Query Modifier work this way:

You make a request to WEX telling that you will use QM, the request pass through QM that apply the strategies, then, it forward the request to WEX Engine, who respond to you.

Here is a simple REST call that is using query-modifier:

http://MY_SERVER:9080/vivisimo/cgi-bin/velocity?v.app=api-rest&v.username=MY_USER&v.password=MY_PASSWORD&v.indent=true&v.function=query-search&fetch-timeout=30000&output-display-mode=limited&arena=MY_ARENA&output-contents-mode=list&syntax-operators=and+or+%28%29+CONTAINING+CONTENT+%25field%25%3A+%2B+NEAR+-+NOT+NOTCONTAINING+NOTWITHIN+OR0+quotes+regex+stem+THRU+BEFORE+FOLLOWEDBY+weight+wildcard+wildchar+WITHIN+WORDS+site+less-than+less-than-or-equal+greater-than+greater-than-or-equal+equal+range&sources=MY_COLLECTION+&output-contents=FIELD1+FIELD2&output-bold-contents=FIELD1&query=java+developer&query-condition-xpath=%24CONDITION_EXAMPLE=%27true%27&query-object=&num-per-source=20&start=0&num=20&query-modification-macros=enhance-query-with-querymodifier

See that the following make WEX use Query Modifier:

&query-modification-macros=enhance-query-with-querymodifier

In order to configure, go to <your WEX install folder>/Engine/nlq , in my case /opt/IBM/dataexplorer/WEX-11_0_1/Engine/nlq

Run “chmod +x querymodifier-install.sh”

Then “./querymodifier-install.sh” (as root)

You will see this kind of output:

Copying /opt/IBM/dataexplorer/WEX-11_0_1/Engine/examples/nlq/querymodifier/querymodifier-production.yml.defaults to /opt/IBM/dataexplorer/WEX-11_0_1/Engine/nlq/querymodifier-production.yml…

Configuring port to 9080…

Configuring path to vivisimo/cgi-bin/velocity…

Configuring PEARs path to /opt/IBM/dataexplorer/WEX-11_0_1/Engine/data/pears…

Copying querymodifier-2.1.9.jar to /opt/IBM/dataexplorer/WEX-11_0_1/Engine/nlq/querymodifier.jar…

Giving executable permissions to /opt/IBM/dataexplorer/WEX-11_0_1/Engine/nlq/querymodifier.jar…

Removing any existing /etc/init.d/querymodifier…

Linking /etc/init.d/querymodifier to …

Done.

Its important to change owner of the created files to WEX instance owner, in my case dataexp, so, as root: chown -R dataexp: <your WEX install folder>/Engine/nlq/

The configuration file is called querymodifier-production.yaml

In the first part of the file, you will see the WEX server setting, like IP, port and user.

After this you can setup the strategies, in my case I have this setup:

#The strategies to apply, by default, to each query. Can also be customized on a per-request basis (“workplan” GET parameter):

strategies:

default: PhraseWhitelistStrategy POSBasedNoiseWordRemoverStrategy DictionaryBasedNoiseWordRemoverStrategy DisjunctifyStrategy

The first strategy it the Disjunctify. It converts AND operators into OR operators, if the operator has more terms than a threshold. For example, if you set minimumRequiredTerms = 4, if user search for less terms than 4, query will be (A AND B AND C AND D), if you search for more than 4 terms, query will be (A OR B OR C OR D OR X OR …..).

The Dictionary-Based Noiseword Removal strategy, basically remove words from the query. For example, if you add BANANA to the list, then if user search for BANANA, it will be ignored. Usually we add to this section the common STOPWORDS, you can find several lists, I recommend use the google one. Another good list is here.

The Phrase Whitelist Strategy its interesting, you can have some external config files for some keyphrases, for example, lets suppose that you want that “Project Manager” be searched and “Project Manager”, and not “Project” and “Manager”, so, you need to add this word in the config file.

We have a secret here: you need to separate the words with <TAB> instead of space, else it will not work.

After configure your strategies, you just need to start the service (usually /etc/init.d/query-modifier start) and perform the REST Calls to test. You can follow the log at /var/log/querymodifier.log.

Every time that you change this setting, you need to recycle query modifier.

Your best friend to help with development and test, its the Api Runner interface from WEX engine. You can access this at:

http://YOUR_SERVER:9080/vivisimo/cgi-bin/velocity?v.app=api-run&v.function=query-parse-querymodifier

See the parameters there and ENJOY!

For more references: http://www.ibm.com/support/knowledgecenter/SS8NLW_11.0.1/com.ibm.watson.wex.fc.nlq.doc/c_wex_adding_nlq.html

Pequeno exemplo de Threads em Java

Compartilhando uma pequena solução utilizada em POCs (provas de conceito) quando preciso demonstrar alguma coisa utilizando Threads, segue um pequeno trecho que pode ser útil para alguém, e certamente para mim mesmo (quem escreve e compartilha – nunca esquece… ou quase isso).

Criei uma classe para ser minha gerenciadora de thread:

package br.com.ibm.threads;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;

public class ThreadExecutorMaganer {
private ExecutorService executor;
private long timeout;

private List<Callable<String>> callables;

public ThreadExecutorMaganer(int maxThreads, long timeoutInSeconds) {
this.executor = Executors.newFixedThreadPool(maxThreads);
this.callables = new ArrayList<Callable<String>>();
this.timeout = timeoutInSeconds;
}

public void add(Callable<String> callable) {
this.callables.add(callable);
}

public List<Future<String>> start() {
List<Future<String>> futures = null;

try {
futures = executor.invokeAll(callables, timeout, TimeUnit.SECONDS);

executor.shutdown();
} catch (InterruptedException e) {
e.printStackTrace();

executor.shutdownNow();
}

return futures;
}

}

Esta é minha Thread em si (veja que tem um IF la com um sleep só pra provocar erro e testar), ela é do tipo Callable;

package br.com.ibm.threads;

import java.util.concurrent.Callable;

public class CallableTask implements Callable<String>{

private final String tarefa;

public CallableTask(String tarefa) {
this.tarefa = tarefa;
}

@Override
public String call() throws Exception {
System.out.println("Inside call-->" + tarefa);
if (tarefa.equals("C")){
System.out.println("Sleeping 6 seconds");
Thread.sleep(6000);
}
return tarefa;
}

}

E esta é minha classe principal que invoca o circo:

package br.com.ibm.threads;

import java.util.List;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;

import com.ibm.services.tools.wexws.utils.ThreadExecutorMaganer;

public class CallThreadExecutorMaganer {

public static void main(String[] args) {
String[] restfulUrls = "A,B,C,D,E".split(",");
ThreadExecutorMaganer tem = new ThreadExecutorMaganer(100, 5);

for (String url : restfulUrls) {
tem.add(new CallableTask(url));
}
List<Future<String>> futureResponses = tem.start();

for (Future<String> futureResponse : futureResponses) {
try {
String resp = futureResponse.get();
System.out.println(resp);

}catch (ExecutionException ex){
System.out.println("ExecutionException while getting WEX response="+ex.getCause().getMessage());
}catch (Exception e) {
System.out.println("Fail to query WEX server:"+e.getMessage());
}
}

}

}

Antes que a patrulha critique: É um SIMPLES exemplo, não deve ser utilizado profissionalmente sem uma análise e adequação para seu caso, como tipagens adequadas, tratamento de erros, etc…

Enjoy!

Categorias:JAVA Tags:, , , ,

Reading XML with Java – Quick and simple example

I always need some code to read XML with Java. This is a place holder to me, but, maybe can be useful to other people.

Here is my XML example:

<operator logic="and">
<operator logic="or">
<term field="query" input-type="user" processing="strict" str="は" />
<term field="query" input-type="user" phrase="phrase" processing="strict" str="銀行業務" weight="1" />
<term field="query" input-type="user" processing="strict" str="持つ" />
<term field="query" input-type="user" phrase="phrase" processing="strict" str="java開発者" weight="1.69" />
<term field="query" input-type="user" processing="strict" str="探して" />
</operator>
</operator>

Here is my Java code:

import org.w3c.dom.*;
import org.xml.sax.InputSource;

import javax.xml.parsers.*;
import java.io.*;

public class ParseXML {

	public static void main(String[] args) {
		String xml = "<operator logic=\"or\"><term field=\"query\" input-type=\"user\" processing=\"strict\" str=\"は\" /><term field=\"query\" input-type=\"user\" phrase=\"phrase\" processing=\"strict\" str=\"銀行業務\" weight=\"1\" /><term field=\"query\" input-type=\"user\" processing=\"strict\" str=\"持つ\" /><term field=\"query\" input-type=\"user\" phrase=\"phrase\" processing=\"strict\" str=\"java開発者\" weight=\"1.69\" /><term field=\"query\" input-type=\"user\" processing=\"strict\" str=\"探して\" /></operator>";
		try {	
			Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new InputSource(new StringReader(xml)));
			doc.getDocumentElement().normalize();
			
			System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
			NodeList nList = doc.getElementsByTagName("term");
			System.out.println("----------------------------");
			for (int temp = 0; temp < nList.getLength(); temp++) {
				Node nNode = nList.item(temp);
				System.out.println("\nCurrent Element :" + nNode.getNodeName());
				if (nNode.getNodeType() == Node.ELEMENT_NODE) {
					Element eElement = (Element) nNode;
					System.out.println("processing : " + eElement.getAttribute("processing"));
					System.out.println("str : " + eElement.getAttribute("str"));
	            }
	         }
	      } catch (Exception e) {
	         e.printStackTrace();
	      }
	}
}
Categorias:JAVA Tags:, , , , , , ,

Monitoring top 10 Linux CPU consuming processes

I always need to check the processes that are consuming CPU at my machine, using ps its easy. With the following command, you can write a script and then send email, take action, etc.

ps aux –sort=-pcpu | head -n 10

You can play with TOP also, but I prefer PS for this case.

top -b -c -n 1 | head -n 17 | tail -n 10

Enjoy!

Categorias:AIX, Linux Tags:, , , , ,

Utilizando melhor o comando TOP no Linux/Unix/Solaris

Utilizo muito o comando TOP (dentre outros) para medir a “saúde” de nossos servidores. Duas opções que gosto muito são o “1” e o “I” (maiusculo).

Apertando 1, o TOP mostra todos os cores de seu processador, o que ajuda a ver sua utilização como um todo.

selection_070

Utilizando o I (letra í maiuscula), você desabilita o “Irix mode”, apertando novamente você o habilita. Basicamente desabilitando o Irix mode, você mostra a utilização da CPU levando em conta sua capacidade real em %. Dando um exemplo, no Irix mode que é o padrão, você pode observar que alguns processos podem consumir mais que 100% de utilização. Isso acontece pois nesse modo ele considera o total de cores que você tem * 100%. Desabilitando o mesmo, o TOP divide a utilização do processo pelo numero todal de CPUs que você tem, levando a um numero mais realista e que não vai passar de 100%. As imagens abaixo mostram primeiro o TOP com a opção padrão (Irix Mode) e logo após, desabilitando o Irix Mode, note que os processos marcados tiveram sua utilização de CPU “diminuída”, porém, não é o caso, ele simplesmente está mostrando a utilização da CPU como um todo.

selection_071selection_072

Enjoy!

Discovering Red Hat version using command line

If you need to check your Red Hat version from command line, here is 2 simple ways:

[root@dstvm601g10 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.7 (Santiago)

or

[root@dstvm601g10 ~]# lsb_release -rd
Description:    Red Hat Enterprise Linux Server release 6.7 (Santiago)
Release:    6.7

I always need this, and always forget!

Enjoy!

Categorias:Linux Tags:, , ,

Solução para o teste EQUI no Codility

Muitas pessoas estão fazendo testes / provas online no Codility como parte de entrevistas de emprego, principalmente, para vagas no exterior. Eu recomendo fazer os tutorias antes de tentar fazer qualquer teste oficial lá, pois somente assim, você vai ver como realmente é a prova, como rodar, testar, e dessa forma, sentir-se confortável com a mesma.

Eu efetuei o teste EQUI, que tem a solução em C aqui. Resolvi em JAVA. Recomendo tentar fazer antes de olhar a solução. Ai vai:


package br.com.ibm.tests;

public class Solution {

public static void main(String[] args) throws Exception {
Solution s = new Solution();
// int[] A = {2, 2, 2, 2, 2, 2, 2, 2, 2};
int[] A = { -7, 1, 5, 2, -4, 3, 0 };
System.out.print(s.solution(A));
}

public int solution(int[] A) {
int sub = 0;
int sum = 0;
int left = 0;
int right = 0;

if (A.length != -1) {
for (int i = 0; i < A.length; i++) {
sum = sum + A[i];
}
}

for (int i = 0; i < A.length; i++) {
sub = sub + A[i];
left = sub - A[i];
right = sum - sub;

if (left == right) {
return i;
}
}
return -1;
}

}

Uma solução mais performática e menos propensa a estouros utilizando BigInteger é:

package br.com.ibm.tests;

import java.math.BigInteger;

public class Solution {

public static void main(String[] args) throws Exception {
Solution s = new Solution();
// int[] A = {2, 2, 2, 2, 2, 2, 2, 2, 2};
int[] A = { -7, 1, 5, 2, -4, 3, 0 };
System.out.print(s.solution(A));
}

public int solution(int[] A) {
BigInteger sub = BigInteger.ZERO;
BigInteger sum = BigInteger.ZERO;
BigInteger left = BigInteger.ZERO;
BigInteger right = BigInteger.ZERO;

if (A.length != -1) {
for (int i = 0; i < A.length; i++) {
sum = sum.add(BigInteger.valueOf(A[i]));
}
}

for (int i = 0; i < A.length; i++) {
sub = sub.add(BigInteger.valueOf(A[i]));
left = sub.subtract(BigInteger.valueOf(A[i]));
right = sum.subtract(sub);

if (left.equals(right)) {
return i;
}
}
return -1;
}

}

Boa sorte.

Categorias:JAVA Tags:, , , ,