Início > Cognitive, Watson, watson explorer, wex > Improving your queries at Watson Explorer using Ontolections

Improving your queries at Watson Explorer using Ontolections

A good approach to enrich your queries at Watson Explorer its use Ontolections. A ontolection provides a set of related terms that are specific to the domain of an application or enterprise, and identifies the relationships between them. Basically, Wex Engine query the ontolection with the query terms, then, add this terms to the final query, and then, query your original collection.

For example: lets suppose that you have a synonym configured as: ALM → Application Lyfecycle Management. If user search for ALM, WEX engine will search also for Application Lyfecycle Management.

A ontolection can have also more than synonyms, we can have related terms, rewrite, spelling, etc. I recommend start with synonyms, then, improve your ontolection.

The first step to start playing with ontolections is create a Thesaurus file. This file will be used to create the ontolection. You can generate a thesaurus from several ways. The most common is create your own XML file manually, but, you can use something called Ontolection Trainer (Ill show how to use in the next posts).

For my example, I have created the following ontolection, it is called practitioner2.xml:

<?xml version="1.0" encoding="utf-8" ?>
<thesaurus name="practitioner1" language="english" domain="general">
<word name=".NET">
<synonym>.Net development</synonym>
<synonym>.Net<span style="font-family: Droid Sans Fallback;"><span style="font-size: small;"><span lang="zh-CN">開発</span></span></span></synonym>
<synonym>.NET<span style="font-family: Droid Sans Fallback;"><span style="font-size: small;"><span lang="zh-CN">開発</span></span></span></synonym>
</word>
<word name="Virtual Private Network">
<synonym>VPN</synonym>
<synonym><span style="font-family: Droid Sans Fallback;"><span style="font-size: small;"><span lang="zh-CN">バーチャル プライベート ネットワーク</span></span></span></synonym>
</word>
<word name="DNS">
<synonym>Domain Name Service</synonym>
<synonym><span style="font-family: Droid Sans Fallback;"><span style="font-size: small;"><span lang="zh-CN">ドメインネーム・サービス</span></span></span></synonym>
<synonym><span style="font-family: Droid Sans Fallback;"><span style="font-size: small;"><span lang="zh-CN">ドメインネームサービス</span></span></span></synonym>
</word>
</thesaurus>

Using this as an example, if user search for DNS, Ill also search for Domain Name Services.

After create your thesaurus file, you need to create a new collection at your WEX Server. Select generic-ontolection at Copy defaults from:

Then, add a new seed, pointing to the thesaurus file, in my case, I select FILES and add /opt/IBM/dataexplorer/WEX-11_0_1/Engine/nlq/practitioner2.xml

Go to collection overview → Configuration → Converting, click edit and set the values as:

Finally, go to Overview and Click Start at Live Status (you can also test before start). You will see Craw and Index running, and Documents being added.

Thats it, your ontolection is ready to use. You can test at your application and at WEX query utility. Here is a simple REST call using my ontolection, see that I’m searching for DNS and automatically WEX will search also for Domain Name Service | ドメインネーム・サービス | ドメインネームサービス.

http://MY_SERVER:9080/vivisimo/cgi-bin/velocity?v.app=api-rest&v.username=MY_USER&v.password=MY_PASSWORD&v.indent=true&v.function=query-search&fetch-timeout=30000&output-display-mode=limited&arena=MY_ARENA&output-contents-mode=list&syntax-operators=and or () CONTAINING CONTENT %field%: + NEAR – NOT NOTCONTAINING NOTWITHIN OR0 quotes regex stem THRU BEFORE FOLLOWEDBY weight wildcard wildchar WITHIN WORDS site less-than less-than-or-equal greater-than greater-than-or-equal equal range&sources=MY_COLLECTION &output-contents=FIELD1 FIELD2&output-bold-contents=FIELD1 FIELD2&query=dns&query-condition-xpath=$FIELD3=’XXXXX’&query-object=&num-per-source=20&start=0&num=20&query-modification-macros=query-modification-expansion&extra-xml=<declare name=”query-expansion.enabled” /><set-var name=”query-expansion.enabled”>true</set-var><declare name=”query-expansion.user-profile” /><set-var name=”query-expansion.user-profile”>on</set-var><declare name=”query-expansion.ontolections” /><set-var name=”query-expansion.ontolections”>onto_practitioner</set-var><declare name=”query-expansion.max-terms-per-type” /><set-var name=”query-expansion.max-terms-per-type”>3</set-var><declare name=”query-expansion.automatic” /><set-var name=”query-expansion.automatic”>synonym:0.8,alternative:0.8,spelling:0.8,narrower:0.5,translation:0.5,broader:0.5,related:0.5</set-var><declare name=”query-expansion.suggestion” /><set-var name=”query-expansion.suggestion”></set-var><declare name=”query-expansion.query-match-type” /><set-var name=”query-expansion.query-match-type”>terms</set-var><declare name=”query-expansion.conceptual-search-similarity-threshold” /><set-var name=”query-expansion.conceptual-search-similarity-threshold”>0.1</set-var><declare name=”query-expansion.conceptual-search-metric” /><set-var name=”query-expansion.conceptual-search-metric”>euclidean-dot-product</set-var><declare name=”query-expansion.conceptual-search-candidates-max” /><set-var name=”query-expansion.conceptual-search-candidates-max”>euclidean-dot-product</set-var><declare name=”query-expansion.conceptual-search-sources” /><set-var name=”query-expansion.conceptual-search-sources”>MY_COLLECTION </set-var><declare name=”query-expansion.stem-expansions” /><set-var name=”query-expansion.stem-expansions”>false</set-var><declare name=”query-expansion.stemming-dictionary” /><set-var name=”query-expansion.stemming-dictionary”>english/wildcard.dict</set-var><declare name=”reporting.track-spelling” /><set-var name=”reporting.track-spelling”>false</set-var><declare name=”meta.stem-expand-stemmer” /><set-var name=”meta.stem-expand-stemmer”>delanguage+english+depluralize</set-var><declare name=”query-expansion.stemming-weight” /><set-var name=”query-expansion.stemming-weight”>0.8</set-var>

See that this parameter turn on the ontolection:

&query-modification-macros=query-modification-expansion

And at &extra-xml I have some specific settings.

Special attention to where I use onto_practitioner, use your ontolection name.

Also, pay attention that if you have more than one server or shards, settings can change.

Calling this REST API, analysing results you will see some output like:

<op-exp logic=”or” middle-string=”OR” name=”OR” precedence=”2″><term field=”query” input-type=”user” processing=”strict” str=”dns”/><term field=”query” relation=”synonym” str=”Domain Name Service”/><term field=”query” relation=”synonym” str=”ドメインネーム・サービス”/><term field=”query” relation=”synonym” str=”ドメインネームサービス”/></op-exp>

If you would like to test at WEX query utility, you should edit the project query-meta and add the following flags:

Enable query stopword removal → true

Query expansion match type Terms

Enable semantic expansion true

And set the configurations like the following:


Thats it. Enjoy!

For more information about ontolection: http://www.ibm.com/support/knowledgecenter/en/SS8NLW_11.0.1/com.ibm.swg.im.infosphere.dataexpl.engine.tut.cs.doc/c_csearch-ontolection-tut.html

Anúncios
  1. Nenhum comentário ainda.
  1. abril 11, 2017 às 9:09 am

Deixe um comentário

Preencha os seus dados abaixo ou clique em um ícone para log in:

Logotipo do WordPress.com

Você está comentando utilizando sua conta WordPress.com. Sair / Alterar )

Imagem do Twitter

Você está comentando utilizando sua conta Twitter. Sair / Alterar )

Foto do Facebook

Você está comentando utilizando sua conta Facebook. Sair / Alterar )

Foto do Google+

Você está comentando utilizando sua conta Google+. Sair / Alterar )

Conectando a %s

%d blogueiros gostam disto: