Arquivo

Posts Tagged ‘train’

Training your ontolections at IBM Watson Explorer

Ontolection Trainer its a nice stuff that people who are using ontolections to Improve the Queries at Watson Explorer need to know. This utility help us to analyze text body and create Thesaurus files, that can be used to create ontolections. Also, you can extract key-phrases or Acronyms that you can use with query-modifier and at some ontolection.

If you don’t know NLQ capabilities at Watson Explorer (WEX) or don’t know what is a Ontolection, I recommend that you read my 2 posts:

https://jmmwrite.wordpress.com/2017/03/29/implementing-natural-language-query-with-ibm-watson-explorer/

https://jmmwrite.wordpress.com/2017/04/04/improving-your-queries-at-watson-explorer-using-ontolections/

Backing to Ontolection Trainer, at NLQ folder (/opt/IBM/dataexplorer/WEX-11_0_2/Engine/nlq in my case) from your WEX installation (since rel 11.0.1), you can find the jar file ontolectiontrainer.jar. Obviously you will need Java to run it. Make sure that the JAVA from WEX installation are configured at your path.

The utility have several arguments, but, the basics are:

  • the type of extraction
  • the corpus that you will use: The corpus are your text file. In my case, I have a file with 1000 Resumes that Ill use to train WEX (RESUME_TEXT_1000.TXT ).
  • the pear file: Pear file consist in the dictionary that the trainer will user to extract terms.
  • the output path: Where it will create the file.

I have used a file called blacklist containing the words that I want to be ignored.

You can have problems with CPU and Memory utilization, for this cases, there are parameters to setup the number of iterations that trainer will do.

To be very objective, here is my commands:

  • To extract the ontolection:

java -jar ontolectiontrainer.jar –trainOntolection –corpus RESUME_TEXT_1000.TXT –pear /opt/IBM/dataexplorer/WEX-11_0_2/Engine/data/pears/en.pear –blacklist blacklist –outputPath generatedOntolection_1000

  • To extract Acronyms:

java -jar ontolectiontrainer.jar –extractAcronyms –corpus RESUME_TEXT_1000.TXT –pear /opt/IBM/dataexplorer/WEX-11_0_2/Engine/data/pears/en.pear –blacklist blacklist –outputPath generatedOntolectionAcronyms_1000

  • To extract Phrases:

java -jar ontolectiontrainer.jar –learnPhrases –corpus RESUME_TEXT_1000.TXT –pear /opt/IBM/dataexplorer/WEX-11_0_2/Engine/data/pears/en.pear –blacklist blacklist –outputPath generatedOntolectionPhrases_1000

For more reference:

https://www.ibm.com/support/knowledgecenter/SS8NLW_11.0.2/com.ibm.watson.wex.fc.nlq.doc/c_wex_nlq_ot.html

Enjoy.