Arquivo

Posts Tagged ‘how-to’

Watson Explorer Content Analytics installation – Steps

I installed Watson Explorer Content Analytics several times in the last months, then, I created a simple Step by step that can be useful to who is starting to work with this. I’m sharing here, please, fell free to add any comment and help me to improve this how-to. Hope that can be useful to you if you are here :D.

Pre-Installation Checklist / Pre-requisites

The following steps should be performed to prepare the server for software installation. This step by step cover the installation in a Red Hat 7 machine. The following 32-bit libraries are required on Linux x86-64 systems:

  • libstdc++33 (compat-libstdc++-33.i686 / libstdc++33-32bit)
  • libstdc++ (libstdc++.i686 / libstdc++6-32bit)
  • zlib (zlib.i686 / libz1-32bit / zlib-32bit)
  • libXext (libXext.i686 / libXext6-32bit / xorg-x11-libXext-32bit)
  • libXft (libXft.i686 / libXft2-32bit / xorg-x11-libs-32bit)
  • libXi (libXi.i686 / libXi6-32bit / xorg-x11-libs-32bit)
  • libXp (libXp.i686 / libXp6-32bit / xorg-x11-libXp-32bit)
  • libXtst (libXtst.i686 / libXtst6-32bit / xorg-x11-libs-32bit)
  • libXm (if thumbnails are needed)
  • libXt (if thumbnails are needed)
  • java-1.8.0-ibm-1.8.0.2.10-1jpp.7.el6.x86_64
  • Httpd

The commands to install the libraries are:

  1. yum install -y libstdc++.i686 libXext.i686 libXft.i686 libXi.i686 libXp.i686 libXtst.i686 libXt.i68
  2. yum install -y zlib.i686 httpd
  3. yum -y install apr apr-util boost-filesystem boost-iostreams boost-program-options boost-regex boost-serialization

If not assign the tmp to a directory that has extra space with the commands:

  1. mkdir /opt/IBM/dataexplorer/tmp
  2. export IATEMPDIR=/opt/IBM/dataexplorer/tmp
  • Install a web server to run the Foundational Components -> On Linux install the latest version of Apache (not required for AC)

  • Ensure that the LINUX subscription libraries are available for the yum command

  • DB2 connect client configured

  • Install Java – java-1.8.0-ibm-1.8.0.2.10-1jpp.7.el6.x86_64 (you can get the package from oracle also, make sure to add Java to the server path)

Users and Folders creation for WEX AE (must be perform with root level access)

  1. Add installation account (esadmin) for server install.
    1. useradd esadmin
    2. create a password for the esadmin user (usually esadmin12)
  1. Create Watson Directories
    1. mkdir -p /watson/archives
  1. Copy the installation tarball onto the server into.
  1. Untar the tarball that contains installation files.mkdir /opt/IBM/dataexplorer/WATSON_EXPR_ADV_EAC_V11.0.2_LNX_M
    cd /opt/IBM/dataexplorer /WATSON_EXPR_ADV_EAC_V11.0.2_LNX_M
    tar -xvf ./watson/archives/WATSON_EXPR_ADV_EAC_V11.0.2_LNX_M.tar

WEX AE Installation Details:

  1. (Note must be performed with root level access) Install the software. (Note: Responses for Master node when selecting server type)
    1. cd /opt/IBM/dataexplorer/WATSON_EXPR_ADV_EAC_V11.0.2_LNX_M/
    2. ./install.bin -i console
  • Installation Responses: (Also see installation screenshots below)
    • Choose Locate : English
    • PRESS TO CONTINUE:
    • Press Enter to continue view license agreement or 1 to accept: 1 (accept)
    • Enter fully qualified hostname of this server: (Hostname will be detected, hit enter. see example below)
    • Administrator user name: (Default: esadmin): (Hit enter to take default)
    • Create user account Enter the Number for your choice..: 2 ( the administrator user account ‘esadmin’ already exists)
    • Enter Administrator Account password: your_esadmin_password that you created for the user earlier (esadmin12)
    • Select a Server Type: (Response depends on type of server being installed on)
      • (1 –Master all on one server )

If installation is consists of a single server.

      • (2 -Master Distributed Server )

If installation consists of multiple servers, and the current server is the Master server

      • (3 -Additional Server )

If the installation is for a server (ie.. Search Server) that will connect to a Master

Select: 1

    • Do you want to install advanced options: 1 (YES)
    • Enter absolute path of the data directory ES_NODE_ROOT (default: /opt/IBM/dataexplorer/esdata ) /opt/IBM/dataexplorer/esdata
    • Enter the absolute path to install : /opt/IBM/dataexplorer/es
    • Common Communication Layer Port (default: 6002) Hit enter for the default
    • Search Server ESSearchServer Port (default: 8394) Hit enter for the default
    • Web Application Server (default embedded) 1 –embedded
    • Enterprise Search Application and Content Analytics Miner Port (default 8393) Hit enter for the default
    • THE INSTALLATION WILL BEGIN AND RUN ABOUT 5-10 MINUTES

    • This completes the WAC (Watson Content Analytic) installation
    • You can start esadmin to test:
      • /opt/IBM/es/bin/esadmin system startall
    • Go to admin console and check:
    • http://:8390/ESAdmin/

Install SIRE (Statistical Information and Relation Extraction) module

1. Stop esadmin – /opt/IBM/es/bin/esadmin system stopall

2. Iinstall the pre-req libraries:

yum -y install apr apr-util boost-filesystem boost-iostreams boost-program-options boost-regex boost-serialization

  1. go to /opt/IBM/dataexplorer/es/bin/sire and run

rpm -ivh sire-20161109-1.x86_64.rpm

  1. Stop esadmin – /opt/IBM/es/bin/esadmin system stopall
  2. Start esadmin – /opt/IBM/es/bin/esadmin system startall
  3. Installation is complete

Uninstall Notes

To Uninstall the WAC application (Note must be performed with root access), switch to root level account and perform the command:

/opt/IBM/dataexplorer/es/uninstall_11.0.1.0./uninstall_11.0.1.0 -i console

WEX AE System Start-up / Shutdown

To stop the services:

  1. /opt/IBM/es/bin/esadmin system stopall

To start the services:

  1. opt/IBM/es/bin/startccl.sh -bg
  2. /opt/IBM/es/bin/esadmin system start
  3. /opt/IBM/es/bin/esadmin system startall

 

Anúncios

Creating a collection in IBM Watson Explorer crawling from Database

Perform a database Craw from a collection its something very common. With IBM Watson Explorer this is something very easy to do. In my example, Ill create a collection and will perform a simple query in a IBM DB2 database, but, the steps will be very similar for other databases, you just need to keep in mind that you will need the correct driver.

1- Put the driver in place:

Get the database jdbc and put in the correct folder, usually it is something like /opt/IBM/dataexplorer/WEX-11_0_2/Engine/lib/java/database/.

2- Create the collection copying defaults from default:

Selection_188.jpg

3- Add a new seed, this is where your collection will get data:

Selection_189.jpg

4- Choose Database:

Selection_190.jpg

5- Enter your database settings and the query that will be performed:

Selection_191.jpg

6- Its done, now you can test:

Selection_192.jpg

7- This can take a while depending on your query and connection, but when it finish, it will show some rows that the query returned in the following format. To see some row data, click Crawler XML:

Selection_193.jpg

8- Here is your data:

Selection_194

9- Now that we see that its working, you can start your craw. This step will feed your collection and can take a good time depending on your amount of data:

Selection_195

10-You must see Craw activity:

Selection_196

11- You can query your collection now to test, just enter your term and click search in the left options:

Selection_197

12- You will see something like this:

Selection_198

Thats it, you have created a collection that get data from Database!

Training your ontolections at IBM Watson Explorer

Ontolection Trainer its a nice stuff that people who are using ontolections to Improve the Queries at Watson Explorer need to know. This utility help us to analyze text body and create Thesaurus files, that can be used to create ontolections. Also, you can extract key-phrases or Acronyms that you can use with query-modifier and at some ontolection.

If you don’t know NLQ capabilities at Watson Explorer (WEX) or don’t know what is a Ontolection, I recommend that you read my 2 posts:

https://jmmwrite.wordpress.com/2017/03/29/implementing-natural-language-query-with-ibm-watson-explorer/

https://jmmwrite.wordpress.com/2017/04/04/improving-your-queries-at-watson-explorer-using-ontolections/

Backing to Ontolection Trainer, at NLQ folder (/opt/IBM/dataexplorer/WEX-11_0_2/Engine/nlq in my case) from your WEX installation (since rel 11.0.1), you can find the jar file ontolectiontrainer.jar. Obviously you will need Java to run it. Make sure that the JAVA from WEX installation are configured at your path.

The utility have several arguments, but, the basics are:

  • the type of extraction
  • the corpus that you will use: The corpus are your text file. In my case, I have a file with 1000 Resumes that Ill use to train WEX (RESUME_TEXT_1000.TXT ).
  • the pear file: Pear file consist in the dictionary that the trainer will user to extract terms.
  • the output path: Where it will create the file.

I have used a file called blacklist containing the words that I want to be ignored.

You can have problems with CPU and Memory utilization, for this cases, there are parameters to setup the number of iterations that trainer will do.

To be very objective, here is my commands:

  • To extract the ontolection:

java -jar ontolectiontrainer.jar –trainOntolection –corpus RESUME_TEXT_1000.TXT –pear /opt/IBM/dataexplorer/WEX-11_0_2/Engine/data/pears/en.pear –blacklist blacklist –outputPath generatedOntolection_1000

  • To extract Acronyms:

java -jar ontolectiontrainer.jar –extractAcronyms –corpus RESUME_TEXT_1000.TXT –pear /opt/IBM/dataexplorer/WEX-11_0_2/Engine/data/pears/en.pear –blacklist blacklist –outputPath generatedOntolectionAcronyms_1000

  • To extract Phrases:

java -jar ontolectiontrainer.jar –learnPhrases –corpus RESUME_TEXT_1000.TXT –pear /opt/IBM/dataexplorer/WEX-11_0_2/Engine/data/pears/en.pear –blacklist blacklist –outputPath generatedOntolectionPhrases_1000

For more reference:

https://www.ibm.com/support/knowledgecenter/SS8NLW_11.0.2/com.ibm.watson.wex.fc.nlq.doc/c_wex_nlq_ot.html

Enjoy.

Introdução a Big Data e Apache Solr

Para quem está interessado em Big Data e além disso quer algo prático utilizando Apache Solr, disponibilizo um conjunto de slides que podem ser utilizados por Estudantes, Professores e profissionais. Usem e distribuam a vontade!

Enabling Wildcard in a collection at Watson Explorer

Eventually we need to enable search using wildcards like * for a collection at Watson Explorer. For sure this can make our queries consume more CPU and Memory, you can think comparing a query that perform a “select … where field = ‘XXX'” against a query that perform a “select …. where field like ‘*XXX'” (pseudo code). What will be faster? So, think carefully before enable this!

To enable, go to your collection configuration -> Indexing -> Term expansion support (4)  , and check Generate Dictionaries.

Selection_356

For more information, check here and here.

Enjoy!

 

Conhecendo o Watson Analytics

Featured imageO Watson Analytics é uma ferramenta que nos permite efetuar análises de grandes massas de dados (big data). Em linhas gerais: você define as fontes de dados, o mesmo efetua uma varredura e análise contextual, e prepara seus dados para serem estudados. Importante citar que você pode ter N fontes de dados, dos mais distintos (planilhas, bancos, urls, etc).

Qualquer pessoa pode brincar com a ferramenta, que está disponível em http://watsonanalytics.com/

Eu criei um vídeo bem simples, em português, mostrando como subir uma planilha e efetuar um simples estudo. O mesmo pode ser visto logo abaixo. O Watson tem uma vasta documentação e inúmeros vídeos na Internet. Vale a pena ver.

Enjoy!

Recompilando o Kernel do Ubuntu

A muito tempo eu não recompilava um Kernel, e para falar a verdade, acabei até esquecendo algumas etapas (no Ubuntu)…

Fui fazer isso hoje e refresquei a memória lendo um ótimo post do Alexandro Silva, o post pode ser encontrado aqui: http://penguim.wordpress.com/2006/11/14/compilando-o-kernel-no-ubuntu-linux/

Enjoy!