Elasticsearch で Sudachi 使う

build 用に maven インスコ

$ brew install maven

brew でいれた最新版(6.2.4) だと sudachi がうまく入れれなかったので 6.2.0  で試す

$ curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.2.0.tar.gz
$ tar -xvf elasticsearch-6.2.0.tar.gz
$ bin/elasticsearch -V
Version: 6.2.0, Build: 37cdac1/2018-02-01T17:31:12.527918Z, JVM: 1.8.0_74

sudachi インスコ

$ git clone git@github.com:WorksApplications/elasticsearch-sudachi.git
$ cd elasticsearch-sudachi
$ mvn package
$ ./bin/elasticsearch-plugin install file:///path/to/elasticsearch-sudachi/target/releases/analysis-sudachi-elasticsearch6.2-1.1.0-SNAPSHOT.zip
$ bin/elasticsearch-plugin list
analysis-sudachi

辞書

$ wget https://oss.sonatype.org/content/repositories/snapshots/com/worksap/nlp/sudachi/0.1.1-SNAPSHOT/sudachi-0.1.1-20180419.085027-26-dictionary-core.tar.bz2
$ tar xvf sudachi-0.1.1-20180419.085027-26-dictionary-core.tar.bz2
$ mkdir config/sudachi_tokenizer
$ mv system_core.dic ./config/sudachi_tokenizer/system_core.dic
$ cat sudachi.json
{
  "settings": {
    "index": {
      "analysis": {
        "tokenizer": {
          "sudachi_tokenizer": {
            "type": "sudachi_tokenizer",
            "mode": "search",
            "discard_punctuation": true
          }
        },
        "analyzer": {
          "sudachi_analyzer": {
            "filter": [
            ],
            "tokenizer": "sudachi_tokenizer",
            "type": "custom"
          }
        }
      }
    }
  }
}
$ ls config/sudachi_tokenizer
system_core.dic

インデックス作成

$ curl -X PUT -H "Content-Type: application/json" http://localhost:9200/sudachi_test/ -d @sudachi.json
{"acknowledged":true,"shards_acknowledged":true,"index":"sudachi_test"}

kibana で確認

f:id:whitech0c0late:20180513234420p:plain

動いたところまで。 いろいろ設定してやってみよう

REF