spaCy

[python]

TOP ↑ ↓

spaCy

インストール

Anaconda NavigatorのEnvironmentsで追加できないときは、

しくみ

処理
手順
属性 Attributes
チャンク名詞句

関数

依存関係の可視化

Demo
References

日本語処理

GiNZA 日本の自然言語処理ライブラリー

解説
インストール
spaCyから使えるようにする

https://spacy.io/

インストール

TOP ↑ ↓

https://spacy.io/usage#quickstart

Windows のコンソールで

pip install spacy

Windowsのシェルで管理者として実行（右ボタン）

python -m spacy download en

Macの場合

pip3 install spacy
python3 -m spacy download en_core_web_sm

Pythonのシェルで

import spacy

それか、スクリプトの先頭に書いておく

import spacy
nlp = spacy.load('en_core_web_sm')
以下スクリプト

Anaconda NavigatorのEnvironmentsで追加できないときは、

TOP ↑ ↓

https://github.com/conda/conda/issues/9423

PowerShellで、

conda install -c conda-forge spacy
conda install -c conda-forge spacy-model-en_core_web_sm

https://anaconda.org/conda-forge/spacy

しくみ

TOP ↑ ↓

https://spacy.io/usage/spacy-101

言語処理に必要な「データベース」が作ってある。
その「データベース」に基づいて、言語処理（自然言語解析）を行う。

処理

TOP ↑ ↓

tokenizer: 単語に分けて
tagger : 品詞分析して
parser : 構文解析して
ner : エンティティ付与

手順

TOP ↑ ↓

「データベース（"en_core_web_sm"）」の読み込み

spacy.load("en_core_web_sm")

読み込んだものを例えば nlp という名前で保存して、それを使って処理できるようにする。

nlp = spacy.load("en_core_web_sm")

処理するデータを変数に入れて

sample = "Parents drive their children everywhere."

nlpで処理した結果を変数(sample_doc)に保存する

sample_doc = nlp(sample)

あとは、このsample_docにたいして、spaCyの関数を使って「処理」をする。
- 表面上は、sample_docを表示させると、普通に文が表示されるだけ。
  - その裏で、処理結果の情報も蓄えられている。

属性 Attributes

TOP ↑ ↓

言語情報が各tokenの属性として付与されている。
- 属性情報そのものは、プログラミング処理の都合上、数値（hash values）になっている

for token in sample_doc:
    print(token.text, token.pos, token.dep)

Parents 92 429
drive 100 8206900633647566924
their 95 440
children 92 416
everywhere 86 400
. 97 445

人の目にわかるようにするには、属性名の後ろにアンダーバーを付ける

for token in sample_doc:
    print(token.text, token.pos_, token.dep_)

Parents NOUN nsubj
drive VERB ROOT
their PRON poss
children NOUN dobj
everywhere ADV advmod
. PUNCT punct

チャンク名詞句

TOP ↑ ↓

nlpでの分析結果には、チャンクという単位もある

for chunk in sample_doc.noun_chunks:
    print(chunk.text)

Parents
their children

チャンクの役割（文の構成要素）と依存関係も分析できる

for chunk in sample_doc.noun_chunks:
    print(chunk.text, chunk.root.text, chunk.root.dep_, chunk.root.head.text)

Parents Parents nsubj drive
their children children dobj drive

chunk.text	chunk.root.text	chunk.root.dep_	chunk.root.head.text
Parents	Parents	nsubj	drive
their children	children	dobj	drive

関数

TOP ↑ ↓

依存関係の可視化

TOP ↑ ↓

https://spacy.io/usage/visualizers

import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("This is a sentence.")
displacy.serve(doc, style="dep")

Jupyter Notebookで実行する場合、最後のdisplacy.serveの代わりに、displacy.renderを使う。

sample = "Parents drive their children everywhere."
sample_doc = nlp(sample)
displacy.render(sample_doc, style="dep")

Demo

TOP ↑ ↓

https://explosion.ai/demos/

References

TOP ↑ ↓

https://githubja.com/explosion/spacy
https://ishitonton.hatenablog.com/entry/2018/11/24/004748
https://spacy.io/usage/spacy-101

日本語処理

TOP ↑ ↓

参考：https://qiita.com/wf-yamaday/items/3ffdcc15a5878b279d61

Anaconda Promptを管理者として実行

pip install ja-ginza

spaCyから使えるようにする

TOP ↑ ↓

import spacy
nlp: spacy.Language = spacy.load("ja_ginza")

spaCy

spaCy

インストール

Anaconda NavigatorのEnvironmentsで追加できないときは、

しくみ

処理

手順

属性 Attributes

チャンク名詞句

関数

依存関係の可視化

Demo

References

日本語処理

GiNZA 日本の自然言語処理ライブラリー

解説

インストール

spaCyから使えるようにする

https://sugiura-ken.org/wiki/

Menu

keyword

category

更新履歴