https://github.com/trinker/tagger
* openNLPの品詞タグ付与
{{pre
tagger wraps the NLP and openNLP packages for easier part of speech tagging. tagger uses the openNLP annotator to compute "Penn Treebank parse annotations using the Apache OpenNLP chunking parser for English."
}}

!!必要なパッケージをインストール
{{pre
install.packages("pacman")
pacman::p_load_gh(c(
  "trinker/termco",
  "trinker/coreNLPsetup",
  "trinker/tagger"
))
library(dplyr)
library(tagger)
install.packages('rJava')
library(rJava)
}}

!!タグ一覧
!penn_tags()
{{pre
   Tag                                 Description
1    $                                      dollar
2   ``                      opening quotation mark
3   ''                      closing quotation mark
4    (                       opening parenthesis
5    )                       closing parenthesis
6    ,                                       comma
7    -                                        dash
8    .                          sentence terminator
9    :                            colon or ellipsis
10  CC                   conjunction, coordinating
11  CD                          numeral, cardinal
12  DT                                  determiner
13  EX                         existential there
14  FW                              foreign word
15  IN     preposition or conjunction, subordinating
16  JJ            adjective or numeral, ordinal
17 JJR                      adjective, comparative
18 JJS                      adjective, superlative
19  LS                          list item marker
20  MD                           modal auxiliary
21  NN           noun, common, singular or mass
22 NNP                      noun, proper, singular
23NNPS                        noun, proper, plural
24 NNS                        noun, common, plural
25 PDT                            pre-determiner
26 POS                            genitive marker
27 PRP                        pronoun, personal
28PRP$                       pronoun, possessive
29  RB                                      adverb
30 RBR                        adverb, comparative
31 RBS                        adverb, superlative
32  RP                                    particle
33 SYM                                      symbol
34  TO       "to" as preposition or infinitive marker
35  UH                               interjection
36  VB                          verb, base form
37 VBD                         verb, past tense
38 VBG       verb, present participle or gerund
39 VBN                      verb, past participle
40 VBP   verb, present tense, not 3rd person singular
41 VBZ   verb, present tense, 3rd person singular
42 WDT                            WH-determiner
43  WP                              WH-pronoun
44 WP$                    WH-pronoun, possessive
45 WRB                                  Wh-adverb
}}

!Penn Tree Bank式のタグではなく、一般的な品詞記号にまとめることもできる
as_universial()

!品詞を明記することもできる
as_basic()

!!コマンド
*NICERよりNS502のデータの本文部分だけを取り出したものを例に
{{pre
> str(ns502)
 chr [1:26] "An Assumed Role" ...
> head(ns502)
[1] "An Assumed Role" [3] "Of course, many aware of the dynamic contrast between student and teacher are perfectly willing to perpetuate and even strengthen this relationship." [4] "It may seem entirely silly, even, to consider that anything needs to be changed." [5] "However, with growing competition in workplaces and with newer jobs being developed on a regular basis, it may be necessary to reexamine this two-dimensional hierarchy in order to better prepare students for the changing world." [6] "For years, American schools have maintained a strict adherence to an invisible, ghostly code, one that encourages respect and honor in a manner reminiscent of both the power structure found in factories and the military." }} !タグ付与: tag_pos() {{pre > tag_pos(ns502) # POSタグ付け 1. An/DT Assumed/NNP Role/NNP 2. Considering/VBG the/DT heightened/VBN role/NN maintained/VBN by/IN many/JJ in/IN ... 3. Of/IN course/NN ,/, many/JJ aware/JJ of/IN the/DT dynamic/JJ contrast/NN between/IN ... 4. It/PRP may/MD seem/VB entirely/RB silly/JJ ,/, even/RB ,/, to/TO consider/VB ... 5. However/RB ,/, with/IN growing/VBG competition/NN in/IN workplaces/NNS and/CC ... . . . 22. Allow/NN rules/NNS to/TO be/VB challenged/VBN ,/, and/CC even/RB changed/VBD on/IN ... 23. Let/VB the/DT wealth/NN of/IN perspectives/NNS a/DT teacher/NN has/VBZ access/NN ... 24. Education/NNP should/MD be/VB defined/VBN by/IN the/DT fluidity/NN of/IN the/DT ... 25. Supreme/NNP authority/NN without/IN equal/JJ exchange/NN is/VBZ as/RB unnatural/JJ ... 26. Instead/RB ,/, we/PRP should/MD yearn/VB to/TO construct/VB a/DT bridge/NN ... }} !タグ頻度 count_tags() {{pre > ns502.pos <- tag_pos(ns502) > count_tags(ns502.pos) n.tokens '' , . `` CC CD DT IN JJ JJR MD NN NNP 1 3 0 0 0 0 0 0 1(33.3%) 0 0 0 0 0 2(66.7%) 2 24 0 1(4.2%) 1(4.2%) 0 0 0 2(8.3%) 2(8.3%) 4(16.7%) 0 0 4(16.7%) 0 3 24 0 1(4.2%) 1(4.2%) 0 2(8.3%) 0 2(8.3%) 3(12.5%) 4(16.7%) 0 0 5(20.8%) 0 4 17 0 2(11.8%) 1(5.9%) 0 0 0 0 1(5.9%) 1(5.9%) 0 1(5.9%) 1(5.9%) 0 5 38 0 2(5.3%) 1(2.6%) 0 1(2.6%) 0 3(7.9%) 6(15.8%) 3(7.9%) 1(2.6%) 1(2.6%) 5(13.2%) 0 6 39 0 3(7.7%) 1(2.6%) 0 2(5.1%) 1(2.6%) 6(15.4%) 4(10.3%) 6(15.4%) 0 0 7(17.9%) 0 }} !タグ頻度のプロット: plot() {{pre > plot(ns502.pos) }} {{ref_image ns502.pos.png}} !タグ付きのテキストの出力: as_word_tag() {{pre > ns502.pos.tagged <- as_word_tag(ns502.pos) > > head(ns502.pos.tagged) [1] "An/DT Assumed/NNP Role/NNP" [2] "Considering/VBG the/DT heightened/VBN role/NN maintained/VBN by/IN many/JJ in/IN education/NN ,/, it/PRP 's/VBZ strangely/RB rare/JJ to/TO fin/VBG many/JJ willing/JJ to/TO question/VB this/DT status/NN quo/NN ./." [3] "Of/IN course/NN ,/, many/JJ aware/JJ of/IN the/DT dynamic/JJ contrast/NN between/IN student/NN and/CC teacher/NN are/VBP perfectly/RB willing/JJ to/TO perpetuate/VB and/CC even/RB strengthen/VB this/DT relationship/NN ./." [4] "It/PRP may/MD seem/VB entirely/RB silly/JJ ,/, even/RB ,/, to/TO consider/VB that/IN anything/NN needs/VBZ to/TO be/VB changed/VBN ./." [5] "However/RB ,/, with/IN growing/VBG competition/NN in/IN workplaces/NNS and/CC with/IN newer/JJR jobs/NNS being/VBG developed/VBN on/IN a/DT regular/JJ basis/NN ,/, it/PRP may/MD be/VB necessary/JJ to/TO reexamine/VB this/DT two-dimensional/JJ hierarchy/NN in/IN order/NN to/TO better/RB prepare/VB students/NNS for/IN the/DT changing/VBG world/NN ./." [6] "For/IN years/NNS ,/, American/JJ schools/NNS have/VBP maintained/VBN a/DT strict/JJ adherence/NN to/TO an/DT invisible/JJ ,/, ghostly/JJ code/NN ,/, one/CD that/WDT encourages/VBZ respect/NN and/CC honor/NN in/IN a/DT manner/NN reminiscent/JJ of/IN both/DT the/DT power/NN structure/NN found/VBD in/IN factories/NNS and/CC the/DT military/JJ ./." }} !品詞付きデータで、品詞を指定して検索 {{pre > grep("\\w+/JJ \\w+/NN", ns502.pos.tagged, value=T) [1] "Of/IN course/NN ,/, many/JJ aware/JJ of/IN the/DT dynamic/JJ contrast/NN between/IN student/NN and/CC teacher/NN are/VBP perfectly/RB willing/JJ to/TO perpetuate/VB and/CC even/RB strengthen/VB this/DT relationship/NN ./." [2] "However/RB ,/, with/IN growing/VBG competition/NN in/IN workplaces/NNS and/CC with/IN newer/JJR jobs/NNS being/VBG developed/VBN on/IN a/DT regular/JJ basis/NN ,/, it/PRP may/MD be/VB necessary/JJ to/TO reexamine/VB this/DT two-dimensional/JJ hierarchy/NN in/IN order/NN to/TO better/RB prepare/VB students/NNS for/IN the/DT changing/VBG world/NN ./." [3] "For/IN years/NNS ,/, American/JJ schools/NNS have/VBP maintained/VBN a/DT strict/JJ adherence/NN to/TO an/DT invisible/JJ ,/, ghostly/JJ code/NN ,/, one/CD that/WDT encourages/VBZ respect/NN and/CC honor/NN in/IN a/DT manner/NN reminiscent/JJ of/IN both/DT the/DT power/NN structure/NN found/VBD in/IN factories/NNS and/CC the/DT military/JJ ./." }} * grepだと、該当する文字列がある一文全体が表示される