{{category R}} !!! tagger {{outline}} ---- https://github.com/trinker/tagger * openNLPの品詞タグ付与 {{pre tagger wraps the NLP and openNLP packages for easier part of speech tagging. tagger uses the openNLP annotator to compute "Penn Treebank parse annotations using the Apache OpenNLP chunking parser for English." }} !!必要なパッケージをインストール {{pre install.packages("pacman") pacman::p_load_gh(c( "trinker/termco", "trinker/coreNLPsetup", "trinker/tagger" )) library(dplyr) library(tagger) install.packages('rJava') library(rJava) }} !!タグ一覧 !penn_tags() {{pre Tag Description 1 $ dollar 2 `` opening quotation mark 3 '' closing quotation mark 4 ( opening parenthesis 5 ) closing parenthesis 6 , comma 7 - dash 8 . sentence terminator 9 : colon or ellipsis 10 CC conjunction, coordinating 11 CD numeral, cardinal 12 DT determiner 13 EX existential there 14 FW foreign word 15 IN preposition or conjunction, subordinating 16 JJ adjective or numeral, ordinal 17 JJR adjective, comparative 18 JJS adjective, superlative 19 LS list item marker 20 MD modal auxiliary 21 NN noun, common, singular or mass 22 NNP noun, proper, singular 23 NNPS noun, proper, plural 24 NNS noun, common, plural 25 PDT pre-determiner 26 POS genitive marker 27 PRP pronoun, personal 28 PRP$ pronoun, possessive 29 RB adverb 30 RBR adverb, comparative 31 RBS adverb, superlative 32 RP particle 33 SYM symbol 34 TO "to" as preposition or infinitive marker 35 UH interjection 36 VB verb, base form 37 VBD verb, past tense 38 VBG verb, present participle or gerund 39 VBN verb, past participle 40 VBP verb, present tense, not 3rd person singular 41 VBZ verb, present tense, 3rd person singular 42 WDT WH-determiner 43 WP WH-pronoun 44 WP$ WH-pronoun, possessive 45 WRB Wh-adverb }} !Penn Tree Bank式のタグではなく、一般的な品詞記号にまとめることもできる: as_universial() {{pre > plot(as_universal(ns502.pos)) }} {{ref_image ns502.universal.png}} !品詞を明記することもできる: as_basic() {{pre > plot(as_basic(ns502.pos)) }} {{ref_image ns502.basic.png}} !!コマンド *NICERよりNS502のデータの本文部分だけを取り出したものを例に {{pre > str(ns502) chr [1:26] "An Assumed Role" ... > head(ns502) [1] "An Assumed Role" [2] "Considering the heightened role maintained by many in education, it's strangely rare to fin many willing to question this status quo." [3] "Of course, many aware of the dynamic contrast between student and teacher are perfectly willing to perpetuate and even strengthen this relationship." [4] "It may seem entirely silly, even, to consider that anything needs to be changed." [5] "However, with growing competition in workplaces and with newer jobs being developed on a regular basis, it may be necessary to reexamine this two-dimensional hierarchy in order to better prepare students for the changing world." [6] "For years, American schools have maintained a strict adherence to an invisible, ghostly code, one that encourages respect and honor in a manner reminiscent of both the power structure found in factories and the military." }} !タグ付与: tag_pos() {{pre > tag_pos(ns502) # POSタグ付け 1. An/DT Assumed/NNP Role/NNP 2. Considering/VBG the/DT heightened/VBN role/NN maintained/VBN by/IN many/JJ in/IN ... 3. Of/IN course/NN ,/, many/JJ aware/JJ of/IN the/DT dynamic/JJ contrast/NN between/IN ... 4. It/PRP may/MD seem/VB entirely/RB silly/JJ ,/, even/RB ,/, to/TO consider/VB ... 5. However/RB ,/, with/IN growing/VBG competition/NN in/IN workplaces/NNS and/CC ... . . . 22. Allow/NN rules/NNS to/TO be/VB challenged/VBN ,/, and/CC even/RB changed/VBD on/IN ... 23. Let/VB the/DT wealth/NN of/IN perspectives/NNS a/DT teacher/NN has/VBZ access/NN ... 24. Education/NNP should/MD be/VB defined/VBN by/IN the/DT fluidity/NN of/IN the/DT ... 25. Supreme/NNP authority/NN without/IN equal/JJ exchange/NN is/VBZ as/RB unnatural/JJ ... 26. Instead/RB ,/, we/PRP should/MD yearn/VB to/TO construct/VB a/DT bridge/NN ... }} !タグ頻度 count_tags() {{pre > ns502.pos <- tag_pos(ns502) > count_tags(ns502.pos) n.tokens '' , . `` CC CD DT IN JJ JJR MD NN NNP 1 3 0 0 0 0 0 0 1(33.3%) 0 0 0 0 0 2(66.7%) 2 24 0 1(4.2%) 1(4.2%) 0 0 0 2(8.3%) 2(8.3%) 4(16.7%) 0 0 4(16.7%) 0 3 24 0 1(4.2%) 1(4.2%) 0 2(8.3%) 0 2(8.3%) 3(12.5%) 4(16.7%) 0 0 5(20.8%) 0 4 17 0 2(11.8%) 1(5.9%) 0 0 0 0 1(5.9%) 1(5.9%) 0 1(5.9%) 1(5.9%) 0 5 38 0 2(5.3%) 1(2.6%) 0 1(2.6%) 0 3(7.9%) 6(15.8%) 3(7.9%) 1(2.6%) 1(2.6%) 5(13.2%) 0 6 39 0 3(7.7%) 1(2.6%) 0 2(5.1%) 1(2.6%) 6(15.4%) 4(10.3%) 6(15.4%) 0 0 7(17.9%) 0 }} !タグ頻度のプロット: plot() {{pre > plot(ns502.pos) }} {{ref_image ns502.pos.png}} !タグ付きのテキストの出力: as_word_tag() {{pre > ns502.pos.tagged <- as_word_tag(ns502.pos) > > head(ns502.pos.tagged) [1] "An/DT Assumed/NNP Role/NNP" [2] "Considering/VBG the/DT heightened/VBN role/NN maintained/VBN by/IN many/JJ in/IN education/NN ,/, it/PRP 's/VBZ strangely/RB rare/JJ to/TO fin/VBG many/JJ willing/JJ to/TO question/VB this/DT status/NN quo/NN ./." [3] "Of/IN course/NN ,/, many/JJ aware/JJ of/IN the/DT dynamic/JJ contrast/NN between/IN student/NN and/CC teacher/NN are/VBP perfectly/RB willing/JJ to/TO perpetuate/VB and/CC even/RB strengthen/VB this/DT relationship/NN ./." [4] "It/PRP may/MD seem/VB entirely/RB silly/JJ ,/, even/RB ,/, to/TO consider/VB that/IN anything/NN needs/VBZ to/TO be/VB changed/VBN ./." [5] "However/RB ,/, with/IN growing/VBG competition/NN in/IN workplaces/NNS and/CC with/IN newer/JJR jobs/NNS being/VBG developed/VBN on/IN a/DT regular/JJ basis/NN ,/, it/PRP may/MD be/VB necessary/JJ to/TO reexamine/VB this/DT two-dimensional/JJ hierarchy/NN in/IN order/NN to/TO better/RB prepare/VB students/NNS for/IN the/DT changing/VBG world/NN ./." [6] "For/IN years/NNS ,/, American/JJ schools/NNS have/VBP maintained/VBN a/DT strict/JJ adherence/NN to/TO an/DT invisible/JJ ,/, ghostly/JJ code/NN ,/, one/CD that/WDT encourages/VBZ respect/NN and/CC honor/NN in/IN a/DT manner/NN reminiscent/JJ of/IN both/DT the/DT power/NN structure/NN found/VBD in/IN factories/NNS and/CC the/DT military/JJ ./." }} !品詞付きデータで、品詞を指定して検索 *形容詞+名詞 を例に {{pre > grep("\\w+/JJ \\w+/NN", ns502.pos.tagged, value=T) [1] "Of/IN course/NN ,/, many/JJ aware/JJ of/IN the/DT dynamic/JJ contrast/NN between/IN student/NN and/CC teacher/NN are/VBP perfectly/RB willing/JJ to/TO perpetuate/VB and/CC even/RB strengthen/VB this/DT relationship/NN ./." [2] "However/RB ,/, with/IN growing/VBG competition/NN in/IN workplaces/NNS and/CC with/IN newer/JJR jobs/NNS being/VBG developed/VBN on/IN a/DT regular/JJ basis/NN ,/, it/PRP may/MD be/VB necessary/JJ to/TO reexamine/VB this/DT two-dimensional/JJ hierarchy/NN in/IN order/NN to/TO better/RB prepare/VB students/NNS for/IN the/DT changing/VBG world/NN ./." [3] "For/IN years/NNS ,/, American/JJ schools/NNS have/VBP maintained/VBN a/DT strict/JJ adherence/NN to/TO an/DT invisible/JJ ,/, ghostly/JJ code/NN ,/, one/CD that/WDT encourages/VBZ respect/NN and/CC honor/NN in/IN a/DT manner/NN reminiscent/JJ of/IN both/DT the/DT power/NN structure/NN found/VBD in/IN factories/NNS and/CC the/DT military/JJ ./." }} * grepだと、該当する文字列がある一文全体が表示される !!stringrパッケージの利用 !stringrのインストール {{pre install.packages("stringr", dependencies=T) library(stringr) }} !stringrパッケージの機能を使って該当文字列だけの検索 https://sugiura-ken.org/wiki/wiki.cgi/exp?page=grepExtract * ↑これを修正 {{pre grepExtract2 <- function(a,b){ # stringrパッケージ利用 #copyleft 2020-12-20 sugiura@nagoya-u.jp # grepExtract2("検索文字列", データ) hit.all <- c() hit <- str_extract(b, a) hit.all <- c(hit.all, hit) hit.all } }} {{pre > grepExtract2("\\w+/JJ \\w+/NN", ns502.pos.tagged) [1] NA NA "dynamic/JJ contrast/NN" [4] NA "regular/JJ basis/NN" "American/JJ schools/NN" [7] "American/JJ schools/NN" "last/JJ name/NN" "student/JJ roles/NN" [10] "Many/JJ professors/NN" "romantic/JJ relationships/NN" "many/JJ foreigners/NN" [13] "student/JJ relationships/NN" "intellectual/JJ equals/NN" NA [16] "loose/JJ roles/NN" "active/JJ response/NN" "first/JJ step/NN" [19] NA "proper/JJ parenting/NN" "distinct/JJ purpose/NN" [22] NA "right/JJ course/NN" "human/JJ experience/NN" [25] "equal/JJ exchange/NN" "equal/JJ power/NN" }}