R R.package quanteda !!!quantedaを使った連語(multi-word expressions)の処理 {{outline}} ---- !連語の検索 *具体的な連語をベクトルにまとめておく multiword <- c("in addition", "on the other hand", "as a result") *「フレーズ」という単位で扱われるように指定する phrase(multiword) *処理例:kwic検索 {{pre > multiword <- c("in addition", "on the other hand", "as a result") > kwic(nicestJPN.corpus, pattern = phrase(multiword)) [JAN0001_P2B.txt, 12:15] are active and free, | on the other hand | olders are less active and [JAN0001_P5B.txt, 117:118] is the biggest reason. | In addition | above reason, people will [JAN0001_P7B.txt, 196:199] answer ten questions. But | on the other hand | , if you understanding ideas }} !連語の頻度と強さの一覧 <> {{pre > textstat_collocations(nicestJPN.corpus) collocation count count_nested length lambda z 1 do not 10 0 2 4.151357 8.325134 2 young people 8 0 2 4.283437 8.180578 3 if you 10 0 2 5.196013 7.994369 4 young person 6 0 2 4.946850 7.666668 5 can not 9 0 2 3.478602 7.612492 6 i think 7 0 2 3.939177 7.470671 7 enough time 5 0 2 6.849914 7.351494 8 want to 9 0 2 4.083226 6.925760 9 person can 5 0 2 4.331071 6.718328 10 enjoy life 3 0 2 5.297317 6.307093 11 have enough 4 0 2 4.753478 6.300479 12 of all 5 0 2 4.435229 6.243870 }} *連語の長さ(グラム数)の指定オプション: size = 数字 *頻度の最低回数の指定オプション: min_count = 回数 ---- * reference https://quanteda.io/articles/pkgdown/examples/phrase.html