R
R.package
quanteda
!!!quantedaを使った連語(multi-word expressions)の処理
{{outline}}

----
!連語の検索
*具体的な連語をベクトルにまとめておく
 multiword <- c("in addition", "on the other hand", "as a result")

*「フレーズ」という単位で扱われるように指定する
 phrase(multiword)

*処理例：kwic検索
{{pre

> multiword <- c("in addition", "on the other hand", "as a result")

> kwic(nicestJPN.corpus, pattern = phrase(multiword))
                                                                                                        
   [JAN0001_P2B.txt, 12:15]      are active and free, | on the other hand | olders are less active and  
 [JAN0001_P5B.txt, 117:118]    is the biggest reason. |    In addition    | above reason, people will   
 [JAN0001_P7B.txt, 196:199] answer ten questions. But | on the other hand | , if you understanding ideas
}}

!連語の頻度と強さの一覧 <<textstat_collocations(コーパスデータ)>>
{{pre
> textstat_collocations(nicestJPN.corpus)
            collocation count count_nested length    lambda        z
1                do not    10            0      2  4.151357 8.325134
2          young people     8            0      2  4.283437 8.180578
3                if you    10            0      2  5.196013 7.994369
4          young person     6            0      2  4.946850 7.666668
5               can not     9            0      2  3.478602 7.612492
6               i think     7            0      2  3.939177 7.470671
7           enough time     5            0      2  6.849914 7.351494
8               want to     9            0      2  4.083226 6.925760
9            person can     5            0      2  4.331071 6.718328
10           enjoy life     3            0      2  5.297317 6.307093
11          have enough     4            0      2  4.753478 6.300479
12               of all     5            0      2  4.435229 6.243870
}}
*連語の長さ（グラム数）の指定オプション： size = 数字
*頻度の最低回数の指定オプション： min_count = 回数


----
* reference https://quanteda.io/articles/pkgdown/examples/phrase.html