R
R.package
!!!corpus
{{outline}}
*https://cran.r-project.org/web/packages/corpus/index.html

!raw data に整形しておく

!corpus_frame() で「corpus data frame object」形式のデータとして保存
,title,text

!text_tekens() でトークン化

!text_filter()
*オプションを指定することで各種整形ができる

!text_ntoken()
*tokenの数
!text_ntype()
typeの数
!text_nsentence()
文の数
!text_stats()
*上三つをまとめて行う

!term_stats()
*各用語が、コーパス・データ中のいくつのサブコーパスに含まれるか
 term_stats(data)
*オプションでngramも同様に
 term_stats(data, ngrams = 5)