{{counter}}
!!!R.package Rで便利なパッケージ

!!tm
https://www.rdocumentation.org/packages/tm/versions/0.7-3

*Boost_tokenizer(x)
*MC_tokenizer(x)
*removePunctuation(tmp)
*removeNumbers(x)
----
{{pre
tmp.v <- Vectorsource(tmp)
tmp.c <- Corpus(tmp.v)
tmpc.td <- TermDocumentMatrix(tmp.c)
findFreqTerms(tmpc.td)
 findMostFreqTerms(tmpc.td)
$`1`
     the     said      and computer      its terminal 
      15        7        6        6        5        5 
}}
----

!!koRpus
 > library(koRpus)
 
 > ns002 <- tokenize(choose.files(), lang="en")
で、たとえば、NICERのNS002のテキストだけのファイルを読み込んで、

 > MTLD(ns002)
 Language: "en"
 
 Total number of tokens: 463
 Total number of types:  218
 
 Measure of Textual Lexical Diversity
                 MTLD: 87.62
    Number of factors: 5.28
          Factor size: 0.72
     SD tokens/factor: 36.8 (all factors)
                       30.05 (complete factors only)
 
 Note: Analysis was conducted case insensitive.