トップ 差分 一覧 ソース 検索 ヘルプ PDF RSS ログイン

R.package

*disclaimer
4721

4720

R.package Rで便利なパッケージ


 tm

https://www.rdocumentation.org/packages/tm/versions/0.7-3

  • Boost_tokenizer(x)
  • MC_tokenizer(x)
  • removePunctuation(tmp)
  • removeNumbers(x)

tmp.v <- Vectorsource(tmp)
tmp.c <- Corpus(tmp.v)
tmpc.td <- TermDocumentMatrix(tmp.c)
findFreqTerms(tmpc.td)
 findMostFreqTerms(tmpc.td)
$`1`
     the     said      and computer      its terminal 
      15        7        6        6        5        5 


 koRpus

https://reaktanz.de/R/pckg/koRpus/koRpus_vignette.html

  • MTLD
> library(koRpus)

> ns002 <- tokenize(choose.files(), lang="en")

で、たとえば、NS002のテキストだけのファイルを読み込んで、

> MTLD(ns002)
Language: "en"

Total number of tokens: 463
Total number of types:  218

Measure of Textual Lexical Diversity
                MTLD: 87.62
   Number of factors: 5.28
         Factor size: 0.72
    SD tokens/factor: 36.8 (all factors)
                      30.05 (complete factors only)

Note: Analysis was conducted case insensitive.

 rpart


>model1 = rpart(LMH ~ DD + SL + MDD, data = C3L2)

> rpart.plot(model)