トップ 履歴 一覧 Farm ソース 検索 ヘルプ PDF RSS ログイン

R.package

*disclaimer
602941

R

R.package Rで便利なパッケージ

 パッケージのインストール

  • たとえば、gplots を使いたい場合
> install.packages("gplots", dependencies = T)
> library(gplots)

どんなライブラリーを、読み込むことができるか library()

どんなライブラリーが、すでに使えるようになっているか search()


パッケージの概要を知る library(help="パッケージ名")



よく使うパッケージ


library(openxlsx)

library(tidyverse)

  • tidyverseには、以下のものが含まれる
✔ ggplot2 3.3.6     ✔ purrr   0.3.4
✔ tibble  3.1.7     ✔ dplyr   1.0.9
✔ tidyr   1.2.0     ✔ stringr 1.4.0
✔ readr   2.1.2     ✔ forcats 0.5.1

A-G


 corpus


 dagitty


 easystats

 eyetrackingR


 ggplot2


 ggstatsplot

H-P

 ngram


O-U

 quanteda

 tidyverse

 tm


V-Z





List of Packages



 retimes

https://cran.r-project.org/web/packages/retimes/index.html

 stringi

 stringr


 psych

心理学系のパッケージ
基本的な記述統計量は summary(x)で出るが、
もう少し詳しく見るには、このパッケージをインストールして describe(x)を使う。

> describe(x)
   vars   n mean   sd median trimmed  mad  min  max range  skew kurtosis   se
X1    1 100 0.06 1.06   0.04    0.07 0.87 -2.9 2.51  5.41 -0.04    -0.32 0.11
  • 標準偏差、歪度、尖度、標準誤差なども出る。

 sjmisc

find_var 該当するデータ列を検索・選択

find_var(data, pattern="パターン", out=出力形式)
find_var(data, pattern="score", out="df")
  • 列名に score を含むものを選んで、データフレーム形式で出力



 tm

https://www.rdocumentation.org/packages/tm/versions/0.7-3

  • Boost_tokenizer(x)
  • MC_tokenizer(x)
  • removePunctuation(tmp)
  • removeNumbers(x)

tmp.v <- Vectorsource(tmp)
tmp.c <- Corpus(tmp.v)
tmpc.td <- TermDocumentMatrix(tmp.c)
findFreqTerms(tmpc.td)
 findMostFreqTerms(tmpc.td)
$`1`
     the     said      and computer      its terminal 
      15        7        6        6        5        5 


 koRpus

https://reaktanz.de/R/pckg/koRpus/koRpus_vignette.html

tokenize()


install.packages("koRpus.lang.en")
library(koRpus.lang.en)
temp <- tokenize(choose.files(), lang="en")

これで、例えば、Project Gutenbergから、グリム童話のGolden Birdのテキストファイルを読み込む

A certain king had a beautiful garden, and in the garden stood a tree
which bore golden apples. These apples were always counted, and about
the time when they began to grow ripe it was found that every night one
of them was gone. The king became very angry at this, and ordered the
gardener to keep watch all night under the tree. The gardener set his
eldest son to watch; but about twelve o’clock he fell asleep, and in

> temp
     doc_id     token      tag lemma lttr   wclass desc stop stem  idx sntc
1      <NA>         A word.kRp          1     word <NA> <NA> <NA>    1    1
2      <NA>   certain word.kRp          7     word <NA> <NA> <NA>    2    1
3      <NA>      king word.kRp          4     word <NA> <NA> <NA>    3    1
4      <NA>       had word.kRp          3     word <NA> <NA> <NA>    4    1
5      <NA>         a word.kRp          1     word <NA> <NA> <NA>    5    1
6      <NA> beautiful word.kRp          9     word <NA> <NA> <NA>    6    1
                                             [...]                         
2948   <NA>         a word.kRp          1     word <NA> <NA> <NA> 2948  140
2949   <NA>     great word.kRp          5     word <NA> <NA> <NA> 2949  140
2950   <NA>      many word.kRp          4     word <NA> <NA> <NA> 2950  140
2951   <NA>      many word.kRp          4     word <NA> <NA> <NA> 2951  140
2952   <NA>     years word.kRp          5     word <NA> <NA> <NA> 2952  140
2953   <NA>         .     .kRp          1 fullstop <NA> <NA> <NA> 2953  140

lex.div()

  • 各種の語彙多様性指標の算出
lex.div(temp)

MTLD()

> library(koRpus)

> ns002 <- tokenize(choose.files(), lang="en")

で、たとえば、NS002のテキストだけのファイルを読み込んで、

> MTLD(ns002)
Language: "en"

Total number of tokens: 463
Total number of types:  218

Measure of Textual Lexical Diversity
                MTLD: 87.62
   Number of factors: 5.28
         Factor size: 0.72
    SD tokens/factor: 36.8 (all factors)
                      30.05 (complete factors only)

Note: Analysis was conducted case insensitive.

MATTR()

> MATTR(temp)
Language: "en"

Total number of tokens: 606
Total number of types:  261

Moving-Average Type-Token Ratio
               MATTR: 0.69
          SD of TTRs: 0.05
         Window size: 100 

 rpart


>model1 = rpart(LMH ~ DD + SL + MDD, data = C3L2)

> rpart.plot(model)

 gplots

install.packages("gplots", dependencies = T)
library(gplots)

> head(meanMHD)
  Group      MHD
1    C2 1.500000
2    C2 1.000000
3    C2 2.000000
4    C2 1.250000
5    C2 1.333333
6    C2 1.333333

attach(meanMHD)
plotmeans(MHD ~ Group)
detach(oneWayMHD)

 orddom 効果量を出してくれる

  • psychパッケージがインストールしてあること
orddom(x, y)
  • これだけで各種効果量を出してくれて、お好きなのをどうぞ、って感じ。
> orddom(dmu02shd$MDD, dmu03shd$MDD)
              ordinal                metric                
var1_X        "group 1 (x)"          "group 1 (x)"         
var2_Y        "group 2 (y)"          "group 2 (y)"         
type_title    "indep"                "indep"               
n in X        "592"                  "592"                 
n in Y        "547"                  "547"                 
N #Y>X        "176563"               "176563"              
N #Y=X        "13390"                "13390"               
N #Y<X        "133871"               "133871"              
PS X>Y        "0.413406665349078"    "0.432442005161643"   
PS Y>X        "0.545243712634023"    "0.567557994838357"   
A X>Y         "0.434081476357528"    "0.434081476357528"   
A Y>X         "0.565918523642472"    "0.565918523642472"   
delta         "0.131837047284944"    "0.128573222406733"   
1-alpha       "95"                   "95"                  
CI low        "0.0647643614128363"   "0.0665435144203836"  
CI high       "0.197723891470872"    "0.190602930393082"   
s delta       "0.0339578629779545"   "0.533045085206667"   
var delta     "0.00115313645802954"  "0.284137062862983"   
se delta      NA                     "0.0316134060048762"  
z/t score     "3.88237173141703"     "4.0670474540738"     
H1 tails p/CI "2"                    "2"                   
p             "0.000109396219747593" "5.52499619534335e-05"
Cohen's d     "0.177125083672158"    "0.241205155014015"   
d CI low      "0.0839111293425067"   "0.124543945177201"   
d CI high     "0.275869162430133"    "0.357866364850828"   
var d.i       "0.291361840460969"    "0.251891629550177"   
var dj.       "0.362041171144457"    "0.319040086833437"   
var dij       "0.941272277686609"    "0.569924729477023"   
df            "1137"                 "1094.88514066995"    
NNT           "7.58512133420789"     "5.70864949792907"    

ノンパラメトリックの場合、Cliff's deltaをみると、以下の基準に基づき判断すればよい:
https://www.rdocumentation.org/packages/effsize/versions/0.7.4/topics/cliff.delta

The magnitude is assessed using the thresholds provided in (Romano 2006), i.e. 
|d|<0.147 "negligible", 
|d|<0.33 "small", 
|d|<0.474 "medium", 
otherwise "large" 

 effsize 効果量を出してくれる

  • ノンパラメトリックのCliff's deltaを見てみる例
install.packages("effsize")
library(effsize)
> cliff.delta(dmu02shd$MDD, dmu03shd$MDD)

Cliff's Delta

delta estimate: -0.131837 (negligible)
95 percent confidence interval:
      lower       upper 
-0.19765420 -0.06483658 
  • 「(negligible)」と、評価もコメントしてつけてくれる。

 lawstat

Brunner-Munzel Testが入っている。

> brunner.munzel.test(dmu02shd$MDD, dmu03shd$MDD)

        Brunner-Munzel Test

data:  dmu02shd$MDD and dmu03shd$MDD
Brunner-Munzel Test Statistic = 3.8809, df = 1098.7, p-value = 0.0001103
95 percent confidence interval:
 0.5325908 0.5992463
sample estimates:
P(X<Y)+.5*P(X=Y) 
       0.5659185 

 brunnermunzel

install.packages("brunnermunzel")
library(brunnermunzel)

> brunnermunzel.test(dmu02shd$MDD, dmu03shd$MDD)

        Brunner-Munzel Test

data:  dmu02shd$MDD and dmu03shd$MDD
Brunner-Munzel Test Statistic = 3.8809, df = 1098.7, p-value = 0.0001103
95 percent confidence interval:
 0.5325908 0.5992463
sample estimates:
P(X<Y)+.5*P(X=Y) 
       0.5659185 

 dplyr


 ggplot2


 quanteda

https://github.com/koheiw/workshop-IJTA

 lme4


 effects


 car

 tabplot


 GGally

相関のグラフ


 参考情報