*disclaimer
793842
R.package Rで便利なパッケージ
パッケージのインストール
- たとえば、gplots を使いたい場合
> install.packages("gplots", dependencies = T) > library(gplots)
どんなライブラリーを、読み込むことができるか library()
どんなライブラリーが、すでに使えるようになっているか search()
パッケージの概要を知る library(help="パッケージ名")
よく使うパッケージ
library(openxlsx) library(tidyverse)
- tidyverseには、以下のものが含まれる
✔ ggplot2 3.3.6 ✔ purrr 0.3.4 ✔ tibble 3.1.7 ✔ dplyr 1.0.9 ✔ tidyr 1.2.0 ✔ stringr 1.4.0 ✔ readr 2.1.2 ✔ forcats 0.5.1
A-G
bookdown
corpus
dagitty
easystats
eyetrackingR
ggplot2
ggstatsplot
H-P
ngram
O-U
quanteda
tidyverse
tm
V-Z
List of Packages
retimes
https://cran.r-project.org/web/packages/retimes/index.html
stringi
stringr
psych
心理学系のパッケージ
基本的な記述統計量は summary(x)で出るが、
もう少し詳しく見るには、このパッケージをインストールして describe(x)を使う。
> describe(x) vars n mean sd median trimmed mad min max range skew kurtosis se X1 1 100 0.06 1.06 0.04 0.07 0.87 -2.9 2.51 5.41 -0.04 -0.32 0.11
- 標準偏差、歪度、尖度、標準誤差なども出る。
sjmisc
find_var 該当するデータ列を検索・選択
find_var(data, pattern="パターン", out=出力形式) find_var(data, pattern="score", out="df")
- 列名に score を含むものを選んで、データフレーム形式で出力
tm
https://www.rdocumentation.org/packages/tm/versions/0.7-3
- Boost_tokenizer(x)
- MC_tokenizer(x)
- removePunctuation(tmp)
- removeNumbers(x)
tmp.v <- Vectorsource(tmp) tmp.c <- Corpus(tmp.v) tmpc.td <- TermDocumentMatrix(tmp.c) findFreqTerms(tmpc.td) findMostFreqTerms(tmpc.td) $`1` the said and computer its terminal 15 7 6 6 5 5
koRpus
https://reaktanz.de/R/pckg/koRpus/koRpus_vignette.html
tokenize()
install.packages("koRpus.lang.en") library(koRpus.lang.en) temp <- tokenize(choose.files(), lang="en")
これで、例えば、Project Gutenbergから、グリム童話のGolden Birdのテキストファイルを読み込む
A certain king had a beautiful garden, and in the garden stood a tree which bore golden apples. These apples were always counted, and about the time when they began to grow ripe it was found that every night one of them was gone. The king became very angry at this, and ordered the gardener to keep watch all night under the tree. The gardener set his eldest son to watch; but about twelve o’clock he fell asleep, and in
> temp doc_id token tag lemma lttr wclass desc stop stem idx sntc 1 <NA> A word.kRp 1 word <NA> <NA> <NA> 1 1 2 <NA> certain word.kRp 7 word <NA> <NA> <NA> 2 1 3 <NA> king word.kRp 4 word <NA> <NA> <NA> 3 1 4 <NA> had word.kRp 3 word <NA> <NA> <NA> 4 1 5 <NA> a word.kRp 1 word <NA> <NA> <NA> 5 1 6 <NA> beautiful word.kRp 9 word <NA> <NA> <NA> 6 1 [...] 2948 <NA> a word.kRp 1 word <NA> <NA> <NA> 2948 140 2949 <NA> great word.kRp 5 word <NA> <NA> <NA> 2949 140 2950 <NA> many word.kRp 4 word <NA> <NA> <NA> 2950 140 2951 <NA> many word.kRp 4 word <NA> <NA> <NA> 2951 140 2952 <NA> years word.kRp 5 word <NA> <NA> <NA> 2952 140 2953 <NA> . .kRp 1 fullstop <NA> <NA> <NA> 2953 140
lex.div()
- 各種の語彙多様性指標の算出
lex.div(temp)
MTLD()
> library(koRpus) > ns002 <- tokenize(choose.files(), lang="en")
で、たとえば、NS002のテキストだけのファイルを読み込んで、
> MTLD(ns002) Language: "en" Total number of tokens: 463 Total number of types: 218 Measure of Textual Lexical Diversity MTLD: 87.62 Number of factors: 5.28 Factor size: 0.72 SD tokens/factor: 36.8 (all factors) 30.05 (complete factors only) Note: Analysis was conducted case insensitive.
MATTR()
> MATTR(temp) Language: "en" Total number of tokens: 606 Total number of types: 261 Moving-Average Type-Token Ratio MATTR: 0.69 SD of TTRs: 0.05 Window size: 100
rpart
>model1 = rpart(LMH ~ DD + SL + MDD, data = C3L2) > rpart.plot(model)
gplots
install.packages("gplots", dependencies = T) library(gplots) > head(meanMHD) Group MHD 1 C2 1.500000 2 C2 1.000000 3 C2 2.000000 4 C2 1.250000 5 C2 1.333333 6 C2 1.333333 attach(meanMHD) plotmeans(MHD ~ Group) detach(oneWayMHD)
orddom 効果量を出してくれる
- psychパッケージがインストールしてあること
orddom(x, y)
- これだけで各種効果量を出してくれて、お好きなのをどうぞ、って感じ。
> orddom(dmu02shd$MDD, dmu03shd$MDD) ordinal metric var1_X "group 1 (x)" "group 1 (x)" var2_Y "group 2 (y)" "group 2 (y)" type_title "indep" "indep" n in X "592" "592" n in Y "547" "547" N #Y>X "176563" "176563" N #Y=X "13390" "13390" N #Y<X "133871" "133871" PS X>Y "0.413406665349078" "0.432442005161643" PS Y>X "0.545243712634023" "0.567557994838357" A X>Y "0.434081476357528" "0.434081476357528" A Y>X "0.565918523642472" "0.565918523642472" delta "0.131837047284944" "0.128573222406733" 1-alpha "95" "95" CI low "0.0647643614128363" "0.0665435144203836" CI high "0.197723891470872" "0.190602930393082" s delta "0.0339578629779545" "0.533045085206667" var delta "0.00115313645802954" "0.284137062862983" se delta NA "0.0316134060048762" z/t score "3.88237173141703" "4.0670474540738" H1 tails p/CI "2" "2" p "0.000109396219747593" "5.52499619534335e-05" Cohen's d "0.177125083672158" "0.241205155014015" d CI low "0.0839111293425067" "0.124543945177201" d CI high "0.275869162430133" "0.357866364850828" var d.i "0.291361840460969" "0.251891629550177" var dj. "0.362041171144457" "0.319040086833437" var dij "0.941272277686609" "0.569924729477023" df "1137" "1094.88514066995" NNT "7.58512133420789" "5.70864949792907"
ノンパラメトリックの場合、Cliff's deltaをみると、以下の基準に基づき判断すればよい:
https://www.rdocumentation.org/packages/effsize/versions/0.7.4/topics/cliff.delta
The magnitude is assessed using the thresholds provided in (Romano 2006), i.e. |d|<0.147 "negligible", |d|<0.33 "small", |d|<0.474 "medium", otherwise "large"
effsize 効果量を出してくれる
- ノンパラメトリックのCliff's deltaを見てみる例
install.packages("effsize") library(effsize) > cliff.delta(dmu02shd$MDD, dmu03shd$MDD) Cliff's Delta delta estimate: -0.131837 (negligible) 95 percent confidence interval: lower upper -0.19765420 -0.06483658
- 「(negligible)」と、評価もコメントしてつけてくれる。
lawstat
Brunner-Munzel Testが入っている。
> brunner.munzel.test(dmu02shd$MDD, dmu03shd$MDD) Brunner-Munzel Test data: dmu02shd$MDD and dmu03shd$MDD Brunner-Munzel Test Statistic = 3.8809, df = 1098.7, p-value = 0.0001103 95 percent confidence interval: 0.5325908 0.5992463 sample estimates: P(X<Y)+.5*P(X=Y) 0.5659185
brunnermunzel
install.packages("brunnermunzel") library(brunnermunzel)
> brunnermunzel.test(dmu02shd$MDD, dmu03shd$MDD) Brunner-Munzel Test data: dmu02shd$MDD and dmu03shd$MDD Brunner-Munzel Test Statistic = 3.8809, df = 1098.7, p-value = 0.0001103 95 percent confidence interval: 0.5325908 0.5992463 sample estimates: P(X<Y)+.5*P(X=Y) 0.5659185
dplyr
ggplot2
quanteda
https://github.com/koheiw/workshop-IJTA
https://sugiura-ken.org/wiki/