*disclaimer
1157430
R.package Rで便利なパッケージ
パッケージのインストール
- たとえば、gplots を使いたい場合
> install.packages("gplots", dependencies = T)
> library(gplots)
どんなライブラリーを、読み込むことができるか library()
どんなライブラリーが、すでに使えるようになっているか search()
パッケージの概要を知る library(help="パッケージ名")
よく使うパッケージ
library(openxlsx) library(tidyverse)
- tidyverseには、以下のものが含まれる
✔ ggplot2 3.3.6 ✔ purrr 0.3.4 ✔ tibble 3.1.7 ✔ dplyr 1.0.9 ✔ tidyr 1.2.0 ✔ stringr 1.4.0 ✔ readr 2.1.2 ✔ forcats 0.5.1
A-G
bookdown
corpus
dagitty
easystats
eyetrackingR
ggplot2
ggstatsplot
H-P
ngram
O-U
quanteda
tidyverse
tm
V-Z
List of Packages
retimes
https://cran.r-project.org/web/packages/retimes/index.html
stringi
stringr
psych
心理学系のパッケージ
基本的な記述統計量は summary(x)で出るが、
もう少し詳しく見るには、このパッケージをインストールして describe(x)を使う。
> describe(x) vars n mean sd median trimmed mad min max range skew kurtosis se X1 1 100 0.06 1.06 0.04 0.07 0.87 -2.9 2.51 5.41 -0.04 -0.32 0.11
- 標準偏差、歪度、尖度、標準誤差なども出る。
sjmisc
find_var 該当するデータ列を検索・選択
find_var(data, pattern="パターン", out=出力形式) find_var(data, pattern="score", out="df")
- 列名に score を含むものを選んで、データフレーム形式で出力
tm
https://www.rdocumentation.org/packages/tm/versions/0.7-3
- Boost_tokenizer(x)
- MC_tokenizer(x)
- removePunctuation(tmp)
- removeNumbers(x)
tmp.v <- Vectorsource(tmp)
tmp.c <- Corpus(tmp.v)
tmpc.td <- TermDocumentMatrix(tmp.c)
findFreqTerms(tmpc.td)
findMostFreqTerms(tmpc.td)
$`1`
the said and computer its terminal
15 7 6 6 5 5
koRpus
https://reaktanz.de/R/pckg/koRpus/koRpus_vignette.html
tokenize()
install.packages("koRpus.lang.en")
library(koRpus.lang.en)
temp <- tokenize(choose.files(), lang="en")
これで、例えば、Project Gutenbergから、グリム童話のGolden Birdのテキストファイルを読み込む
A certain king had a beautiful garden, and in the garden stood a tree which bore golden apples. These apples were always counted, and about the time when they began to grow ripe it was found that every night one of them was gone. The king became very angry at this, and ordered the gardener to keep watch all night under the tree. The gardener set his eldest son to watch; but about twelve o’clock he fell asleep, and in
> temp
doc_id token tag lemma lttr wclass desc stop stem idx sntc
1 <NA> A word.kRp 1 word <NA> <NA> <NA> 1 1
2 <NA> certain word.kRp 7 word <NA> <NA> <NA> 2 1
3 <NA> king word.kRp 4 word <NA> <NA> <NA> 3 1
4 <NA> had word.kRp 3 word <NA> <NA> <NA> 4 1
5 <NA> a word.kRp 1 word <NA> <NA> <NA> 5 1
6 <NA> beautiful word.kRp 9 word <NA> <NA> <NA> 6 1
[...]
2948 <NA> a word.kRp 1 word <NA> <NA> <NA> 2948 140
2949 <NA> great word.kRp 5 word <NA> <NA> <NA> 2949 140
2950 <NA> many word.kRp 4 word <NA> <NA> <NA> 2950 140
2951 <NA> many word.kRp 4 word <NA> <NA> <NA> 2951 140
2952 <NA> years word.kRp 5 word <NA> <NA> <NA> 2952 140
2953 <NA> . .kRp 1 fullstop <NA> <NA> <NA> 2953 140
lex.div()
- 各種の語彙多様性指標の算出
lex.div(temp)
MTLD()
> library(koRpus) > ns002 <- tokenize(choose.files(), lang="en")
で、たとえば、NS002のテキストだけのファイルを読み込んで、
> MTLD(ns002)
Language: "en"
Total number of tokens: 463
Total number of types: 218
Measure of Textual Lexical Diversity
MTLD: 87.62
Number of factors: 5.28
Factor size: 0.72
SD tokens/factor: 36.8 (all factors)
30.05 (complete factors only)
Note: Analysis was conducted case insensitive.
MATTR()
> MATTR(temp)
Language: "en"
Total number of tokens: 606
Total number of types: 261
Moving-Average Type-Token Ratio
MATTR: 0.69
SD of TTRs: 0.05
Window size: 100
rpart
>model1 = rpart(LMH ~ DD + SL + MDD, data = C3L2) > rpart.plot(model)
gplots
install.packages("gplots", dependencies = T)
library(gplots)
> head(meanMHD)
Group MHD
1 C2 1.500000
2 C2 1.000000
3 C2 2.000000
4 C2 1.250000
5 C2 1.333333
6 C2 1.333333
attach(meanMHD)
plotmeans(MHD ~ Group)
detach(oneWayMHD)
orddom 効果量を出してくれる
- psychパッケージがインストールしてあること
orddom(x, y)
- これだけで各種効果量を出してくれて、お好きなのをどうぞ、って感じ。
> orddom(dmu02shd$MDD, dmu03shd$MDD)
ordinal metric
var1_X "group 1 (x)" "group 1 (x)"
var2_Y "group 2 (y)" "group 2 (y)"
type_title "indep" "indep"
n in X "592" "592"
n in Y "547" "547"
N #Y>X "176563" "176563"
N #Y=X "13390" "13390"
N #Y<X "133871" "133871"
PS X>Y "0.413406665349078" "0.432442005161643"
PS Y>X "0.545243712634023" "0.567557994838357"
A X>Y "0.434081476357528" "0.434081476357528"
A Y>X "0.565918523642472" "0.565918523642472"
delta "0.131837047284944" "0.128573222406733"
1-alpha "95" "95"
CI low "0.0647643614128363" "0.0665435144203836"
CI high "0.197723891470872" "0.190602930393082"
s delta "0.0339578629779545" "0.533045085206667"
var delta "0.00115313645802954" "0.284137062862983"
se delta NA "0.0316134060048762"
z/t score "3.88237173141703" "4.0670474540738"
H1 tails p/CI "2" "2"
p "0.000109396219747593" "5.52499619534335e-05"
Cohen's d "0.177125083672158" "0.241205155014015"
d CI low "0.0839111293425067" "0.124543945177201"
d CI high "0.275869162430133" "0.357866364850828"
var d.i "0.291361840460969" "0.251891629550177"
var dj. "0.362041171144457" "0.319040086833437"
var dij "0.941272277686609" "0.569924729477023"
df "1137" "1094.88514066995"
NNT "7.58512133420789" "5.70864949792907"
ノンパラメトリックの場合、Cliff's deltaをみると、以下の基準に基づき判断すればよい:
https://www.rdocumentation.org/packages/effsize/versions/0.7.4/topics/cliff.delta
The magnitude is assessed using the thresholds provided in (Romano 2006), i.e. |d|<0.147 "negligible", |d|<0.33 "small", |d|<0.474 "medium", otherwise "large"
effsize 効果量を出してくれる
- ノンパラメトリックのCliff's deltaを見てみる例
install.packages("effsize")
library(effsize)
> cliff.delta(dmu02shd$MDD, dmu03shd$MDD)
Cliff's Delta
delta estimate: -0.131837 (negligible)
95 percent confidence interval:
lower upper
-0.19765420 -0.06483658
- 「(negligible)」と、評価もコメントしてつけてくれる。
lawstat
Brunner-Munzel Testが入っている。
> brunner.munzel.test(dmu02shd$MDD, dmu03shd$MDD)
Brunner-Munzel Test
data: dmu02shd$MDD and dmu03shd$MDD
Brunner-Munzel Test Statistic = 3.8809, df = 1098.7, p-value = 0.0001103
95 percent confidence interval:
0.5325908 0.5992463
sample estimates:
P(X<Y)+.5*P(X=Y)
0.5659185
brunnermunzel
install.packages("brunnermunzel")
library(brunnermunzel)
> brunnermunzel.test(dmu02shd$MDD, dmu03shd$MDD)
Brunner-Munzel Test
data: dmu02shd$MDD and dmu03shd$MDD
Brunner-Munzel Test Statistic = 3.8809, df = 1098.7, p-value = 0.0001103
95 percent confidence interval:
0.5325908 0.5992463
sample estimates:
P(X<Y)+.5*P(X=Y)
0.5659185
dplyr
ggplot2
quanteda
https://github.com/koheiw/workshop-IJTA
https://sugiura-ken.org/wiki/