{{counter}} R !!!R.package Rで便利なパッケージ {{outline}} !!パッケージのインストール *たとえば、gplots を使いたい場合 {{pre > install.packages("gplots", dependencies = T) > library(gplots) }} !!stringr !str_detect() *該当する文字列があるかどうか調べる str_detect(データ, "正規表現") *subset() と合わせて使うと便利 **データフレーム中の特定の列に「ある種の文字列」があるかどうかを調べて、その文字列を含む行だけを選び出す。 ***「ある種の文字列」の例:小文字の連続で書かれている「単語」が複数あるもの {{pre fragJBnozeroMW <- subset(fragJBnozero, str_detect(fragJBnozero[,3], "[:lower:]+ +[^a-z]*[:lower:]+"), select=c("total","MW")) > fragJBnozeroMW total MW 251 256 a [NN] of [NP] 278 253 do n't [VP] 285 252 the [NN] of [NP] 291 219 do not [VP] 306 235 want to [VP] 325 240 a lot 330 213 for example 341 210 a lot of [NP] }} ---- 参考サイト https://heavywatal.github.io/rstats/stringr.html !!psych 心理学系のパッケージ 基本的な記述統計量は <>で出るが、 もう少し詳しく見るには、このパッケージをインストールして <>を使う。 {{pre > describe(x) vars n mean sd median trimmed mad min max range skew kurtosis se X1 1 100 0.06 1.06 0.04 0.07 0.87 -2.9 2.51 5.41 -0.04 -0.32 0.11 }} *標準偏差、歪度、尖度、標準誤差なども出る。 !!tm https://www.rdocumentation.org/packages/tm/versions/0.7-3 *Boost_tokenizer(x) *MC_tokenizer(x) *removePunctuation(tmp) *removeNumbers(x) ---- {{pre tmp.v <- Vectorsource(tmp) tmp.c <- Corpus(tmp.v) tmpc.td <- TermDocumentMatrix(tmp.c) findFreqTerms(tmpc.td) findMostFreqTerms(tmpc.td) $`1` the said and computer its terminal 15 7 6 6 5 5 }} ---- !!koRpus https://reaktanz.de/R/pckg/koRpus/koRpus_vignette.html *MTLD > library(koRpus) > ns002 <- tokenize(choose.files(), lang="en") で、たとえば、NS002のテキストだけのファイルを読み込んで、 > MTLD(ns002) Language: "en" Total number of tokens: 463 Total number of types: 218 Measure of Textual Lexical Diversity MTLD: 87.62 Number of factors: 5.28 Factor size: 0.72 SD tokens/factor: 36.8 (all factors) 30.05 (complete factors only) Note: Analysis was conducted case insensitive. !!rpart >model1 = rpart(LMH ~ DD + SL + MDD, data = C3L2) > rpart.plot(model) !!gplots {{pre install.packages("gplots", dependencies = T) library(gplots) > head(meanMHD) Group MHD 1 C2 1.500000 2 C2 1.000000 3 C2 2.000000 4 C2 1.250000 5 C2 1.333333 6 C2 1.333333 attach(meanMHD) plotmeans(MHD ~ Group) detach(oneWayMHD) }} {{ref_image meanComparisonMHD.png}} !!orddom *psychパッケージがインストールしてあること orddom(x, y) *これだけで各種効果量を出してくれて、お好きなのをどうぞ、って感じ。 {{pre > orddom(dmu02shd$MDD, dmu03shd$MDD) ordinal metric var1_X "group 1 (x)" "group 1 (x)" var2_Y "group 2 (y)" "group 2 (y)" type_title "indep" "indep" n in X "592" "592" n in Y "547" "547" N #Y>X "176563" "176563" N #Y=X "13390" "13390" N #YY "0.413406665349078" "0.432442005161643" PS Y>X "0.545243712634023" "0.567557994838357" A X>Y "0.434081476357528" "0.434081476357528" A Y>X "0.565918523642472" "0.565918523642472" delta "0.131837047284944" "0.128573222406733" 1-alpha "95" "95" CI low "0.0647643614128363" "0.0665435144203836" CI high "0.197723891470872" "0.190602930393082" s delta "0.0339578629779545" "0.533045085206667" var delta "0.00115313645802954" "0.284137062862983" se delta NA "0.0316134060048762" z/t score "3.88237173141703" "4.0670474540738" H1 tails p/CI "2" "2" p "0.000109396219747593" "5.52499619534335e-05" Cohen's d "0.177125083672158" "0.241205155014015" d CI low "0.0839111293425067" "0.124543945177201" d CI high "0.275869162430133" "0.357866364850828" var d.i "0.291361840460969" "0.251891629550177" var dj. "0.362041171144457" "0.319040086833437" var dij "0.941272277686609" "0.569924729477023" df "1137" "1094.88514066995" NNT "7.58512133420789" "5.70864949792907" }} ノンパラメトリックの場合、Cliff's deltaをみると、以下の基準に基づき判断すればよい: https://www.rdocumentation.org/packages/effsize/versions/0.7.4/topics/cliff.delta The magnitude is assessed using the thresholds provided in (Romano 2006), i.e. |d|<0.147 "negligible", |d|<0.33 "small", |d|<0.474 "medium", otherwise "large" !!effsize {{pre install.packages("effsize") library(effsize) > cliff.delta(dmu02shd$MDD, dmu03shd$MDD) Cliff's Delta delta estimate: -0.131837 (negligible) 95 percent confidence interval: lower upper -0.19765420 -0.06483658 }} *「(negligible)」と、評価もコメントしてつけてくれる。 !!lawstat Brunner-Munzel Testが入っている。 {{pre > brunner.munzel.test(dmu02shd$MDD, dmu03shd$MDD) Brunner-Munzel Test data: dmu02shd$MDD and dmu03shd$MDD Brunner-Munzel Test Statistic = 3.8809, df = 1098.7, p-value = 0.0001103 95 percent confidence interval: 0.5325908 0.5992463 sample estimates: P(X brunnermunzel.test(dmu02shd$MDD, dmu03shd$MDD) Brunner-Munzel Test data: dmu02shd$MDD and dmu03shd$MDD Brunner-Munzel Test Statistic = 3.8809, df = 1098.7, p-value = 0.0001103 95 percent confidence interval: 0.5325908 0.5992463 sample estimates: P(X