R !!!R.package Rで便利なパッケージ {{outline}} !!パッケージのインストール *たとえば、gplots を使いたい場合 {{pre > install.packages("gplots", dependencies = T) > library(gplots) }} !どんなライブラリーを、読み込むことができるか <> !どんなライブラリーが、すでに使えるようになっているか <> !パッケージの概要を知る <> !!!A-G !!corpus !!dagitty !!eyetrackingR !!!H-P !!ngram !!!O-U !!quanteda !!tidyverse !!tm !!!V-Z ---- !!!List of Packages !!retimes https://cran.r-project.org/web/packages/retimes/index.html !!stringi !!stringr !str_which() *文字列がある行番号を調べる str_which(カラム名,"文字列") {{pre > head(df1) A B C 1 AAA 2 20 2 BBB 3 30 3 AAA 40 4 BBB 4 30 5 CCC 5 60 > str_which(df1$A, "AAA") [1] 1 3 }} !str_detect() *該当する文字列があるかどうか調べる str_detect(データ, "正規表現") *subset() と合わせて使うと便利 **データフレーム中の特定の列に「ある種の文字列」があるかどうかを調べて、その文字列を含む行だけを選び出す。 ***「ある種の文字列」の例：小文字の連続で書かれている「単語」が複数あるもの {{pre fragJBnozeroMW <- subset(fragJBnozero, str_detect(fragJBnozero[,3], "[:lower:]+ +[^a-z]*[:lower:]+"), select=c("total","MW")) > fragJBnozeroMW total MW 251 256 a [NN] of [NP] 278 253 do n't [VP] 285 252 the [NN] of [NP] 291 219 do not [VP] 306 235 want to [VP] 325 240 a lot 330 213 for example 341 210 a lot of [NP] }} !str_extract() *指定したパターンが該当した文字列を抽出する *正規表現で複数のパターンの文字列が該当する場合、個々に該当したパターンを出力 ---- 参考サイト https://heavywatal.github.io/rstats/stringr.html !!psych 心理学系のパッケージ基本的な記述統計量は <>で出るが、もう少し詳しく見るには、このパッケージをインストールして <>を使う。 {{pre > describe(x) vars n mean sd median trimmed mad min max range skew kurtosis se X1 1 100 0.06 1.06 0.04 0.07 0.87 -2.9 2.51 5.41 -0.04 -0.32 0.11 }} *標準偏差、歪度、尖度、標準誤差なども出る。 !!sjmisc !find_var 該当するデータ列を検索・選択 find_var(data, pattern="パターン", out=出力形式) find_var(data, pattern="score", out="df") *列名に score を含むものを選んで、データフレーム形式で出力 !!tm https://www.rdocumentation.org/packages/tm/versions/0.7-3 *Boost_tokenizer(x) *MC_tokenizer(x) *removePunctuation(tmp) *removeNumbers(x) ---- {{pre tmp.v <- Vectorsource(tmp) tmp.c <- Corpus(tmp.v) tmpc.td <- TermDocumentMatrix(tmp.c) findFreqTerms(tmpc.td) findMostFreqTerms(tmpc.td) $`1` the said and computer its terminal 15 7 6 6 5 5 }} ---- !!koRpus https://reaktanz.de/R/pckg/koRpus/koRpus_vignette.html !tokenize() install.packages("koRpus.lang.en") library(koRpus.lang.en) temp <- tokenize(choose.files(), lang="en") これで、例えば、Project Gutenbergから、グリム童話のGolden Birdのテキストファイルを読み込む {{pre A certain king had a beautiful garden, and in the garden stood a tree which bore golden apples. These apples were always counted, and about the time when they began to grow ripe it was found that every night one of them was gone. The king became very angry at this, and ordered the gardener to keep watch all night under the tree. The gardener set his eldest son to watch; but about twelve o’clock he fell asleep, and in }} {{pre > temp doc_id token tag lemma lttr wclass desc stop stem idx sntc 1 A word.kRp 1 word 1 1 2 certain word.kRp 7 word 2 1 3 king word.kRp 4 word 3 1 4 had word.kRp 3 word 4 1 5 a word.kRp 1 word 5 1 6 beautiful word.kRp 9 word 6 1 [...] 2948 a word.kRp 1 word 2948 140 2949 great word.kRp 5 word 2949 140 2950 many word.kRp 4 word 2950 140 2951 many word.kRp 4 word 2951 140 2952 years word.kRp 5 word 2952 140 2953 . .kRp 1 fullstop 2953 140 }} !lex.div() *各種の語彙多様性指標の算出 lex.div(temp) !MTLD() > library(koRpus) > ns002 <- tokenize(choose.files(), lang="en") で、たとえば、NS002のテキストだけのファイルを読み込んで、 > MTLD(ns002) Language: "en" Total number of tokens: 463 Total number of types: 218 Measure of Textual Lexical Diversity MTLD: 87.62 Number of factors: 5.28 Factor size: 0.72 SD tokens/factor: 36.8 (all factors) 30.05 (complete factors only) Note: Analysis was conducted case insensitive. !MATTR() {{pre > MATTR(temp) Language: "en" Total number of tokens: 606 Total number of types: 261 Moving-Average Type-Token Ratio MATTR: 0.69 SD of TTRs: 0.05 Window size: 100 }} !!rpart >model1 = rpart(LMH ~ DD + SL + MDD, data = C3L2) > rpart.plot(model) !!gplots {{pre install.packages("gplots", dependencies = T) library(gplots) > head(meanMHD) Group MHD 1 C2 1.500000 2 C2 1.000000 3 C2 2.000000 4 C2 1.250000 5 C2 1.333333 6 C2 1.333333 attach(meanMHD) plotmeans(MHD ~ Group) detach(oneWayMHD) }} {{ref_image meanComparisonMHD.png}} !!orddom　効果量を出してくれる *psychパッケージがインストールしてあること orddom(x, y) *これだけで各種効果量を出してくれて、お好きなのをどうぞ、って感じ。 {{pre > orddom(dmu02shd$MDD, dmu03shd$MDD) ordinal metric var1_X "group 1 (x)" "group 1 (x)" var2_Y "group 2 (y)" "group 2 (y)" type_title "indep" "indep" n in X "592" "592" n in Y "547" "547" N #Y>X "176563" "176563" N #Y=X "13390" "13390" N #YY "0.413406665349078" "0.432442005161643" PS Y>X "0.545243712634023" "0.567557994838357" A X>Y "0.434081476357528" "0.434081476357528" A Y>X "0.565918523642472" "0.565918523642472" delta "0.131837047284944" "0.128573222406733" 1-alpha "95" "95" CI low "0.0647643614128363" "0.0665435144203836" CI high "0.197723891470872" "0.190602930393082" s delta "0.0339578629779545" "0.533045085206667" var delta "0.00115313645802954" "0.284137062862983" se delta NA "0.0316134060048762" z/t score "3.88237173141703" "4.0670474540738" H1 tails p/CI "2" "2" p "0.000109396219747593" "5.52499619534335e-05" Cohen's d "0.177125083672158" "0.241205155014015" d CI low "0.0839111293425067" "0.124543945177201" d CI high "0.275869162430133" "0.357866364850828" var d.i "0.291361840460969" "0.251891629550177" var dj. "0.362041171144457" "0.319040086833437" var dij "0.941272277686609" "0.569924729477023" df "1137" "1094.88514066995" NNT "7.58512133420789" "5.70864949792907" }} ノンパラメトリックの場合、Cliff's deltaをみると、以下の基準に基づき判断すればよい： https://www.rdocumentation.org/packages/effsize/versions/0.7.4/topics/cliff.delta The magnitude is assessed using the thresholds provided in (Romano 2006), i.e. |d|<0.147 "negligible", |d|<0.33 "small", |d|<0.474 "medium", otherwise "large" !!effsize 効果量を出してくれる *ノンパラメトリックのCliff's deltaを見てみる例 {{pre install.packages("effsize") library(effsize) > cliff.delta(dmu02shd$MDD, dmu03shd$MDD) Cliff's Delta delta estimate: -0.131837 (negligible) 95 percent confidence interval: lower upper -0.19765420 -0.06483658 }} *「(negligible)」と、評価もコメントしてつけてくれる。 !!lawstat Brunner-Munzel Testが入っている。 {{pre > brunner.munzel.test(dmu02shd$MDD, dmu03shd$MDD) Brunner-Munzel Test data: dmu02shd$MDD and dmu03shd$MDD Brunner-Munzel Test Statistic = 3.8809, df = 1098.7, p-value = 0.0001103 95 percent confidence interval: 0.5325908 0.5992463 sample estimates: P(X brunnermunzel.test(dmu02shd$MDD, dmu03shd$MDD) Brunner-Munzel Test data: dmu02shd$MDD and dmu03shd$MDD Brunner-Munzel Test Statistic = 3.8809, df = 1098.7, p-value = 0.0001103 95 percent confidence interval: 0.5325908 0.5992463 sample estimates: P(X