トップ 差分 一覧 ソース 検索 ヘルプ PDF RSS ログイン

R.package

*disclaimer
46220

46219
R

R.package Rで便利なパッケージ

 パッケージのインストール

  • たとえば、gplots を使いたい場合
> install.packages("gplots", dependencies = T)
> library(gplots)

どんなライブラリーを、読み込むことができるか library()

どんなライブラリーが、すでに使えるようになっているか search()

 tidyverse


 stringr

str_which()

  • 文字列がある行番号を調べる
str_which(カラム名,"文字列")
> head(df1)
    A B  C
1 AAA 2 20
2 BBB 3 30
3 AAA   40
4 BBB 4 30
5 CCC 5 60
> str_which(df1$A, "AAA")
[1] 1 3


str_detect()

  • 該当する文字列があるかどうか調べる
str_detect(データ, "正規表現")
  • subset() と合わせて使うと便利
    • データフレーム中の特定の列に「ある種の文字列」があるかどうかを調べて、その文字列を含む行だけを選び出す。
      • 「ある種の文字列」の例:小文字の連続で書かれている「単語」が複数あるもの
fragJBnozeroMW <- subset(fragJBnozero, str_detect(fragJBnozero[,3], "[:lower:]+ +[^a-z]*[:lower:]+"), select=c("total","MW"))

> fragJBnozeroMW
     total                                MW
251    256                    a [NN] of [NP]
278    253                       do n't [VP]
285    252                  the [NN] of [NP]
291    219                       do not [VP]
306    235                      want to [VP]
325    240                             a lot
330    213                       for example
341    210                     a lot of [NP]

参考サイト
https://heavywatal.github.io/rstats/stringr.html


 psych

心理学系のパッケージ
基本的な記述統計量は summary(x)で出るが、
もう少し詳しく見るには、このパッケージをインストールして describe(x)を使う。

> describe(x)
   vars   n mean   sd median trimmed  mad  min  max range  skew kurtosis   se
X1    1 100 0.06 1.06   0.04    0.07 0.87 -2.9 2.51  5.41 -0.04    -0.32 0.11
  • 標準偏差、歪度、尖度、標準誤差なども出る。

 sjmisc

find_var 該当するデータ列を検索・選択

find_var(data, pattern="パターン", out=出力形式)
find_var(data, pattern="score", out="df")
  • 列名に score を含むものを選んで、データフレーム形式で出力



 tm

https://www.rdocumentation.org/packages/tm/versions/0.7-3

  • Boost_tokenizer(x)
  • MC_tokenizer(x)
  • removePunctuation(tmp)
  • removeNumbers(x)

tmp.v <- Vectorsource(tmp)
tmp.c <- Corpus(tmp.v)
tmpc.td <- TermDocumentMatrix(tmp.c)
findFreqTerms(tmpc.td)
 findMostFreqTerms(tmpc.td)
$`1`
     the     said      and computer      its terminal 
      15        7        6        6        5        5 


 koRpus

https://reaktanz.de/R/pckg/koRpus/koRpus_vignette.html

tokenize()


install.packages("koRpus.lang.en")
library(koRpus.lang.en)
temp <- tokenize(choose.files(), lang="en")

これで、例えば、Project Gutenbergから、グリム童話のGolden Birdのテキストファイルを読み込む

A certain king had a beautiful garden, and in the garden stood a tree
which bore golden apples. These apples were always counted, and about
the time when they began to grow ripe it was found that every night one
of them was gone. The king became very angry at this, and ordered the
gardener to keep watch all night under the tree. The gardener set his
eldest son to watch; but about twelve o’clock he fell asleep, and in

> temp
     doc_id     token      tag lemma lttr   wclass desc stop stem  idx sntc
1      <NA>         A word.kRp          1     word <NA> <NA> <NA>    1    1
2      <NA>   certain word.kRp          7     word <NA> <NA> <NA>    2    1
3      <NA>      king word.kRp          4     word <NA> <NA> <NA>    3    1
4      <NA>       had word.kRp          3     word <NA> <NA> <NA>    4    1
5      <NA>         a word.kRp          1     word <NA> <NA> <NA>    5    1
6      <NA> beautiful word.kRp          9     word <NA> <NA> <NA>    6    1
                                             [...]                         
2948   <NA>         a word.kRp          1     word <NA> <NA> <NA> 2948  140
2949   <NA>     great word.kRp          5     word <NA> <NA> <NA> 2949  140
2950   <NA>      many word.kRp          4     word <NA> <NA> <NA> 2950  140
2951   <NA>      many word.kRp          4     word <NA> <NA> <NA> 2951  140
2952   <NA>     years word.kRp          5     word <NA> <NA> <NA> 2952  140
2953   <NA>         .     .kRp          1 fullstop <NA> <NA> <NA> 2953  140

lex.div()

  • 各種の語彙多様性指標の算出
lex.div(temp)

MTLD()

> library(koRpus)

> ns002 <- tokenize(choose.files(), lang="en")

で、たとえば、NS002のテキストだけのファイルを読み込んで、

> MTLD(ns002)
Language: "en"

Total number of tokens: 463
Total number of types:  218

Measure of Textual Lexical Diversity
                MTLD: 87.62
   Number of factors: 5.28
         Factor size: 0.72
    SD tokens/factor: 36.8 (all factors)
                      30.05 (complete factors only)

Note: Analysis was conducted case insensitive.

 rpart


>model1 = rpart(LMH ~ DD + SL + MDD, data = C3L2)

> rpart.plot(model)

 gplots

install.packages("gplots", dependencies = T)
library(gplots)

> head(meanMHD)
  Group      MHD
1    C2 1.500000
2    C2 1.000000
3    C2 2.000000
4    C2 1.250000
5    C2 1.333333
6    C2 1.333333

attach(meanMHD)
plotmeans(MHD ~ Group)
detach(oneWayMHD)

 orddom 効果量を出してくれる

  • psychパッケージがインストールしてあること
orddom(x, y)
  • これだけで各種効果量を出してくれて、お好きなのをどうぞ、って感じ。
> orddom(dmu02shd$MDD, dmu03shd$MDD)
              ordinal                metric                
var1_X        "group 1 (x)"          "group 1 (x)"         
var2_Y        "group 2 (y)"          "group 2 (y)"         
type_title    "indep"                "indep"               
n in X        "592"                  "592"                 
n in Y        "547"                  "547"                 
N #Y>X        "176563"               "176563"              
N #Y=X        "13390"                "13390"               
N #Y<X        "133871"               "133871"              
PS X>Y        "0.413406665349078"    "0.432442005161643"   
PS Y>X        "0.545243712634023"    "0.567557994838357"   
A X>Y         "0.434081476357528"    "0.434081476357528"   
A Y>X         "0.565918523642472"    "0.565918523642472"   
delta         "0.131837047284944"    "0.128573222406733"   
1-alpha       "95"                   "95"                  
CI low        "0.0647643614128363"   "0.0665435144203836"  
CI high       "0.197723891470872"    "0.190602930393082"   
s delta       "0.0339578629779545"   "0.533045085206667"   
var delta     "0.00115313645802954"  "0.284137062862983"   
se delta      NA                     "0.0316134060048762"  
z/t score     "3.88237173141703"     "4.0670474540738"     
H1 tails p/CI "2"                    "2"                   
p             "0.000109396219747593" "5.52499619534335e-05"
Cohen's d     "0.177125083672158"    "0.241205155014015"   
d CI low      "0.0839111293425067"   "0.124543945177201"   
d CI high     "0.275869162430133"    "0.357866364850828"   
var d.i       "0.291361840460969"    "0.251891629550177"   
var dj.       "0.362041171144457"    "0.319040086833437"   
var dij       "0.941272277686609"    "0.569924729477023"   
df            "1137"                 "1094.88514066995"    
NNT           "7.58512133420789"     "5.70864949792907"    

ノンパラメトリックの場合、Cliff's deltaをみると、以下の基準に基づき判断すればよい:
https://www.rdocumentation.org/packages/effsize/versions/0.7.4/topics/cliff.delta

The magnitude is assessed using the thresholds provided in (Romano 2006), i.e. 
|d|<0.147 "negligible", 
|d|<0.33 "small", 
|d|<0.474 "medium", 
otherwise "large" 

 effsize 効果量を出してくれる

  • ノンパラメトリックのCliff's deltaを見てみる例
install.packages("effsize")
library(effsize)
> cliff.delta(dmu02shd$MDD, dmu03shd$MDD)

Cliff's Delta

delta estimate: -0.131837 (negligible)
95 percent confidence interval:
      lower       upper 
-0.19765420 -0.06483658 
  • 「(negligible)」と、評価もコメントしてつけてくれる。

 lawstat

Brunner-Munzel Testが入っている。

> brunner.munzel.test(dmu02shd$MDD, dmu03shd$MDD)

        Brunner-Munzel Test

data:  dmu02shd$MDD and dmu03shd$MDD
Brunner-Munzel Test Statistic = 3.8809, df = 1098.7, p-value = 0.0001103
95 percent confidence interval:
 0.5325908 0.5992463
sample estimates:
P(X<Y)+.5*P(X=Y) 
       0.5659185 

 brunnermunzel

install.packages("brunnermunzel")
library(brunnermunzel)

> brunnermunzel.test(dmu02shd$MDD, dmu03shd$MDD)

        Brunner-Munzel Test

data:  dmu02shd$MDD and dmu03shd$MDD
Brunner-Munzel Test Statistic = 3.8809, df = 1098.7, p-value = 0.0001103
95 percent confidence interval:
 0.5325908 0.5992463
sample estimates:
P(X<Y)+.5*P(X=Y) 
       0.5659185 

 dplyr


 ggplot2


 quanteda


 lme4


 effects

 tabplot


 GGally

相関のグラフ