R
{{category R}}
!!!頻度の検定
{{outline}}
----
!Reference
嶋田・阿部 (2017) Rで学ぶ統計学入門
https://m.media-amazon.com/images/I/61SwFdVnAGL._AC_UY218_.jpg

*総標本サイズが40未満だったり、各セルの期待頻度が10未満の場合、イェーツの補正が必要（chisq.test()はデフォルトで補正するようになっている）
*各セルの期待頻度が5未満の場合、χ二乗検定もG検定も使えない。
*<<Fisher's exact testでやればよい>>
!!χ二乗検定
!独立性の検定

イギリス英語とアメリカ英語で、 therefore の生起位置に違いがあるか。

,位置,　文頭,　文中
,英,　15,　96
,米,　38,　53

{{pre
therefore.data <- matrix(c(15,38,96,53), nrow=2, ncol=2)

     [,1] [,2]
[1,]   15   96
[2,]   38   53

chisq.test(therefore.data)

	Pearson's Chi-squared test with Yates' continuity correction

data:  therefore.data
X-squared = 19.179, df = 1, p-value = 1.19e-05

}}
*有意
!適合度の検定
*理論的に想定される「期待頻度」にあっているか（適合しているか）の検定
*総語数が違うコーパスデータ内での頻度の違い
**例：100万語のコーパスデータ内の36回と、50万語のコーパスデータ内の20回で頻度に差があるか
*比率をもとに期待確率を設定する
**比率は 100万 vs. 50万 なので、2:1
***そのまま p=c(1000000,500000) としておけばよい。
**全体が 1 になるように比率のスケールを調整するオプション rescale=T

{{pre
sample.data <- c(36, 20)

chisq.test(sample.data, p=c(2,1), rescale=T)

	Chi-squared test for given probabilities

data:  sample.data
X-squared = 0.14286, df = 1, p-value = 0.7055
}}
*有意ではない
!!G検定 Log-likelihood ratio test（対数尤度比検定）

install.packages("Deducer")
library(Deducer)

{{pre
likelihood.test(therefore.data)

	Log likelihood ratio (G-test) test of independence without correction

data:  therefore.data
Log likelihood ratio statistic (G) = 20.925, X-squared df = 1, p-value = 4.776e-06
}}
!!Fisher's exact probability test（正確確率法）
*昔、手作業で計算していた時は、2x2の表が事実上の限界だったが、今は昔。

{{pre
therefore.data
fisher.test(therefore.data)

     [,1] [,2]
[1,]   15   96
[2,]   38   53

	Fisher's Exact Test for Count Data

data:  therefore.data
p-value = 9.958e-06
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.1021974 0.4526589
sample estimates:
odds ratio 
 0.2196899 
}}

!オッヅ比
*出現する確率を比べた値
*何倍多く出現するか