GLMM

R
[GLMM]

TOP ↑ ↓

GLMM

目的変数の分布とリンク関数
分布の選択

おきまりの分析でよく使うライブラリーと分析手順

データの読み込み部分を以下のようにしてエクセルファイルを読み込む
GLMM Template (RMarkdown ファイル)

分布の選択ツール： fitdistrplus

lognormal(対数正規分布)

概要

ANOVA
Multiple Regression Analysis 重回帰分析
LME
GLM : Generalized Linear Model　一般化線形モデル
GLMM: Generalized Linear Mixed Model 一般化線形混合モデル

カテゴリー変数

カテゴリー変数とする要因内のレベル（水準）の違いが、目的変数に影響を与えるかを調べる、つまり、レベル間の差の検定をするのと同じこと。
contrast coding

モデル構築二つの方針

理論的に考えられる要因で決め打ち
考えられる要因すべてを入れたものから、重要なものだけを選択していく

３通りのモデル選択

anova()で、想定するモデルを作っておいて、モデルの比較
step()で、飽和モデルから順に変数を減らしてAICが低いものを選ぶ
MuMIn パッケージでモデル選択、総当たりで組み合わせを試してAICが低いものを選ぶ

交互作用と単純主効果

交互作用
単純主効果

報告

採用するモデルがきまったらsummary()でその結果を報告する
easystats の report() と report_table()
sjPlotのtab_model()
modelsummary
(glmer1_param <- model_parameters(モデル, exponentiate = TRUE))
TeXで報告する場合 verbatim(summary(モデル))

glmmTMB
グラフはいろいろ描けるが、見せる意義のあるグラフを見せること

plot(allEffects(モデル))
ggeffects を使うとrawdataも合わせてグラフにできる

ランダム効果の注意点

ランダム効果は、カテゴリカルな変数について
ランダム切片
ランダムスロープ（傾き）
ランダム切片とランダム傾きの相関
表記法

固定効果の係数一覧

ランダム効果の分散と相関一覧

ランダム効果が有意かどうか調べる
ランダム効果で ?isSingular と警告が出る
convergence warningが出た場合（Brown, 2021の方法）
妥当性評価

残差が正規分布しているか確認する
easystats
LMEのモデルが妥当か検証

References
入門用参考図書

目的変数の分布とリンク関数

TOP ↑ ↓

分布	リンク関数例
正規（gaussian）	identity	log
ガンマ（Gamma）	log	inverse	identity
二項（binomial）	logit
ポアソン（poisson）	log
ベータ

分布の選択

TOP ↑ ↓

目的変数の分布を見て、どれを選ぶかを決める
- 正誤などの二値だったら binomial
- 正の整数だったら poisson
- 正の連続値だったら Gamma
- 正規分布だったら gaussian

分布の可視化 fitdistrplus

整数の場合、データ数が多くなれば、事実上、正規分布と同じと見なされる。

分布をRでシミュレートして、当てはまるかどうか見てみる。

おきまりの分析でよく使うライブラリーと 分析手順

TOP ↑ ↓

サンプルデータは、組み込みデータの「ChickWeight」
動作確認のためだけなので、分析として正しいわけではない点に注意
読み込む元のデータを入れ替えて使う。

データの読み込み部分を以下のようにしてエクセルファイルを読み込む

TOP ↑ ↓

```{r}
sample.dat <- read.xlsx("エクセルファイル名.xlsx")
head(sample.dat)
```

GLMM Template (RMarkdown ファイル)

TOP ↑ ↓

これをダウンロードして、RStudioで開いて、
必要なデータをExcelで表にして、読み込んだら一応分析できます。

sample.dat <- read.xlsx("ファイル名.xlsx", sheet="シート名")

GLMMtemplate1.Rmd(31)

---
title: "GLMM Template 1"
author: "sugiura"
date: "2024-06-04"
output: html_document
---

# Data

```{r}
sample.dat <- ChickWeight
head(sample.dat, 40)
```

|指標|説明|
|:----|:---|
| weight| 重さ(g)|
| Time| 生まれてからの日数|
| Chick| ヒヨコのID|
| Diet| 食事のタイプ|

```{r}
summary(sample.dat)
```

# ライブラリー

```{r, warning=F, message=F}
library(tidyverse)
library(openxlsx)
library(lme4)
library(lmerTest)
library(MuMIn)
library(effects)
library(ggplot2)
library(easystats)
library(jtools)
library(fitdistrplus)
library(sjPlot)
library(emmeans)
```

# 分布の確認

## 目的変数のヒストグラム

```{r}
hist(sample.dat$weight)
```

```{r}
library(fitdistrplus)

descdist(sample.dat$weight, boot=500)
```


```{r}
sample.fit <- fitdist(sample.dat$weight, "norm")
plot(sample.fit)
```

## データの分布を観察

```{r}
g <- ggplot(sample.dat)
g <- g + aes(x=Time, y=weight, group=Diet, color=Diet)
g <- g + geom_point()
g <- g + geom_smooth()
#g <- g + geom_line()
#g <- g + geom_density(alpha=.7)
#g <- g + facet_wrap(~name, scales="free")
plot(g)
```


## longformat

### longformat変換の確認

```{r}
sample.dat %>% pivot_longer(cols=!c(Chick, Diet)) %>% head()
```

### longformat変換

```{r}
sample.dat.long <- sample.dat %>% pivot_longer(cols=!c(Chick, Diet)) 
```

### longformatで描くグラフ

```{r}
g <- ggplot(sample.dat.long)
g <- g + aes(x=name, y=value, group=Diet, fill=Diet)
g <- g + geom_boxplot()
#g <- g + geom_smooth()
#g <- g + geom_line()
#g <- g + geom_density(alpha=.7)
g <- g + facet_wrap(~name, scales="free")
plot(g)
```


# GLMM

```{r}
sample.model <- glmer(weight ~ Time * Diet + (1|Chick), family=gaussian, data=sample.dat)

summary(sample.model)
```

```{r}
plot(allEffects(sample.model), multiline=T, confint = list(style = "auto"))
```

# 報告

```{r}
summary(report(sample.model))
```


```{r}
report_table(sample.model)
```

```{r}
tab_model(sample.model)
```



```{r}
plot_model(sample.model)
```


## 交互作用の対応


```{r}
plot_model(sample.model, type="int")
```

## Time をカテゴリー変数で： 事前、事後、遅延　のようなケースの場合


```{r}
sample.dat %>% mutate(Time.ca = as.factor(Time)) %>% head()
```


```{r}
sample.dat.ca <- sample.dat %>% mutate(Time.ca = as.factor(Time))
```

# 固定要因にカテゴリー変数を使う場合

```{r}
sample.model.ca <- glmer(weight ~ Time.ca * Diet + (1|Chick), family=gaussian, data=sample.dat.ca)

summary(sample.model.ca)
```


```{r}
plot(allEffects(sample.model.ca), multiline=T, confint = list(style = "auto"))
```


# 多重比較

```{r}
emm.result.ca <- emmeans(sample.model.ca, ~ Time.ca * Diet)

emm.result.ca
```


```{r}
pairs(emm.result.ca, simple="Time.ca")
```


```{r}
emmip(emm.result.ca, Diet ~ Time.ca)
```


```{r}
pwpp(emm.result.ca)
```

## 妥当性検証

```{r}
performance(sample.model)
```

```{r}
plot(sample.model)
```

## 多重共線性チェック

```{r}
check_collinearity(sample.model)
```

```{r}
plot(check_collinearity(sample.model))
```




## 残差の正規性チェック

```{r}
check_normality(sample.model)
```

```{r}
library(see)
nom <- check_normality(sample.model)
plot(nom)
```

分布の選択ツール： fitdistrplus

TOP ↑ ↓

lognormal(対数正規分布)

TOP ↑ ↓

目的変数をlog変換して、分布は正規分布を指定する
- link関数にlogを指定するのとは違う話

概要

TOP ↑ ↓

ANOVA

TOP ↑ ↓

yが数値データ（テストスコア等）
xがカテゴリーデータ（Low, Mid, High）

model.1 <- lm(y ~ x, データ）
model.0 <- lm(y ~ 1, データ）
anova(model.1, model0)

Anova(model.1)

library(MuMIn) 
options(na.action = "na.fail") 
dredge(model.1, rank="AIC")

summary(model.1)

confint(model.1)

library(multcomp)
summary(glht(model.1, linfct = mcp(group = "Tukey")))

一人の被験者からは一回だけ（ランダム効果なし）

glm(応答変数 ~ 説明変数, family=分布モデル, data)

分布モデルは、正規分布以外にも対応
ランダム効果を入れない点が、glmerとは違う
交互作用も * で対応
- 説明変数が多い場合、すべてに * を入れると収束しない。
  - 理論的に必要なところだけにする
  - 不要な説明変数を外す
  - maxitオプションを試してみる？
モデル選択：飽和モデルを作っておいてAICの小さいものを選ぶ
- step()
- MuMIn

Reference

https://www1.doshisha.ac.jp/~mjin/R/Chap_16/16.html

GLMM: Generalized Linear Mixed Model 一般化線形混合モデル

TOP ↑ ↓

一人の被験者について複数回（ランダム効果を考慮）
一つの項目について複数回（ランダム効果を考慮）
モデル式

左辺：目的変数
右辺：説明変数
　　　＋記号で、複数の変数を並べる
　　　：記号で、前後の交互作用を想定する
　　　＊記号で、変数の効果と組み合わせた交互作用の効果（自動で組み合わせてくれる）
　　　（　）に入れるのがランダム効果（被験者のばらつきとか、項目のばらつきとか）

Reference

https://youtu.be/lY_dzLMF4Jo

カテゴリー変数

TOP ↑ ↓

カテゴリー変数とする要因内のレベル（水準）の違いが、目的変数に影響を与えるかを調べる、つまり、レベル間の差の検定をするのと同じこと。

TOP ↑ ↓

contrast coding

TOP ↑ ↓

モデル構築二つの方針

TOP ↑ ↓

理論的に考えられる要因で決め打ち

TOP ↑ ↓

重要なものと重要でないものを明示的に示し、要因について議論できる

考えられる要因すべてを入れたものから、重要なものだけを選択していく

TOP ↑ ↓

重要なものだけのモデルができる
- 可能性として考えられたが実は重要でなかったものがモデルに含まれないので、その点を議論しにくい

３通りの モデル選択

TOP ↑ ↓

anova()で、想定するモデルを作っておいて、モデルの比較

TOP ↑ ↓

有意差がなければ、誤差の範囲
有意差があれば、誤差とは言えない影響力あり

step()で、飽和モデルから順に変数を減らしてAICが低いものを選ぶ

TOP ↑ ↓

ランダム変数を含むモデル（glmer, lmer）では使えない。
- ランダム変数を含む場合は、複数のモデルを作っておいてAICを比較する。

MuMIn パッケージで モデル選択、総当たりで組み合わせを試してAICが低いものを選ぶ

TOP ↑ ↓

交互作用と単純主効果

TOP ↑ ↓

交互作用

TOP ↑ ↓

単純主効果

TOP ↑ ↓

reghelperのsimple_slopes()

報告

TOP ↑ ↓

採用するモデルがきまったらsummary()でその結果を報告する

TOP ↑ ↓

easystats の report() と report_table()

TOP ↑ ↓

エクセルファイルへ保存：そのまま出力をwrite.xlsxに渡せばよい。

report_table(model2) %>% write.xlsx("model2.xlsx")

sjPlotのtab_model()

TOP ↑ ↓

https://strengejacke.github.io/sjPlot/articles/tab_model_estimates.html

出力はhtmlだが、RStudionの右側のViewer部分をドラッグしてコピーしてWordにドロップすればそのまま挿入できる

https://strengejacke.github.io/sjPlot/articles/table_css.html

デフォルトだと、見出しの用語がイタリックになっていたりするので、cssでスタイルを調整したほうがよい。

modelsummary

TOP ↑ ↓

(glmer1_param <- model_parameters(モデル, exponentiate = TRUE))

TOP ↑ ↓

TeXで報告する場合 verbatim(summary(モデル))

TOP ↑ ↓

https://github.com/glmmTMB/glmmTMB/issues/462 参照

verbatim <- function(x)
{
  txt <- trimws(capture.output(x))
  txt <- c("\\begin{verbatim}", txt, "\\end{verbatim}")
  cat(txt, sep="\n")
}

verbatim(summary(モデル))

glmmTMB

TOP ↑ ↓

beta分布が扱える
lognormal(対数正規分布が扱える)

```{r, eval=F}
install.packages("glmmTMB")
install.packages("bbmle")
```

```{r}
library(glmmTMB)
library(bbmle)
```

グラフはいろいろ描けるが、見せる意義のあるグラフを見せること

TOP ↑ ↓

たいてい、有意ではない結果のグラフを見せても意味がないので、意味がない。
- 場合によっては、有意でない結果を見せたいときには、有意でないものを見せるわけです。
  - つまり、見せる意義のあるものだけを見せて、ってことです。

plot(allEffects(モデル))

TOP ↑ ↓

格子付き

plot(allEffects(モデル), grid = TRUE)

複数の線を一つのグラフで

plot(allEffects(モデル), multiline=T)

一つずつグラフに

plot(allEffects(モデル), selection =1)

二つ目は selection = 2

y軸　０から１までに

 plot(allEffects(モデル), type = "response", ylim = c(0, 1), grid = TRUE)

信頼区間オプション

, confint = list(style = "auto")

ggeffects を使うとrawdataも合わせてグラフにできる

TOP ↑ ↓

GLMM でできたモデルに基づき、予測をして、その効果を視覚化

ランダム効果の注意点

TOP ↑ ↓

ランダム効果は、カテゴリカルな変数について

TOP ↑ ↓

ランダム切片

TOP ↑ ↓

被験者や項目によってばらつくことがある
- 同じ被験者群でも、そもそも早い人とか遅い人がいる。
- 実験項目をそろえたといっても、そもそも難しさに違いがあったりする。

ランダムスロープ（傾き）

TOP ↑ ↓

被験者内・項目内の実験の場合
固定効果の影響の出方に強弱ある。

ランダム切片とランダム傾きの相関

TOP ↑ ↓

相関がある場合とない場合（強弱）とがある
- そもそも早くできて、かつ、学習効率が高いことがある。

表記法

TOP ↑ ↓

| 変数というのは、変数ごとに、という意味
- 参加者ごとに、切片（出発点での値）や傾き（固定要因の効果）がランダム（バラバラの誤差がある）

（１｜変数）とすると、その変数の「切片」がランダムに異なっていると想定
- 　　　（そもそも最初の実力にバラつきがあるとか）
- 　　　（そもそも最初から項目にバラつきがあるとか）

（0+固定効果｜変数）とすると、
- 　　　変数の「切片」は固定で、
- 　　　固定効果の「傾き」がランダムに異なっていると想定
- 切片を固定してランダム効果だけ想定することはあまりない

（1+固定効果｜変数）とすると、
- 　　変数の「切片」と固定効果の「傾き」がランダムに異なっていると想定
- 　　（ただし、切片と傾きに相関が想定される場合）
- 1を省略して（固定効果｜変数）と書いても同じ
  - 1は書かなくてもあることが前提という考え（ランダム切片は普通ある、と想定）

（1+固定効果 || 変数）とすると、
- 　　変数の「切片」と固定効果の「傾き」がランダムに異なっていると想定
- 　　（ただし、切片と傾きに相関が想定されない場合）
- 1を省略して（固定効果||変数）と書いても同じ

固定効果の係数一覧

TOP ↑ ↓

fixef(モデル)

ランダム効果の分散と相関一覧

TOP ↑ ↓

 VarCorr(モデル)

 Groups Name        Std.Dev.  Corr       
 ID     (Intercept) 0.0985877            
        timing2     0.0254089 0.807      
        timing3     0.0905422 0.847 0.370
 itemid (Intercept) 0.0237337            
        formPD      0.0028372 0.972

ランダム効果が有意かどうか調べる

TOP ↑ ↓

https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#testing-significance-of-random-effects

ランダム効果を入れたモデルと、入れないモデルを作って、
anova()で調べる。（入れたモデルを先に）

ランダム効果で ?isSingular と警告が出る

TOP ↑ ↓

https://rstudy.info/lmer-boundary-singular-fit-see-is-singular%E3%81%A8%E3%82%A8%E3%83%A9%E3%83%BC%E3%81%8C%E3%81%A7%E3%82%8B%E7%90%86%E7%94%B1/
https://stackoverflow.com/questions/60028673/lme4-error-boundary-singular-fit-see-issingular
https://stats.stackexchange.com/questions/378939/dealing-with-singular-fit-in-mixed-models
https://rdrr.io/cran/lme4/man/isSingular.html
https://qiita.com/fujino-fpu/items/0a2cd10b400f9b604606

singularかどうか、確認して、

isSingular(モデル)

rePCA(モデル) で、オーバーフィッティングかどうか確認できる。

そうだとなったら、それは、このモデルのランダム効果の設定では、ランダム効果が測れない（データの数に比べモデルが複雑過ぎ）ので、
- 分散が（ほぼ）ゼロで、測れない。パラメタが計算できない。

複雑なもの（ランダムスロープ）から外していく。

 the random effects structure is too complex to be supported by the data, which naturally leads to the advice to remove the most complex part of the random effects structure (usually random slopes).

Test Fitted Model for (Near) Singularity
Description
Evaluates whether a fitted mixed model is (almost / near) singular, i.e., the parameters are on the boundary of the feasible parameter space: variances of one or more linear combinations of effects are (close to) zero.

Usage
isSingular(x, tol = 1e-4)

convergence warningが出た場合（Brown, 2021の方法）

TOP ↑ ↓

警告:  Model failed to converge with max|grad| = 0.0274818 (tol = 0.002, component 1)

理論的に不要なランダム効果を除く
データが少なすぎないか？
パラメタの調整

control = lmerControl(optimizer = "bobyqa")

glmerだったら、control = glmerControl(optimizer = "bobyqa")

afex パッケージの all_fit(モデル)でどれが使えるか判断する
ランダム傾きとランダム切片に相関がないかもしれない場合は、切片を0にしてみる
試行回数を増やす control = lmerControl(optCtr = list(maxfun = 1e9))

ヘルプを見る ?convergence

妥当性評価

TOP ↑ ↓

残差が正規分布しているか確認する

TOP ↑ ↓

resid(モデル) で残差の値一覧が得られる
qqnorm(残差) で、Q-Q Plot
qqline(残差) で、理論的な回帰直線をプロット
1. オプション lwd=線の太さ
2. オプション col="色"
理論的な直線上に観察値の残差が当てはまるかを確認する

easystats

TOP ↑ ↓

LMEのモデルが妥当か検証

TOP ↑ ↓

https://bookdown.org/animestina/phd_july_19/testing-the-assumptions.html

References

TOP ↑ ↓

https://kyoritsu-pub.sakura.ne.jp/app/file/goods_contents/HowtouseR/GLM-1.html#5.
https://qiita.com/ykawakubo/items/16fd0c08ee4f6e21e653
Best practice guidance for linear mixed-effects models in psychological science
R's lmer cheat sheet
https://qiita.com/ykawakubo/items/16fd0c08ee4f6e21e653
https://www.jkarreth.net/files/RPOS517_Day13_GLMs.html
http://www.akira-murakami.com/?p=320
https://tjo.hatenablog.com/entry/2013/09/23/232814
https://strengejacke.github.io/sjPlot/
https://cran.r-project.org/web/packages/sjPlot/vignettes/plot_interactions.html
https://ademos.people.uic.edu/
https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html
https://mspeekenbrink.github.io/sdam-r-companion/linear-mixed-effects-models.html
https://kuboweb.github.io/-kubo/ce/FaqGlm.html#toc5
https://bookdown.org/animestina/phd_july_19/testing-the-assumptions.html
https://stats.stackexchange.com/questions/77891/checking-assumptions-lmer-lme-mixed-models-in-r
https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity
https://vasishth.github.io/Freq_CogSci/
http://joshuawiley.com/MonashHonoursStatistics/LMM_Intro.html
http://data-science.tokyo/ed/edj1-2-1-4.html
Log-linked Gamma GLM vs log-linked Gaussian GLM vs log-transformed LM https://stats.stackexchange.com/questions/77579/log-linked-gamma-glm-vs-log-linked-gaussian-glm-vs-log-transformed-lm
Bates et al. (2015) Fitting Linear Mixed-Effects Models Using lme4, J of Statistical Software. 67(1). doi: 10.18637/jss.v067.i01
https://stats.stackexchange.com/questions/13166/rs-lmer-cheat-sheet/13173#13173
Barr, Dale J, R. Levy, C. Scheepers und H. J. Tily (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68:255– 278.
村山航, 刺激の効果を侮るなかれ―ランダム刺激効果を含んだ線形混合モデルの重要性と落とし穴―, 基礎心理学研究, 2017, 36 巻, 2 号, p. 236-242, 公開日 2018/06/16, Online ISSN 2188-7977, Print ISSN 0287-7651, https://doi.org/10.14947/psychono.36.40, https://www.jstage.jst.go.jp/article/psychono/36/2/36_36.40/_article/-char/ja
Murakami 2015 Modeling systematicity and individuality in nonlinear second language development: The case of English grammatical morphemes. https://osf.io/dbuh4/

GLMM FAQ

https://kuboweb.github.io/-kubo/ce/FaqGlm.html

一般化線形モデルを実行したいならRがおすすめ https://www.kyougokumakoto.com/2019/07/r-glm.html

R言語　一般化線形モデル・glmの使いかた【初心者向け】https://multivariate-statistics.com/2021/03/06/r-programming-generalized-linear-model/

生態学のデータ解析 - FAQ 一般化線形モデル https://kuboweb.github.io/-kubo/ce/FaqGlm.html

生態学のデータ解析 - FAQ モデル選択 https://kuboweb.github.io/-kubo/ce/FaqModelSelection.html

https://socinuit.hatenablog.com/entry/2019/12/21/042023

入門用参考図書

TOP ↑ ↓

嶋田・阿部(2017)『Rで学ぶ統計学入門』東京化学同人
馬場(2015)　　　『平均・分散から始める一般化線形モデル入門』プレアデス出版
堀(2017)　　　　『ゼロからはじめる統計モデリング』ナカニシヤ出版
粕谷(2012)　　　『一般化線形モデル』共立出版
藤井(2010)　　　『カテゴリカルデータ解析』共立出版

GLMM Template (RMarkdown ファイル)

Multiple Regression Analysis 重回帰分析

GLMM: Generalized Linear Mixed Model 一般化線形混合モデル

MuMIn パッケージで モデル選択 、総当たりで組み合わせを試してAICが低いものを選ぶ

交互作用と単純主効果

easystats の report() と report_table()

sjPlotのtab_model()

TeXで報告する場合 verbatim(summary(モデル))

plot(allEffects(モデル))

ggeffects を使うとrawdataも合わせてグラフにできる

keyword

category

更新履歴

MuMIn パッケージでモデル選択、総当たりで組み合わせを試してAICが低いものを選ぶ