table()

table(): クロス集計

TOP ↑ ↓

table(): クロス集計

Reference

サンプル・データ
データの取り込み
クロス集計：table(データフレームのカラム, データフレームのカラム)
クロス集計：table(dataframe[,c(カラム番号を並べる)])
NAも集計に入れる： useNA = "ifany"
縦横の集計も加える： addmargins()
比率で表示： prop.table(table())
比率で表示（四捨五入）： round(prop.table(table()),2)
全体の比率ではなく、横で集計した比率：margin = 1
全体の比率ではなく、縦で集計した比率：margin = 2

モザイク・グラフ
結果をデータフレームに変換： as.data.frame()

変換前
変換後

https://www.openintro.org/data/index.php?data=acs12
- Results from the US Census American Community Survey, 2012
- 2012年のアメリカ人口統計からの2,000のサンプル

データの取り込み

TOP ↑ ↓

acs.dat <- read.csv("https://www.openintro.org/data/csv/acs12.csv")
head(acs.dat)

  income         employment hrs_work  race age gender citizen time_to_work    lang married         edu disability   birth_qrtr
1  60000 not in labor force       40 white  68 female     yes           NA english      no     college         no jul thru sep
2      0 not in labor force       NA white  88   male     yes           NA english      no hs or lower        yes jan thru mar
3     NA               <NA>       NA white  12 female     yes           NA english      no hs or lower         no oct thru dec
4      0 not in labor force       NA white  17   male     yes           NA   other      no hs or lower         no oct thru dec
5      0 not in labor force       NA white  77 female     yes           NA   other      no hs or lower        yes jul thru sep
6   1700           employed       40 other  35 female     yes           15   other     yes hs or lower        yes jul thru sep

str(acs.dat)

'data.frame':	2000 obs. of  13 variables:
 $ income      : int  60000 0 NA 0 0 1700 NA NA NA 45000 ...
 $ employment  : chr  "not in labor force" "not in labor force" NA "not in labor force" ...
 $ hrs_work    : int  40 NA NA NA NA 40 NA NA NA 84 ...
 $ race        : chr  "white" "white" "white" "white" ...
 $ age         : int  68 88 12 17 77 35 11 7 6 27 ...
 $ gender      : chr  "female" "male" "female" "male" ...
 $ citizen     : chr  "yes" "yes" "yes" "yes" ...
 $ time_to_work: int  NA NA NA NA NA 15 NA NA NA 40 ...
 $ lang        : chr  "english" "english" "english" "other" ...
 $ married     : chr  "no" "no" "no" "no" ...
 $ edu         : chr  "college" "hs or lower" "hs or lower" "hs or lower" ...
 $ disability  : chr  "no" "yes" "no" "no" ...
 $ birth_qrtr  : chr  "jul thru sep" "jan thru mar" "oct thru dec" "oct thru dec" ...

クロス集計：table(データフレームのカラム, データフレームのカラム)

TOP ↑ ↓

人種と教育

table(acs.dat$race, acs.dat$edu)

        college grad hs or lower
  asian      22   11          53
  black      26    9         162
  other      12    9         125
  white     299  115        1099

クロス集計：table(dataframe[,c(カラム番号を並べる)])

TOP ↑ ↓

NAも集計に入れる： useNA = "ifany"

TOP ↑ ↓

table(acs.dat$race, acs.dat$edu, useNA="ifany")

        college grad hs or lower <NA>
  asian      22   11          53    1
  black      26    9         162    9
  other      12    9         125    6
  white     299  115        1099   42

useNA = "always"とすると、縦横ですべて表示

table(acs.dat$race, acs.dat$edu, useNA="always")

        college grad hs or lower <NA>
  asian      22   11          53    1
  black      26    9         162    9
  other      12    9         125    6
  white     299  115        1099   42
  <NA>        0    0           0    0

縦横の集計も加える： addmargins()

TOP ↑ ↓

addmargins(table(acs.dat$race, acs.dat$edu))

        college grad hs or lower  Sum
  asian      22   11          53   86
  black      26    9         162  197
  other      12    9         125  146
  white     299  115        1099 1513
  Sum       359  144        1439 1942


addmargins(table(acs.dat$race, acs.dat$edu, useNA="ifany"))

        college grad hs or lower <NA>  Sum
  asian      22   11          53    1   87
  black      26    9         162    9  206
  other      12    9         125    6  152
  white     299  115        1099   42 1555
  Sum       359  144        1439   58 2000

NAの数を確認： is.na() でTRUEがNAの数

table(is.na(acs.dat$race))
table(is.na(acs.dat$edu))

FALSE 
 2000 

FALSE  TRUE 
 1942    58

比率で表示： prop.table(table())

TOP ↑ ↓

全体の内訳の比率

prop.table(table(acs.dat$race, acs.dat$edu))

            college        grad hs or lower
  asian 0.011328527 0.005664264 0.027291452
  black 0.013388260 0.004634398 0.083419156
  other 0.006179197 0.004634398 0.064366632
  white 0.153964985 0.059217302 0.565911432

比率で表示（四捨五入）： round(prop.table(table()),2)

TOP ↑ ↓

小数点以下2桁

round(prop.table(table(acs.dat$race, acs.dat$edu)),2)

        college grad hs or lower
  asian    0.01 0.01        0.03
  black    0.01 0.00        0.08
  other    0.01 0.00        0.06
  white    0.15 0.06        0.57

全体の比率ではなく、横で集計した比率：margin = 1

TOP ↑ ↓

この場合は、人種ごとに学歴の比率を出す

round(prop.table(table(acs.dat$race, acs.dat$edu), margin=1),2)

        college grad hs or lower
  asian    0.26 0.13        0.62
  black    0.13 0.05        0.82
  other    0.08 0.06        0.86
  white    0.20 0.08        0.73

白人の73%は高卒
大卒の比率はアジア系が一番高い
大学院修了者の比率もアジア系が一番高い
- アジア系の13%の人は大学院修了

全体の比率ではなく、縦で集計した比率：margin = 2

TOP ↑ ↓

学歴ごとに、それを構成する人種の割合

round(prop.table(table(acs.dat$race, acs.dat$edu), margin=2),2)

        college grad hs or lower
  asian    0.06 0.08        0.04
  black    0.07 0.06        0.11
  other    0.03 0.06        0.09
  white    0.83 0.80        0.76

大卒の83%は白人

モザイク・グラフ

TOP ↑ ↓

vcd: Visualizing Categorical Data

install.packages("vcd")

https://cran.r-project.org/web/packages/vcd/index.html
https://stats.stackexchange.com/questions/147863/how-to-interpret-the-residual-colors-on-a-mosaic-plot

library(vcd)
mosaic(xtabs(~acs.dat$race+acs.dat$edu))

オプションで shade=T つけると、色がつく。
- 残差分析
  - 青：期待値より観察値が多い
  - 赤：期待値より観察値が少ない

結果をデータフレームに変換： as.data.frame()

TOP ↑ ↓

変換前

TOP ↑ ↓

table(nns.bigram.df) %>% head()

ngram.all
      0 years       05 euro        1 2019        1 euro 1 grandfather         1 hour 
         327           209            23           418           295           159

変換後

TOP ↑ ↓

table(nns.bigram.df) %>% as.data.frame() %>% head()

       ngram.all       Freq
1	0 years         327		
2	05 euro	        209		
3	1 2019	         23		
4	1 euro	        418		
5	1 grandfather	295		
6	1 hour	        159

table(): クロス集計

Reference

例

サンプル・データ

データの取り込み

クロス集計：table(データフレームのカラム, データフレームのカラム)

クロス集計：table(dataframe[,c(カラム番号を並べる)])

NAも集計に入れる： useNA = "ifany"

縦横の集計も加える： addmargins()

比率で表示： prop.table(table())

比率で表示（四捨五入）： round(prop.table(table()),2)

全体の比率ではなく、横で集計した比率：margin = 1

全体の比率ではなく、縦で集計した比率：margin = 2

モザイク・グラフ

結果をデータフレームに変換： as.data.frame()

変換前

変換後

https://sugiura-ken.org/wiki/

Menu

keyword

category

更新履歴