R.data

Data sets in package ‘datasets’:

AirPassengers                      Monthly Airline Passenger Numbers 1949-1960
BJsales                            Sales Data with Leading Indicator
BJsales.lead (BJsales)             Sales Data with Leading Indicator
BOD                                Biochemical Oxygen Demand
CO2                                Carbon Dioxide Uptake in Grass Plants
ChickWeight                        Weight versus age of chicks on different diets
DNase                              Elisa assay of DNase
EuStockMarkets                     Daily Closing Prices of Major European Stock Indices, 1991-1998
Formaldehyde                       Determination of Formaldehyde
HairEyeColor                       Hair and Eye Color of Statistics Students
Harman23.cor                       Harman Example 2.3

install.packages("languageR")
library(languageR)
data(package="languageR")

alice                       Alice's Adventures in Wonderland
moby                        Moby Dick

auxiliaries                 Auxiliaries for regular and irregular verbs in Dutch

dative                      Dative Alternation
dativeSimplified            Dative Alternation - simplified data set
verbs                       Dative Alternation - simplified data set

english                     English visual lexical decision and naming latencies
lexdec                      Lexical decision latencies for 79 English nouns
ratings                     Ratings for 81 English nouns

与格交代する動詞の例

data(verbs)
head(verbs)

RealizationOfRec Verb AnimacyOfRec AnimacyOfTheme LengthOfTheme

1	NP	feed	animate	inanimate	2.639057
2	NP	give	animate	inanimate	1.098612
3	NP	give	animate	inanimate	2.564949
4	NP	give	animate	inanimate	1.609438
5	NP	offer	animate	inanimate	1.098612
6	NP	give	animate	inanimate	1.386294

?verbs とすると、Helpに説明が出る

verbs {languageR}	R Documentation
Dative Alternation - simplified data set
Description
A simplified version of the dative data set, used for expository purposes only.

Usage
data(verbs)
Format
A data frame with 903 observations on the following 5 variables.

RealizationOfRec
	a factor with levels NP and PP.

Verb
	a factor with the verbs as levels.

AnimacyOfRec
	a factor with levels animate and inanimate.

AnimacyOfTheme
	a factor with levels animate and inanimate.

LengthOfTheme
	a numeric vector coding the length in words of the theme.

References
Bresnan, J., Cueni, A., Nikitina, T. and Baayen, R. H. (2007) Predicting the dative alternation, in Bouma, G. and Kraemer, I. and Zwarts, J. (eds.), Cognitive Foundations of Interpretation, Royal Netherlands Academy of Sciences, 33 pages, in press.

身長と体重と性別のサンプルデータ

TOP ↑ ↓

https://helloacm.com/the-machine-learning-case-study-how-to-predict-weight-over-heightgender-using-linear-regression/

どんなデータか、その概要を知る
- str() でデータの概要が表示される

> str(kimatsu)
'data.frame':   18 obs. of  2 variables:
 $ kokugo : num  83 45 73 50 22 67 77 89 66 90 ...
 $ suugaku: num  90 55 90 43 33 55 48 98 56 75 ...

グラフにして様子を見る
- plot()

ランダムサンプリング sample(数値の範囲, サンプル数)

TOP ↑ ↓

sample(1:100, 50)
 [1] 68 39  1 34 87 43 14 82 59 51 85 21 54 74  7 73 79 37 83 97 44 84 33 35 70 96 42 38 20 28 72 80 40 69 25 99 91 75
[39]  6 24 32 94  2 45 18 22 92 90 98 64

これで、ランダムな数字を必要なだけ出しておいて、
それを「要素番号」として指定することで、
データフレーム中のデータをランダムに選び出せる。

> iris[random50,]
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
70           5.6         2.5          3.9         1.1 versicolor
87           6.7         3.1          4.7         1.5 versicolor
100          5.7         2.8          4.1         1.3 versicolor
75           6.4         2.9          4.3         1.3 versicolor
81           5.5         2.4          3.8         1.1 versicolor
13           4.8         3.0          1.4         0.1     setosa
40           5.1         3.4          1.5         0.2     setosa
89           5.6         3.0          4.1         1.3 versicolor

ランダムに選んだ残りは -をつけて指定すればよい

> iris[-random50,]
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
2            4.9         3.0          1.4         0.2     setosa
3            4.7         3.2          1.3         0.2     setosa
4            4.6         3.1          1.5         0.2     setosa
5            5.0         3.6          1.4         0.2     setosa
6            5.4         3.9          1.7         0.4     setosa
7            4.6         3.4          1.4         0.3     setosa
8            5.0         3.4          1.5         0.2     setosa
9            4.4         2.9          1.4         0.2     setosa
11           5.4         3.7          1.5         0.2     setosa
14           4.3         3.0          1.1         0.1     setosa

rnorm(個数, 平均, 標準偏差)

TOP ↑ ↓

ダミーのデータを作り出す。（何か試してみたいときに）
平均と標準偏差を省略して個数だけ指定した場合、平均０、標準偏差１の正規分布データからのランダム抽出

> rnorm(10)
 [1]  1.4364014 -0.5077639  0.3058840  0.1259501 -1.1296745  0.3292148
 [7]  0.9260699 -0.6915141 -1.8044824 -1.1223909
> rnorm(10)
 [1]  2.49496685  0.59830372  0.32473479  0.74853727 -1.52421661  0.98082241
 [7]  0.91520576  1.19348810  0.27119494 -0.08119991
> rnorm(10)
 [1]  0.28923196  0.59616551 -0.20413144 -0.03034469 -0.19886301  0.33546843
 [7] -0.31893659 -0.28439913 -0.58158149 -0.68247953
> rnorm(10)

平均５０で、標準偏差１０のデータを４０人分
- やってみてヒストグラム作るとわかりますが、４０人程度では、ランダムだと、「え、これが」と思われるような分布になります。４００でもまだまだ、４０００になるとまあまあ、４００００だとかなり安定した正規分布になりますね。
- 同じグラフでも、boxplotだと、４０でもきれいにつりあいの取れたグラフになります。

> rnorm(40, 50, 10)
 [1] 54.92133 52.74849 53.63346 54.34087 63.92496 48.14784 53.24380 51.89087 46.62319
[10] 33.08571 37.38375 69.22399 57.46107 59.48568 44.48870 52.82696 35.83338 62.21806
[19] 52.94664 39.54044 49.31500 45.00038 60.97926 51.26191 41.98686 29.47495 41.15922
[28] 55.83927 70.17838 63.72457 43.71194 52.10943 45.59285 59.61884 41.72390 47.07231
[37] 50.06059 54.59948 48.93411 48.99457
> hist((rnorm(40, 50, 10)))

Rで扱うデータの種類

TOP ↑ ↓

図でわかりやすい　↓

https://cell-innovation.nig.ac.jp/surfers/vector_difference.html

ベクトル

TOP ↑ ↓

プログラミングで言う「配列」
「変数名」をつける
要素が入っている
変数名 <- c(要素をカンマで区切って並べる)
- 例：国語と数学の得点
- kokugo <- c(83, 45, 73, 50, 22)
- suugaku <- c(90, 55, 90, 43, 33)
要素は前から順番に位置が決まっている
変数名をタイプすると内容が表示される

> kokugo
[1] 83 45 73 50 22

# 五教科７科目の得点
> test <- c(79, 85, 87, 78, 89, 68, 59)
> test
[1] 79 85 87 78 89 68 59

# 得点に科目の「名前」を付ける
> names(test) <- c("koku", "suu", "ei", "sekaisi", "rinri", "buturi", "kagaku")
> test
   koku     suu      ei sekaisi   rinri  buturi  kagaku 
     79      85      87      78      89      68      59 

# 要素番号でも名前でも、呼び出せる
> test[3]
ei 
87 
> test["ei"]
ei 
87

行列 matrix

TOP ↑ ↓

データフレーム

TOP ↑ ↓

プログラミングで言う「多次元配列」
行が項目
列が測定値
複数のベクトルを合わせてデータフレームを作ることもできる
変数名 <- data.frame(ベクトル名をカンマで区切って並べる)
- 例：期末試験の得点（国語と数学）
- kimatsu <- data.frame(kokugo, suugaku)
変数名をタイプすると内容が表示される

>kimatsu

  kokugo suugaku
1     83      90
2     45      55
3     73      90
4     50      43
5     22      33

個々のベクトルを指定するときは　$ を使う
- 国語の得点　kimatsu$kokugo
- 数学の得点　kimatsu$suugaku

逆に、あるデータフレームの特定の列を取り出して、別のデータフレームにする

icnaleMLS.MLT <- data.frame(icnale$MLS, icnale$MLT)

何行何列あるか調べる dim()

> dim(fragE11)
[1] 19282    10

19282行・10列あることがわかる。

カラム名を確認　names()

> names(ToothGrowth)
[1] "len"  "supp" "dose"

データフレームかどうか、確認is.data.frame()

データフレーム内の複数の列で並べ替え

data[order(data$pid, data$week),]

pidのカラムで並べ替えてから、weekのカラムで並べ替える
- 事前

  pid week score
1   1    8     3
2   1    6     3
3   1    3     2
4   1    2     1
5   1    7     3
6   1    4     3

事後

    pid week score
8     1    1     1
4     1    2     1
3     1    3     2
6     1    4     3
7     1    5     3
2     1    6     3

データの型

TOP ↑ ↓

factor　（因子）

TOP ↑ ↓

カテゴリー変数
順序の有り無しに分けられる
- 順序なし factor()
- 順序あり ordered()

Reference

https://sites.google.com/site/leihcrev/r/ordered-and-unordered-factors

factorの並び順を指定する

TOP ↑ ↓

factor(データ, levels = c("はじめ", "つぎに", "さいごに"))

f.data$students <- factor(f.data$students, levels=c("year2", "year3", "natives"))

データの型の変換

TOP ↑ ↓

as.vector()
as.numeric()
as.matrix()
as.data.frame()
unlist() listをvectorに
as.factor()

> as.tibble(sample2)
# A tibble: 100 x 8
      ID unnatural personal demonst connective cohesion coherence reader
   <int>     <int>    <int>   <int>      <int>    <int>     <int>  <int>
 1  1001         2        1       1          1        1         1      1
 2  1002         3        2       2          1        2         3      1
 3  1003         1        1       1          1        1         2      1
 4  1004         2        2       2          1        2         1      1
 5  1005         3        2       2          2        2         1      1
 6  1006         3        4       2          3        3         2      2

> sample2$unnatural <- as.factor(sample2$unnatural)

> as.tibble(sample2)
# A tibble: 100 x 8
      ID unnatural personal demonst connective cohesion coherence reader
   <int> <fct>        <int>   <int>      <int>    <int>     <int>  <int>
 1  1001 2                1       1          1        1         1      1
 2  1002 3                2       2          1        2         3      1
 3  1003 1                1       1          1        1         2      1
 4  1004 2                2       2          1        2         1      1
 5  1005 3                2       2          2        2         1      1
 6  1006 3                4       2          3        3         2      2

すべての列を変換

data.new <-lapply(data.org, as.factor)

factor型の変換に注意

TOP ↑ ↓

数値をいったんfactor型にすると、factorのレベルを表す数字（1から始まる）が付く
それを再度数値に変換すると、元の数字ではなく、factorのレベルの数字が数値になるので注意
いったん文字列にしてから戻す

as.integer(as.character(y.df$SL))
as.numeric(as.character(y.df$SL))

Reference https://a-habakiri.hateblo.jp/entry/2016/12/18/213416

データの型の確認

TOP ↑ ↓

dim()
class()

> class(sample)
[1] "data.frame"
> class(sample$unnatural)
[1] "integer"
> class(sample2$unnatural)
[1] "factor"

mode()
typeof()

> dim(cn2S4)
[1] 103   2

> class(cn2S4)
[1] "data.frame"

> mode(cn2S4)
[1] "list"

> typeof(cn2S4)
[1] "list"

> typeof(cn2S4$MDD)
[1] "integer"

> class(cn2S4$MDD)
[1] "factor"

> mode(cn2S4$MDD)
[1] "numeric"

summary()
str()

間違いやすい点のまとめ

TOP ↑ ↓

https://cell-innovation.nig.ac.jp/surfers/R_point.html

事前にデータを作っておく

TOP ↑ ↓

表計算ソフトなどで、テキストファイル（タブ区切りやCSV）で保存しておく。
一件一行
一番上の行は、変数名
欠損値は NA と記入しておく
ファイルの読み込み
- データフレーム名 <- read.table(choose.files())
- これでウインドウが開くのでどのファイルを読み込むか指定する

エクセルから「コピペ」することもできる

TOP ↑ ↓

エクセル上でデータを一覧にしておく
1. 一番上の行は見出しとして行名を入れておく。これがRでも使われるので、簡潔な記号にしておくのがよい。
取り入れたいデータ部分をマウスでドラッグして選び、Ctrl+Cとかでコピーする。
ウィンドウをRに切り替えて、
Rのコンソールで以下の命令を打って、実行（エンターキーを押す）。

データフレーム名 <- read.delim("clipboard")
もしくは、
データフレーム名 <- read.table("clipboard", header=T)

データフレーム名は、Rの中でのデータの名前
見出しの有り無しのデフォルトが、read.delimとread.tableで違うので注意
見出しの一部を変更するには、要素番号で見出しを指定して上書きする。

> names(KSL)[3] <- "SL"

KSLというデータの３番目の見出しをSLに変更する例

テキストファイル読み込み： readLines(), scan()

TOP ↑ ↓

一文一行ずつ読み込んで全体はベクトル。一文一行が一要素。

readLines()

x <- readLines(choose.files())

「incomplete final line」（）という警告が出る場合、warn=F とする。

x <- readLines("ファイル名", warn=F)

scan()

x <- scan(choose.files(), what="char", sep="\n")

要素の比較　%in%

TOP ↑ ↓

x %in% y

欠損値を計算から外すオプション

TOP ↑ ↓

na.rm=T

データの読み込み（ファイルから読み込む）

TOP ↑ ↓

csvファイル

TOP ↑ ↓

read.csv()
tidyverse::read_csv()
- こちらの方がトラブル少ない
- デフォルトで、ヘッダーあり

タブ区切りファイル

TOP ↑ ↓

read.delim("ファイル名")

ヘッダーはデフォルトでTRUEになっている。

読み込む際に、データの型をカラムごとに指定する： colClasses=c()

TOP ↑ ↓

Adam100.dat <- read.delim("Adam100.txt", colClasses=c("character", "integer", "integer", "integer","integer"))

データの一部（サブセット）を取り出す：条件に当てはまるものだけを選ぶ subset()

TOP ↑ ↓

subset(データフレーム名, 条件)

or 条件は |
and条件は &
文字のマッチは =="文字"
否定は !=

例

TOP ↑ ↓

文長が４以上：　SL >= 4
文長が4以上14以下： SL >=4 & SL <= 14

該当するデータの特定の列だけを出力

TOP ↑ ↓

subset(データフレーム名, 条件, 出力列)

出力列の例

select = c(MHD, MDD)

KSL.sub <- subset(KSL, SL >=4 & SL <= 14, select = c(SL, MHD, MDD))

文長が４以上１４以下に該当するデータのうち、SLとMHDとMDDのカラムだけを取り出す。

単純に要素番号の指定だけでもできる

> jp2gram2 <- jp2gram[,3:4]

特定の行を削除するには != で該当しないものを残せばよい

TOP ↑ ↓

NP.dat.jp2b <- NP.dat.jp %>% subset(Group != "Am")

しかし、これだと、GroupにAmというレベルが残ったままになる。
使ってないレベルを削除するのは droplevels()

NP.dat.jp2c <- droplevels(NP.dat.jp2)

データの選択

TOP ↑ ↓

Reference

https://kazutan.github.io/JSSP2018_spring/data_handling.html

TOEICスコアがないものを除外する

> head(bginfo)
     ID TOEIC TOEFL C.test Group
1 JP001   720    NA  69.38     A
2 JP002   930    NA  67.50     A
3 JP003   525   500  56.25     A
4 JP004    NA   477  60.00     A
5 JP005   440   501  56.25     A
6 JP006   800   490  65.00     A


> filter(bginfo, !is.na(TOEIC))
      ID TOEIC TOEFL C.test Group
1  JP001   720    NA  69.38     A
2  JP002   930    NA  67.50     A
3  JP003   525   500  56.25     A
4  JP005   440   501  56.25     A
5  JP006   800   490  65.00     A
6  JP012   740    NA  30.00     A

条件を複数重ねる

TOP ↑ ↓

> filter(bginfo, !is.na(TOEIC), !is.na(TOEFL))
      ID TOEIC TOEFL C.test Group
1  JP003   525   500  56.25     A
2  JP005   440   501  56.25     A
3  JP006   800   490  65.00     A
4  JP015   750   510  58.13     A
5  JP016   685   487  61.88     B
6  JP017   605   500  52.50     B
7  JP018   570   507  43.13     B
8  JP021   875   530  54.38     B
9  JP022   880   460  68.13     B
10 JP024   620   540  58.75     B
11 JP025   875   520  70.00     B

これで、TOEICもTOEFLも両方スコアのあるデータだけになった。

データの一部（特定のカラム）を取り出す

TOP ↑ ↓

data.frame()

いくつものカラムが並ぶデータの特定のカラムだけを使いたい場合
- 連続していれば、データフレーム[,2:4] とすれば、2列目から4列目だけを取り出せる
- 非連続している場合は、data.frame()で、カラム名を指定して取り出す

data.frame(x$ID, x$age, x$score)

データフレーム内のカテゴリー変数のレベル名の並び替え

TOP ↑ ↓

デフォルトだとアルファベット順になる。

> str(onomato.data)
'data.frame':	120 obs. of  4 variables:
 $ subj  : int  1 2 3 4 5 6 7 8 9 10 ...
 $ cond  : Factor w/ 2 levels "cont","exp": 2 2 2 2 2 2 2 2 2 2 ...
 $ timing: Factor w/ 3 levels "delay","post",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ score : int  65 75 35 70 85 55 85 20 50 55 ...
> levels(onomato.data$timing)
[1] "delay" "post"  "pre"

特定の順番に並べ替える

> onomato.data2 <- transform(onomato.data, timing=factor(timing, levels=c("pre", "post", "delay")))
> levels(onomato.data2$timing)
[1] "pre"   "post"  "delay"

データの一部を除く [-行番号, -列番号]

TOP ↑ ↓

削除する行・列番号にマイナスをつけて処理する。

TOP ↑ ↓

１列目を除くデータフレーム名[ , -1]
２列目を除くデータフレーム名[ , -2]
１行目を除くデータフレーム名[-1 , ]
２行目を除くデータフレーム名[-2 , ]

一列目にファイル名が入っていて、ファイル名以外の部分で、相関を見たいとき

chart.Correlation(データフレーム名[ , -1])

複数の行・列の指定はc()を使って範囲を指定して削除

TOP ↑ ↓

２列目から4列目を除くデータフレーム名[ , -c(2:4)]

データの 欠損値を埋める

TOP ↑ ↓

データ[is.na(データ)] <- 値

t2b.dat[is.na(t2b.dat)] <- "jp"

NAだったところにjpと入る

行にID番号をつける

TOP ↑ ↓

cbind(ID=1:nrow(データ), データ)

データに行番号をつける。（ランクの順位を数値として入れておくなど）row()

TOP ↑ ↓

行番号を rankというカラムに追加する

TOP ↑ ↓

top10000.df$rank <- row(top10000.df)
head(top10000.df)

    top10000 rank
the    68659    1
of     38144    2
to     28852    3
and    28657    4
in     22834    5
a      19779    6

行のnames属性を独立したカラムとして追加する rownames()

TOP ↑ ↓

> top10000.df$item <- rownames(top10000.df)
> head(top10000.df)
    top10000 rank item
the    68659    1  the
of     38144    2   of
to     28852    3   to
and    28657    4  and
in     22834    5   in
a      19779    6    a

これで、該当の項目を grep で検索できる

> grep("therefore", top10000.df$item)
[1] 175

thereforeは175位

ファイルへの書き出し

TOP ↑ ↓

write(変数, "ファイル名", ncol=列数)

データフレームは、write.table()で書き出す。

write.table(データフレーム名, "ファイル名")

フィールド間が半角スペース

write.csv(データフレーム名, "ファイル名.csv")

フィールド間がカンマ

csvファイルをエクセルで読み込む場合

install.packages("tidyverse")
library(dplyr)
wirte_excel_csv(データ名, "output.csv")

データの操作

TOP ↑ ↓

データフレームの縦結合： dplyr::bind_rows

TOP ↑ ↓

2つ以上もOK

データフレームの横結合： dplyr::bind_cols

TOP ↑ ↓

2つ以上もOK

変数を縦につなげる rbind()

TOP ↑ ↓

rbind(変数１, 変数２)

新しいデータフレーム <- rbind(データフレーム１, データフレーム２)

カラム名が同じでないとエラーになる。「名前が以前の名前と一致しません」
- カラム名の確認 colnames(データフレーム)
  - 二つのカラム名の比較： setdiff()を使う

setdiff(colnames(データ１), colnames(データ２))

カラム名の確認

colnames(データ)[4]

カラム名の変更

colnames(データ)[4] <- "新しい名前"

一度につなげられるのは二つ。
- 三つつなげる場合は、二段階になる。

変数を横につなげる

TOP ↑ ↓

cbind()

列を追加する

TOP ↑ ↓

空（NA）の値をもった列をつけ足す。

データフレーム$追加する列名 <- NA

データを縦横に一連の同じ処理をする apply()

TOP ↑ ↓

行単位で処理（二つ目の引数 1）：たとえば、平均点を出す

TOP ↑ ↓

apply(x, 1, 関数)

> head(tesuto)
  koku  suu eigo
1 46.2 40.4 50.3
2 36.0 54.7 64.7
3 46.6 61.7 48.7
4 43.2 52.6 51.8
5 49.6 42.0 43.2
6 43.1 53.5 60.4
> tesuto$total <- apply(tesuto[1:3], 1, sum)
> head(tesuto)
  koku  suu eigo total
1 46.2 40.4 50.3 136.9
2 36.0 54.7 64.7 155.4
3 46.6 61.7 48.7 157.0
4 43.2 52.6 51.8 147.6
5 49.6 42.0 43.2 134.8
6 43.1 53.5 60.4 157.0

列単位で処理（二つ目の引数 2）：たとえば、総合点を出す

TOP ↑ ↓

apply(x, 2, 関数）

一部の列だけを集計して合計の列を付け足す rowSums()

TOP ↑ ↓

２列目から９列目を足して、totalの列を右端に付け足す

> head(fragJA)
          fragment p1c1a p5c2a p2c3a p7c4a p6c5a p3c6a p4c7a p8c8a
1      (NP (PRP ))  1070   656   840  1114  1070   758  1043   970
2            (. .)   815   863   833   814   771   795   841   777
3      (ROOT (S ))   812   858   835   810   771   794   834   770
4 (PP (IN ) (NP ))   950   950   666   703   896   799   761   588
5            (, ,)   640   679   651   642   680   697   673   644
6  (S (NP ) (VP ))   574   597   728   688   620   513   730   615

fragJA$total <- rowSums(fragJA[,2:9])

> head(fragJA)
          fragment p1c1a p5c2a p2c3a p7c4a p6c5a p3c6a p4c7a p8c8a total
1      (NP (PRP ))  1070   656   840  1114  1070   758  1043   970  7521
2            (. .)   815   863   833   814   771   795   841   777  6509
3      (ROOT (S ))   812   858   835   810   771   794   834   770  6484
4 (PP (IN ) (NP ))   950   950   666   703   896   799   761   588  6313
5            (, ,)   640   679   651   642   680   697   673   644  5306
6  (S (NP ) (VP ))   574   597   728   688   620   513   730   615  5065

数値データの部分だけ合計する（NAは除く）

mutate("Total" = rowSums(across(where(is.numeric)), na.rm=TRUE))

データのフォーマットをWide format から Long formatにする

TOP ↑ ↓

縦横で「集計」してあるものを、一行一件のLong formatに変換する

gather()

TOP ↑ ↓

pivot_longer()

TOP ↑ ↓

個々の オブジェクトをファイルに保存

TOP ↑ ↓

保存

TOP ↑ ↓

save(オブジェクト, file="ファイル名.Rdata")

オブジェクトは、複数あってもよい

TOP ↑ ↓

save(オブジェクト, オブジェクト, オブジェクト, file="ファイル名.Rdata")

読み込み

TOP ↑ ↓

load("ファイル名.Rdata")

読み込んだ結果、もともとのオブジェクトがもともとのオブジェクト名で復元される。

読み込むときに別名にしたいとき

TOP ↑ ↓

保存

TOP ↑ ↓

saveRDS(オブジェクト, file="ファイル名.rds")

復元

TOP ↑ ↓

readRDS(file="ファイル名.rds")

別名 <- readRDS(file="ファイル名.rds")

data.frameから取り出したものを matrixに変換

TOP ↑ ↓

> head(fragJ11)
          fragment p1c1a p5c2a p2c3a p7c4a p6c5a p3c6a p4c7a p8c8a p8c1b p4c2b p3c3b p6c4b p7c5b p2c6b p5c7b p1c8b JPtotal
1      (NP (PRP ))  1070   656   840  1114  1070   758  1043   970  1063  1006   837  1218   986   814   628   999   15072
2            (. .)   815   863   833   814   771   795   841   777   879   852   839   815   790   831   821   764   13100
3      (ROOT (S ))   812   858   835   810   771   794   834   770   877   852   831   812   782   832   818   761   13049
4 (PP (IN ) (NP ))   950   950   666   703   896   799   761   588   537   671   831   806   742   717  1001   915   12533
5            (, ,)   640   679   651   642   680   697   673   644   542   582   591   608   639   627   676   662   10233
6  (S (NP ) (VP ))   574   597   728   688   620   513   730   615   625   789   551   677   667   665   551   564   10154
> x <- fragJ11[2,2:17]
> x
  p1c1a p5c2a p2c3a p7c4a p6c5a p3c6a p4c7a p8c8a p8c1b p4c2b p3c3b p6c4b p7c5b p2c6b p5c7b p1c8b
2   815   863   833   814   771   795   841   777   879   852   839   815   790   831   821   764
> class(x)
[1] "data.frame"
> y <- as.matrix(x)
> class(y)
[1] "matrix"
> y
  p1c1a p5c2a p2c3a p7c4a p6c5a p3c6a p4c7a p8c8a p8c1b p4c2b p3c3b p6c4b p7c5b p2c6b p5c7b p1c8b
2   815   863   833   814   771   795   841   777   879   852   839   815   790   831   821   764
> dimnames(y) <- NULL
> y
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16]
[1,]  815  863  833  814  771  795  841  777  879   852   839   815   790   831   821   764
> class(y)
[1] "matrix"

二行目の２から１７番目の要素だけ取り出す

x <- fragJ11[2,2:17]

マトリックスに変換

y <- as.matrix(x)

見出しにNULLを入れて見出しをなくす

dimnames(y) <- NULL

data.matrix() もある。

行列を入れ替える： t関数

TOP ↑ ↓

> x <- matrix(1:9, nrow=3, ncol=3)
> x
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
> t(x)
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9