R !!!判別分析の例 {{outline}} !テキストファイルから言語特徴量の算出 *学習者データの処理 **作業ディレクトリーをデータの入っているディレクトリーに設定 **list.files() でファイルが入っていることを確認 {{pre > list.files() [1] "JAN0001_P1B.txt" "JAN0001_P2B.txt" "JAN0001_P3B.txt" "JAN0001_P4B.txt" "JAN0001_P5B.txt" [6] "JAN0001_P6B.txt" "JAN0001_P7B.txt" "JAN0001_P8B.txt" "JAN0002_P1A.txt" "JAN0002_P2A.txt" }} *基礎的言語特徴量の算出プログラム(myTextIndex.R)の実行 myTextIndex() *結果の保存ファイルを、一つ上のディレクトリーに作成する *保存された結果のデータを読み込む {{pre > myTextIndex() Read 12 items Read 17 items Read 11 items Read 8 items Read 16 items Read 13 items Read 18 items Read 8 items Read 15 items Read 19 items > JPindex <- read.table(choose.files()) }} *カラム名をつける {{pre > JPindex V1 V2 V3 V4 V5 V6 V7 V8 V9 1 JAN0001_P1B.txt 192 108 12 0.5625000 7.794229 0.6929687 4.562500 16.000000 2 JAN0001_P2B.txt 237 127 17 0.5358650 8.249536 0.6880591 4.329114 13.941180 3 JAN0001_P3B.txt 148 92 11 0.6216216 7.562353 0.7097297 4.391892 13.454550 4 JAN0001_P4B.txt 84 62 8 0.7380952 6.764755 0.6200000 4.547619 10.500000 5 JAN0001_P5B.txt 229 113 16 0.4934498 7.467250 0.6471179 4.161572 14.312500 6 JAN0001_P6B.txt 200 105 13 0.5250000 7.424621 0.6608500 4.170000 15.384620 7 JAN0001_P7B.txt 232 110 18 0.4741379 7.221854 0.6408190 4.353448 12.888890 8 JAN0001_P8B.txt 91 67 8 0.7362637 7.023508 0.6700000 4.318681 11.375000 9 JAN0002_P1A.txt 149 92 15 0.6174497 7.536934 0.6869799 4.630872 9.933333 10 JAN0002_P2A.txt 192 109 19 0.5677083 7.866397 0.6815625 4.578125 10.105260 > names(JPindex) <- c("filename", "Token", "Type", "NoS", "TTR", "GI", "MATTR", "AWL", "ASL") > JPindex filename Token Type NoS TTR GI MATTR AWL ASL 1 JAN0001_P1B.txt 192 108 12 0.5625000 7.794229 0.6929687 4.562500 16.000000 2 JAN0001_P2B.txt 237 127 17 0.5358650 8.249536 0.6880591 4.329114 13.941180 3 JAN0001_P3B.txt 148 92 11 0.6216216 7.562353 0.7097297 4.391892 13.454550 4 JAN0001_P4B.txt 84 62 8 0.7380952 6.764755 0.6200000 4.547619 10.500000 5 JAN0001_P5B.txt 229 113 16 0.4934498 7.467250 0.6471179 4.161572 14.312500 6 JAN0001_P6B.txt 200 105 13 0.5250000 7.424621 0.6608500 4.170000 15.384620 7 JAN0001_P7B.txt 232 110 18 0.4741379 7.221854 0.6408190 4.353448 12.888890 8 JAN0001_P8B.txt 91 67 8 0.7362637 7.023508 0.6700000 4.318681 11.375000 9 JAN0002_P1A.txt 149 92 15 0.6174497 7.536934 0.6869799 4.630872 9.933333 10 JAN0002_P2A.txt 192 109 19 0.5677083 7.866397 0.6815625 4.578125 10.105260 }} *同様に母語話者データも処理する {{pre > list.files() > myTextIndex() > NSindex <- read.table(choose.files()) > names(NSindex) <- c("filename", "Token", "Type", "NoS", "TTR", "GI", "MATTR", "AWL", "ASL") > NSindex filename Token Type NoS TTR GI MATTR AWL ASL 1 ENG0002_1P1A.txt 608 262 30 0.4309211 10.625500 0.6851645 4.817434 20.26667 2 ENG0002_2P5A.txt 796 337 28 0.4233668 11.944650 0.6943090 5.026382 28.42857 3 ENG0002_3P2A.txt 857 359 34 0.4189032 12.263210 0.7148191 4.868145 25.20588 4 ENG0002_4P7A.txt 924 406 38 0.4393939 13.356420 0.6975649 4.961039 24.31579 5 ENG0002_5P6A.txt 847 392 36 0.4628099 13.469280 0.7370248 4.472255 23.52778 6 ENG0002_6P3A.txt 610 239 23 0.3918033 9.676827 0.6592951 4.645902 26.52174 7 ENG0002_7P4A.txt 727 276 55 0.3796424 10.236270 0.6694360 4.093535 13.21818 8 ENG0002_8P8A.txt 538 241 30 0.4479554 10.390250 0.6657435 4.486989 17.93333 9 ENG0003_1P8B.txt 412 169 24 0.4101942 8.326032 0.6609223 4.296117 17.16667 10 ENG0003_2P4B.txt 482 207 22 0.4294606 9.428592 0.6858921 4.315353 21.90909 }} !ファイル名の代わりに、カテゴリーを代入(バックアップを作り作業) {{pre > JPindex2 <- JPindex > JPindex2$filename <- "JP" > JPindex2 filename Token Type NoS TTR GI MATTR AWL ASL 1 JP 192 108 12 0.5625000 7.794229 0.6929687 4.562500 16.000000 2 JP 237 127 17 0.5358650 8.249536 0.6880591 4.329114 13.941180 3 JP 148 92 11 0.6216216 7.562353 0.7097297 4.391892 13.454550 4 JP 84 62 8 0.7380952 6.764755 0.6200000 4.547619 10.500000 5 JP 229 113 16 0.4934498 7.467250 0.6471179 4.161572 14.312500 6 JP 200 105 13 0.5250000 7.424621 0.6608500 4.170000 15.384620 7 JP 232 110 18 0.4741379 7.221854 0.6408190 4.353448 12.888890 8 JP 91 67 8 0.7362637 7.023508 0.6700000 4.318681 11.375000 9 JP 149 92 15 0.6174497 7.536934 0.6869799 4.630872 9.933333 10 JP 192 109 19 0.5677083 7.866397 0.6815625 4.578125 10.105260 > NSindex2 <- NSindex > NSindex2$filename <- "NS" > NSindex2 filename Token Type NoS TTR GI MATTR AWL ASL 1 NS 608 262 30 0.4309211 10.625500 0.6851645 4.817434 20.26667 2 NS 796 337 28 0.4233668 11.944650 0.6943090 5.026382 28.42857 3 NS 857 359 34 0.4189032 12.263210 0.7148191 4.868145 25.20588 4 NS 924 406 38 0.4393939 13.356420 0.6975649 4.961039 24.31579 5 NS 847 392 36 0.4628099 13.469280 0.7370248 4.472255 23.52778 6 NS 610 239 23 0.3918033 9.676827 0.6592951 4.645902 26.52174 7 NS 727 276 55 0.3796424 10.236270 0.6694360 4.093535 13.21818 8 NS 538 241 30 0.4479554 10.390250 0.6657435 4.486989 17.93333 9 NS 412 169 24 0.4101942 8.326032 0.6609223 4.296117 17.16667 10 NS 482 207 22 0.4294606 9.428592 0.6858921 4.315353 21.90909 }} !二種類のファイルを統合 {{pre > JPNSindex <- rbind(JPindex2, NSindex2) > JPNSindex filename Token Type NoS TTR GI MATTR AWL ASL 1 JP 192 108 12 0.5625000 7.794229 0.6929687 4.562500 16.000000 2 JP 237 127 17 0.5358650 8.249536 0.6880591 4.329114 13.941180 3 JP 148 92 11 0.6216216 7.562353 0.7097297 4.391892 13.454550 4 JP 84 62 8 0.7380952 6.764755 0.6200000 4.547619 10.500000 5 JP 229 113 16 0.4934498 7.467250 0.6471179 4.161572 14.312500 6 JP 200 105 13 0.5250000 7.424621 0.6608500 4.170000 15.384620 7 JP 232 110 18 0.4741379 7.221854 0.6408190 4.353448 12.888890 8 JP 91 67 8 0.7362637 7.023508 0.6700000 4.318681 11.375000 9 JP 149 92 15 0.6174497 7.536934 0.6869799 4.630872 9.933333 10 JP 192 109 19 0.5677083 7.866397 0.6815625 4.578125 10.105260 11 NS 608 262 30 0.4309211 10.625500 0.6851645 4.817434 20.266670 12 NS 796 337 28 0.4233668 11.944650 0.6943090 5.026382 28.428570 13 NS 857 359 34 0.4189032 12.263210 0.7148191 4.868145 25.205880 14 NS 924 406 38 0.4393939 13.356420 0.6975649 4.961039 24.315790 15 NS 847 392 36 0.4628099 13.469280 0.7370248 4.472255 23.527780 16 NS 610 239 23 0.3918033 9.676827 0.6592951 4.645902 26.521740 17 NS 727 276 55 0.3796424 10.236270 0.6694360 4.093535 13.218180 18 NS 538 241 30 0.4479554 10.390250 0.6657435 4.486989 17.933330 19 NS 412 169 24 0.4101942 8.326032 0.6609223 4.296117 17.166670 20 NS 482 207 22 0.4294606 9.428592 0.6858921 4.315353 21.909090 }} !!判別分析 *Leave-One-Out Cross Validation付きで判別分析を行う モデル <- lda(カテゴリー ~ ., data = データ全体, CV = T)