R !!!判別分析の例 {{outline}} !テキストファイルから言語特徴量の算出 *学習者データの処理 **作業ディレクトリーをデータの入っているディレクトリーに設定 **list.files() でファイルが入っていることを確認 {{pre > list.files() [1] "JAN0001_P1B.txt" "JAN0001_P2B.txt" "JAN0001_P3B.txt" "JAN0001_P4B.txt" "JAN0001_P5B.txt" [6] "JAN0001_P6B.txt" "JAN0001_P7B.txt" "JAN0001_P8B.txt" "JAN0002_P1A.txt" "JAN0002_P2A.txt" }} *基礎的言語特徴量の算出プログラム(myTextIndex.R)の実行 myTextIndex() *結果の保存ファイルを、一つ上のディレクトリーに作成する *保存された結果のデータを読み込む {{pre > myTextIndex() Read 12 items Read 17 items Read 11 items Read 8 items Read 16 items Read 13 items Read 18 items Read 8 items Read 15 items Read 19 items > JPindex <- read.table(choose.files()) }} *カラム名をつける {{pre > JPindex V1 V2 V3 V4 V5 V6 V7 V8 V9 1 JAN0001_P1B.txt 192 108 12 0.5625000 7.794229 0.6929687 4.562500 16.000000 2 JAN0001_P2B.txt 237 127 17 0.5358650 8.249536 0.6880591 4.329114 13.941180 3 JAN0001_P3B.txt 148 92 11 0.6216216 7.562353 0.7097297 4.391892 13.454550 4 JAN0001_P4B.txt 84 62 8 0.7380952 6.764755 0.6200000 4.547619 10.500000 5 JAN0001_P5B.txt 229 113 16 0.4934498 7.467250 0.6471179 4.161572 14.312500 6 JAN0001_P6B.txt 200 105 13 0.5250000 7.424621 0.6608500 4.170000 15.384620 7 JAN0001_P7B.txt 232 110 18 0.4741379 7.221854 0.6408190 4.353448 12.888890 8 JAN0001_P8B.txt 91 67 8 0.7362637 7.023508 0.6700000 4.318681 11.375000 9 JAN0002_P1A.txt 149 92 15 0.6174497 7.536934 0.6869799 4.630872 9.933333 10 JAN0002_P2A.txt 192 109 19 0.5677083 7.866397 0.6815625 4.578125 10.105260 > names(JPindex) <- c("filename", "Token", "Type", "NoS", "TTR", "GI", "MATTR", "AWL", "ASL") > JPindex filename Token Type NoS TTR GI MATTR AWL ASL 1 JAN0001_P1B.txt 192 108 12 0.5625000 7.794229 0.6929687 4.562500 16.000000 2 JAN0001_P2B.txt 237 127 17 0.5358650 8.249536 0.6880591 4.329114 13.941180 3 JAN0001_P3B.txt 148 92 11 0.6216216 7.562353 0.7097297 4.391892 13.454550 4 JAN0001_P4B.txt 84 62 8 0.7380952 6.764755 0.6200000 4.547619 10.500000 5 JAN0001_P5B.txt 229 113 16 0.4934498 7.467250 0.6471179 4.161572 14.312500 6 JAN0001_P6B.txt 200 105 13 0.5250000 7.424621 0.6608500 4.170000 15.384620 7 JAN0001_P7B.txt 232 110 18 0.4741379 7.221854 0.6408190 4.353448 12.888890 8 JAN0001_P8B.txt 91 67 8 0.7362637 7.023508 0.6700000 4.318681 11.375000 9 JAN0002_P1A.txt 149 92 15 0.6174497 7.536934 0.6869799 4.630872 9.933333 10 JAN0002_P2A.txt 192 109 19 0.5677083 7.866397 0.6815625 4.578125 10.105260 }} *同様に母語話者データも処理する {{pre > list.files() > myTextIndex() > NSindex <- read.table(choose.files()) > names(NSindex) <- c("filename", "Token", "Type", "NoS", "TTR", "GI", "MATTR", "AWL", "ASL") > NSindex filename Token Type NoS TTR GI MATTR AWL ASL 1 ENG0002_1P1A.txt 608 262 30 0.4309211 10.625500 0.6851645 4.817434 20.26667 2 ENG0002_2P5A.txt 796 337 28 0.4233668 11.944650 0.6943090 5.026382 28.42857 3 ENG0002_3P2A.txt 857 359 34 0.4189032 12.263210 0.7148191 4.868145 25.20588 4 ENG0002_4P7A.txt 924 406 38 0.4393939 13.356420 0.6975649 4.961039 24.31579 5 ENG0002_5P6A.txt 847 392 36 0.4628099 13.469280 0.7370248 4.472255 23.52778 6 ENG0002_6P3A.txt 610 239 23 0.3918033 9.676827 0.6592951 4.645902 26.52174 7 ENG0002_7P4A.txt 727 276 55 0.3796424 10.236270 0.6694360 4.093535 13.21818 8 ENG0002_8P8A.txt 538 241 30 0.4479554 10.390250 0.6657435 4.486989 17.93333 9 ENG0003_1P8B.txt 412 169 24 0.4101942 8.326032 0.6609223 4.296117 17.16667 10 ENG0003_2P4B.txt 482 207 22 0.4294606 9.428592 0.6858921 4.315353 21.90909 }} !ファイル名の代わりに、カテゴリーを代入(バックアップを作り作業) {{pre > JPindex2 <- JPindex > JPindex2$filename <- "JP" > JPindex2 filename Token Type NoS TTR GI MATTR AWL ASL 1 JP 192 108 12 0.5625000 7.794229 0.6929687 4.562500 16.000000 2 JP 237 127 17 0.5358650 8.249536 0.6880591 4.329114 13.941180 3 JP 148 92 11 0.6216216 7.562353 0.7097297 4.391892 13.454550 4 JP 84 62 8 0.7380952 6.764755 0.6200000 4.547619 10.500000 5 JP 229 113 16 0.4934498 7.467250 0.6471179 4.161572 14.312500 6 JP 200 105 13 0.5250000 7.424621 0.6608500 4.170000 15.384620 7 JP 232 110 18 0.4741379 7.221854 0.6408190 4.353448 12.888890 8 JP 91 67 8 0.7362637 7.023508 0.6700000 4.318681 11.375000 9 JP 149 92 15 0.6174497 7.536934 0.6869799 4.630872 9.933333 10 JP 192 109 19 0.5677083 7.866397 0.6815625 4.578125 10.105260 > NSindex2 <- NSindex > NSindex2$filename <- "NS" > NSindex2 filename Token Type NoS TTR GI MATTR AWL ASL 1 NS 608 262 30 0.4309211 10.625500 0.6851645 4.817434 20.26667 2 NS 796 337 28 0.4233668 11.944650 0.6943090 5.026382 28.42857 3 NS 857 359 34 0.4189032 12.263210 0.7148191 4.868145 25.20588 4 NS 924 406 38 0.4393939 13.356420 0.6975649 4.961039 24.31579 5 NS 847 392 36 0.4628099 13.469280 0.7370248 4.472255 23.52778 6 NS 610 239 23 0.3918033 9.676827 0.6592951 4.645902 26.52174 7 NS 727 276 55 0.3796424 10.236270 0.6694360 4.093535 13.21818 8 NS 538 241 30 0.4479554 10.390250 0.6657435 4.486989 17.93333 9 NS 412 169 24 0.4101942 8.326032 0.6609223 4.296117 17.16667 10 NS 482 207 22 0.4294606 9.428592 0.6858921 4.315353 21.90909 }} !二種類のファイルを統合