getwd()
[1] “C:/(中略)/NICER1_3_2/2020-11-24NICER1_3_2/NICER_NNS”
{r, results = 'hide'}
list.files()
* 学習者のデータと母語話者のデータそれぞれ10ファイルくらい読み込んで、表現を検索して比べてみる。
jpn511 <- readLines("JPN511.txt")
## Warning in readLines("JPN511.txt"): 'JPN511.txt'
## で不完全な最終行が見つかりました
警告: incomplete final line found on 'JPN511.txt'
warn=F
オプションを付ける。jpn511 <- readLines("JPN511.txt", warn=F)
{r, warning=F}
jpn501 <- readLines("JPN501.txt")
jpn502 <- readLines("JPN502.txt")
jpn503 <- readLines("JPN503.txt")
jpn504 <- readLines("JPN504.txt")
jpn505 <- readLines("JPN505.txt")
jpn506 <- readLines("JPN506.txt")
jpn507 <- readLines("JPN507.txt")
jpn508 <- readLines("JPN508.txt")
jpn509 <- readLines("JPN509.txt")
jpn510 <- readLines("JPN510.txt")
setwd("../NICER_NS")
ns501 <- readLines("NS501.txt")
ns502 <- readLines("NS502.txt")
ns503 <- readLines("NS503.txt")
ns504 <- readLines("NS504.txt")
ns505 <- readLines("NS505.txt")
ns506 <- readLines("NS506.txt")
ns507 <- readLines("NS507.txt")
ns508 <- readLines("NS508.txt")
ns509 <- readLines("NS509.txt")
ns510 <- readLines("NS510.txt")
ls()
## [1] "jpn501" "jpn502" "jpn503" "jpn504" "jpn505" "jpn506" "jpn507" "jpn508"
## [9] "jpn509" "jpn510" "jpn511" "ns501" "ns502" "ns503" "ns504" "ns505"
## [17] "ns506" "ns507" "ns508" "ns509" "ns510"
Rでは正規表現「*」の「エスケープ」に「\」を二重に使う点に注意)
[ ]
内は、いずれかの文字を意味する。
[Hh]owever
は、However
もしくは
however
要素番号ではなく、中身を表示するオプション
value=T
大文字小文字を区別しないオプション ignore.case=T
grep("[hH]owever", ns501, value=T)
## [1] "*NS501:\tHowever in the French educational system instead of a head or a body there is a thesis and an anti-thesis or point and counter point in which the writer must oppose his or her original statements."
## [2] "*NS501:\tThis makes the facts easy to access, however, it does not force the writer to challenge his or her own logic in the process, leaving the ideas themselves rigid."
## [3] "*NS501:\tHowever what the French lose in logical flow they gain in critical thinking."
## [4] "*NS501:\tHowever, sadly with the continuous failings of the American educational system, these lofty dreams yet remain dreams for a generation of potential Newtons and Einsteins."
jpn.10 <- c(jpn501, jpn502, jpn503, jpn504, jpn505, jpn506, jpn507, jpn508, jpn509, jpn510)
length(jpn.10)
## [1] 993
ns.10 <- c(ns501, ns502, ns503, ns504, ns505, ns506, ns507, ns508, ns509, ns510)
length(ns.10)
## [1] 945
grep("[hH]owever", ns.10, value=T)
## [1] "*NS501:\tHowever in the French educational system instead of a head or a body there is a thesis and an anti-thesis or point and counter point in which the writer must oppose his or her original statements."
## [2] "*NS501:\tThis makes the facts easy to access, however, it does not force the writer to challenge his or her own logic in the process, leaving the ideas themselves rigid."
## [3] "*NS501:\tHowever what the French lose in logical flow they gain in critical thinking."
## [4] "*NS501:\tHowever, sadly with the continuous failings of the American educational system, these lofty dreams yet remain dreams for a generation of potential Newtons and Einsteins."
## [5] "*NS502:\tHowever, with growing competition in workplaces and with newer jobs being developed on a regular basis, it may be necessary to reexamine this two-dimensional hierarchy in order to better prepare students for the changing world."
## [6] "*NS503:\tHowever, I worry that in today's increasingly global society, in which scientific developments are often explicitly prioritized over humanities-based education and research around the world, our global society is perhaps sacrificing crucial analysis of the potential consequences of such scientific research."
## [7] "*NS503:\tHumanities-based education and analysis, however, has the potential to challenge such ideology and, thereby, transform contemporary global society for the better."
## [8] "*NS504:\tHowever, both systems are not completely different, as they both take into account the importance of academic achievement and also the base of the curriculum, albeit having its differences, remains based on a language, social science, natural science and mathematics core."
## [9] "*NS505:\tHowever Australians have sought to distinguish themselves from the Brits by assuming the role of the scrapper, the underdog."
## [10] "*NS505:\tHowever, there are also some negative aspects to Australia's sporting identity."
## [11] "*NS505:\tAustralia presents itself to the world as a sporting nation, however I challenge the validity of this representation."
## [12] "*NS507:\tHowever, the situation in the United States is much different with most children beginning their first foreign language classes only in high school, if at all."
## [13] "*NS507:\tHowever, a similar attitude is displayed when an American finds themselves abroad: Why can't they just speak English?"
## [14] "*NS508:\tHowever, as English began to increase in popularity worldwide its influence also took hold of Scotland."
## [15] "*NS508:\tHowever, even thought the education method is quite successful the lack of interest and importance on Gaelic means the number of students attending these schools are limited and Gaelic is not being used in the world outside the classroom."
## [16] "*NS508:\tHowever, that is most certainly easier said than done."
## [17] "*NS509:\tHowever, today paper money is made of the same materials and the only thing that distinguishes one bill from another is the digit printed on each one."
## [18] "*NS509:\tHowever, I think that people need to reassess what is important to them at what is valuable."
## [19] "*NS509:\tHowever, when it comes down to it, it is all just paper."
## [20] "*NS509:\tHowever, the value of material objects is completely up to us as individuals."
## [21] "*NS510:\tHowever, there are those that claim that any opposition towards these actions by the Australian Federal Government are in fact based on an underlying racial issue rather than an issue of economical practicality or fairness"
grep("[hH]owever", jpn.10, value=T)
## [1] "%NTV:\tHowever, many budo have no teammate; instead, you must play against yourself."
## [2] "%COM:\tAvoid starting sentences with coordinating conjunctions. You can change \"but\" to \"however\"."
## [3] "%NTV:\tHowever, because of rei, budo provides many additional good points, such as mental strength."
## [4] "*JPN502:\tHowever, we cannot study in advance because of the less time."
## [5] "%NTV:\tHowever, we could not study enough because we had less time. "
## [6] "%NTV:\tHowever, I think we should not view it as an entirely useless and incorrect policy."
## [7] "*JPN506:\tHowever, I have heard one family story ever before."
## [8] "*JPN507:\tHowever, L make think to educational systems."
## [9] "*JPN507:\tHowever, we want more high quality working in one area, it need longer time to enhance that skill."
## [10] "%NTV:\tHowever, there are sports where the player's genetics is non-relative."
## [11] "%COM:\tDon't start sentences with coordinating conjunctions. \"But\" can often be replaced with \"however\"."
## [12] "%NTV:\tHowever, when they enter university or get a job, the situation changes."
"[hH]owever"
記号 | 説明 | 例 |
---|---|---|
^ | 先頭 | ^\\*JPN |
. | なんでも一文字 | . |
* | 直前の0回以上の繰り返し | .* |
+ | 直前の1回以上の繰り返し | .+ |
\b | 単語の境界 | \\bhow\\b |
\\
https://sugiura-ken.org/wiki/wiki.cgi/exp?page=R%2Ereggrep("^\\*JPN.+[hH]owever", jpn.10, value=T)
## [1] "*JPN502:\tHowever, we cannot study in advance because of the less time."
## [2] "*JPN506:\tHowever, I have heard one family story ever before."
## [3] "*JPN507:\tHowever, L make think to educational systems."
## [4] "*JPN507:\tHowever, we want more high quality working in one area, it need longer time to enhance that skill."
grep("^\\*NS.+[hH]owever", ns.10, value=T)
## [1] "*NS501:\tHowever in the French educational system instead of a head or a body there is a thesis and an anti-thesis or point and counter point in which the writer must oppose his or her original statements."
## [2] "*NS501:\tThis makes the facts easy to access, however, it does not force the writer to challenge his or her own logic in the process, leaving the ideas themselves rigid."
## [3] "*NS501:\tHowever what the French lose in logical flow they gain in critical thinking."
## [4] "*NS501:\tHowever, sadly with the continuous failings of the American educational system, these lofty dreams yet remain dreams for a generation of potential Newtons and Einsteins."
## [5] "*NS502:\tHowever, with growing competition in workplaces and with newer jobs being developed on a regular basis, it may be necessary to reexamine this two-dimensional hierarchy in order to better prepare students for the changing world."
## [6] "*NS503:\tHowever, I worry that in today's increasingly global society, in which scientific developments are often explicitly prioritized over humanities-based education and research around the world, our global society is perhaps sacrificing crucial analysis of the potential consequences of such scientific research."
## [7] "*NS503:\tHumanities-based education and analysis, however, has the potential to challenge such ideology and, thereby, transform contemporary global society for the better."
## [8] "*NS504:\tHowever, both systems are not completely different, as they both take into account the importance of academic achievement and also the base of the curriculum, albeit having its differences, remains based on a language, social science, natural science and mathematics core."
## [9] "*NS505:\tHowever Australians have sought to distinguish themselves from the Brits by assuming the role of the scrapper, the underdog."
## [10] "*NS505:\tHowever, there are also some negative aspects to Australia's sporting identity."
## [11] "*NS505:\tAustralia presents itself to the world as a sporting nation, however I challenge the validity of this representation."
## [12] "*NS507:\tHowever, the situation in the United States is much different with most children beginning their first foreign language classes only in high school, if at all."
## [13] "*NS507:\tHowever, a similar attitude is displayed when an American finds themselves abroad: Why can't they just speak English?"
## [14] "*NS508:\tHowever, as English began to increase in popularity worldwide its influence also took hold of Scotland."
## [15] "*NS508:\tHowever, even thought the education method is quite successful the lack of interest and importance on Gaelic means the number of students attending these schools are limited and Gaelic is not being used in the world outside the classroom."
## [16] "*NS508:\tHowever, that is most certainly easier said than done."
## [17] "*NS509:\tHowever, today paper money is made of the same materials and the only thing that distinguishes one bill from another is the digit printed on each one."
## [18] "*NS509:\tHowever, I think that people need to reassess what is important to them at what is valuable."
## [19] "*NS509:\tHowever, when it comes down to it, it is all just paper."
## [20] "*NS509:\tHowever, the value of material objects is completely up to us as individuals."
## [21] "*NS510:\tHowever, there are those that claim that any opposition towards these actions by the Australian Federal Government are in fact based on an underlying racial issue rather than an issue of economical practicality or fairness"
grep("^\\*NS.+however", ns.10, value=T)
## [1] "*NS501:\tThis makes the facts easy to access, however, it does not force the writer to challenge his or her own logic in the process, leaving the ideas themselves rigid."
## [2] "*NS503:\tHumanities-based education and analysis, however, has the potential to challenge such ideology and, thereby, transform contemporary global society for the better."
## [3] "*NS505:\tAustralia presents itself to the world as a sporting nation, however I challenge the validity of this representation."
for (条件){
すること
すること
}
getwd()
list.files()
file.zenbu <- list.files() # ディレクトリー内の全ファイルのリスト作成
ruiseki <- "" # 結果を入れる入れ物を準備(文字列)
for (i in file.zenbu){ # ファイルのリストから一つずつ取り出して i に入れる
yomikomi <- readLines(i, warn=F) # i から読み込んだものを、yomikomi に入れる
ruiseki <- c(ruiseki, yomikomi) # 読み込んだ結果を、ruisekiの中に c()を使って追加していく
}
ruiseki
length(ruiseki)
## [1] 37949
head(ruiseki)
## [1] "" "@Begin" "@Participants:\tJPN501"
## [4] "@PID:\tPIDJP501" "@Age:\t21" "@Sex:\tF"
head(ruiseki, 20)
## [1] ""
## [2] "@Begin"
## [3] "@Participants:\tJPN501"
## [4] "@PID:\tPIDJP501"
## [5] "@Age:\t21"
## [6] "@Sex:\tF"
## [7] "@YearInSchool:\tU2"
## [8] "@Major:\tagriculture"
## [9] "@StudyHistory:\t8"
## [10] "@OtherLanguage:\tChinese=1.0;none="
## [11] "@Qualification:\tTOEIC=590(2013);none=;none="
## [12] "@Abroad:\tnone=;none="
## [13] "@Reading:\t3"
## [14] "@Writing:\t2"
## [15] "@Listening:\t2"
## [16] "@Speaking:\t1"
## [17] "@JapaneseEssay:\t4"
## [18] "@EnglishEssayEx:\t3"
## [19] "@EnglishEssay:\t2"
## [20] "@Difficulty:\t"
tail(ruiseki)
## [1] "%NTV:\tAs a result, classes will take longer and students' abilities will decrease."
## [2] "%COM:\t"
## [3] "*JPN881:\tI think it should be abolished Internet added in the elementary school's classes."
## [4] "%NTV:\tI think the idea of adding the Internet to elementary school classes should be abolished."
## [5] "%COM:\t"
## [6] "@End"
ruiseki[201:220]
## [1] "%COM:\t"
## [2] "*JPN502:\tSo, our world ranking of study level was fell down ."
## [3] "%NTV:\tTherefore, our world ranking in terms of study level slipped down. "
## [4] "%COM:\tAvoid starting sentences with coordinating conjunctions. You can change \"so\" to \"therefore\". "
## [5] "%par:"
## [6] "*JPN502:\tSecond,I'll tell what happen to us now."
## [7] "%NTV:\tSecond, I'll explain what happens now."
## [8] "%COM:\t"
## [9] "*JPN502:\tMany people who are older than us sometimes call us \"YUTORI SEDAI\" that include that negative image."
## [10] "%NTV:\tMany people who are older than us sometimes call us \"YUTORI sedai\", which has a negative connotation."
## [11] "%COM:\t"
## [12] "*JPN502:\tThere image is student who grew in \"YUTORI\" is slow to decide and have less effort and so on."
## [13] "%NTV:\tTheir image of a student who grew up under the YUTORI system is of someone who is slow to decide, doesn't make much effort, and so on."
## [14] "%COM:\t"
## [15] "*JPN502:\tEven if I was played tennis in lesson of university, the teacher said that \"You are YUTORI, so you should make your self more better.\""
## [16] "%NTV:\tEven if I had a tennis lesson in university, the teacher said, \"You are YUTORI, so you should make yourself better.\""
## [17] "%COM:\tUse quote marks to indicate quoted speech."
## [18] "*JPN502:\tI was shocked."
## [19] "%NTV:\tI was shocked."
## [20] "%COM:\t"
file.zenbu <- list.files() # ディレクトリー内の全ファイルのリスト作成
ruiseki <- "" # 結果を入れる入れ物を準備(文字列)
for (i in file.zenbu){ # ファイルのリストから一つずつ取り出して i に入れる
yomikomi <- readLines(i, warn=F) # i から読み込んだものを、yomikomi に入れる
ruiseki <- c(ruiseki, yomikomi) # 読み込んだ結果を、ruisekiの中に c()を使って追加していく
}
jpn.however <- grep("^\\*JPN.+[hH]owever", ruiseki, value=T)
head(jpn.however, 20)
## [1] "*JPN502:\tHowever, we cannot study in advance because of the less time."
## [2] "*JPN506:\tHowever, I have heard one family story ever before."
## [3] "*JPN507:\tHowever, L make think to educational systems."
## [4] "*JPN507:\tHowever, we want more high quality working in one area, it need longer time to enhance that skill."
## [5] "*JPN511:\tHowever, there are limited teacher who have teaching license."
## [6] "*JPN512:\tHowever, most healthy people don't recognize what kind of disadvantages are there and how to deal with the cases which they meet disadvantaged people."
## [7] "*JPN512:\tThe answer is, however, not about education."
## [8] "*JPN516:\tHowever, since bubble economy was collapsed, Japan had to adapt its crisis."
## [9] "*JPN516:\tHowever, Japanese students' educational levels were declined under this system."
## [10] "*JPN516:\tHowever, I think the way the Japanese government chosen was not necessarily true."
## [11] "*JPN517:\tHowever, I am too busy to work these days because I attend many lecture and I have to study for the test."
## [12] "*JPN517:\tHowever, it is very hard."
## [13] "*JPN520:\tHowever, we have to admit that not all of the teachers can teach \"speaking English\" well enough."
## [14] "*JPN526:\tHowever, the other day my father said to me, \"It's the time you should pay some money for family per month.\""
## [15] "*JPN528:\tHowever, Finnish education system also have some bad points."
## [16] "*JPN529:\tHowever, my life has changed by my belonging a sports club in this university."
## [17] "*JPN529:\tHowever, I continue to play lacrosse thank for my teammates."
## [18] "*JPN532:\tHowever, high school students don't have to select them."
## [19] "*JPN536:\tHowever, there are various and stimulus sports in the world."
## [20] "*JPN536:\tHowever, it is also important to do sports."
../
を付ける
write.table(jpn.however, "../jpn.however.txt")
結果はテキストファイルで保存される。
エクセルで、テキストファイルを読み込む
[ " ]
研究としては、エクセルの一覧表でデータを見ながら、研究目的に合わせた分析コードを記入していく。
分析結果を集計する
参考(howeverの生起位置) https://nuss.nagoya-u.ac.jp/s/42p2ZX6eRdpxq7q
jpn501
非構造化テキスト: フォーマットなし、ベタ打ちテキスト
構造化テキスト: フォーマットに従って整形されている(例:CHILDESのCHATフォーマット) https://www.sugiura-ken.org/wiki/wiki.cgi/exp?page=CHAT
@Begin
@Participants: CHI
*CHI: 発話データはこの部分に、一発話一行で表記.行頭は *で始まる
%COD: 発話の分析コードなど行頭は %で始まる。
(この発話部分が発話の数だけ繰り返される。)
@End
tmp1 <- grep("\\*JPN", jpn501, value=T)
head(tmp1)
## [1] "*JPN501:\tWhat kind of sports do you like?"
## [2] "*JPN501:\tDo you like soccer, base ball or swimming?"
## [3] "*JPN501:\tThere are many and variety sports around the world."
## [4] "*JPN501:\tA country has some traditional sports."
## [5] "*JPN501:\tOf course, there are some traditional sports in Japan."
## [6] "*JPN501:\tThey are called \"BUDO\"."
行頭の話者記号 “*JPN501:部分が不要
テキスト処理では、「削除」は何もないもので「置き換える」
置き換えるコマンドは gsub()
tmp2 <- gsub("\\*JPN501:\t", "", tmp1)
head(tmp2)
## [1] "What kind of sports do you like?"
## [2] "Do you like soccer, base ball or swimming?"
## [3] "There are many and variety sports around the world."
## [4] "A country has some traditional sports."
## [5] "Of course, there are some traditional sports in Japan."
## [6] "They are called \"BUDO\"."
tmp3 <- strsplit(tmp2, " ")
head(tmp3)
## [[1]]
## [1] "What" "kind" "of" "sports" "do" "you" "like?"
##
## [[2]]
## [1] "Do" "you" "like" "soccer," "base" "ball"
## [7] "or" "swimming?"
##
## [[3]]
## [1] "There" "are" "many" "and" "variety" "sports" "around"
## [8] "the" "world."
##
## [[4]]
## [1] "A" "country" "has" "some" "traditional"
## [6] "sports."
##
## [[5]]
## [1] "Of" "course," "there" "are" "some"
## [6] "traditional" "sports" "in" "Japan."
##
## [[6]]
## [1] "They" "are" "called" "\"BUDO\"."
tmp4 <- unlist(tmp3)
head(tmp4, 20)
## [1] "What" "kind" "of" "sports" "do" "you"
## [7] "like?" "Do" "you" "like" "soccer," "base"
## [13] "ball" "or" "swimming?" "There" "are" "many"
## [19] "and" "variety"
tmp5 <- sort(tmp4)
head(tmp5, 20)
## [1] "" "" "" "\"BUDO\"." "\"BUDO\"." "\"REI\""
## [7] "\"REI\"" "\"REI\"" "\"REI\"." "\"REI\"." "(the" "a"
## [13] "A" "about" "about" "also" "an" "an"
## [19] "and" "and"
length(tmp5)
## [1] 322
tmp6 <- table(tmp5)
tmp6
## tmp5
## "BUDO". "REI" "REI". (the a
## 3 2 3 2 1 1
## A about also an and And
## 1 2 1 2 7 1
## are around awful, ball base be
## 5 3 1 1 1 1
## beat because body bow BUDO BUDO,
## 2 1 1 3 7 2
## BUDOJYO but But called can cannot
## 3 1 2 1 3 1
## clean could country course, deeply. do
## 1 1 1 1 1 1
## Do each efforts. ended, enemy enemy.
## 1 3 1 1 1 1
## enter example, expression feel feeling fight
## 1 3 1 2 3 1
## Finally, First, for For from game
## 1 1 4 2 1 3
## game. give give. good grow. has
## 4 1 1 2 1 1
## have him, how If important in
## 2 1 1 2 1 4
## In involved is It Japan. Japanese
## 1 1 7 1 1 1
## JYUDO, KENDO, kind KYUDO leave like
## 1 1 1 1 1 1
## like? loves make many mate, mates.
## 1 1 1 6 1 1
## mental more much must no not
## 1 1 1 7 1 1
## of Of on on. only or
## 2 1 1 1 1 1
## other other, other. pain people place
## 3 1 1 1 3 1
## place) play players Players playing plays
## 1 4 5 1 3 1
## point points, proud remember. sad, same
## 1 1 1 1 1 1
## Secondly, should so So So, soccer,
## 1 1 2 1 1 2
## some sports sports. start, strong, strong.
## 2 5 1 1 1 1
## swimming? taught teach team thank that
## 1 1 1 2 6 5
## the Then, there There there. they
## 19 1 2 1 2 2
## They thing think this This tired,
## 1 1 1 1 2 1
## today, too. traditional variety We weak
## 1 1 3 1 1 1
## What when where which who will
## 1 4 1 1 2 1
## with without world world. would you
## 1 1 1 2 1 18
## You your yourself yourself.
## 1 2 3 3
length(tmp6)
## [1] 166
write.table(一覧表, “ファイル名.txt”)
現在のWorking Directoryに保存される。
NICER_NNSの中
write.table(tmp6, "../tmp6.txt")