1 イントロダクション

1.1 学習者コーパスNICER 1.3

1.1.1 NICERの概要

日本語母語大学生・大学院生の英文エッセイ（381個）
参照用に英語母語話者の英文エッセイ（71個）
制限時間１時間（監督者あり）
辞書など参考書なし
三つのトピック（education, money, sports）から一つ選ぶ
ワープロソフト使用（スペルチェックと文書校正機能はオフ）
書き終わった後、本人がスペルチェックのみ実行
データはCHATフォーマット形式に整形
英語母語話者の添削文添付

1.1.2 ダウンロード

http://sgr.gsid.nagoya-u.ac.jp/wordpress/?page_id=1301

NICER1_3.zip
マウス右ボタンクリックで、「開く」(すべて解凍）

NICER1.3

1.1.3 内容

ファイル・フォルダ	説明
NICER1_3readme_2020-01-16.txt	概要の説明
Learner_Instructions.pdf	学習者用指示文
Learner_Profile_List.xls	学習者情報一覧
Learner_Questionnaire.pdf	学習者用質問事項
Native_Instructions.pdf	母語話者用指示文
Native_Profile_List.xls	母語話者情報一覧
Native_Questionnaire.pdf	母語話者向け質問事項
NICER_NNS/	学習者コーパス・データ
NICER_NS/	母語話者コーパス・データ

1.2 Rのダウンロードとインストール

https://cran.r-project.org/

Download R for Windows

1.3 Rの起動と終了

起動：「スタート」から選ぶ
終了：「メニュー」の「ファイル」から「終了」を選ぶか、コマンド　q()

1.4 Rによる簡単な計算

コンソール（「R Console」という窓）でコマンドを実行　
プロンプトの右にコマンド（命令）を書く

（注）#はコメント
+   # 足す
-   # 引く
*   # かける
/   # 割る
^   # べき乗
sqrt(x) #ルート (1/2乗)
log(x)  #xの自然対数を取る。底はe 
log2(x) #xの対数を取る。底は2

1.4.1 計算例

1 + 1

## [1] 2

8 - 5

## [1] 3

2 * 2

## [1] 4

15 / 3

## [1] 5

2^3

## [1] 8

sqrt(9)

## [1] 3

log(2.7)

## [1] 0.9932518

log2(8)

## [1] 3

1.5 変数の作成と値の代入

変数という入れ物に値を代入する。
変数名は、英数字と_.が使える。Rのコマンドと重複しないように。

1.6 変数

数字と文字を区別する
文字はダブルクオートに入れる
<- で入力

abc <- 3        # abcという変数を作り、3を代入
abc         # abcの中身を表示

## [1] 3

abc <- 300  # abcの値に300が上書きされる
abc         # abcの中身を表示

## [1] 300

efj <- 6

jke <- 2987

ls()        # これまでに作った変数一覧を表示

## [1] "abc" "efj" "jke"

rm(jke)     #変数jkeを削除

ls()

## [1] "abc" "efj"

1.6.1 文字と数字を区別する

namae <- "sugiura"  # namaeにはsugiuraという文字列が入る

class(abc)      # class(変数名)で変数の「クラス」を表示

## [1] "numeric"

class(namae)

## [1] "character"

1.7 配列（Rでは「ベクトル」と呼ぶ）

1.7.1 配列＝入れ物が複数ならんでいる変数

<- c( , , , ) で入れる

kazu <- c(2,4,6,8)  # kazuという配列を作り、2, 4, 6, 8 を代入

kazu

## [1] 2 4 6 8

names <- c("I","you","he") # 配列の中身に文字列を代入

names

## [1] "I"   "you" "he"

1.7.2 要素がいくつ入っているか調べる： length()

length(kazu)

## [1] 4

length(names)

## [1] 3

1.8 作業スペース・履歴の保存と読み込み

作業スペースを保存しておくと、作業記録をとっておける。
作業を中断する場合、作業スペースと履歴を保存しておくと、作成した変数や使用したコマンドが保存されるので便利。

1.8.1 作業スペースの保存（「ワークスペース」と呼ばれることもある）

Rのメニューバーより、「ファイル」＞「作業スペースの保存」を選択
保存したい場所を指定し、ファイル名を付ける。この時、拡張子を.RData

1.8.2 履歴の保存

Rのメニューバーより、「ファイル」＞「履歴の保存」を選択
保存したい場所を指定し、ファイル名を付ける。拡張子を.Rhistory

1.8.3 作業スペースの読み込み

Rのメニューバーより、「ファイル」＞「作業スペースの読み込み」を選択
読み込みたいファイル名を選択し、「開く」をクリック
もしくは、読み込みたいファイルを保存したフォルダを開き、該当ファイルをダブルクリック。

1.8.4 履歴の読み込み

Rのメニューバーより、「ファイル」＞「履歴の読み込み」を選択
読み込みたいファイル名を選択し、「開く」をクリック

1.8.5 作業したコンソール画面上の記録の保存

Rのメニューバーより、「ファイル」＞「ファイルを保存」を選択
保存したい場所を指定し、ファイルを保存する。デフォルトでは、lastsave.txt というファイル名。日付をつけて保存すると便利。

1.9 ファイルとして保存されているデータの読み込み： scan()

scan(file="ファイルの場所と名前", what="char")
    # どこのフォルダーの何という名前のファイルか （後で作業ディレクトリの話）
    # 文字データなので、what="char"を指定

1.9.1 ウインドを開いて、GUIでファイルを選ぶ： scan(choose.files(), what=“char”)

MacOSは、file.choose()

1.9.2 例：母語話者ファイル（NS501.txt）を読み込んでみる。

scan(choose.files(), what="char")

(出力例) 
Read 853 items
  [1] "@Begin"                   "@Participants:"          
  [3] "NS501"                    "@PID:"                   
  [5] "PIDNS501"                 "@Age:"                   
  [7] "27"                       "@Sex:"                   
  [9] "M"                        "@L1:"                    
 [11] "AmE"                      "@FatherL1:"              
 [13] "none"                     "@MotherL1:"              
（以下省略）

1.10 セパレーターの話

データの単位を何で区切るか
デフォルトは　スペース

1.10.1 行単位で読み込むには、セパレーターを指定する： sep=“”

セパレータを改行マーク()に変更。（デフォルトではスペース）

scan(choose.files(), what="char", sep="\n")

(出力例) 
Read 104 items
  [1] "@Begin"                                                                                                     
  [2] "@Participants:\tNS501"      
  [3] "@PID:\tPIDNS501"               
  [4] "@Age:\t27"                   
  [5] "@Sex:\tM"     
  [6] "@L1:\tAmE"           
  [7] "@FatherL1:\tnone"  
  [8] "@MotherL1:\tnone"   
  [9] "@AcademicBackground:\tM1" 
（以下省略）

1.10.2 読み込んだファイルをRの中にデータとして保存する

母語話者データのファイル　NS501.txt を読み込んでRの中に変数（配列）として保存。
配列名「ns501」という名前にすることにする

ns501 <- scan(choose.files(), what="char", sep ="\n")

1.11 「作業ディレクトリ」とファイルの一覧

1.11.1 「作業ディレクトリ」の移動

「ファイル」＞「ディレクトリの変更」
（「File」　　＞「Change dir」）

1.11.2 PCのファイルシステム内の「どこにいるか」（どこのフォルダー（ディレクトリ）内か）を確認

getwd()

## [1] "C:/Users/sugiura/Dropbox/ed/2020/2020後期/金曜２　第二特論/Rstudio-text/LCR"

1.11.3 ファイル・フォルダの一覧表示

list.files()

(出力例) 
  [1] "JPN501.txt" "JPN502.txt" "JPN503.txt" "JPN504.txt"
  [5] "JPN505.txt" "JPN506.txt" "JPN507.txt" "JPN508.txt"
  [9] "JPN509.txt" "JPN510.txt" "JPN511.txt" "JPN512.txt"
 [13] "JPN513.txt" "JPN514.txt" "JPN515.txt" "JPN516.txt"

1.11.4 フォルダの中への移動（＝作業ディレクトリーの移動）

setwd("NICER1_3")　 # 引用符に入れる点に注意

setwd("NICER_NNS")　　

# setwd("..")　　　　　　# 一つ上へ移動

setwd("../NICER_NS")　　# 一つ上へ行ってその下にあるNICE-NSへ移動

getwd()

## [1] "C:/Users/sugiura/Dropbox/ed/2020/2020後期/金曜２　第二特論/Rstudio-text/LCR/NICER1_3/NICER_NS"

1.11.5 ファイルの読み込み

★「choose.files()」の代わりに、ファイル名を直接明記する

setwd("NICER1_3/NICER_NNS")

scan("JPN501.txt", what="char")

scan("JPN501.txt", what="char", sep="\n")

1.11.6 Rの中に「オブジェクト」として読み込む

setwd("NICER1_3/NICER_NS")

ns501 <- scan("ns501.txt", what="char", sep="\n")

1.11.7 エディタ

「ファイル」＞「新しいスクリプト」で、「Rエディタ」を開いて、そこに命令をコピペ＆編集

1.12 ■作業■

命令をコピペして、ファイル名を修正し、母語話者データ、学習者データ、それぞれ10ファイルずつ、読み込んでRの中に配列として保存してください。

1.13 NICERデータのフォーマット

現在のNICERのデータは、CHATフォーマットになっている。

http://childes.psy.cmu.edu/

「CHAT Transcription Manual」
『今日から使える発話データベースCHILDES入門』　宮田Susanne 編　Brian MacWhinney 監修　　ひつじ書房　2004年11月
CHATフォーマットでは、一つのファイルに、「ヘッダー情報」と「本文情報」が入っている。
ヘッダー部分は、行頭が @ で始まる。
本文部分は、　　行頭が * で始まる。

@Begin
@Languages: en
@Participants: CHI Ross Child, FAT Brian Father
@ID: en|macwhinney|CHI|2;10.10||||Target_Child||
@ID: en|macwhinney|FAT|35;2.||||Target_Child||
*ROS: why isn't Mommy coming?
%com: Mother usually picks Ross up around 4 PM.
*FAT: don't worry.
*FAT: she'll be here soon.
*CHI: good.
@End

MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk. 3rd Edition. Mahwah, NJ: Lawrence Erlbaum Associates

2 文字列の検索：grep(“検索文字列”, 変数名)

grep("However", ns501)

## [1] 45 54 84

要素番号（＝行番号）が表示される。

2.1 grepで、要素番号でなく、中身そのものを出力するオプション：　value=T

grep("However", ns501, value=T)

## [1] "*NS501:\tHowever in the French educational system instead of a head or a body there is a thesis and an anti-thesis or point and counter point in which the writer must oppose his or her original statements."
## [2] "*NS501:\tHowever what the French lose in logical flow they gain in critical thinking."                                                                                                                        
## [3] "*NS501:\tHowever, sadly with the continuous failings of the American educational system, these lofty dreams yet remain dreams for a generation of potential Newtons and Einsteins."

grep("however", ns501, value=T)   # 小文字 h

## [1] "*NS501:\tThis makes the facts easy to access, however, it does not force the writer to challenge his or her own logic in the process, leaving the ideas themselves rigid."

別の方法：該当する行のデータを取り出す

ns501[grep("However", ns501)]

## [1] "*NS501:\tHowever in the French educational system instead of a head or a body there is a thesis and an anti-thesis or point and counter point in which the writer must oppose his or her original statements."
## [2] "*NS501:\tHowever what the French lose in logical flow they gain in critical thinking."                                                                                                                        
## [3] "*NS501:\tHowever, sadly with the continuous failings of the American educational system, these lofty dreams yet remain dreams for a generation of potential Newtons and Einsteins."

2.2 grepで正規表現を使う

Rでは正規表現「*」の「エスケープ」に「\」を二重に使う点に注意）

http://stat.biopapyrus.net/r/regex.html

grep("[hH]owever", ns501, value=T)

## [1] "*NS501:\tHowever in the French educational system instead of a head or a body there is a thesis and an anti-thesis or point and counter point in which the writer must oppose his or her original statements."
## [2] "*NS501:\tThis makes the facts easy to access, however, it does not force the writer to challenge his or her own logic in the process, leaving the ideas themselves rigid."                                    
## [3] "*NS501:\tHowever what the French lose in logical flow they gain in critical thinking."                                                                                                                        
## [4] "*NS501:\tHowever, sadly with the continuous failings of the American educational system, these lofty dreams yet remain dreams for a generation of potential Newtons and Einsteins."

もしくは、grepの大文字小文字区別しないオプションを付ける： ignore.case=T

grep("however", ns501, value=T, ignore.case=T)

## [1] "*NS501:\tHowever in the French educational system instead of a head or a body there is a thesis and an anti-thesis or point and counter point in which the writer must oppose his or her original statements."
## [2] "*NS501:\tThis makes the facts easy to access, however, it does not force the writer to challenge his or her own logic in the process, leaving the ideas themselves rigid."                                    
## [3] "*NS501:\tHowever what the French lose in logical flow they gain in critical thinking."                                                                                                                        
## [4] "*NS501:\tHowever, sadly with the continuous failings of the American educational system, these lofty dreams yet remain dreams for a generation of potential Newtons and Einsteins."

2.3 ■課題■

学習者と母語話者が使っている表現を検索して比べてみよう。

R for Learner Corpus Research 2020

sugiura

2021/02/20

1 イントロダクション

1.1 学習者コーパスNICER 1.3

1.1.1 NICERの概要

1.1.2 ダウンロード

1.1.3 内容

1.2 Rのダウンロードとインストール

1.3 Rの起動と終了

1.4 Rによる簡単な計算

1.4.1 計算例

1.5 変数の作成と値の代入

1.6 変数

1.6.1 文字と数字を区別する

1.7 配列（Rでは「ベクトル」と呼ぶ）

1.7.1 配列＝入れ物が複数ならんでいる変数

1.7.2 要素がいくつ入っているか調べる： length()

1.8 作業スペース・履歴の保存と読み込み

1.8.1 作業スペースの保存 （「ワークスペース」と呼ばれることもある）

1.8.2 履歴の保存

1.8.3 作業スペースの読み込み

1.8.4 履歴の読み込み

1.8.5 作業したコンソール画面上の記録の保存

1.9 ファイルとして保存されているデータの読み込み： scan()

1.9.1 ウインドを開いて、GUIでファイルを選ぶ： scan(choose.files(), what=“char”)

1.9.2 例：母語話者ファイル（NS501.txt）を読み込んでみる。

1.10 セパレーターの話

1.10.1 行単位で読み込むには、セパレーターを指定する： sep=“”

1.10.2 読み込んだファイルをRの中にデータとして保存する

1.11 「作業ディレクトリ」とファイルの一覧

1.11.1 「作業ディレクトリ」の移動

1.11.2 PCのファイルシステム内の「どこにいるか」（どこのフォルダー（ディレクトリ）内か）を確認

1.11.3 ファイル・フォルダの一覧表示

1.11.4 フォルダの中への移動（＝作業ディレクトリーの移動）

1.11.5 ファイルの読み込み

1.11.6 Rの中に「オブジェクト」として読み込む

1.11.7 エディタ

1.12 ■作業■

1.13 NICERデータのフォーマット

2 文字列の検索：grep(“検索文字列”, 変数名)

2.1 grepで、要素番号でなく、中身そのものを出力するオプション： value=T

2.2 grepで正規表現を使う

2.3 ■課題■

1.8.1 作業スペースの保存（「ワークスペース」と呼ばれることもある）

2.1 grepで、要素番号でなく、中身そのものを出力するオプション：　value=T