*disclaimer
636971
WordFreqList.R
- テキストファイルを指定して、語彙頻度一覧表を出力する。
WordFreqList.R(333)
WordFreqList <- function(i, decreasing = F) { # 2019-12-28 sugiura@nagoya-u.jp lines.tmp <- scan(i, what="char") body.lower <- tolower(lines.tmp) body.token <- unlist(strsplit(body.lower, "\\W")) word.list <- sort(body.token) word.list <- word.list[word.list != ""] word.freq <- table(word.list) if(decreasing != F){ word.df <- as.data.frame(word.freq) word.df[order(word.df$Freq, decreasing=T), ] }else{ as.data.frame(word.freq) } }
引数に、テキストファイルを指定する。WordFreqList("テキストファイル名")
> WordFreqList("JAN0001_P1B.txt") Read 192 items word.list Freq 1 a 3 2 about 1 3 academic 1 4 all 1 5 always 1 6 and 1 7 anything 1 8 are 2 9 be 1 10 because 2 11 become 2 12 better 2 13 big 1 14 bload 3 15 by 1 16 can 3 17 common 1
オプションで、降順「decreasing = T」を指定すると、高頻度順に出力する。
> WordFreqList("JAN0001_P1B.txt", decreasing = T) Read 192 items word.list Freq 92 to 11 107 you 11 57 not 6 40 is 5 44 knowledge 5 78 specialized 5 22 do 4 41 it 4 1 a 3 14 bload 3 16 can 3 31 get 3 60 of 3 86 that 3 8 are 2 10 because 2
ファイル名の代わりに「choose.files()」を指定すれば、窓を開いてファイルを選択できる。
> WordFreqList(choose.files(), decreasing = T) Read 237 items word.list Freq 17 can 8 105 they 8 127 young 8 75 not 7 85 person 7 46 have 6 51 i 5 77 of 5 102 their 5 9 and 4 10 are 4 78 old 4 100 that 4 108 this 4 110 time 4 112 to 4 121 with 4 16 but 3 27 enough 3 36 free 3 39 future 3 48 his 3 54 is 3 55 it 3 84 people 3 94 so 3 4 active 2 11 at 2
https://sugiura-ken.org/wiki/