*disclaimer
1198381
WordFreqList.R
- テキストファイルを指定して、語彙頻度一覧表を出力する。
WordFreqList.R(480)
WordFreqList <- function(i, decreasing = F) {
# 2019-12-28 sugiura@nagoya-u.jp
lines.tmp <- scan(i, what="char")
body.lower <- tolower(lines.tmp)
body.token <- unlist(strsplit(body.lower, "\\W"))
word.list <- sort(body.token)
word.list <- word.list[word.list != ""]
word.freq <- table(word.list)
if(decreasing != F){
word.df <- as.data.frame(word.freq)
word.df[order(word.df$Freq, decreasing=T), ]
}else{
as.data.frame(word.freq)
}
}
引数に、テキストファイルを指定する。WordFreqList("テキストファイル名")
> WordFreqList("JAN0001_P1B.txt")
Read 192 items
word.list Freq
1 a 3
2 about 1
3 academic 1
4 all 1
5 always 1
6 and 1
7 anything 1
8 are 2
9 be 1
10 because 2
11 become 2
12 better 2
13 big 1
14 bload 3
15 by 1
16 can 3
17 common 1
オプションで、降順「decreasing = T」を指定すると、高頻度順に出力する。
> WordFreqList("JAN0001_P1B.txt", decreasing = T)
Read 192 items
word.list Freq
92 to 11
107 you 11
57 not 6
40 is 5
44 knowledge 5
78 specialized 5
22 do 4
41 it 4
1 a 3
14 bload 3
16 can 3
31 get 3
60 of 3
86 that 3
8 are 2
10 because 2
ファイル名の代わりに「choose.files()」を指定すれば、窓を開いてファイルを選択できる。
> WordFreqList(choose.files(), decreasing = T)
Read 237 items
word.list Freq
17 can 8
105 they 8
127 young 8
75 not 7
85 person 7
46 have 6
51 i 5
77 of 5
102 their 5
9 and 4
10 are 4
78 old 4
100 that 4
108 this 4
110 time 4
112 to 4
121 with 4
16 but 3
27 enough 3
36 free 3
39 future 3
48 his 3
54 is 3
55 it 3
84 people 3
94 so 3
4 active 2
11 at 2
https://sugiura-ken.org/wiki/