R !!!relativeFrequency 相対頻度について {{outline}} !!サンプルデータ 100万語 http://www.thegrammarlab.com/?nor-portfolio=1000000-word-sample-corpora http://micusp.elicorpora.info/ MICUSP Sample http://www.thegrammarlab.com/?nor-portfolio=1000000-word-sample-corpora# {{pre # サンプルデータを読み込む install.packages("readtext", dependencies = T) library(readtext) sample100 <- readtext(choose.files()) sample100 str(sample100) install.packages("quanteda") library(quanteda) sample100.corpus <- corpus(sample100) summary(sample100.corpus) sample100.dfm <- dfm(sample100.corpus) summary(sample100.dfm) sample100.tokens <- tokens(sample100orpus) #サンプリング tokens_chunk(sample100.tokens, 100) }} !!全体の高頻度語 {{pre > topfeatures(sample100.dfm) the of to and in a is that for as 68659 38144 28852 28657 22834 19779 15292 14552 9630 9208 > topfeatures(sample100.dfm, 100) the of to and in a is that for as this be are with 68659 38144 28852 28657 22834 19779 15292 14552 9630 9208 7612 7154 6860 6814 s it on not by was from an their have or which at i 6396 6227 5998 5401 5374 4712 4497 4194 3932 3820 3804 3415 3314 3231 they his were we can more one these will he has but also would 3171 3123 2953 2898 2866 2837 2833 2817 2802 2645 2600 2541 2425 2424 other all there her between when than if two only because about such may 2067 2056 1960 1867 1863 1832 1795 1775 1695 1675 1666 1634 1584 1534 however time been what its into who so students each both first social most 1515 1505 1478 1475 1467 1465 1407 1384 1379 1367 1355 1345 1340 1328 how had while some do people she through used no could does many them 1308 1300 1258 1258 1248 1225 1220 1206 1202 1198 1185 1179 1170 1167 different then our my women use out state being well data study system should 1145 1083 1077 1049 1023 1016 1015 1000 985 973 971 961 943 938 even where 930 906 }}