トップ 差分 一覧 ソース 検索 ヘルプ PDF RSS ログイン

NICER1.1SampleData

*disclaimer
55351

NICER1.1SampleData

  • myTextIndexTopic.R で言語特徴量を抽出

myTextIndexTopic.R

> head(JPNindexTopic)
        file     Topic Score Token Type NoS       TTR       GI     MATTR      AWL      ASL
1 JPN501.txt    sports     4   994  260 123 0.2615694 8.246699 0.4919115 3.888330 8.081301
2 JPN502.txt education     4   997  283 120 0.2838516 8.962700 0.5160281 3.925777 8.308333
3 JPN503.txt education     3   561  207  70 0.3689840 8.739547 0.5566488 4.247772 8.014286
4 JPN504.txt    sports     4   817  245 114 0.2998776 8.571465 0.4998409 4.057528 7.166667
5 JPN505.txt    sports     4  1024  274 106 0.2675781 8.562500 0.5182812 3.676758 9.660377
6 JPN506.txt     money     3   638  197  93 0.3087774 7.799305 0.5082132 3.578370 6.860215
> anyNA(JPNindexTopic)
[1] TRUE
> str(JPNindexTopic)
'data.frame':	349 obs. of  11 variables:
 $ file : Factor w/ 349 levels "JPN501.txt","JPN502.txt",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Topic: Factor w/ 3 levels "education","money",..: 3 1 1 3 3 2 1 3 3 1 ...
 $ Score: int  4 4 3 4 4 3 4 3 4 3 ...
 $ Token: int  994 997 561 817 1024 638 1033 635 734 575 ...
 $ Type : int  260 283 207 245 274 197 274 185 203 201 ...
 $ NoS  : int  123 120 70 114 106 93 111 93 91 72 ...
 $ TTR  : num  0.262 0.284 0.369 0.3 0.268 ...
 $ GI   : num  8.25 8.96 8.74 8.57 8.56 ...
 $ MATTR: num  0.492 0.516 0.557 0.5 0.518 ...
 $ AWL  : num  3.89 3.93 4.25 4.06 3.68 ...
 $ ASL  : num  8.08 8.31 8.01 7.17 9.66 ...
> anyNA(JPNindexTopic)
[1] TRUE
> JPNindexTopic.b <- na.omit(JPNindexTopic)
> str(JPNindexTopic.b)
'data.frame':	347 obs. of  11 variables:
 $ file : Factor w/ 349 levels "JPN501.txt","JPN502.txt",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Topic: Factor w/ 3 levels "education","money",..: 3 1 1 3 3 2 1 3 3 1 ...
 $ Score: int  4 4 3 4 4 3 4 3 4 3 ...
 $ Token: int  994 997 561 817 1024 638 1033 635 734 575 ...
 $ Type : int  260 283 207 245 274 197 274 185 203 201 ...
 $ NoS  : int  123 120 70 114 106 93 111 93 91 72 ...
 $ TTR  : num  0.262 0.284 0.369 0.3 0.268 ...
 $ GI   : num  8.25 8.96 8.74 8.57 8.56 ...
 $ MATTR: num  0.492 0.516 0.557 0.5 0.518 ...
 $ AWL  : num  3.89 3.93 4.25 4.06 3.68 ...
 $ ASL  : num  8.08 8.31 8.01 7.17 9.66 ...
 - attr(*, "na.action")= 'omit' Named int  83 159
  ..- attr(*, "names")= chr  "83" "159"