トップ 差分 一覧 Farm ソース 検索 ヘルプ PDF RSS ログイン

xmlconvert

*disclaimer
196978

[R]

xmlconvert


References




If you want to convert XML data to R data frames or vice-versa, 
it is recommend to install the 'xmlconvert' package and 
use its xml_to_df() and df_to_xml() functions. 
Type 'install.packages("xmlconvert", dependencies = TRUE)' 
into the R console' to install 'xmlconvert'.

XMLの問題点

  • XMLでは、内容が、タグの属性として表記される場合と、タグの要素として表記される場合がある。
  • xmlconvertは、そのいずれかのパタンしか指定できない。
  • ゆえに、両方混在しているXML文書の場合、タグごとに属性を変換する必要がある。

例:EFCAMDAT

  <writing id="47436" level="7" unit="4">
    <learner id="169213" nationality="us"/>
    <topic id="52">Writing about a memorable experience</topic>
    <date>2011-11-23 19:15:17.060</date>
    <grade>75</grade>
    <text>
      When very young, I was very decisive, as to my decision to split was ...
    </text>
  </writing>

属性と要素と個別に変換し、結果をcbind()で結合

library(xmlconvert)
writing.df <- xml_to_df("EFsample1.xml", 
                           records.tag = "writing", fields="attributes")
learner.df <- xml_to_df("EFsample1.xml", 
                           records.tag = "learner", fields="attributes")
topic.df <- xml_to_df("EFsample1.xml", 
                           records.tag = "topic", fields="attributes")
tags.df <- xml_to_df("EFsample1.xml", 
                           records.tag = "writing", fields="tags")

EFsample1b.df <- cbind(writing.df, learner.df, topic.df, tags.df)

str(EFsample1b.df)

'data.frame':	10 obs. of  11 variables:
 $ id         : num  7635 25056 25057 33028 34198 ...
 $ level      : num  7 7 7 7 7 7 7 7 7 7
 $ unit       : num  1 1 2 1 1 1 1 2 3 4
 $ id         : num  169797 103857 103857 119717 76698 ...
 $ nationality: chr  "us" "us" "us" "us" ...
 $ id         : num  49 49 50 49 49 49 49 50 51 52
 $ learner    : num  NA NA NA NA NA NA NA NA NA NA
 $ topic      : chr  "Giving instructions to play a game" "Giving instructions to play a game" "Planning for the future" "Giving instructions to play a game" ...
 $ date       : chr  "2012-04-28 04:19:03.927" "2012-02-29 08:29:14.813" "2012-04-16 10:17:12.440" "2011-09-05 18:35:08.997" ...
 $ grade      : num  65 90 82 83 86 86 89 88 92 75
 $ text       : chr  "\n      instructions for frisbee bowling. mark the bowling alley an ...