*disclaimer
1196756
[R]
xmlconvert
References
- https://cran.r-project.org/web/packages/xmlconvert/readme/README.html
- https://rdrr.io/cran/xmlconvert/man/xml_to_df.html
- https://githubmemory.com/repo/jsugarelli/xmlconvert
- https://github.com/jsugarelli/xmlconvert#readme
- https://github.com/jsugarelli/xmlconvert/
If you want to convert XML data to R data frames or vice-versa,
it is recommend to install the 'xmlconvert' package and
use its xml_to_df() and df_to_xml() functions.
Type 'install.packages("xmlconvert", dependencies = TRUE)'
into the R console' to install 'xmlconvert'.
XMLの問題点
- XMLでは、内容が、タグの属性として表記される場合と、タグの要素として表記される場合がある。
- xmlconvertは、そのいずれかのパタンしか指定できない。
- ゆえに、両方混在しているXML文書の場合、タグごとに属性を変換する必要がある。
例:EFCAMDAT
<writing id="47436" level="7" unit="4">
<learner id="169213" nationality="us"/>
<topic id="52">Writing about a memorable experience</topic>
<date>2011-11-23 19:15:17.060</date>
<grade>75</grade>
<text>
When very young, I was very decisive, as to my decision to split was ...
</text>
</writing>
属性と要素と個別に変換し、結果をcbind()で結合
library(xmlconvert)
writing.df <- xml_to_df("EFsample1.xml",
records.tag = "writing", fields="attributes")
learner.df <- xml_to_df("EFsample1.xml",
records.tag = "learner", fields="attributes")
topic.df <- xml_to_df("EFsample1.xml",
records.tag = "topic", fields="attributes")
tags.df <- xml_to_df("EFsample1.xml",
records.tag = "writing", fields="tags")
EFsample1b.df <- cbind(writing.df, learner.df, topic.df, tags.df)
str(EFsample1b.df) 'data.frame': 10 obs. of 11 variables: $ id : num 7635 25056 25057 33028 34198 ... $ level : num 7 7 7 7 7 7 7 7 7 7 $ unit : num 1 1 2 1 1 1 1 2 3 4 $ id : num 169797 103857 103857 119717 76698 ... $ nationality: chr "us" "us" "us" "us" ... $ id : num 49 49 50 49 49 49 49 50 51 52 $ learner : num NA NA NA NA NA NA NA NA NA NA $ topic : chr "Giving instructions to play a game" "Giving instructions to play a game" "Planning for the future" "Giving instructions to play a game" ... $ date : chr "2012-04-28 04:19:03.927" "2012-02-29 08:29:14.813" "2012-04-16 10:17:12.440" "2011-09-05 18:35:08.997" ... $ grade : num 65 90 82 83 86 86 89 88 92 75 $ text : chr "\n instructions for frisbee bowling. mark the bowling alley an ...
https://sugiura-ken.org/wiki/