myGrep4.R

ディレクトリー内のすべてのファイルに対し、対象とする文字列が何回出現するかの一覧表を出力
- 検索対象が二つの場合の例。
  - スクリプト中の　a,b の箇所を、増やせば、いくつでも対応可

使い方

データ（のみ）の入っているディレクトリーに作業ディレクトリーを変更
a,bを指定して実行
出力ファイルを聞いてくるので、作業ディレクトリーの外に、ファイル名をきめて「保存」
結果は、テキストファイルとして保存される。

myGrep4 <- function(a,b){

	#copyleft 2019-01-25 sugiura@nagoya-u.jp
	output.file = choose.files()
	#結果を保存するファイル名を指定する。保存場所に注意
	#（ファイル名を聞かれたら適当に名前を付ける。例えば jpn.txt）

	files <- list.files()
	for (i in files) {

	hit <- vector()

	lines.tmp <- scan(i, what="char", sep="\n")
		#ファイルを選択。

	data.tmp <- grep("\\*(JPN|NS)...:\t", lines.tmp, value=T)
		#*で始まり
		# JPNかNS があって、
		# その後ろに、３文字あって、
		# その後ろに、コロンの記号とタブ記号がある行のみ。
	body.tmp <- gsub("\\*(JPN|NS)...:\t", "", data.tmp)
		#行頭の記号とタブ記号を削除。
	body.tmp <- body.tmp[body.tmp != ""]
		# 空の要素を削除（空でない要素のみを残す）する「イディオム」
	body.lower <- tolower(body.tmp)
									# 小文字にして
	body.nopunc <- gsub("\\W", " ", body.lower)
									# 記号をスペースに
	body.single <- gsub(" +", " ", body.nopunc)
									# 重複スペースを一つに
	body.clear <- gsub(" $", "", body.single)
									# 文末スペースの削除
	body.token <- unlist(strsplit(body.clear, " "))
									# 
	body.token <- body.token[body.token != ""]
							# 空の要素を削除する「イディオム」1

		hit.a <- grep(a, body.token)
		hit.b <- grep(b, body.token)
	
	cat(i, length(hit.a), length(hit.b), "\n", file=output.file, append=T)

	}
}

使用例

TOP ↑ ↓

> myGrep4("and", "but")
> and.but <- read.table(choose.files())
> head(and.but)
          V1 V2 V3
1 JPN501.txt  8  3
2 JPN502.txt  9  0
3 JPN503.txt  6  2
4 JPN504.txt  7  1
5 JPN505.txt 15  1
6 JPN506.txt  5  0
> class(and.but)
[1] "data.frame"
> colnames(and.but) <- c("file", "and", "but")
> head(and.but)
        file and but
1 JPN501.txt   8   3
2 JPN502.txt   9   0
3 JPN503.txt   6   2
4 JPN504.txt   7   1
5 JPN505.txt  15   1
6 JPN506.txt   5   0
> attach(and.but)
> plot(and, but)

myGrep4.R

使い方

使用例

https://sugiura-ken.org/wiki/

Menu

keyword

category

更新履歴