I\'m new to R and I\'m trying to load 100 or so txt files with three columns Name, Frequency and Gender into a single data frame. The files are all name \"yob1990.txt\" etc.
You could also use fread
and rbindlist
from data.table
. If the files are in the working directory,
f1 <- list.files(pattern="^yob.*\\.txt")
f1 #created 3 files
#[1] "yob1990.txt" "yob1991.txt" "yob1992.txt"
library(data.table)
library(stringr)
year <- as.numeric(str_extract(f1, perl("[0-9]+(?=\\.txt)")))
res <- rbindlist(Map(`cbind`, lapply(f1, fread), year=year))
head(res)
# Name Frequency Gender year
#1: Sam 24 Male 1990
#2: Gobi 22 Male 1990
#3: Rose 44 Female 1990
#4: Anita 35 Female 1990
#5: John 44 Male 1991
#6: Sofia 52 Female 1991
Or you could use unnest
from tidyr
devtools::install_github("hadley/tidyr")
library(tidyr)
res1 <- unnest(setNames(lapply(f1, fread), year), year)
head(res1)
# year Name Frequency Gender
#1 1990 Sam 24 Male
#2 1990 Gobi 22 Male
#3 1990 Rose 44 Female
#4 1990 Anita 35 Female
#5 1991 John 44 Male
#6 1991 Sofia 52 Female
I would use a workflow something like this, which assumes (1) that the only .txt
files in the specified path are the ones you want read in, and (2) that the only numerals in the filenames are the digits of the years.
f <- list.files('path/to/files', patt='\\.txt$', full.names=TRUE)
# replace path above as required
d <- do.call(rbind, lapply(f, function(x) {
d <- read.table(x, header=TRUE) # add sep argument as required
d$Year <- as.numeric(gsub('\\D', '', basename(x)))
d
}))
f
will be a vector of full paths to the files you need to read in.
lapply
considers each filename in turn (each element of f
), temporarily refers to that filename as x
, and performs everything in between the curly braces.
gsub('\\D', '', basename(x))
performs a "find and replace"-type operation on basename(x)
(which is the filename of the currently considered file, excluding the structure of the directory containing the file). We look for all non-digit characters ('\\D'
), and replace them with nothing (''
). We add the result of this gsub
operation (which is the year, assuming no other digits lurk in the filename) to a new Year
column of the data.frame.
Finally, we return d
, and once lapply
has performed this procedure on all files in f
, we row bind them all together with do.call(rbind, ...)
.