I have a large data file consisting of a single line of text. The format resembles
Cat 14 Dog 15 Horse 16
I\'d eventually like to get
Method 1: (extracting from long vector with seq()
> inp <- scan(textConnection("Cat 14 Dog 15 Horse 16"), what="character")
Read 6 items
> data.frame(animal = inp[seq(1,length(inp), by=2)],
numbers =as.numeric(inp[seq(2,length(inp), by=2)]))
animal numbers
1 Cat 14
2 Dog 15
3 Horse 16
Method 2: (using the "what" argument to scan to greater effect)
> inp <- data.frame(scan(textConnection("Cat 14 Dog 15 Horse 16"),
what=list("character", "numeric")))
Read 3 records
> names(inp) <- c("animals", "numbers")
> inp
animals numbers
1 Cat 14
2 Dog 15
3 Horse 16
This is a refinement of the Method 2: (was worried about possibility of very long column names in the result from scan() so I read the help page again and added names to the what argument values:
inp <- data.frame(scan(textConnection("Cat 14 Dog 15 Horse 16"),
what=list( animals="character",
numbers="numeric")))
Read 3 records
> inp
animals numbers
1 Cat 14
2 Dog 15
3 Horse 16
This solution takes full advantage of scan()
's what
argument, and seems simpler (to me) than any of the others:
x <- scan(file = textConnection("Cat 14 Dog 15 Horse 16"),
what = list(Animal=character(), Number=numeric()))
# Convert x (at this point a list) into a data.frame
as.data.frame(x)
# Animal Number
# 1 Cat 14
# 2 Dog 15
# 3 Horse 16
Assuming that the white space is a delimiter, you can use the following mechanism:
scan
to read the filematrix
, then to a data.frame
The code:
x <- scan(file=textConnection("
Cat 14 Dog 15 Horse 16
"), what="character")
xx <- as.data.frame(matrix(x, ncol=2, byrow=TRUE))
names(xx) <- c("Animal", "Number")
xx$Number <- as.numeric(xx$Number)
The results:
xx
Animal Number
1 Cat 1
2 Dog 2
3 Horse 3
Here is another approach
string <- readLines(textConnection(x))
string <- gsub("(\\d+)", "\\1\n", string, perl = TRUE)
dat <- read.table(text = string, sep = "")
One way:
# read the line
r <- read.csv("exa.Rda",sep=" ", head=F)
# every odd number index is an animal
animals <- r[,(1:ncol(r)-1)%%2==0]
# every even number index is a number
numbers <- r[,(1:ncol(r))%%2==0]
# flipping the animal row into a column
animals <- t(animals)
# flipping the number row into a column
numbers <- t(numbers)
# putting the data together
mydata <- data.frame(animals, numbers)
Here's one solution using a variety of tools/hacks, specifically:
strplit
to split on space characters (\\s
)unlist
to coerce the list returned by strsplit
into a vectormatrix
to turn the vector into the appropriate shapedata.frame
to allow for columns of different modeas.character
and as.numeric
to convert the Count column from a factorHere's everything put together:
txt <- "Cat 14 Dog 15 Horse 16"
out <- data.frame(matrix(unlist(strsplit(txt, "\\s")), ncol = 2, byrow = TRUE, dimnames = list(NULL, c("Animal", "Count"))))
out$Count <- as.numeric(as.character(out$Count))
str(out)
'data.frame': 3 obs. of 2 variables:
$ Animal: Factor w/ 3 levels "Cat","Dog","Horse": 1 2 3
$ Count : num 14 15 16