Importing one long line of data into R

前端 未结 6 1622
醉酒成梦
醉酒成梦 2021-02-05 18:58

I have a large data file consisting of a single line of text. The format resembles

Cat    14  Dog    15  Horse  16

I\'d eventually like to get

相关标签:
6条回答
  • 2021-02-05 19:33

    Method 1: (extracting from long vector with seq()

    > inp <- scan(textConnection("Cat 14 Dog 15 Horse 16"), what="character")
    Read 6 items
    > data.frame(animal = inp[seq(1,length(inp), by=2)], 
                 numbers =as.numeric(inp[seq(2,length(inp), by=2)]))
      animal numbers
    1    Cat      14
    2    Dog      15
    3  Horse      16
    

    Method 2: (using the "what" argument to scan to greater effect)

    > inp <- data.frame(scan(textConnection("Cat 14 Dog 15 Horse 16"), 
                         what=list("character", "numeric")))
    Read 3 records
    > names(inp) <- c("animals", "numbers")
    > inp
      animals numbers
    1     Cat      14
    2     Dog      15
    3   Horse      16
    

    This is a refinement of the Method 2: (was worried about possibility of very long column names in the result from scan() so I read the help page again and added names to the what argument values:

    inp <- data.frame(scan(textConnection("Cat 14 Dog 15 Horse 16"), 
                            what=list( animals="character", 
                                       numbers="numeric")))
    Read 3 records
    > inp
      animals numbers
    1     Cat      14
    2     Dog      15
    3   Horse      16
    
    0 讨论(0)
  • 2021-02-05 19:43

    This solution takes full advantage of scan()'s what argument, and seems simpler (to me) than any of the others:

    x <- scan(file = textConnection("Cat 14 Dog 15 Horse 16"), 
              what = list(Animal=character(), Number=numeric()))
    
    # Convert x (at this point a list) into a data.frame
    as.data.frame(x)
    #   Animal Number
    # 1    Cat     14
    # 2    Dog     15
    # 3  Horse     16
    
    0 讨论(0)
  • 2021-02-05 19:46

    Assuming that the white space is a delimiter, you can use the following mechanism:

    • Use scan to read the file
    • Convert the results to a matrix, then to a data.frame

    The code:

    x <- scan(file=textConnection("
    Cat 14 Dog 15 Horse 16
    "), what="character")
    
    xx <- as.data.frame(matrix(x, ncol=2, byrow=TRUE))
    names(xx) <- c("Animal", "Number")
    xx$Number <- as.numeric(xx$Number)
    

    The results:

    xx
    
      Animal Number
    1    Cat      1
    2    Dog      2
    3  Horse      3
    
    0 讨论(0)
  • 2021-02-05 19:48

    Here is another approach

    string <- readLines(textConnection(x))
    string <- gsub("(\\d+)", "\\1\n", string, perl = TRUE)
    dat    <- read.table(text = string, sep = "")
    
    0 讨论(0)
  • 2021-02-05 19:50

    One way:

    # read the line
    r <- read.csv("exa.Rda",sep=" ", head=F)
    # every odd number index is an animal
    animals <- r[,(1:ncol(r)-1)%%2==0]
    # every even number index is a number
    numbers <- r[,(1:ncol(r))%%2==0]
    # flipping the animal row into a column
    animals <- t(animals)
    # flipping the number row into a column
    numbers <- t(numbers)
    # putting the data together
    mydata <- data.frame(animals, numbers)
    
    0 讨论(0)
  • 2021-02-05 19:54

    Here's one solution using a variety of tools/hacks, specifically:

    • strplit to split on space characters (\\s)
    • unlist to coerce the list returned by strsplit into a vector
    • matrix to turn the vector into the appropriate shape
    • data.frame to allow for columns of different mode
    • as.character and as.numeric to convert the Count column from a factor

    Here's everything put together:

    txt <- "Cat 14 Dog 15 Horse 16"
    
    out <- data.frame(matrix(unlist(strsplit(txt, "\\s")), ncol = 2, byrow = TRUE, dimnames = list(NULL, c("Animal", "Count"))))
    out$Count <- as.numeric(as.character(out$Count))
    str(out)
    
    'data.frame':   3 obs. of  2 variables:
     $ Animal: Factor w/ 3 levels "Cat","Dog","Horse": 1 2 3
     $ Count : num  14 15 16
    
    0 讨论(0)
提交回复
热议问题