Convert console output of list to a real R list

前端 未结 2 913
一个人的身影
一个人的身影 2021-02-08 03:44

Someone just posted some console output as an example. (This happens a lot, and I have strategies for converting output of print for vectors and dataframes.) I\'m wondering if a

相关标签:
2条回答
  • 2021-02-08 04:17

    Here's my shot at a solution. It works well on both your test cases, and on the few others with which I've tested it.

    deprint <- function(ll) {
        ## Pattern to match strings beginning with _at least_ one $x or [[x]]
        branchPat <- "^(\\$[^$[]*|\\[\\[[[:digit:]]*\\]\\])"
        ## Pattern to match strings with _just_ one $x or one [[x]]
        trunkPat <- "^(\\$[^$[]*|\\[\\[[[:digit:]]*\\]\\])\\s*$"
        ##
        isBranch <- function(X) {
            grepl(branchPat, X[1])
        }
        ## Parse character vectors of lines like "[1] 1 3 4" or
        ## "[1] TRUE FALSE" or c("[1] a b c d", "[5] e f") 
        readTip <- function(X) {
            X <- paste(sub("^\\s*\\[.*\\]", "", X), collapse=" ")
            tokens <- scan(textConnection(X), what=character(), quiet=TRUE)
            read.table(text = tokens, stringsAsFactors=FALSE)[[1]]
        }
    
        ## (0) Split into vector of lines (if needed) and
        ##     strip out empty lines
        ll <- readLines(textConnection(ll))
        ll <- ll[ll!=""]
    
        ## (1) Split into branches ...
        trunks <- grep(trunkPat, ll)
        grp <- cumsum(seq_along(ll) %in% trunks)
        XX <- split(ll, grp)
        ## ... preserving element names, where present
        nms <- sapply(XX, function(X) gsub("\\[.*|\\$", "", X[[1]]))
        XX <-  lapply(XX, function(X) X[-1])
        names(XX) <- nms
    
        ## (2) Strip away top-level list identifiers.
        ## pat2 <- "^\\$[^$\\[]*"
        XX <- lapply(XX, function(X) sub(branchPat, "", X))
    
        ## (3) Step through list elements:
        ## - Branches will need further recursive processing.
        ## - Tips are ready to parse into base type vectors.
        lapply(XX, function(X) {
            if(isBranch(X)) deprint(X) else readTip(X)
        })
    }
    

    With L, your more complicated example list, here's what it gives:

    ## Because deprint() interprets numbers without a decimal part as integers,
    ## I've modified L slightly, changing "list(w=2,4)" to "list(w=2L,4L)" 
    ## to allow a meaningful test using identical(). 
    L <-
    structure(list(a = structure(list(d = 1:2, j = 5:6, o = structure(list(
        w = 2L, 4L), .Names = c("w", ""))), .Names = c("d", "j", "o"
    )), b = "c", c = 3:4), .Names = c("a", "b", "c"))
    
    ## Capture the print representation of L, and then feed it to deprint()
    test2 <- capture.output(L)
    LL <- deprint(test2)
    identical(L, LL)
    ## [1] TRUE
    LL
    ## $a
    ## $a$d
    ## [1] 1 2
    ## 
    ## $a$j
    ## [1] 5 6
    ## 
    ## $a$o
    ## $a$o$w
    ## [1] 2
    ## 
    ## $a$o[[2]]
    ## [1] 4
    ## 
    ## $b
    ## [1] "c"
    ## 
    ## $c
    ## [1] 3 4
    

    And here's how it handles the print representation of test, your more regular list:

    deprint(test)
    ## [[1]]
    ## [1] 1.0000 1.9643 4.5957
    ## 
    ## [[2]]
    ## [1] 1.0000 2.2753 3.8589
    ## 
    ## [[3]]
    ## [1] 1.0000 2.9781 4.5651
    ## 
    ## [[4]]
    ## [1] 1.0000 2.9320 3.5519
    ## 
    ## [[5]]
    ## [1] 1.0000 3.5772 2.8560
    ## 
    ## [[6]]
    ## [1] 1.0000 4.0150 3.1937
    ## 
    ## [[7]]
    ## [1] 1.0000 3.3814 3.4291
    

    One more example:

    head(as.data.frame(deprint(capture.output(as.list(mtcars)))))
    #    mpg cyl disp  hp drat    wt  qsec vs am gear carb
    # 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
    # 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
    # 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
    # 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
    # 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
    # 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
    
    0 讨论(0)
  • 2021-02-08 04:21

    I wouldn't call it "elegant", but for unnamed lists you could do some checking/modifications to something along these lines:

    s <- strsplit(gsub("\\[+\\d+\\]+", "", test), "\n+")[[1]][-1]
    lapply(s, function(x) scan(text = x, what = double(), quiet = TRUE))
    
    [[1]]
    [1] 1.0000 1.9643 4.5957
    
    [[2]]
    [1] 1.0000 2.2753 3.8589
    
    [[3]]
    [1] 1.0000 2.9781 4.5651
    
    [[4]]
    [1] 1.0000 2.9320 3.5519
    
    [[5]]
    [1] 1.0000 3.5772 2.8560
    
    [[6]]
    [1] 1.0000 4.0150 3.1937
    
    [[7]]
    [1] 1.0000 3.3814 3.4291
    

    Of course, this is specific to lists only and this particular example is specifically what = double(), so that would require additional checking. An idea that pops into my head to detect character elements in the list would be to make the what argument

    what = if(length(grep("\"", x))) character() else double()
    
    0 讨论(0)
提交回复
热议问题