Quickly reading very large tables as dataframes

后端 未结 11 1778
清歌不尽
清歌不尽 2020-11-21 04:46

I have very large tables (30 million rows) that I would like to load as a dataframes in R. read.table() has a lot of convenient features, but it seems like the

11条回答
  •  再見小時候
    2020-11-21 05:10

    Often times I think it is just good practice to keep larger databases inside a database (e.g. Postgres). I don't use anything too much larger than (nrow * ncol) ncell = 10M, which is pretty small; but I often find I want R to create and hold memory intensive graphs only while I query from multiple databases. In the future of 32 GB laptops, some of these types of memory problems will disappear. But the allure of using a database to hold the data and then using R's memory for the resulting query results and graphs still may be useful. Some advantages are:

    (1) The data stays loaded in your database. You simply reconnect in pgadmin to the databases you want when you turn your laptop back on.

    (2) It is true R can do many more nifty statistical and graphing operations than SQL. But I think SQL is better designed to query large amounts of data than R.

    # Looking at Voter/Registrant Age by Decade
    
    library(RPostgreSQL);library(lattice)
    
    con <- dbConnect(PostgreSQL(), user= "postgres", password="password",
                     port="2345", host="localhost", dbname="WC2014_08_01_2014")
    
    Decade_BD_1980_42 <- dbGetQuery(con,"Select PrecinctID,Count(PrecinctID),extract(DECADE from Birthdate) from voterdb where extract(DECADE from Birthdate)::numeric > 198 and PrecinctID in (Select * from LD42) Group By PrecinctID,date_part Order by Count DESC;")
    
    Decade_RD_1980_42 <- dbGetQuery(con,"Select PrecinctID,Count(PrecinctID),extract(DECADE from RegistrationDate) from voterdb where extract(DECADE from RegistrationDate)::numeric > 198 and PrecinctID in (Select * from LD42) Group By PrecinctID,date_part Order by Count DESC;")
    
    with(Decade_BD_1980_42,(barchart(~count | as.factor(precinctid))));
    mtext("42LD Birthdays later than 1980 by Precinct",side=1,line=0)
    
    with(Decade_RD_1980_42,(barchart(~count | as.factor(precinctid))));
    mtext("42LD Registration Dates later than 1980 by Precinct",side=1,line=0)
    

提交回复
热议问题