问题
I have a very large dBase file (1.64Gb). It takes a very long time to load the whole file in R using the standard foreign::read.dbf()
function. I would like to load only a few variables in the dataset. Does anyone have a solution ?
回答1:
I think the read.dbf(...)
function in package foreign
was intended for reading the *.dbf
part of a shapefile, in which case reading in part of the file really doesn't make sense. You seem to want to do something different.
Using RODBC
might work, depending on how your system is configured. If you're running Windows, and if you have the dBASE ODBC drivers installed, this will probably work for you (note: When you install MSOffice, it sets up a user dsn called "dBase Files", which should be accessible from RODBC
. So if you have MSOffice installed, this should work...).
Important Note: This will only work if you are running a 32-bit version of R. This is because there are no 64-bit dBASE ODBC drivers. Generally, when you download 64-bit R you get both 32- and 64-bit versions, so it's just a matter of switching between them.
library(RODBC)
# setwd("< directory with your files >")
conn <- odbcConnect(dsn="dBASE Files")
df <- sqlFetch(conn,"myTable",max=10) # grab first ten rows
head(df)
# LENGTH COASTLN010
# 1 0.02482170 1
# 2 0.01832134 2
# 3 0.03117752 3
# 4 0.04269755 4
# 5 0.02696307 5
# 6 0.05047828 6
sqlQuery(conn,"select * from myTable where LENGTH<0.008")
# LENGTH COASTLN010
# 1 0.00625200 186
# 2 0.00634897 379
# 3 0.00733319 1583
# 4 0.00369786 1617
# 5 0.00722233 1618
# 6 0.00524176 1636
The example above is just meant to give you an idea of how to use RODBC
. In this example, I have a file, myTable.dbf
in the "directory with all your files", and this dbf has two columns, LENGTH
and COASTLN010
(this file actually is part of a coastline shapefile, but that is irrelevant...).
If this doesn't work try:
conn <- odbcConnectDbase("myTable.dbf")
来源:https://stackoverflow.com/questions/26177062/how-can-i-read-a-part-of-a-dbase-file