问题
I have a number of large data files (.csv) on my local drive that I need to read in R, filter rows/columns, and then combine. Each file has about 33,000 rows and 575 columns.
I read this post: Quickly reading very large tables as dataframes and decided to use "sqldf".
This is the short version of my code:
Housing <- file("file location on my disk")
Housing_filtered <- sqldf('SELECT Var1 FROM Housing', file.format = list(eol="/n")) *I am using Windows
I see "Housing_filtered" data.frame is created with Var1, but zero observations. This is my very first experience with sqldf. I am not sure why zero observations are returned.
I also used "read.csv.sql" and still I see zero observations.
Housing_filtered <- read.csv.sql(file = "file location on my disk",
sql = "select Var01 from file",
eol = "/n",
header = TRUE, sep = ",")
回答1:
You never really imported the file as a data.frame
like you think.
You've opened a connection to a file. You mentioned that it is a CSV. Your code should look something like this if it is a normal CSV file:
Housing <- read.csv("my_file.csv")
Housing_filtered <- sqldf('SELECT Var1 FROM Housing')
If there's something non-standard about this CSV file please mention what it is and how it was created.
Also, to another point that was made in the comments, if you do for some reason need to manually input the line breaks use \n
where you were using /n
. Any error is not being caused by that change, but rather you're getting passed 1 problem and on to another, probably due to improperly handling missing data, space, commas in text fields that aren't being handled, etc.
If there are still data errors can you please use R code to create a small file that is reflective of the relevant characteristics of your data and which produces the same error when you import it? This may help.
来源:https://stackoverflow.com/questions/50891345/sqldf-returns-zero-observations