问题
I am trying to read a Stata dataset in R with the foreign
package, but when I try to read the file using:
library(foreign)
data <- read.dta("data.dta")
I got the following error:
Error in read.dta("data.dta") : a binary read error occurred
The file works fine in Stata. This site suggests saving the file in Stata without labels and then reading it into R. With this workaround I am able to load the file into R, but then I lose the labels. Why am I getting this error and how can I read the file into R with the labels? Another person finds that they get this error when they have variables with no values. My data do have at least one or two such variables, but I have no easy way to identify those variables in stata. It is a very large file with thousands of variables.
回答1:
You should call library(foreign)
before reading the Stata data.
library(foreign)
data <- read.dta("data.dta")
Updates: As mentioned here,
"The error message implies that the file was found, and that it started with the right sequence of bytes to be a Stata .dta file, but that something (probably the end of the file) prevented R from reading what it was expecting to read. "
But, we might be just guessing without any further information.
Update to OP's question and answer:
I have tried whether that is the case using auto data from Stata, but its not.So, there should be other reasons:
*Claims 1 and 2: if there is missings in variable or there is dataset with labels, R read.dta
will generate the error *
sysuse auto #this dataset has labels
replace mpg=. #generates missing for mpg variable
br in 1/10
make price mpg rep78 headroom trunk weight length turn displacement gear_ratio foreign
AMC Concord 4099 3 2.5 11 2930 186 40 121 3.58 Domestic
AMC Pacer 4749 3 3.0 11 3350 173 40 258 2.53 Domestic
AMC Spirit 3799 3.0 12 2640 168 35 121 3.08 Domestic
Buick Century 4816 3 4.5 16 3250 196 40 196 2.93 Domestic
Buick Electra 7827 4 4.0 20 4080 222 43 350 2.41 Domestic
Buick LeSabre 5788 3 4.0 21 3670 218 43 231 2.73 Domestic
Buick Opel 4453 3.0 10 2230 170 34 304 2.87 Domestic
Buick Regal 5189 3 2.0 16 3280 200 42 196 2.93 Domestic
Buick Riviera 10372 3 3.5 17 3880 207 43 231 2.93 Domestic
Buick Skylark 4082 3 3.5 13 3400 200 42 231 3.08 Domestic
save "~myauto"
de(myauto)
Contains data from ~\myauto.dta
obs: 74 1978 Automobile Data
vars: 12 25 Aug 2013 11:32
size: 3,478 (99.9% of memory free) (_dta has notes)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
make str18 %-18s Make and Model
price int %8.0gc Price
mpg int %8.0g Mileage (mpg)
rep78 int %8.0g Repair Record 1978
headroom float %6.1f Headroom (in.)
trunk int %8.0g Trunk space (cu. ft.)
weight int %8.0gc Weight (lbs.)
length int %8.0g Length (in.)
turn int %8.0g Turn Circle (ft.)
displacement int %8.0g Displacement (cu. in.)
gear_ratio float %6.2f Gear Ratio
foreign byte %8.0g origin Car type
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Sorted by: foreign
library(foreign)
myauto<-read.dta("myauto.dta") #works perfect
str(myauto)
'data.frame': 74 obs. of 12 variables:
$ make : chr "AMC Concord" "AMC Pacer" "AMC Spirit" "Buick Century" ...
$ price : int 4099 4749 3799 4816 7827 5788 4453 5189 10372 4082 ...
$ mpg : int NA NA NA NA NA NA NA NA NA NA ...
$ rep78 : int 3 3 NA 3 4 3 NA 3 3 3 ...
$ headroom : num 2.5 3 3 4.5 4 4 3 2 3.5 3.5 ...
$ trunk : int 11 11 12 16 20 21 10 16 17 13 ...
$ weight : int 2930 3350 2640 3250 4080 3670 2230 3280 3880 3400 ...
$ length : int 186 173 168 196 222 218 170 200 207 200 ...
$ turn : int 40 40 35 40 43 43 34 42 43 42 ...
$ displacement: int 121 258 121 196 350 231 304 196 231 231 ...
$ gear_ratio : num 3.58 2.53 3.08 2.93 2.41 ...
$ foreign : Factor w/ 2 levels "Domestic","Foreign": 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "datalabel")= chr "1978 Automobile Data"
- attr(*, "time.stamp")= chr "25 Aug 2013 11:23"
- attr(*, "formats")= chr "%-18s" "%8.0gc" "%8.0g" "%8.0g" ...
- attr(*, "types")= int 18 252 252 252 254 252 252 252 252 252 ...
- attr(*, "val.labels")= chr "" "" "" "" ...
- attr(*, "var.labels")= chr "Make and Model" "Price" "Mileage (mpg)" "Repair Record 1978" ...
- attr(*, "expansion.fields")=List of 2
..$ : chr "_dta" "note1" "from Consumer Reports with permission"
..$ : chr "_dta" "note0" "1"
- attr(*, "version")= int 12
- attr(*, "label.table")=List of 1
..$ origin: Named int 0 1
.. ..- attr(*, "names")= chr "Domestic" "Foreign"
回答2:
Here's a solver list. My guess is that the first item has a 75% likelihood to solve your issue.
- In Stata, resave a fresh copy of your
dta
file withsaveold
, and try again. - If that fails, provide a sample to show what kind of values kill the
read.dta
function. - If missing values are to blame, run the loop from the other answer.
A more thorough description of the dataset would be required to work past that point. The issue seems fixable, I've never had much trouble using foreign
with tons of Stata files.
You might also give a try to the Stata.file
function in the memisc
package to see if that fails too.
回答3:
I do not know why this occurs and would be interested if anyone could explain, but read.dta
indeed cannot handle variables that are all NA. A solution is to delete such variables in Stata with the following code:
foreach varname of varlist * {
quietly sum `varname'
if `r(N)'==0 {
drop `varname'
disp "dropped `varname' for too much missing data"
}
}
回答4:
It's been a lot of time, but I solved this same problem exporting the .dta data to .csv. The problem was related to the labels of the factor variables, especially because the labels were in Spanish and the ASCII encoding is a mess. I hope this work for someone with the same problem and with Stata software.
In stata:
export delimited using "/Users/data.csv", nolabel replace
In R:
df <- read.csv("lapop2014.csv")
来源:https://stackoverflow.com/questions/18423366/error-reading-stata-data-in-r