问题
I've recently updated to R 4.0.0 from R 3.5.1. The behaviour of read.csv
seems to have changed - when I load .csv files in R 4.0.0 factors are not automatically detected, instead being recognised as characters. I'm also still running 3.5.1 on my machine, and when loading the same files in 3.5.1 using the same code, factors are recognised as factors. This is somewhat suboptimal.
Any suggestions?
I'm running Windows 10 Pro and create .csv files in Excel 2013.
回答1:
As Ronak Shah said in a comment to your questing, R 4.0.0 changed the default behavior in how read.table()
(and so its wrappers including read.csv()
) treats character vectors. There has been a long debate over that issue, but basically stringsAsFactors == T
setting was a default since the inception of R because it helped to save memory due to the way factor variables are implemented in R (essentially they are an integer vector with factor level information added on top). There is less of a reason do that nowadays since the memory is much more abundant and this option often produced unintended side effects.
You can read more about your particular issue and also other peculiarities of vectors in R in Chapter 3 of Advanced R by Hadley Wickham. In there he gives two articles that go into great detail on why default behavior was the way it was. Here is one and here is another. I would also suggest that you check out Hadley's book if you already have some experience with R, it helped me very much to learn some of the less obvious features of the language.
回答2:
As everyone here said - the default behaviour have changed in R 4.0.0 and strings aren't automatically converted to factors anymore. This affects various functions, including read.csv()
and data.frame()
. However some functions, that are explicitly made to work with factors, are not affected. These include expand.grid()
and as.data.frame.table()
.
One way you can bypass this change is by setting a global option:
options(stringsAsFactors = TRUE)
But this will also be deprecated and eventually you will have to convert strings to factors manually.
The main reason for such a decision seems to be reproducibility. Automatic string to factor conversion produces factor levels and those levels can depend on the locale used by the system. Hence if you are from Russia and share your script with automatically converted factors with your friend in Japan he might end up with different order of factor levels.
You can read more about this on "The R Blog" stringsAsFactors post by Kurt Hornik
来源:https://stackoverflow.com/questions/61950876/read-csv-doesnt-seem-to-detect-factors-in-r-4-0-0