问题
I am just curious whether it is more efficient to store data in long or wide format regardless of the interpretative? I have used object.size()
to determine the size in the memory but they do not differ significantly (the long being slightly more efficient in terms of size) and the value is only and estimate.
On top of the raw size, I am also wondering which of the format is more efficient in terms of being manipulated when used in modelling.
回答1:
The memory usage of the two different matrix
es should be identical:
> object.size(long <- matrix(seq(10000), nrow = 1000))
40200 bytes
> object.size(square <- matrix(seq(10000), nrow = 100))
40200 bytes
Any differences in efficiency will be dwarfed by the inefficiency in using R, so hardly need to be considered, if they are even measurable.
The situation is very different for a data.frame
, since it is implemented as a list
of vector
s:
> object.size(as.data.frame(long))
41704 bytes
> object.size(as.data.frame(square))
50968 bytes
The time efficiency of this will depend on what exactly you want to do.
回答2:
For a matrix there will be absolutely no difference. The same is true for a data.frame of that matrix. Reforming the shape of a matrix is merely assigning dimension attributes... for the most part.
If you are going to categorize that data in some way and add additional information then wide will usually be more efficient storage wise but long will generally be handled more efficiently. This isn't a necessary property of long format, that it's less space efficient, but generally you would have a compound variable description in the column names in wide that would be separated and given a new column, or multiple columns in long. Therefore, it will take up more space due to those redundancies. On the handling side it's easier to aggregate the long data or select specific cases for deletion than in a wide format that has multivariate column designations.
Long is also the best way (of these two) if the data are not perfectly rectangular (or cubic, etc).
来源:https://stackoverflow.com/questions/8181069/is-the-wide-or-long-format-data-more-efficient