Is the wide or long format data more efficient?

南楼画角 提交于 2019-11-30 17:06:05

问题


I am just curious whether it is more efficient to store data in long or wide format regardless of the interpretative? I have used object.size() to determine the size in the memory but they do not differ significantly (the long being slightly more efficient in terms of size) and the value is only and estimate.

On top of the raw size, I am also wondering which of the format is more efficient in terms of being manipulated when used in modelling.


回答1:


The memory usage of the two different matrixes should be identical:

> object.size(long <- matrix(seq(10000), nrow = 1000))
40200 bytes
> object.size(square <- matrix(seq(10000), nrow = 100))
40200 bytes

Any differences in efficiency will be dwarfed by the inefficiency in using R, so hardly need to be considered, if they are even measurable.

The situation is very different for a data.frame, since it is implemented as a list of vectors:

> object.size(as.data.frame(long))
41704 bytes
> object.size(as.data.frame(square))
50968 bytes

The time efficiency of this will depend on what exactly you want to do.




回答2:


For a matrix there will be absolutely no difference. The same is true for a data.frame of that matrix. Reforming the shape of a matrix is merely assigning dimension attributes... for the most part.

If you are going to categorize that data in some way and add additional information then wide will usually be more efficient storage wise but long will generally be handled more efficiently. This isn't a necessary property of long format, that it's less space efficient, but generally you would have a compound variable description in the column names in wide that would be separated and given a new column, or multiple columns in long. Therefore, it will take up more space due to those redundancies. On the handling side it's easier to aggregate the long data or select specific cases for deletion than in a wide format that has multivariate column designations.

Long is also the best way (of these two) if the data are not perfectly rectangular (or cubic, etc).



来源:https://stackoverflow.com/questions/8181069/is-the-wide-or-long-format-data-more-efficient

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!