R - readRDS() & load() fail to give identical data.tables as the original

后端未结

关注

 4  2177

野趣味

Background

I tried to replace some CSV output files with rds files to improve efficiency. These are intermediate files that wi

相关标签:

4条回答

陌清茗

2021-02-13 14:21

Probably, this has to do with pointers:

 attributes(aDT)
$names
[1] "a" "b"

$row.names
 [1]  1  2  3  4  5  6  7  8  9 10

$class
[1] "data.table" "data.frame"

$.internal.selfref
<pointer: 0x0000000000390788>

> attributes(bDT)
$names
[1] "a" "b"

$row.names
 [1]  1  2  3  4  5  6  7  8  9 10

$class
[1] "data.table" "data.frame"

$.internal.selfref
<pointer: (nil)>

> attributes(bDF)
$names
[1] "a" "b"

$row.names
 [1]  1  2  3  4  5  6  7  8  9 10

$class
[1] "data.frame"

> attributes(aDF)
$names
[1] "a" "b"

$row.names
 [1]  1  2  3  4  5  6  7  8  9 10

$class
[1] "data.frame"

You can closely look at what's going using .Internal(inspect(.)) command:

.Internal(inspect(aDT))

 .Internal(inspect(bDT))

0 讨论(0)

我在风中等你

2021-02-13 14:24
The solution is to use setDT after load or readRDS
```
aDT2 <- readRDS("aDT2.RData")
setDT(aDT2)
```
source: Adding new columns to a data.table by-reference within a function not always working
0 讨论(0)
发布评论:

提交评论
- 加载中...
灰色年华

2021-02-13 14:30
The newly loaded data.table doesn't know the pointer value of the already loaded one. You could tell it with
```
attributes(bDT)$.internal.selfref <- attributes(aDT)$.internal.selfref
identical( aDT, bDT, ignore.environment = T )
# [1] TRUE
```
data.frame don't keep this attribute, probably because they don't do in place modification.
0 讨论(0)
发布评论:

提交评论
- 加载中...

悲&欢浪女

2021-02-13 14:37

I happen to find a way that resolves the issue (disclaimer: it's a rather inelegant way but it works!) - adding then deleting a dummy column in the loaded data table leads to identical being 'True'. I have also successfully replaced csv with rds intermediate files in my own code.

To be honest, I don't understand enough of the inner workings of R nor data table to know why it works, so any explanations and/or more elegant solutions would be welcomed.

library( data.table )

aDT <- data.table( a=1:10, b=LETTERS[1:10] )
saveRDS( aDT, file = "aDT.rds")
bDT <- readRDS( file = "aDT.rds" )
identical( aDT, bDT, ignore.environment = T )  # Gives 'False'

bDT[ , aaa := NA ]; bDT[ , aaa := NULL ]
identical( aDT, bDT, ignore.environment = T )  # Now gives 'True'


# Using the add-del-col 'trick' works here too
aDT2 <- data.table( a=1:10, b=LETTERS[1:10] )
save( aDT2, file = "aDT2.RData")
bDT2 <- aDT2; rm( aDT2 )
load( file = "aDT2.RData" )
identical( aDT2, bDT2, ignore.environment = T )  # Gives 'False'

aDT2[ , aaa := NA ]; aDT2[ , aaa := NULL ]
identical( aDT2, bDT2, ignore.environment = T )  # Now gives 'True'

0 讨论(0)