adding hash to each row using dplyr and digest in R

大兔子大兔子 提交于 2021-02-08 19:45:09

问题


I need to add a fingerprint to each row in a dataset so to check with a later version of the same set to look for difference.

I know how to add hash for each row in R like below:

data.frame(iris,hash=apply(iris,1,digest))

I am learning to use dplyr as the dataset is getting huge and I need to store them in SQL Server, I tried something like below but the hash is not working, all rows give the same hash:

iris %>%
  rowwise() %>%
  mutate(hash=digest(.))

Any clue for row-wise hashing using dplyr? Thanks!


回答1:


We could use do

res <- iris %>%
         rowwise() %>% 
         do(data.frame(., hash = digest(.)))
head(res, 3)
# A tibble: 3 x 6
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species                             hash
#         <dbl>       <dbl>        <dbl>       <dbl>  <fctr>                            <chr>
#1          5.1         3.5          1.4         0.2  setosa e261621c90a9887a85d70aa460127c78
#2          4.9         3.0          1.4         0.2  setosa 7bf67322858048d82e19adb6399ef7a4
#3          4.7         3.2          1.3         0.2  setosa c20f3ee03573aed5929940a29e07a8bb

Note that in the apply procedure, all the columns are converted to a single class as apply converts to matrix and matrix can hold only a single class. There will be a warning about converting the factor to character class



来源:https://stackoverflow.com/questions/46335585/adding-hash-to-each-row-using-dplyr-and-digest-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!