dvc

By how much can i approx. reduce disk volume by using dvc?

梦想的初衷 提交于 2020-06-25 10:28:17
问题 I want to classify ~1m+ documents and have a Version Control System for in- and Output of the corresponding model. The data changes over time: sample size increases over time new Features might appear anonymization procedure might Change over time So basically "everything" might change: amount of observations, Features and the values. We are interested in making the ml model Building reproducible without using 10/100+ GB of disk volume, because we save all updated versions of Input data.

By how much can i approx. reduce disk volume by using dvc?

我的梦境 提交于 2020-06-25 10:28:08
问题 I want to classify ~1m+ documents and have a Version Control System for in- and Output of the corresponding model. The data changes over time: sample size increases over time new Features might appear anonymization procedure might Change over time So basically "everything" might change: amount of observations, Features and the values. We are interested in making the ml model Building reproducible without using 10/100+ GB of disk volume, because we save all updated versions of Input data.