project organization with R [duplicate]

一笑奈何 提交于 2019-12-07 19:36:45

问题


Possible Duplicate:
Workflow for statistical analysis and report writing

I have been programming with R for not too long but am running into a project organization question that I was hoping somebody could give me some tips on. I am finding that a lot of the analysis I do is ad hoc: that is, I run something, think about the results, tweek it and run some more. This is conceptually different than in a language like C++ where you think about the entire thing you want to run before coding. It is a huge benefit of interpreted languages. However, the issue that comes up is I end up having a lot of .RData files that I save so I don't have to source my script every time. Does anyone have any good ideas about how to organize my project so I can return to it a month later and have a good idea of what each file is associated with?

This is sort of a documentation question I guess. Should I document my entire project at each leg and be vigorous about cleaning up files that will no longer be necessary but were a byproduct of the research? This is my current system but it is a bit cumbersome. Does anyone else have any other suggestions?

Per the comment below: One of the key things that I am trying to avoid is the proliferation of .R analysis files and .RData sets that go along with them.


回答1:


Some thoughts on research project organisation here:

http://software-carpentry.org/4_0/data/mgmt/

the take-home message being:

  • Use Version Control for your programs
  • Use sensible directory names
  • Use Version Control for your metadata
  • Really, Version Control is a good thing.



回答2:


My analysis is a knitr document, with some external .R files which are called from it.

All data is in a database, but during my analysis the processed data are saved as .RData files. Only when I delete the RData, they are recreated from the database when I run the analysis again. Kinda like a cache, saves database access and data processing time when I rerun (parts of) my analysis.

Using a knitr (Sweave, etc) document for the analysis enables you to easily write a documented workflow with the results included. And knitr caches the results of the analysis, so small changes do usually not result in a full rerun of all R code, but only of a small section. Saves quite some running time for a bigger analysis.

(Ah, and as said before: use version control. Another tip: working with knitr and version control is very easy with RStudio.)



来源:https://stackoverflow.com/questions/13036472/project-organization-with-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!