I'll explain my short story with R and big data set.
I had a connector from R to RDBMS,
- where I stored 80mln compounds.
I've build a queries which gathered some subset of this data.
Then manipulate on this subset.
R was simply choking with more than 200k rows in memory on my PC.
So working on some appropriate subset for machine is good approach.