I have tried to puzzle out an answer to this question for many months while learning pandas. I use SAS for my day-to-day work and it is great for it\'s out-of-core support.
Why Pandas ? Have you tried Standard Python ?
The use of standard library python. Pandas is subject to frequent updates, even with the recent release of the stable version.
Using the standard python library your code will always run.
One way of doing it is to have an idea of the way you want your data to be stored , and which questions you want to solve regarding the data. Then draw a schema of how you can organise your data (think tables) that will help you query the data, not necessarily normalisation.
You can make good use of :
Ram and HDD is becoming cheaper and cheaper with time and standard python 3 is widely available and stable.
The fondamental question you are trying to solve is "how to query large sets of data ?". The hdfs architecture is more or less what I am describing here (data modelling with data being stored on disk).
Let's say you have 1000 petabytes of data, there no way you will be able to store it in Dask or Pandas, your best chances here is to store it on disk and process it with generators.