I have been working with warehousing for a while now.
I am intrigued by Columnar Databases and the speed that they have to offer for data retrievals.
I have mul
How do columnar databases work? The defining concept of a column-store is that the values of a table are stored contiguously by column. Thus the classic supplier table from CJ Date's supplier and parts database:
SNO STATUS CITY SNAME
--- ------ ---- -----
S1 20 London Smith
S2 10 Paris Jones
S3 30 Paris Blake
S4 20 London Clark
S5 30 Athens Adams
would be stored on disk or in memory something like:
S1S2S3S4S5;2010302030;LondonParisParisLondonAthens;SmithJonesBlakeClarkAdams
This is in contrast to a traditional rowstore which would store the data more like this:
S120LondonSmith;S210ParisJones;S330ParisBlake;S420LondonClark;S530AthensAdams
From this simple concept flows all of the fundamental differences in performance, for better or worse, between a column-store and a row-store. For example, a column store will excel at doing aggregations like totals and averages, but inserting a single row can be expensive, while the inverse holds true for row-stores. This should be apparent from the above diagram.
How do they differ from relational databases? A relation database is a logical concept. A columnar database, or column-store, is a physical concept. Thus the two terms are not comparable in any meaningful way. Column- oriented DMBSs may be relational or not, just as row-oriented DBMS's may adhere more or less to relational principles.