20 Billion Rows/Month - Hbase / Hive / Greenplum / What?

后端 未结 7 562
离开以前
离开以前 2021-01-29 18:12

I\'d like to use your wisdom for picking up the right solution for a data-warehouse system. Here are some details to better understand the problem:

Data is organized in

7条回答
  •  闹比i
    闹比i (楼主)
    2021-01-29 18:45

    I have had great success with vertica. I am currently loading anywhere between 200 million to 1 billion rows in a day - averaging about 9 billons row a month - though I have gone as high as 17 billion in a month. I have close to 21 dimensions and the queries run blazingly fast. We moved on from the older system when we simply didn't have the windows of time to do the dataload.

    we did a very exhaustive trial and study of different solutions - and practically looked at everything on the market. While both Teradata and Netezza would have suited us, they were simply too expensive for us. Vertica beat them both on the price/performance ratio. It is by the way a columnar database.

    We have about 80 users now - and it is expected to grow to about 900 by the end of next year when we start rolling out completely.

    We are extensively using ASP.NET/dundas/reporting services for reports. It also plays nice with third party reporting solutions - though we haven't tried it.

    By the way what are you going to use for dataload ? We are using informatica and have been very pleased with it. SSIS drove us up the wall.

提交回复
热议问题