data-warehouse | 易学教程

Microsoft Azure Data Warehouse: Flat Tables or Star Schema

阅读更多关于 Microsoft Azure Data Warehouse: Flat Tables or Star Schema

问题 I am creating data warehouse model on numerous OLTP tables. a) I can either utilize a Star schema or b) Flat table model table. Many people think dimensional star schema model table is not required; because most data can report itself in a single table. Additionally, star schema Kimball was created when performance and storage are an issue. Some claim with improved tech, data can be presented in a single table. Should I still separate data into dimensions/facts tables or just use the flat

MERGE - UPDATE column values separately, based on logic in WHEN MATCHED block

阅读更多关于 MERGE - UPDATE column values separately, based on logic in WHEN MATCHED block

问题 Earlier today, I asked this question and got the answer I was looking for. Now I have a follow-up question: What I want: I want the MERGE to compare each column value, per row, in the target table against the corresponding value in the source table, and make any updates based on the logic separated by OR in the WHEN MATCHED AND block. I am afraid that the code I've written (pictured below) will make the updates listed in the THEN UPDATE SET block if any of the logic separated by OR in the

Site-To-Site Data synchronization Over WCF

阅读更多关于 Site-To-Site Data synchronization Over WCF

问题 I'm developping a distributed solution with a WebSite and a Corporate Application Management. Here is the architecture : Web Site : Database (SQL Server) Web Site : ASP.NET MVC Data synchronization Services (WCF) - Corporate Management Application : Database (SQL Server) WinForm Application Data synchronization Services (WCF) I want to perform Site-To-Site data synchronization. Note : The Corporate Management Application Database is the Warehouse datastore. Usually i want Corporate side asks

Slowly changing dimensions- SCD1 and SCD2 implementation in Hive

阅读更多关于 Slowly changing dimensions- SCD1 and SCD2 implementation in Hive

问题 I am looking for SCD1 and SCD2 implementation in Hive (1.2.1). I am aware of the workaround to load SCD1 and SCD2 tables prior to Hive (0.14). Here is the link for loading SCD1 and SCD2 with the workaround approach http://hortonworks.com/blog/four-step-strategy-incremental-updates-hive/ Now that Hive supports ACID operations just want to know if there is a better or direct way of loading it. 回答1: As HDFS is immutable storage it could be argued that versioning data and keeping history (SCD2)

Design logical model of Datawarehouse, Fact Tables and Dimensions Table

阅读更多关于 Design logical model of Datawarehouse, Fact Tables and Dimensions Table

问题 Hi i'm newbie in Datawarehousing,For homework ask me realize the logical design, physical and implementation.How would you model this in a Data Warehouse: i wish design the Data Warehouse which give the answers of statistics relating to a baseball league For players in offensive: •How many times has a batter to bat. •How many runs scored is. •How many hits,doubles hit and triples hits. •How many homeruns did. •many RBI. •many base on balls in Defensive: ▪ How many outs, double play takes ▪

Staging in ETL: Best Practices?

阅读更多关于 Staging in ETL: Best Practices?

问题 Currently, the architecture I work with takes a few data sources out of which one is staged locally because it's hosted in the cloud. The others are hosted locally anyway, so the ETL I perform takes it directly from the source. I don't really see the point in creating a stage for the other sources. 1) Is there a distinct benefit to duplicating the locally hosted source into a local stage? 2) Is it a better idea to host the stage on a separate machine or the same one as the Warehouse? 3) If I

How deep to go when denormalising

阅读更多关于 How deep to go when denormalising

问题 I denormalising a OLTP database for use in a DWH. At the moment I am denormalising studygroups. Each studygroup has a key pointing towards 1 project. Each project has a key pointing towards 1 department. Each department has a key pointing towards 1 university. Each universityhas a key pointing to 1 city. Now I know that you are supposed to denormalize the sh*t out your OLTP but in this dwh department will be a dimension on its own. This goes for university also. Would it suffise to add a key

What are the pros and cons of loading data directly into Google BigQuery vs going through Cloud Storage first?

阅读更多关于 What are the pros and cons of loading data directly into Google BigQuery vs going through Cloud Storage first?

问题 Also, is there anything wrong with doing transforms/joins directly within BigQuery? I'd like to minimize the number of components and steps involved for a data warehouse I'm setting up (simple transaction and inventory data for a chain of retail stores.) 回答1: Loading data via Cloud Storage is the fastest (and the cheapest) way. Loading directly can be done via app (using streaming insert which add some additional cost) For the doing transformation - if what are you plan/need to do can be done

Is a fact table in normalized or de-normalized form?

阅读更多关于 Is a fact table in normalized or de-normalized form?

问题 I did a bit R&D on the fact tables, whether they are normalized or de-normalized. I came across some findings which make me confused. According to Kimball: Dimensional models combine normalized and denormalized table structures. The dimension tables of descriptive information are highly denormalized with detailed and hierarchical roll-up attributes in the same table. Meanwhile, the fact tables with performance metrics are typically normalized. While we advise against a fully normalized with

Star-Schema Design [closed]

阅读更多关于 Star-Schema Design [closed]

问题 Closed . This question is opinion-based. It is not currently accepting answers. Want to improve this question? Update the question so it can be answered with facts and citations by editing this post. Closed 3 years ago . Is a Star-Schema design essential to a data warehouse? Or can you do data warehousing with another design pattern? 回答1: Using star schemas for a data warehouse system gets you several benefits and in most cases it is appropriate to use them for the top layer. You may also