问题
I have 6 dimension tables, all in the form of csv files. I have to form a star schema using Python. I'm not sure how to create the fact table using Python. The fact table (theoretically) has at least one column that is common with a dimension table.
How can I create the fact table, keeping in mind that quantities from multiple dimension tables should correspond correctly in the fact table?
I am not allowed to reveal the code or exact data, but I'll add a small example. File 1 contains the following columns: student_id, student_name. File 2 contains : student_id, department_id, department_name, sem_id. Lastly File 3 contains student_id, subject_code, subject_score. The 3 dimension tables are in the form of csv files. I now need the fact table to contain: student_id, student_name, department_id, subject_code. How can I form the fact table in that form? Thank you for your help.
回答1:
Reading certain blogs look like it is not a good way to handle such cases in python in memory but still if the below post make sense you cn use it
Fact Loading
The first step in DW loading is dimensional conformance. With a little cleverness the above processing can all be done in parallel, hogging a lot of CPU time. To do this in parallel, each conformance algorithm forms part of a large OS-level pipeline. The source file must be reformatted to leave empty columns for each dimension's FK reference. Each conformance process reads in the source file and writes out the same format file with one dimension FK filled in. If all of these conformance algorithms form a simple OS pipe, they all run in parallel. It looks something like this.
src2cvs source | conform1 | conform2 | conform3 | load At the end, you use the RDBMS's bulk loader (or write your own in Python, it's easy) to pick the actual fact values and the dimension FK's out of the source records that are fully populated with all dimension FK's and load these into the fact table.
回答2:
Would you like to add any code you're currently stuck on? Please add a Minimal, Complete, and Verifiable example including the file content and expected output
来源:https://stackoverflow.com/questions/51151263/creating-star-schema-from-csv-files-using-python