This is what I learnt from the course
Chapter 1 - Data Warehouse Foundations
Every business needs a data warehouse for business intelligence. A data warehouse sources data from disparate internal and external systems via an ETL (Extract, Transform, Load) process on a regular basis. It is created with relational database systems like MySQL, MSSQL, Postgres, Oracle etc.
You can gain alot from building a data warehouse. For instance, a data warehouse reduces computational stress on the operational database, allowing it focus on transactional rather than analytical activities. Also a data warehouse can act as a single reliable source of historical data from the various operational systems in use.
Data warehouses are built using OLAP databases. OLAP databases differ from OLTP database in that the focus is on efficient analysis not transactions. The need to join tables is greatly reduced by de-normalizing table structures.
Data warehouses are made up of two types of tables - Fact and Dimension tables. What's the difference? Fact or measures table are focused on a "single measurement of a real-word observation" example financial transactions. Dimension tables store descriptive information about the facts in the facts table. Here duplication is allowed to reduce future processing.
You will use staging or integration tables to help with transforming data from data sources into the data warehouse. The data warehouse design can be either star or snowflake. In the star design, all the dimension tables are directly connected to their fact table whereas in a snow flake design, a dimension table can be directly connected to another dimension table.
Data Warehouse Hardware Configuration
- More cores is better than faster care because of the many calculations.
- Provide adequate internet bandwidth.
- Use a simple recovery model because, its not transaction intensive, so the logs wont help much, worst case scenario, we simply rerun the ETL process.
- Increase tempdb's initial size since we will use it a lot and we want to reduce how frequently it will double in size cause that process is expensive
- Ensure that tempdb auto-shrink is set to false
To be continued...