<aside>
☝
List of notes for this specialization + Lecture notes & Repository & Quizzes + Home page on Coursera. Read this note alongside the lecture notes—some points aren't mentioned here as they're already covered in the lecture notes.
</aside>
Data Warehouses & Data Lakes
Overview
This week, we focus on the first one in Storage Hierarchy.
Week labs:
Simple Data Lake
Buld a Data Lakehouse.
Convesation with Bill Inmon
- Introduction to Bill Inmon: Known as the creator of the data warehouse and a pioneer in the modern data industry. Began programming in 1965 at White Sands Missile Range, New Mexico.
- Definition of a Data Warehouse: Described as corporate data that consolidates information across various functions (marketing, sales, finance, management) for a unified corporate view.
- Challenges Before Data Warehousing: Before data warehousing, each application was developed in isolation, which made data integration across departments difficult.
- Origins of ETL: Inmon helped create ETL (Extract, Transform, Load) processes to automate the collection, transformation, and loading of data into warehouses, reducing the need for manually coded programs.
- Early Industry Resistance: IBM initially opposed data warehousing, focusing on transaction processing. Inmon’s suggestion of using data beyond transactions led to significant industry criticism.
- Role of Marketing: Marketing departments were the early adopters and primary supporters of data warehousing, while technical teams were initially skeptical.
- Development of Textual ETL: Inmon pioneered textual ETL to integrate text data into corporate databases, addressing an area where text was underutilized.
- Legacy of Ed Yourdon: Inmon highlighted Ed Yourdon’s influence on programming, particularly in introducing structured design and analysis, which brought organization and methodology to early coding practices.
Data Warehouse - Key architectural ideas
Data warehouse is suitable for OLAP rather than OLTP (check these terms in C3W1).