<aside> ☝
List of notes for this specialization + Lecture notes & Repository & Quizzes + Home page on Coursera. Read this note alongside the lecture notes—some points aren't mentioned here as they're already covered in the lecture notes.
</aside>
The focus of this course is really on turning data into something useful and serving it in a way that creates business value.
Some real scenarios:
Deploying AI systems faces challenges beyond software engineering, including data-driven concept drift. This occurs when training data no longer reflects current realities. Systems must detect shifts, gather new data, and update models, adding complexity to real-world deployment.
Data Modeling: Choosing a coherent data structure that aligns with the business goals and logic.
Huge mistake: DE starts to build data system without thinking about how they will organize their data to make it useful fro the business (think that it’s only for big companies).
Plan of the course 4:
When you model your data, you go from abstract concepts to concrete implementation.
Conceptual → Entity-Relationship (ER) Diagram
one-to-one relationship
zero-or-one-to-many relationship (or just one-to-many)
Logical: details about the implementation of the conceptual model. ← add types & primary keys and foreign keys
PK (primary key), FK (foreign key)
Physical: details about the implementation of the lofical model in a specific DBMS.
Normalization is typically applied to relational databases.
An example of not-nomalized vs normalized data
First normal form (not normalized) vs Third normal form (normalized)
Steps from denormalized form → first normal form → third normal form:
Denormalized form → First normal form (1NF)
The requirements of 1NF: each column must be unique + have a single value (no nested) & unique primary key.
Split OrderItems
into 4 columns & add new ItemNumber
to couple with OrderID
to be a composite primary key.
1NF → 2NF
The requirements of 1NF must be met
Partial dependencies should be removed. Partial dependencies = a subset of non-key columns that depend on some columns in the composite key.
Eg: Columns from CustomerId
to OrderDate
depend on OrderID
.
Split all columns from CustomerID
to OrderDate
to another tables with primary key is OrderID
Transitive dependency: a non-key column depends on another non-key column.
Eg: “name” and “price” depend on “sku”. “CustomerName” and “address” depend on “CustomerID”
2NF → 3NF
Split price
and name
into another table with primary key sku
. Split CustomerName
and address
into another table with primary key CustomerID
Convention: a normalized database means it’s in the third normal form (3NF).
There's no one size fits all solution, and you might encounter cases where denormalization actually has performance advantages