<aside> ☝
List of notes for this specialization + Lecture notes & Repository & Quizzes + Home page on Coursera. Read this note alongside the lecture notes—some points aren't mentioned here as they're already covered in the lecture notes.
</aside>
As a data engineer, you’ll write queries to extract data and set up storage solutions for stakeholders to query directly, considering how storage choices affect query speed and performance.
Query: A statement that you write in a specific query language to retrieve or act on data.
In previous course, you’ve worked with some kinds of queries:



Week 3 plan:

Transport System: when you send a query to DBMS, your request arrives in TS and then to Query Processor.
Query Processor: Query Parser + Query Optimizer

Query Parser

Query Optimizer
Execution Engine: excution plan → query result
Storage Engine: already mentioned in prev courses.

Use EXPLAIN keyword before any query to see the cost.

Common vs Advanced SQL statements

The data we will work with

Use DISTINCT and SQL Functions

SELECT DISTINCT
fact_rental.staff_id,
CONCAT(dim_staff.first_name, ' ', dim_staff.last_name) AS staff_name,
fact_rental.customer_id
FROM fact_rental
JOIN dim_staff ON fact_rental.staff_id = dim_staff.staff_id
CASE
CASE
WHEN cond1 THEN result1
WHEN cond2 THEN result2
ELSE result3
END
Use CASE and IN
SELECT
fact_rental.customer_id,
fact_rental.rental_id,
(CASE
WHEN payment_date < return_date THEN 1
ELSE 0
END) AS on_time_payment
FROM fact_rental
JOIN dim_customer ON dim_customer.customer_id = fact_rental.customer_id
WHERE dim_customer.country IN ("United States", "Canada")
AND (fact_retal.rental_date between "2005-05-24" and "2005-07-26")
LIMIT 5

