<aside> ☝
List of notes for this specialization + Lecture notes & Repository & Quizzes + Home page on Coursera. Read this note alongside the lecture notes—some points aren't mentioned here as they're already covered in the lecture notes.
</aside>
As a data engineer, you’ll write queries to extract data and set up storage solutions for stakeholders to query directly, considering how storage choices affect query speed and performance.
Query: A statement that you write in a specific query language to retrieve or act on data.
In previous course, you’ve worked with some kinds of queries:
Week 3 plan:
Transport System: when you send a query to DBMS, your request arrives in TS and then to Query Processor.
Query Processor: Query Parser + Query Optimizer
Query Parser
Query Optimizer
Execution Engine: excution plan → query result
Storage Engine: already mentioned in prev courses.
Use EXPLAIN
keyword before any query to see the cost.
Common vs Advanced SQL statements
The data we will work with
Use DISTINCT
and SQL Functions
SELECT DISTINCT
fact_rental.staff_id,
CONCAT(dim_staff.first_name, ' ', dim_staff.last_name) AS staff_name,
fact_rental.customer_id
FROM fact_rental
JOIN dim_staff ON fact_rental.staff_id = dim_staff.staff_id
CASE
CASE
WHEN cond1 THEN result1
WHEN cond2 THEN result2
ELSE result3
END
Use CASE
and IN
SELECT
fact_rental.customer_id,
fact_rental.rental_id,
(CASE
WHEN payment_date < return_date THEN 1
ELSE 0
END) AS on_time_payment
FROM fact_rental
JOIN dim_customer ON dim_customer.customer_id = fact_rental.customer_id
WHERE dim_customer.country IN ("United States", "Canada")
AND (fact_retal.rental_date between "2005-05-24" and "2005-07-26")
LIMIT 5