<aside> ☝
List of notes for this specialization + Lecture notes & Repository & Quizzes + Home page on Coursera. Read this note alongside the lecture notes—some points aren't mentioned here as they're already covered in the lecture notes.
</aside>
Recall: As a DE, you get raw data somewhere → turn it into something useful → make it available for downstream use cases.
Recall all the labs we’ve done so far:
Course 1 Week 2 Lab: inject data from RDS into S3 using Glue ETS jobs.
Course 1 Week 4 Lab: inject data from Kinesis Data Streams and use Kinesis Firehose to deliver an event to an S3 bucket
Course 2 Week 1 final lab: troubleshooting some common connection issues when connecting to a database.
Plan for this week:
Data you’re working with is unbounded (continous stream of events) - the stream doesn’t have particular beginning and ending.
If we ingest events individually, one at a time → streaming injection
If we impose some boundaries and inject all data within these boundaries → batch injection
Different ways to impose the boundaries:
→ The more we increase the frequency of the injection → streaming injection.
It depends on use case and the source system to decide which one to use.
Ways to ingest data from databases:
Connectors (JDBC/ODBC API) ← Lab 1 of this week.
Ingestion Tool (AWS Glue ETL)
Ways to injest data from files: Use secure file transfer like SFTP, SCP.
Ways to injest data from streaming systems: choose batch or streaming or setup message queue. ← Lab 2 of this week.