<aside>
☝
List of notes for this specialization + Lecture notes & Repository & Quizzes + Home page on Coursera. Read this note alongside the lecture notes—some points aren't mentioned here as they're already covered in the lecture notes.
</aside>
Course 3 overview
- Storage and queries are more complex than they appear.
- Various decisions in data storage impact performance and efficiency.
- Storage solution considerations: data type, data size, data format, access and update pattern.
- Storage hierarchy:
- Management system: Organizes data in the raw components and allows you to interact with the stored data
- OLTP (Online Transactional Processing Systems): focus on performing read and write queries with low latency. ← row oriented storage is more suitable
- OLAP systems (Online Analytical Processing Systems): Focus on applying analytical activities on data (e.g. aggregation, summarization) ← column oriented storage is more suitable
- Plan:
- Week 1: Trade off between storage cost and performance.
- Cloud storage paradigms (block, object and file storage)
- Data storage in databases
- Row vs column-oriented databases
- Graph and vector databases
- Characteristics of physical components
- Serialization and compression
- Week 2: How to choose the appropriate abstractions for storing your data.
- Week 3: Queries
- How queries work
- How different storage solutions affect query performance
- Techniques for improving query performance
Data Storage Deep Dive
Storage Raw Ingredients - Physical Components of Data Storage
![image.png](https://prod-files-secure.s3.us-west-2.amazonaws.com/70a67195-bc38-429a-9695-1ad1b42ccec8/80c9bc73-81a9-43f6-960e-a162b5993dfc/image.png)
- Persistent Storage Medium: HDD, SSD.
- Volatile Memory: RAM, CPU cache.
Storage Raw Ingredients - Processes Required for Data Storage
Cloud Storage Options: Block, Object and File storage
-
File Storage:
- Organizes data in a hierarchical directory structure.
- Suitable for centralized access and easy sharing among users.
- Lower read/write performance due to tracking file hierarchy.
- Amazon Elastic File System (EFS)
![image.png](https://prod-files-secure.s3.us-west-2.amazonaws.com/70a67195-bc38-429a-9695-1ad1b42ccec8/d79e4ad1-a5d1-42ec-9677-2dee037d46b6/image.png)
-
Block Storage:
- Stores data in fixed-size blocks.
- High performance and low latency for frequent read/write operations.
- Ideal for transactional workloads and virtual machine storage.
- Amazon Elastic Block Store (EBS)
![image.png](https://prod-files-secure.s3.us-west-2.amazonaws.com/70a67195-bc38-429a-9695-1ad1b42ccec8/703c33fe-ecde-4e03-b965-2da7ad36b136/image.png)
-
Object Storage:
- Stores data as immutable objects in a flat structure.
- Highly scalable, supporting petabytes of data.
- Best for analytical workloads, data lakes, and large unstructured data storage.
- Amazon S3
![image.png](https://prod-files-secure.s3.us-west-2.amazonaws.com/70a67195-bc38-429a-9695-1ad1b42ccec8/50b5bc16-ae86-48f3-a202-d94d94c2ec45/image.png)