4/22 - Relational Data Model, Memory Hierarchy

Different RDBMS and Spark based tools serialize data in different binary formats

mtx and df have row/col numbers, relation is orderless
schema flexibility: mtx cells are numbers, relation tuples conform to pre defined schema, all rows/cols can have names; col cells can be mixed types
transpose: not supported by relations

Structured Data

Unstructured

Lake: Loose coupling of data file format for storage and data/query processing stack (vs RDBMS’s tight coupling)

Tradeoffs of parquet vs text-based files