5/24 - Replication, Optimizations and Hybrid Parallelism TODO

Common practices

Task vs Data-Parallelism have pros and cons

Task wastes memory/storage due to replication (DRY)
Data-par is painful to implement but scales w/o wasting memory/storage → higher network costs

Best of both worlds: run task-par on sharded data

Screen Shot 2024-05-29 at 3.36.33 PM.png

Screen Shot 2024-05-29 at 3.36.48 PM.png

Screen Shot 2024-05-29 at 3.36.59 PM.png

most scalable data systems today support only full task-par (Dask) or full data-par (RDBMS)

Screen Shot 2024-05-29 at 3.35.52 PM.png