Principle of Locality: Many Programs tend to use data and instructions with addresses near or equal to those they have used recently.
Data Layout - order in which items are laid out in memory / disk
Data Access Pattern: order in which a prgram has to access items in memory
Hardware Efficiency: how close actual execution runtime is to best possible runtime, improved w/ careful data layout and access patterns
Example: matrix multiplication
Suppose data layout in DRAM is in row-major order
$$ C_{n \space by\space m} = A_{n \space by\space p}B_{p\space by\space m} $$
DRAM: A[1;] A[2;] … B[1;] B[2;] …
Caches A[1;] B[1;]
for i = 1 to n
for j = 1 to m
for k = 1 to p
C[i][j] += A[i][k] * B[k][j]
Solution
for i = 1 to n
for j = 1 to m
for k = 1 to p
C[i][j] += A[i][k] * B[k][j]