Concurrency and data locality for sparse linear algebra on modern processors