Data-centric Performance Measurement and Mapping for Highly Parallel Programming Models