The Muenster Skeleton Library
The Münster Skeleton Library
The Muenster Skeleton Library (Muesli) is a C++ programming library enabling the hassle-free programming of heterogeneous clusters equipped with multi-core CPUs as well as many-core GPUs and Xeon Phi coprocessors by implementing the concept of so-called algorithmic skeletons. When using Muesli, low-level details of parallel programming are encapsulated inside the library, such that parallel programming is taken to a higher level of abstraction. Users do not need to bother with MPI, OpenMP and/or CUDA, but can simply implement parallel programs as if they were sequential . In essence, Muesli makes parallel programming easier, safer, and less error-prone. -
Main Features
- Three execution configurations for CPU, GPU, or Xeon Phi based heterogeneous clusters. From a single program source, multiple binaries for different heterogeneous clusters based on either multi-core CPUs, many-core GPUs, or Xeon Phi coprocessors.
- Parallel containers in terms of distributed data structures (1D array + 2D matrix) abstract from the memory hierarchy of heterogeneous clusters and estabish coarse-grained parallelization. They provide a flexible data (re)distribution mechnism, automatic memory management, and implicit (lazy) data transfer between different memory areas.
- Data parallel skeletons map, zip, fold, mapStencil, and their variants (inPlace, IndexInPlace) are implemented in terms of member functions of distributed data structures and establish fine-grained parallelization.
- A task parallel farm skeleton can be used for simultaneous CPU+GPU execution. A dynamic load balancing mechanism ensures a reasonable workload distribution between the different execution units.
- Flexible and convenient mechanisms for implementing and providing the skeleton user functions including both functional approaches (C++11 lambdas) and object-oriented approaches (C++ functors). As a key feature, Muesli functors define an interface for providing additional arguments to the user function.
Code Example
The following code example computes the Frobenius Norm of a matrix.
#include "muesli.h" #include "dmatrix.h" using namespace msl; int main() { initSkeletons(argc, argv); // initialize Muesli // create distributed matrix auto init = [] (int row, int col) {return randomFloat(row, col);}; DMatrix A(8, 8, Muesli::num_total_procs, 1, init, Distribution::DIST); // create user functions auto square = [] MSL_GPUFUNC (T a) {return a*a;}; auto sum = [] MSL_GPUFUNC (T a, T b) {return a+b;}; // apply skeletons A.mapInPlace(square); T f_norm = A.fold(sum); printv("||A||_F = %f\n", sqrt(f_norm)); terminateSkeletons(); // terminate Muesli }
The most up to date version of Muesli is v3.0 and can be downloaded here. It provides the features listed above.
There are also older versions of Muesli that provide a slightly different feature set (see below).
Old Versions
- Download Muesli 2.3
Main Features:- Algorithmic skeletons for multi-core clusters (MPI + OpenMP)
- Parallel containers: Distributed Array, Matrix, Sparse Matrix
- Data parallel skeletons: map, zip, fold, scan, and variants (InPlace, IndexInPlace)
- Task parallel skeletons: Pipe, Farm, Filter, Branch&Bound, Devide&Conquer
- Currying of user functions
- Download Muesli 1.0
Main Features:- Algorithmic skeletons for (multi-core) clusters (MPI)
- Parallel containers: Distributed Array, Matrix
- Data parallel skeletons: map, zip, fold, scan
- Taks parallel skeletons: Pipe, Farm, Filter, Loop
- Currying of user functions
- Download Muesli 2.3
- Steffen Ernsting and Herbert Kuchen. Data Parallel Algorithmic Skeletons with Accelerator Support. International Journal of Parallel Programming, pages 1–17, 2016. Available as 'Online First': doi: 10.1007/s10766–016–0416–7
- Steffen Ernsting and Herbert Kuchen. Java Implementation of Data Parallel Skeletons on GPUs. In In Parallel Computing: On the Road to Exascale, Proceedings of the International Conference on Parallel Computing, ParCo 2015, 1–4 September 2015, Edinburgh, Scotland, UK, pages 155–164, 2015.
- Steffen Ernsting and Herbert Kuchen. A Scalable Farm Skeleton for Hybrid Parallel and Distributed Programming. International Journal of Parallel Programming, 42(6):968–987, 2014.
- Steffen Ernsting and Herbert Kuchen. A Scalable Farm Skeleton for Heterogeneous Parallel Programming. In Parallel Computing: Accelerating Computational Science and Engineering (CSE), Proceedings of the International Conference on Parallel Computing, ParCo 2013, 10-13 September 2013, Garching (near Munich), Germany, pages 72–81, 2013.
- Steffen Ernsting and Herbert Kuchen. Algorithmic skeletons for multi-core, multi-GPU systems and clusters. IJHPCN, 7(2):129–138, 2012.
- Steffen Ernsting and Herbert Kuchen. Data Parallel Skeletons in Java. In Proceedings of the International Conference on Computational Science, ICCS 2012, Omaha, Nebraska, USA, 4–6 June, 2012, pages 1817–1826, 2012.
- Steffen Ernsting and Herbert Kuchen. Data Parallel Skeletons for GPU Clusters and Multi-GPU Systems. In Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August – 3 September 2011, Ghent, Belgium, pages 509–518, 2011.
- Philipp Ciechanowicz and Herbert Kuchen: Enhancing Muesli's Data Parallel Skeltons for Multi-Core Computer Architectures. In: Proceedings of the 12th IEEE International Conference on High Performance Computing and communications (HPCC). Melbourne, Victoria, Australia, pp. 108-113, DOI 10.1109/HPCC.2010.23.
- Philipp Ciechanowicz, Michael Poldner, Herbert Kuchen: The Münster Skeleton Library Muesli- A Comprehensive Overview. ERCIS Working Paper No. 7, 2009.
- Philipp Ciechanowicz: Algorithmic Skeletons for General Sparse Matrices on Multi-Core Processors. In: Proceedings of The 20th IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS). Orlando, Florida, USA, 2008, p. 188-197.
- Philipp Ciechanowicz, Stephan Duglosz, Herbert Kuchen, Ulrich Müller-Funk: Exploiting Training Example Parallelism with a Batch Variant of the ART 2 Classification Algorithm. In: Proceedings of The IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN) as part of The 26th IASTED International Multi-Conference on Applied Informatics. Innsbruck, Austria, 2008, p. 195-201
- Michael Poldner, Herbert Kuchen: Task Parallel Skeletons for Divide and Conquer. Proceedings of Workshop of the Working Group Programming Languages and Computing Concepts of the German Computer Science Association GI, Bad Honnef, 2008.
- Michael Poldner, Herbert Kuchen: Optimizing Skeletal Stream Processing for Divide and Conquer. Proceedings of the 3rd International Conference on Software and Data Technologies (ICSOFT), pages 181-189, INSTICC PRESS, 2008.
- Michael Poldner, Herbert Kuchen: Algorithmic Skeletons for Branch and Bound. ICSOFT 2006, CCIS 10, pages 204–219, Springer, 2008.
- Michael Poldner, Herbert Kuchen: On Implementing the Farm Skeleton. Parallel Processing Letters, Vol. 18, No. 1, pages 117-131, March 2008.
- Michael Poldner, Herbert Kuchen: Skeletons for Divide and Conquer Algorithms. Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN), Innsbruck, Austria, February, IASTED/ACTA Press 2008.
- Michael Poldner, Herbert Kuchen: Algorithmic Skeletons for Branch & Bound. Proceedings of 1st International Conference on Software and Data Technology (ICSOFT), Vol. 1, pages 291-300, Setubal, Portugal, 2006.
- Michael Poldner, Herbert Kuchen: Scalable Farms. Proceedings of the International Conference ParCo 2005, NIC Series, Vol. 33, pages 795-802, 2006.
- Michael Poldner, Herbert Kuchen: On Implementing the Farm Skeleton. Proceedings of 3rd International Workshop on High-level Parallel Programming and Applications (HLPP), Warwick, 2005.
- Herbert Kuchen. A Skeleton Library. In Proceedings of the 8th International Euro-Par Conference on Parallel Processing, Euro-Par’02, pages 620–629, London, UK, 2002. Springer-Verlag. ISBN 3-540-44049-6.
- H. Kuchen and J. Striegnitz. Higher-Order Functions and Partial Applications for a C++ Skeleton Library. In Proceedings of the 2002 joint ACM-ISCOPE Conference on Java Grande, pages 122–130. ACM, 2002
- Herbert Kuchen and Murray Cole. The Integration of Task and Data Parallel Skeletons. Parallel Processing Letters, 12(02):141–155, 2002.
Muesli is available under the MIT license. For more information please refer to the LICENSE.txt file. -