SLEEF implements vectorized versions of all C99 real floating point math functions. It can utilize SIMD instructions of modern processors. SLEEF is designed to fully utilize SIMD computation by reducing the use of conditional branches and scatter/gather memory access. Our benchmarks show that the performance of SLEEF is comparable to that of the best commercial library.
UniSIMD provides a unified and low-level macro assembler for ARM and x86 architectures. It declares a subset of shared SIMD instructions and a common API to reduce code deduplication and variation. Currently Intel SSE2 (32-bit x86 ISA) and ARM NEON (32-bit ARMv7 ISA) are supported. 64-bit wide SIMD with longer registers and adressing will be added later. UniSIMD is a C/C++ macro collection, thus can be easily included from header files.
Connected Component Labeling (CCL) is a well known technique for assigning a unique label to each of connected components in a given binary image. In this project, libraries that implement a parallel CCL algorithm that is suitable for execution on GPGPU are provided. In the archive, an implementation in OpenCL and C that is usable in combination with OpenCV library, a single-threaded vectorized CPU implementation that utilizes AVX2 instructions, and a Java implementation that is for helping unde
dispy is a Python framework for parallel computation. It distributes tasks across across multiple processors in a single machine (SMP), other machine nodes in a cluster, grid or cloud. It's suitable for large sets of data being worked on in parallel (SIMD). It does not facilitate communication or sharing among execution tasks. It utilizes asynchronous sockets, coroutines, epoll, kqueue or Windows IOCP.
|