OpenBLAS is an optimized BLAS (Basic Linear Algebra Subprograms) library based on GotoBLAS2 1.13 BSD version.
Homepage
Download
Recent Releases
0.3.2821 Sep 2024 11:45
minor bugfix:
8-Aug-2024
General:
Reworked the unfinished implementation of HUGETLB from GotoBLAS.
For allocating huge memory pages as buffers on suitable systems
Changed the unfinished implementation of GEMM3M for the generic.
Target on all architectures to at least forward to regular GEMM
Improved multithreaded GEMM performance for large non-skinny matrices.
Improved BLAS3 performance on larger multicore systems through improved.
Parallelism
Improved performance of the initial memory allocation by reducing.
Locking overhead
Improved performance of GBMV at small problem sizes by introducing
a size barrier for the switch to multithreading.
Added an implementation of the CBLAS_GEMM_BATCH extension.
Miscompilation of CAXPYC and ZAXPYC on all architectures in
CMAKE builds (error introduced in 0.3.27).
Corner cases involving the handling of NAN and INFINITY.
Arguments in ?SCAL on all architectures
Added support for cross-compiling to WEBM with CMAKE (in addition
to the already present makefile support).
NAN handling and potential accuracy in compilations with
Intel ICX by supplying a suitable fp-model option by default.
The contents of the github project wiki have been converted into
a new set of documentation included with the source code.
It is now possible to register a callback function that replaces.
The built-in support for multithreading with an external backend
Like TBB (openblas_set_threads_callback_function)
Potential duplication of sufin shared library naming.
Improved C compiler detection by the build system to tolerate more.
Naming variants for gcc builds
an unnecessary dependency of the utest on CBLAS.
Spurious error reports from the BLAS extensions utest.
Unwanted invocation of the GEMM3M tests in cross-compilation.
a flaw in the makefile build that could lead to the pkgconfig.
File containing an entry of UNKNOWN for the target cpu after installing
Integrated from the Reference-LAPACK project:
Uninitialized variables in the LAPACK tests for ?QP3RK
0.3.2704 Apr 2024 22:05
minor bugfix:
general:
added initial (generic) support for the CSKY architecture
capped the maximum number of threads used in GEMM, GETRF and POTRF to avoid creating
underutilized or idle threads
sped up multithreaded POTRF on all platforms
added extension openblas_set_num_threads_local() that returns the previous thread count
re-evaluated the SGEMV and DGEMV load thresholds to avoid activating multithreading
for too small workloads
improved the fallback code used when the precompiled number of threads is exceeded,
and made it callable multiple times during the lifetime of an instance
added CBLAS interfaces for the BLAS extensions ?AMIN,?AMAX, CAXPYC and ZAXPYC
fixed a potential buffer overflow in the interface to the GEMMT kernels
fixed use of incompatible pointer types in GEMMT and C/ZAXPBY as flagged by GCC-14
fixed unwanted case sensitivity of the character parameters in ?TRTRS
sped up the OpenMP thread management code
fixed sizing of logical variables in INTERFACE64 builds of the C version of LAPACK
fixed inclusion of new LAPACK and LAPACKE functions from LAPACK 3.11 in the shared library
added a testsuite for the BLAS extensions
modified the error thresholds for SGS/DGS functions in the LAPACK testsuite to suppress
spurious errors
added support for building the benchmark collection with CMAKE
added rewriting of linker options to avoid linking both libgomp and libomp in CMAKE builds
with OpenMP enabled that use clang with gfortran
fixed building on systems with ucLibc
added support for calling ?NRM2 with a negative increment value on all architectures
added support for the LLVM18 version of the flang-new compiler
fixed handling of the OPENBLAS_LOOPS variable in several benchmarks
Integrated fixes from the Reference-LAPACK project:
Increased accuracy in C/ZLARFGP (Reference-LAPACK PR 981)
0.3.2603 Jan 2024 13:32
minor bugfix:
general:
- improved the version of openblas.pc that is created by the CMAKE build
- fixed a CMAKE-specific build problem on older versions of MacOS
- worked around linking problems on old versions of MacOS
- corrected installation location of the lapacke_mangling header in CMAKE builds
- added type declarations for complex variables to the MSVC-specific parts of the LAPACK header
- significantly sped up ?GESV for small problem sizes by introducing a lower bound for multithreading
- imported additions and corrections from the Reference-LAPACK project:
- added new LAPACK functions for truncated QR with pivoting (Reference-LAPACK PRs 891 941)
- handle miscalculation of minimum work array size in corner cases (Reference-LAPACK PR 942)
- fixed use of uninitialized variables in ?GEDMD and improved inline documentation (PR 959)
- fixed use of uninitialized variables (and consequential failures) in ?BBCSD (PR 967)
- added tests for the recently introduced Dynamic Mode Decomposition functions (PR 736)
- fixed several memory leaks in the LAPACK testsuite (PR 953)
- fixed counting of testsuite results by the Python script (PR 954)
x86-64:
- fixed computation of CASUM on SkylakeX and newer targets in the special
case that AVX512 is not supported by the compiler or operating environment
- fixed potential undefined behaviour in the CASUM/ZASUM kernels for AVX512 targets
- worked around a problem in the pre-AVX kernels for GEMV
- sped up the thread management code on MS Windows
arm64:
- fixed building of the LAPACK testsuite with Xcode 15 on Apple M1 and newer
- sped up the thread management code on MS Windows
- sped up SGEMM and DGEMM on Neoverse V1 and N1
- sped up ?DOT on SVE-capable targets
- reduced the number of targets in DYNAMIC_ARCH builds by eliminating functionally equivalent ones
- included support for Apple M1 and newer targets in DYNAMIC_ARCH builds
power:
- improved the SGEMM kernel for POWER10
- fixed compilation with (very) old versions of gc