2020-12-17
Free BLAS Isn’t Rubbish
- 2023-07-13
-
Note new data in BLIS repo.
There’s a myth that proprietary
BLASes
— at least in Intel MKL — are much better than free software
ones.Only considering CPU ones here, not GPU.
See also reference
in a previous item. It results in
people insisting on using MKL even on AMD CPUs it doesn’t support, or isn’t
meant to, in preference to AMD’s BLIS (a version of BLIS).Presumably
at least no-one would deny that even reference BLAS is infinitely better
than MKL on ARM and POWER, though.
(MKL was also widely used in
contravention of its licence before that changed to be more permissive.) The
myth is remarkably resistant to data, like the analogous one for the Intel
compiler, but I’ll another opportunity to refer to some recently obtained.
Over the time I’ve known OpenBLAS, since its
early days, MKL level 3 performance has only been much better than for
OpenBLAS’ on new micro-architectures that no-one has taken an interest
in.There was a hiatus and then backtracking on Skylake-X for a
while, and it never supported Knights Landing.
BLIS is also fairly good in many cases,
and possibly more robust than OpenBLAS for multi-threading. There’s also
Eigen, though
it’s not primarily to provide vanilla BLAS. In more specialized applications,
for small matrices, the reason MKL improved was competition from
libxsmm,According to its
maintainer.
and BLASFEO’s
limited coverage of somewhat larger ones seems fast.
Anyway, I’ll take the opportunity to refer to recent
data
supplementing the
original,Which
has since acquired new measurements, but specifically not for recent OpenBLAS.
particularly to be fair to OpenBLAS on SKX, and to add some more systems.
There’s a sample result reproduced below for illustration.
By the way, ATLAS is sometimes referred to as a free optimized BLAS, but is certainly inferior to OpenBLAS on recent hardware, and appears to be dead.