2021-11-05
GCC Code Generation v. Manual Vectorization and Proprietary Compilers
Update
- 29-03-2022
-
On the comparison with proprietary compilers front, there are more recent results published by Polyhedron which make GCC look even worse than previously. On GNU/Linux Ifort 2019.5 and gfortran 7.4 were used, but GCC 9.1 was available when that ifort was released as far as I can tell, and gfortran
-ffast-math
wasn’t used (or-fuse-profile
and-flto
).
Many people underestimate GCC in comparison with proprietary compilers, and
perhaps underestimate optimizing compilers in general.Remember the
experience of even the first
optimizing compiler.
I wrote up a couple of cases of some recent work
with it.
Versus Manual Intrinsic Coding
The first [pdf]
caseAuto-vectorized MPI Reduction Operations: or GCC is
Better than Supposed, Part 1
generalizes and extends vectorization of
reduction operations in OpenMPI which was found to be important and was
previously done only for x86_64But I’m working on a POWER9 system.
manually with compiler intrinsics. In contrast to the position of the paper
about that, it turns out to be easy to get most of the relevant operations
auto-vectorized (with GCC or other compilers) — more than were done manually
— and independently of the target architecture.The easy way
doubtless wouldn’t make a conference paper…
Versus Proprietary Compilers
The second [pdf]
caseProprietary and Free Fortran Compiler Optimizations: or
GCC is better than Supposed, Part 2
compares the performance of the code
GCC generates on a Fortran benchmark set against other compilers I have to
hand on x86_64 and POWER. In the bottom line GCC is basically on par with
Intel ifort, and somewhat behind IBM XL. There’s analysis of some of the
cases.
Disclaimer: It’s a long time since I was a GCC maintainer, so I don’t have a particular axe to grind except for free software, engineering, and measurement.