GCC Code Generation v. Manual Vectorization and Proprietary Compilers

Update

29-03-2022: On the comparison with proprietary compilers front, there are more recent results published by Polyhedron which make GCC look even worse than previously. On GNU/Linux Ifort 2019.5 and gfortran 7.4 were used, but GCC 9.1 was available when that ifort was released as far as I can tell, and gfortran -ffast-math wasn’t used (or -fuse-profile and -flto).

Many people underestimate GCC in comparison with proprietary compilers, and perhaps underestimate optimizing compilers in general.Remember the experience of even the first optimizing compiler.

I wrote up a couple of cases of some recent work with it.

Versus Manual Intrinsic Coding

The first [pdf] caseAuto-vectorized MPI Reduction Operations: or GCC is Better than Supposed, Part 1

generalizes and extends vectorization of reduction operations in OpenMPI which was found to be important and was previously done only for x86_64But I’m working on a POWER9 system.

manually with compiler intrinsics. In contrast to the position of the paper about that, it turns out to be easy to get most of the relevant operations auto-vectorized (with GCC or other compilers) — more than were done manually — and independently of the target architecture.The easy way doubtless wouldn’t make a conference paper…

Versus Proprietary Compilers

The second [pdf] caseProprietary and Free Fortran Compiler Optimizations: or GCC is better than Supposed, Part 2

compares the performance of the code GCC generates on a Fortran benchmark set against other compilers I have to hand on x86_64 and POWER. In the bottom line GCC is basically on par with Intel ifort, and somewhat behind IBM XL. There’s analysis of some of the cases.

Disclaimer: It’s a long time since I was a GCC maintainer, so I don’t have a particular axe to grind except for free software, engineering, and measurement.