2022-01-21
Building Container Images for Foreign Architectures
Update
- 2022-10-12
-
Add a comparison with native building on the POWER system.
- 2022-09-22
-
Re-done for Charliecloud 0.27+ commands and
almalinux
instead ofcentos
. - 2022-02-23
-
Stray bits in two examples fixed.
Note that CentOS 8 examples don’t work since the distribution was EOLed, there’s currently no
ch-image --force
support for a replacement likealmalinux
(but see a contribution), and there’s no Docker image of CentOS Stream 8 (though you could build from a linuxcontainers 8-Stream root). Alsoch-build2dir
is due for removal in version 0.27.
I do user support a GNU/Linux POWER9-based compute system on which users want
to run containers for whatever reasons.My reasons would typically be
running things for which I already have packages I can’t get installed, but
in many cases it’s possible just to download and unpack the relevant ones
somewhere like ~/.local
, and adjust environment variables.
The
first complaint was that people didn’t have what they needed from Docker Hub,
and wanted to build their own images, but thought they couldn’t without
privileges for Docker.
My solution to that is to use
Charliecloud. That can be
installed trivially, can build images for mainstream GNU/Linux distributions,
and run them, completely unprivileged (given Linux user namespaces). There
may be alternatives, but Charliecloud is a nice simple,
secure,Modulo Linux user namespace vulnerabilities.
HPC-oriented
system. That means running mostly un-isolated user programs — just with a
namespace to adjust ‘/
’ — not service daemons, for instance.
I don’t know the current state of Docker for unprivileged building, but it’s not installed on the system anyway. One ‘unprivileged’ alternative container system image builder is Red Hat’s Buildah, but that’s also not installed, and I don’t know whether you can install and run it without privileges — at least not trivially.
That was a problem raised for users who want to build ppc64le images.
Anyway, Charliecloud could keep them happy building and running container
images on the system from a Dockerfile
or minimal root image.
Simple QEMU use
Then there was a statement that suitable ppc64le-architecture container images needed to be built on amd64 — I’m not sure why — and they can’t be.
Can’t is a challenging word.And surprising here. The
obvious, heavyweight, approach is to run in a ppc64le QEMU VM on your
desktop, however, I couldn’t make libvirt happy with the POWER9
configuration of a Vagrant box in the limited time I spent, though it’s
probably easy for an expert.
However, it’s not much of a challenge; the
procedure is trivial, at least if your amd64 system is Debian-based and you
can install packages, to the extent I’ve exercised it.
First, you may need to
install the
current Charliecloud from source (version 0.27+); at
least Debian 11’s is too old in several respects, and there’s
currently no backport. The installation is trivial. Then you need
the qemu-user-static
package (not plain qemu-user
or qemu-user-binfmt
).qemu-user-static
installs
‘binfmt’ hooks, to run qemu-ppc64le-static
when
exec
ing a ppc64le binary in my case.
That allows foreign-architecture
binaries to run in the container like
magic while
building the image — most likely for package installations with
RUN
in a
Dockerfile
.
Simple pulls
It isn’t actually necessary to use QEMU if all you want to do is pull from an image repository, or make a base image from a rootfs; i.e. something like this works anyway:
$ ch-image build --a ppc64le -f <(echo FROM debian) -t d11 .
$ ch-convert -o dir d11 d11
However, the ppc64le debian
image you get in the local store will
be used by a subsequent FROM debian
even if you specify
--arch amd64
for any subsequent build (at least with Charliecloud
0.27).You may also need to care about the
cache.
Using FROM ... AS ...
doesn’t help, but you can pull the
image to a directory and ch-convert
if
you want to keep it around with a suitable name.
QEMU overhead
Obviously you must pay a price speed-wise for QEMU’s architecture emulation but,
at least for things like apt
/yum
operations,
as opposed to
building something, it’s acceptable. I took a long-running example
originally used in anger on the POWER system, and compared
cross-building the image
with building natively.On a laptop with an NVMe SSD and
Tiger Lake CPU on a ∼50 MB/s download speed network link.
The cross-ch-image
step — the one that needs foreign
binaries — took a bit more than twice as long as the native one,
reproducibly to a few percent.
Comparison of cross- and native-building images to show QEMU overhead
$ ch-image -a ppc64le pull almalinux:8 # arch-independent
[...]
$ cat <<EOF > Dockerfile.a8
FROM almalinux:8
# For installs inside container:
RUN dnf install -y fakeroot epel-release
RUN dnf install -y --setopt=install_weak_deps=false epel-release \
'dnf-command(config-manager)' \
&& dnf config-manager --enable powertools \
&& dnf config-manager --enable epel \
&& dnf config-manager --enable appstream \
&& dnf copr enable loveshack/livhpc -y \
&& dnf install -y --setopt=install_weak_deps=false \
cube-devel opari2 bash-completion libdwarf-devel cube \
scorep-openmpi scorep \
&& dnf clean all
# lustre mount point
RUN mkdir /nobackup
EOF
$ time ch-image build -a ppc64le --force -t a8ppc -f Dockerfile.a8 .
[...]
real 3m42.075s
user 2m46.660s
sys 0m3.939s
$ ch-image pull almalinux:8
$ time ch-image build --force -t a8 -f Dockerfile.a8 .
[...]
real 1m25.537s
user 0m20.557s
sys 0m3.213s
On the POWER system (AC922 nodes), the native build operation was about the
same speed as on the x86 laptop if the image storage is on either local disk
(SAS of some sort) or /dev/shm
. The time is actually dependent on
the filesystem, with the build taking ∼ 1.8 times as long on the NFS home
filesystem and 3 times on the Lustre filesystem (using the default single 1MB
stripe layout).
In contrast with image building, after running configure
, the build
stage for a
simple C source package was around 10 times slower than cross-building
it with the Debian powerpc64le
cross-tools.
No local privileges?
Suppose you need to do this on a system on which you don’t have privileges to
install the binfmt hooks QEMU uses to execute foreign binaries transparently.
You might think you could do something like this after obtaining a
qemu-ppc64le-static
binaryProbably by unpacking a suitable
distribution package, as below.
FROM almalinux:8
COPY qemu-ppc64le-static /usr/bin
# need a static sh
COPY busybox /tmp/sh
SHELL ["/tmp/sh", "-c"]
RUN qemu-ppc64le-static ...
That won’t work because it doesn’t account for #!
in scripts
(e.g. yum
/dnf
), or exec’ing subprograms, only running an
initial binary under the shell.
To solve the problem, you could build and use
PRoot,Currently x86_64, arm,
and aarch64 only.
which has a hook for exec
ing with QEMU.
(proot
intercepts system calls made by programs running under it, and
more-or-less emulates sudo
ne chroot
that way, more
comprehensively than fakeroot
; see
udocker for an interesting use.)
For example:
$ ch-image pull -a ppc64le almalinux:8
$ ch-convert -o dir almalinux:8 a8p
$ proot -q ./qemu-ppc64le-static -S a8p/ yum install -y epel-release
That has overhead from PRoot as well as QEMU, but it’s tolerable.
Apart from foreign images, PRoot may be useful for installing packages in
images for distributions missing --force
support in
ch-image
.
An example of extracting the relevant qemu static binary
from Debian (into ./usr/bin
), if you have dpkg
is
$ wget -O- http://ftp.debian.org/debian/pool/main/q/qemu/\
qemu-user-static_6.2+dfsg-1_amd64.deb |
dpkg-deb -x - .
With RPM tools it might be
$ rpm2cpio https://dl.fedoraproject.org/pub/fedora/linux/releases/34/\
Everything/x86_64/os/Packages/q/qemu-user-static-5.2.0-5.fc34.1.x86_64.rpm |
cpio -id
dpkg-deb
may be available on an RPM-based system, and
rpm2cpio
on a Debian derivative but, if necessary, you can download a
.deb
file, and extract the contents:
$ ar x qemu-user-static_6.2+dfsg-1_amd64.deb data.tar.xz
$ tar fx data.tar.xz
That’s quicker and simpler then building a Charliecloud image with a
qemu-user-static
package installed, which you might otherwise
do to run a packaged program.
No registry image?
If there isn’t a registry from which you can pull the base image of
interestOthers than hub.docker.com are available.
you could
investigate
https://images.linuxcontainers.org/images
— images intended for LXC, but generally useful.
This isn’t such a case, but illustrates the principle:
$ wget https://images.linuxcontainers.org/images/almalinux/\
8/ppc64el/default/20220922_23:08/rootfs.tar.xz
$ ch-image import rootfs.tar.xz ppcalma:8 # can't pipe in
Since there’s nothing like ‘latest’ on linuxcontainers, you have to look for the image du jour for the distribution, unfortunately. (Another date’s image failed because the tarball contained an absolute symbolic link. You can unpack a tarball, fix any such problems, and just import the directory instead.)