2022-01-21 containers, virtualization, updated

Building Container Images for Foreign Architectures

Update

2022-10-12

Add a comparison with native building on the POWER system.

2022-09-22

Re-done for Charliecloud 0.27+ commands and almalinux instead of centos.

2022-02-23

Stray bits in two examples fixed.

Note that CentOS 8 examples don’t work since the distribution was EOLed, there’s currently no ch-image --force support for a replacement like almalinux (but see a contribution), and there’s no Docker image of CentOS Stream 8 (though you could build from a linuxcontainers 8-Stream root). Also ch-build2dir is due for removal in version 0.27.


I do user support a GNU/Linux POWER9-based compute system on which users want to run containers for whatever reasons.My reasons would typically be running things for which I already have packages I can’t get installed, but in many cases it’s possible just to download and unpack the relevant ones somewhere like ~/.local, and adjust environment variables.

The first complaint was that people didn’t have what they needed from Docker Hub, and wanted to build their own images, but thought they couldn’t without privileges for Docker.

My solution to that is to use Charliecloud. That can be installed trivially, can build images for mainstream GNU/Linux distributions, and run them, completely unprivileged (given Linux user namespaces). There may be alternatives, but Charliecloud is a nice simple, secure,Modulo Linux user namespace vulnerabilities.

HPC-oriented system. That means running mostly un-isolated user programs — just with a namespace to adjust ‘/’ — not service daemons, for instance.

I don’t know the current state of Docker for unprivileged building, but it’s not installed on the system anyway. One ‘unprivileged’ alternative container system image builder is Red Hat’s Buildah, but that’s also not installed, and I don’t know whether you can install and run it without privileges — at least not trivially.

That was a problem raised for users who want to build ppc64le images.

Anyway, Charliecloud could keep them happy building and running container images on the system from a Dockerfile or minimal root image.

Simple QEMU use

Then there was a statement that suitable ppc64le-architecture container images needed to be built on amd64 — I’m not sure why — and they can’t be.

Can’t is a challenging word.And surprising here. The obvious, heavyweight, approach is to run in a ppc64le QEMU VM on your desktop, however, I couldn’t make libvirt happy with the POWER9 configuration of a Vagrant box in the limited time I spent, though it’s probably easy for an expert.

However, it’s not much of a challenge; the procedure is trivial, at least if your amd64 system is Debian-based and you can install packages, to the extent I’ve exercised it.

First, you may need to install the current Charliecloud from source (version 0.27+); at least Debian 11’s is too old in several respects, and there’s currently no backport. The installation is trivial. Then you need the qemu-user-static package (not plain qemu-user or qemu-user-binfmt).qemu-user-static installs ‘binfmt’ hooks, to run qemu-ppc64le-static when execing a ppc64le binary in my case.

That allows foreign-architecture binaries to run in the container like magic while building the image — most likely for package installations with RUN in a Dockerfile.

Simple pulls

It isn’t actually necessary to use QEMU if all you want to do is pull from an image repository, or make a base image from a rootfs; i.e. something like this works anyway:

$ ch-image build --a ppc64le -f <(echo FROM debian) -t d11 .
$ ch-convert -o dir d11 d11

However, the ppc64le debian image you get in the local store will be used by a subsequent FROM debian even if you specify --arch amd64 for any subsequent build (at least with Charliecloud 0.27).You may also need to care about the cache.

Using FROM ... AS ... doesn’t help, but you can pull the image to a directory and ch-convert if you want to keep it around with a suitable name.

QEMU overhead

Obviously you must pay a price speed-wise for QEMU’s architecture emulation but, at least for things like apt/yum operations, as opposed to building something, it’s acceptable. I took a long-running example originally used in anger on the POWER system, and compared cross-building the image with building natively.On a laptop with an NVMe SSD and Tiger Lake CPU on a 50 MB/s download speed network link.

The cross-ch-image step — the one that needs foreign binaries — took a bit more than twice as long as the native one, reproducibly to a few percent.

Comparison of cross- and native-building images to show QEMU overhead

$ ch-image -a ppc64le pull almalinux:8  # arch-independent
 [...]
$ cat <<EOF > Dockerfile.a8
FROM almalinux:8
# For installs inside container:
RUN dnf install -y fakeroot epel-release
RUN dnf install -y --setopt=install_weak_deps=false epel-release \
                'dnf-command(config-manager)' \
 && dnf config-manager --enable powertools \
 && dnf config-manager --enable epel \
 && dnf config-manager --enable appstream \
 && dnf copr enable loveshack/livhpc -y \
 && dnf install -y --setopt=install_weak_deps=false \
        cube-devel opari2 bash-completion libdwarf-devel cube \
        scorep-openmpi scorep \
 && dnf clean all
# lustre mount point
RUN mkdir /nobackup
EOF
$ time ch-image build -a ppc64le --force -t a8ppc -f Dockerfile.a8 .
 [...]
real	3m42.075s
user	2m46.660s
sys	0m3.939s
$ ch-image pull almalinux:8
$ time ch-image build --force -t a8 -f Dockerfile.a8 .
 [...]
real	1m25.537s
user	0m20.557s
sys	0m3.213s

On the POWER system (AC922 nodes), the native build operation was about the same speed as on the x86 laptop if the image storage is on either local disk (SAS of some sort) or /dev/shm. The time is actually dependent on the filesystem, with the build taking  ∼ 1.8 times as long on the NFS home filesystem and 3 times on the Lustre filesystem (using the default single 1MB stripe layout).

In contrast with image building, after running configure, the build stage for a simple C source package was around 10 times slower than cross-building it with the Debian powerpc64le cross-tools.

No local privileges?

Suppose you need to do this on a system on which you don’t have privileges to install the binfmt hooks QEMU uses to execute foreign binaries transparently. You might think you could do something like this after obtaining a qemu-ppc64le-static binaryProbably by unpacking a suitable distribution package, as below.

FROM almalinux:8
COPY qemu-ppc64le-static /usr/bin
# need a static sh
COPY busybox /tmp/sh
SHELL ["/tmp/sh", "-c"]
RUN qemu-ppc64le-static ...

That won’t work because it doesn’t account for #! in scripts (e.g. yum/dnf), or exec’ing subprograms, only running an initial binary under the shell.

To solve the problem, you could build and use PRoot,Currently x86_64, arm, and aarch64 only.

which has a hook for execing with QEMU. (proot intercepts system calls made by programs running under it, and more-or-less emulates sudone chroot that way, more comprehensively than fakeroot; see udocker for an interesting use.) For example:

$ ch-image pull -a ppc64le almalinux:8
$ ch-convert -o dir almalinux:8 a8p
$ proot -q ./qemu-ppc64le-static -S a8p/ yum install -y epel-release

That has overhead from PRoot as well as QEMU, but it’s tolerable.

Apart from foreign images, PRoot may be useful for installing packages in images for distributions missing --force support in ch-image.

An example of extracting the relevant qemu static binary from Debian (into ./usr/bin), if you have dpkg is

$ wget -O- http://ftp.debian.org/debian/pool/main/q/qemu/\
qemu-user-static_6.2+dfsg-1_amd64.deb |
  dpkg-deb -x - .

With RPM tools it might be

$ rpm2cpio https://dl.fedoraproject.org/pub/fedora/linux/releases/34/\
Everything/x86_64/os/Packages/q/qemu-user-static-5.2.0-5.fc34.1.x86_64.rpm |
  cpio -id

dpkg-deb may be available on an RPM-based system, and rpm2cpio on a Debian derivative but, if necessary, you can download a .deb file, and extract the contents:

$ ar x qemu-user-static_6.2+dfsg-1_amd64.deb data.tar.xz
$ tar fx data.tar.xz

That’s quicker and simpler then building a Charliecloud image with a qemu-user-static package installed, which you might otherwise do to run a packaged program.

No registry image?

If there isn’t a registry from which you can pull the base image of interestOthers than hub.docker.com are available.

you could investigate https://images.linuxcontainers.org/images — images intended for LXC, but generally useful.

This isn’t such a case, but illustrates the principle:

$ wget https://images.linuxcontainers.org/images/almalinux/\
8/ppc64el/default/20220922_23:08/rootfs.tar.xz
$ ch-image import rootfs.tar.xz ppcalma:8  # can't pipe in

Since there’s nothing like ‘latest’ on linuxcontainers, you have to look for the image du jour for the distribution, unfortunately. (Another date’s image failed because the tarball contained an absolute symbolic link. You can unpack a tarball, fix any such problems, and just import the directory instead.)