Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting aarch64/ppc64le/s390x/x86_64 in the future when /dev/kvm is not available #2782

Open
miabbott opened this issue Mar 30, 2022 · 21 comments

Comments

@miabbott
Copy link
Member

We need a solution for continuing to use coreos-assembler across all supported architectures where /dev/kvm is not available, in order to support our strongly coupled build+test model that we have adopted.

In the near future, /dev/kvm will not be available for newer ppc64le platforms and will break our ability to produce CoreOS-based builds on that platform. We may be able to continue to limp along on older versions of ppc64le platforms, but we will eventually reach a point in time where that will no longer be possible.

How will we be able to continue to support our opinionated, container-based build+test model using coreos-assembler in that future?

@cgwalters
Copy link
Member

There are a variety of options.

First, we could use full qemu emulation (I assume this exists and will continue to do so) - it will just be much slower and may have bugs.

Doing this just for the build side and a few quick sanity checks, and then having the rest of the workload passed over to other infrastructure would likely be tenable.
(We actually really have this issue across the board; I am proud of our integrated build+testing with qemu, but we are overly reliant on this and should do more to schedule tests on available cloud and other platforms even just on x86_64/aarch64)

The other option is to have a special case path that talks to a remote virt platform and does a build using a real privileged container that also uses loopback mounts to create the filesystem. This is much like what https://www.osbuild.org/ is doing (except they install as an RPM, not a privileged container, but same idea with loopback)

Or, we could not special case that and do it across the board, but IMO it completely ruins the elegance of our current model of just being a regular unprivileged container that can run via podman locally or in a stock Kubernetes/OCP cluster without any extra setup.

(For example, talking to an external service brings in all the usual problems around resource lifecycles - what cleans up these transient remote worker VMs if a job gets terminated, etc. These are solvable problems and have been solved, but are also nontrivial to deal with operationally)

@nullr0ute
Copy link

So osbuild/Image Builder already supports all of this functionality. Maybe it's a good time to consider how coreos-assembler may be able to make use of that infrastructure.

@teg
Copy link

teg commented Apr 5, 2022

The Image Builder team would be very happy to look into collaborating on building CoreOS images through our service in api.openshift.com, if that's something you'd be interested in adopting.

The service is currently used to build RHEL for Edge and (soon) Fedora IoT, as well as all the RHEL cloud and virt images. We aspire to be able to build all Fedora/CentOS/RHEL, so CoreOS is obviously a natural candidate.

The benefits that come to mind would be:

  • you would no longer need to maintain any build infrastructure of your own (you'd still need to orchestrate builds, but wouldn't need to do the actual building so wouldn't need to worry about IBM workers etc)
  • we would use the same tooling/infrastructure to build all OSTree-based artefacts, ensuring feature parity of the tools
  • hopefully we'd be able to collaborate more closely, avoid duplicated effort and move towards uniformity where it makes sense of both OSTree and rpm-based Fedora/CentOS/RHEL.

What would need to happen:

  • we would need to figure out what API extensions would be necessary for the service to be useable to you
  • someone would need to do the work to implement any gaps

Potential challenges:

  • your current developer workflow might be quite different to how we currently develop image definitions, so I don't know how much work it would be to bridge this gap.
  • we would need to figure out when we would all have the time to schedule all the work.

It is worth mentioning that osbuild can be made to work in (privileged) containers, which is how we develop locally, as I know that was a question in the past.

@cverna
Copy link
Member

cverna commented Apr 6, 2022

I think it is, indeed, interesting to look back at osbuild. There is indeed value in code reuse and avoiding duplicated efforts. In regards to the build infrastructure, there are obviously pros as @teg mentioned but there are also cons not owning the tool we use to build the OS means being slower in terms of innovation on how we build and test the OS, basically this is adding an external dependency to our build process.

Another point that I remember was discussed is that currently cosa is also doing the testing of the OS, I don't know if this could be something osbuild could support but having the same tool to build and test the OS is a big advantage of cosa IMO.

@miabbott
Copy link
Member Author

miabbott commented Apr 6, 2022

Another point that I remember was discussed is that currently cosa is also doing the testing of the OS, I don't know if this could be something osbuild could support but having the same tool to build and test the OS is a big advantage of cosa IMO.

Big emphasis on this. We gain mountains of value by coupling our build+test operations using the same tool.

@cgwalters
Copy link
Member

I just want to say first I'd love for our teams to work together more! I think just having regular ongoing communication is the first step and so it's great to have this thread!

you would no longer need to maintain any build infrastructure of your own

To repeat though what others have said, testing is also very core to what we do. And even in a world where we outsource our builds to some distinct infrastructure, there's quite a lot of infrastructure we maintain for that. (Also beyond testing, the release pipeline, forming a complete CI/CD system)

Another way to look at this is that for us the build system is an implementation detail, the output of the build system is what we present to our users and we spend a lot of time on that.

Even on just the build system side, we go through phases where delivering new features sometimes we make deep changes to how we build things. I'm definitely in the process of doing that right now.

But on the flip side, I could imagine that we try to use osbuild internally somehow, and as long as we can have a fast turnaround time on shipping that, I think we can start that integration story. I completely agree that tighter integration long term makes sense, let's figure out how to do it in a way that helps both of us!

@cverna
Copy link
Member

cverna commented Jul 14, 2022

Is kvm only needed for the build phase or it is also needed to run our tests? This is not something that is really clear to me know. In other words I would like to better understand the scope of our dependency to kvm and what would be (roughly) the impact and effort for us to implement a solution that does not require kvm.
I think we really need to have that discussion now as finding compatible hardware in the near future for ppc64le is going to be a challenge.

@cgwalters
Copy link
Member

We don't strictly need kvm. That's what #2782 (comment) is touching on.

@dustymabe
Copy link
Member

Is kvm only needed for the build phase or it is also needed to run our tests?

I think it's also needed for tests. As Colin mentions we could use emulation for tests, but I think we're going to spend a lot of time chasing issues that aren't real issues. One example that comes to mind is race conditions in other software that we wouldn't hit otherwise.

If we fallback to emulation we might as well just emulate ppc64le on an x86_64 builder and not worry with ppc64le hardware at all.

@cgwalters
Copy link
Member

Yes but to re-iterate; I have a strong opinion that we should use qemu for basic sanity testing (does this new systemd boot) like immediately after making a build - and that's easy to do if the systems are co-located. But even that's not strictly required.

But we have also evolved to be too qemu focused I think. After basic sanity tests are done - and to make this concrete - on ppc64le, we generate a PowerVS image, and pass it off to a remote PowerVS instance, boot it there and run probably most of the kola tests there.
This applies really across all architectures - we don't need to run all kola tests on qemu and aws all the time for example. This relates to "test tiering", e.g. we should identify tests that must pass all the time versus ones we can run more periodically.

@cgwalters
Copy link
Member

cgwalters commented Jul 14, 2022

If we fallback to emulation we might as well just emulate ppc64le on an x86_64 builder and not worry with ppc64le hardware at all.

Mmmm...I think doing some of that could make sense for pull requests in particular. It's much slower, but would in some cases indeed help us do some types of testing. Concretely, I think COPR for example is doing s390x builds via cross-architecture emulation - and that can make sense.

But no matter what we are definitely going to be building and testing across all architectures. We are not going to ship something we didn't test (in a "real" deployment scenario e.g. OpenStack/PowerVS/AWS as opposed to our use of "synthetic" environments like raw qemu). We're just debating where each part happens.

@cgwalters
Copy link
Member

Concretely, did anyone try a spike on builds/tests using export COSA_NO_KVM=1?

@cgwalters
Copy link
Member

cgwalters commented Jul 20, 2022

One thing I think we will need is to teach kola about PowerVS. Looks like there's https://github.com/IBM-Cloud/power-go-client
which is already used by openshift/installer for example.
Bigger picture, we may gain benefit from trying to use e.g. terraform in more places like that project does too.

Another variant of this: add env OPENSHIFT_INSTALL_COREOS_PROVISION_ONLY_WITH_IGNITION=foo.ign OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE=./builds/ppc64le/fedora-coreos-xyz.powervs openshift-install create bootstrap that runs just the bits of openshift-install that provision an image on powervs, and e.g. we pass it the Ignition config and OS image that we want to test. That's 90% of the heavy lifting of what kola is doing.

@cgwalters
Copy link
Member

If full qemu emulation doesn't work for "basic build and test scenarios" (and I'd be quite surprised if this was the case, particularly since right now we're using the qemu from Fedora which doesn't have features cut out of it), then I think the fallback here is going to look more like:

  • add a new flavor of "remote" build to cosa that does things outside of qemu - this is a lot like what osbuild does, if we can reuse some of that code all the better, but I also think conflating the two is going to be messy. One thing that would likely help here a lot is to use the podman remote flow, but have it use privileged containers.

Regardless of the kvm/qemu situation, to repeat I think we clearly need PowerVS support; that can happen in parallel. (Though out of curiosity will there be any e.g. OpenStack or other remote virt APIs support for ppc64le going forward or is that out too? )

@mrcxavier
Copy link

mrcxavier commented Jul 26, 2022

These are the messages that it reported during "cosa build"

+ kola qemuexec -m 4096 --auto-cpus -U --workdir none --console-to-file /srv/fcos/tmp/build.qemu/runvm-console.txt -- -drive if=none,id=root,format=raw,snapshot=on,file=/srv/fcos/tmp/build.qemu/supermin.build/root,index=1 -device virtio-blk,drive=root -kernel /srv/fcos/tmp/build.qemu/supermin.build/kernel -initrd /srv/fcos/tmp/build.qemu/supermin.build/initrd -no-reboot -nodefaults -device virtio-serial -virtfs local,id=workdir,path=/srv/fcos,security_model=none,mount_tag=workdir -append 'root=/dev/vda console=hvc0 selinux=1 enforcing=0 autorelabel=1' -device virtserialport,chardev=virtioserial0,name=cosa-cmdout -chardev stdio,id=virtioserial0 -drive if=none,id=target,format=qcow2,file=/srv/fcos/tmp/build.qemu/fedora-coreos-36.20220726.dev.0-qemu.ppc64le.qcow2.tmp,cache=unsafe -device virtio-blk,serial=target,drive=target
qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-cfpc=workaround
qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-sbbc=workaround
qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-ibs=workaround
=========================
--- FAIL: basic (801.57s)
        cluster.go:94: kolet:
2022-07-22T18:00:35Z kolet: Found content in /etc/sysconfig/network-scripts
    --- FAIL: basic/NetworkScripts (37.67s)
            cluster.go:97: kolet: Process exited with status 1
=========================

qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-ccf-assist=on

@travier
Copy link
Member

travier commented Jul 26, 2022

@mrcxavier Can we get more details? How much time did the build take?

It looks like the problem in the output above is about a QEMU feature that does not work anymore on new Power.

@cgwalters
Copy link
Member

cgwalters commented Jul 26, 2022

These are the messages that it reported during "cosa build"

Just from cosa build? You're not also doing cosa kola ...?

2022-07-22T18:00:35Z kolet: Found content in /etc/sysconfig/network-scripts

This seems like an unrelated bug...has this test been passing in current ppc64le pipelines?

it looks like the problem in the output above is about a QEMU feature that does not work anymore on new Power.

AFAICS the output from qemu here is warnings, not fatal errors. If we got this far, ISTM we must have generated a build. So the next question is around which tests run - and how many of those tests to actually run in qemu versus delegating to powervs.

@dustymabe
Copy link
Member

One thing I think we will need is to teach kola about PowerVS.

I've been feeling for a while that we have a decent amount of virt code (think our build process that uses supermin and qemu directly) that deals with a lot of intricacies (i.e. platform differences and whatnot) that we should probably try to leverage another project for.

I'm wondering (assuming libvirt has support for powerVM) if we should try to refactor our code to use libvirt instead and drop alot of the code that is going to bitrot over time and we aren't really experts in. Theoretically this would allow us to use powerVM on power and kvm everywhere else.

Not sure how feasible this is. Interested in others thoughts there.

@cgwalters
Copy link
Member

cgwalters commented Jul 27, 2022

OK first, I am getting confused between "PowerVM" and "PowerVS". I think PowerVM is a bit like KVM+qemu, and PowerVS is more like e.g. OpenStack or KubeVirt?

assuming libvirt has support for powerVM)

It's not listed here: https://libvirt.org/drivers.html

I think the reason may be actually that the PowerVM code is proprietary?

Anyways, to re-summarize my proposal here:

  • Use qemu "full emulation" without KVM for builds and basic tests
  • Generate a PowerVS image, upload to remote PowerVS instance, run tests there

"run tests with PowerVS" requires teaching kola about powervs. Alternatively...one thing I think could work decently well is for us to add a mode to openshift-install which literally just provisions a single node and runs Ignition. It already has all the code to do that; a lot of overlap with what kola is doing around provisioning and ssh, etc.

What you're talking about is really different, it's more like "use powervm everywhere we use qemu/kvm". I don't know the feasibility of that. It might indeed be the case that we could install the powervm userspace in our container, but the real next problem is we have a lot of dependencies on qemu specifically. I think a wholesale port to use libvirt would only make sense if there were a powervm backend for libvirt - as you say.

@mrcxavier
Copy link

@mrcxavier Can we get more details? How much time did the build take?

It looks like the problem in the output above is about a QEMU feature that does not work anymore on new Power.

These are the messages that it reported during "cosa build"

Just from cosa build? You're not also doing cosa kola ...?

2022-07-22T18:00:35Z kolet: Found content in /etc/sysconfig/network-scripts

This seems like an unrelated bug...has this test been passing in current ppc64le pipelines?

it looks like the problem in the output above is about a QEMU feature that does not work anymore on new Power.

AFAICS the output from qemu here is warnings, not fatal errors. If we got this far, ISTM we must have generated a build. So the next question is around which tests run - and how many of those tests to actually run in qemu versus delegating to powervs.
Scenario 1 (export COSA_NO_KVM=1)
sh-5.1# time cosa build
real 34m28.159s
user 32m22.378s
sys 7m46.512s
qemu messages error is shown

Scenario 2
sh-5.1# time cosa build
real 7m59.758s
user 4m26.571s
sys 0m38.811s

no qemu messages error

@travier
Copy link
Member

travier commented Jul 29, 2022

If it takes 8 min for a build I'd say it's perfectly fine. How long does a full kola run takes?

$ cosa kola run --basic-qemu-scenarios
$ cosa kola run --parallel 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants