-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting aarch64/ppc64le/s390x/x86_64 in the future when /dev/kvm
is not available
#2782
Comments
There are a variety of options. First, we could use full qemu emulation (I assume this exists and will continue to do so) - it will just be much slower and may have bugs. Doing this just for the build side and a few quick sanity checks, and then having the rest of the workload passed over to other infrastructure would likely be tenable. The other option is to have a special case path that talks to a remote virt platform and does a build using a real privileged container that also uses loopback mounts to create the filesystem. This is much like what https://www.osbuild.org/ is doing (except they install as an RPM, not a privileged container, but same idea with loopback) Or, we could not special case that and do it across the board, but IMO it completely ruins the elegance of our current model of just being a regular unprivileged container that can run via (For example, talking to an external service brings in all the usual problems around resource lifecycles - what cleans up these transient remote worker VMs if a job gets terminated, etc. These are solvable problems and have been solved, but are also nontrivial to deal with operationally) |
So osbuild/Image Builder already supports all of this functionality. Maybe it's a good time to consider how coreos-assembler may be able to make use of that infrastructure. |
The Image Builder team would be very happy to look into collaborating on building CoreOS images through our service in api.openshift.com, if that's something you'd be interested in adopting. The service is currently used to build RHEL for Edge and (soon) Fedora IoT, as well as all the RHEL cloud and virt images. We aspire to be able to build all Fedora/CentOS/RHEL, so CoreOS is obviously a natural candidate. The benefits that come to mind would be:
What would need to happen:
Potential challenges:
It is worth mentioning that |
I think it is, indeed, interesting to look back at osbuild. There is indeed value in code reuse and avoiding duplicated efforts. In regards to the build infrastructure, there are obviously pros as @teg mentioned but there are also cons not owning the tool we use to build the OS means being slower in terms of innovation on how we build and test the OS, basically this is adding an external dependency to our build process. Another point that I remember was discussed is that currently cosa is also doing the testing of the OS, I don't know if this could be something osbuild could support but having the same tool to build and test the OS is a big advantage of cosa IMO. |
Big emphasis on this. We gain mountains of value by coupling our build+test operations using the same tool. |
I just want to say first I'd love for our teams to work together more! I think just having regular ongoing communication is the first step and so it's great to have this thread!
To repeat though what others have said, testing is also very core to what we do. And even in a world where we outsource our builds to some distinct infrastructure, there's quite a lot of infrastructure we maintain for that. (Also beyond testing, the release pipeline, forming a complete CI/CD system) Another way to look at this is that for us the build system is an implementation detail, the output of the build system is what we present to our users and we spend a lot of time on that. Even on just the build system side, we go through phases where delivering new features sometimes we make deep changes to how we build things. I'm definitely in the process of doing that right now. But on the flip side, I could imagine that we try to use osbuild internally somehow, and as long as we can have a fast turnaround time on shipping that, I think we can start that integration story. I completely agree that tighter integration long term makes sense, let's figure out how to do it in a way that helps both of us! |
Is kvm only needed for the build phase or it is also needed to run our tests? This is not something that is really clear to me know. In other words I would like to better understand the scope of our dependency to kvm and what would be (roughly) the impact and effort for us to implement a solution that does not require kvm. |
We don't strictly need kvm. That's what #2782 (comment) is touching on. |
I think it's also needed for tests. As Colin mentions we could use emulation for tests, but I think we're going to spend a lot of time chasing issues that aren't real issues. One example that comes to mind is race conditions in other software that we wouldn't hit otherwise. If we fallback to emulation we might as well just emulate ppc64le on an x86_64 builder and not worry with ppc64le hardware at all. |
Yes but to re-iterate; I have a strong opinion that we should use qemu for basic sanity testing (does this new systemd boot) like immediately after making a build - and that's easy to do if the systems are co-located. But even that's not strictly required. But we have also evolved to be too qemu focused I think. After basic sanity tests are done - and to make this concrete - on ppc64le, we generate a PowerVS image, and pass it off to a remote PowerVS instance, boot it there and run probably most of the kola tests there. |
Mmmm...I think doing some of that could make sense for pull requests in particular. It's much slower, but would in some cases indeed help us do some types of testing. Concretely, I think COPR for example is doing s390x builds via cross-architecture emulation - and that can make sense. But no matter what we are definitely going to be building and testing across all architectures. We are not going to ship something we didn't test (in a "real" deployment scenario e.g. OpenStack/PowerVS/AWS as opposed to our use of "synthetic" environments like raw qemu). We're just debating where each part happens. |
Concretely, did anyone try a spike on builds/tests using |
One thing I think we will need is to teach kola about PowerVS. Looks like there's https://github.com/IBM-Cloud/power-go-client Another variant of this: add |
If full qemu emulation doesn't work for "basic build and test scenarios" (and I'd be quite surprised if this was the case, particularly since right now we're using the qemu from Fedora which doesn't have features cut out of it), then I think the fallback here is going to look more like:
Regardless of the kvm/qemu situation, to repeat I think we clearly need PowerVS support; that can happen in parallel. (Though out of curiosity will there be any e.g. OpenStack or other remote virt APIs support for ppc64le going forward or is that out too? ) |
These are the messages that it reported during "cosa build"
|
@mrcxavier Can we get more details? How much time did the build take?
|
Just from
This seems like an unrelated bug...has this test been passing in current ppc64le pipelines?
AFAICS the output from qemu here is warnings, not fatal errors. If we got this far, ISTM we must have generated a build. So the next question is around which tests run - and how many of those tests to actually run in qemu versus delegating to powervs. |
I've been feeling for a while that we have a decent amount of virt code (think our build process that uses supermin and qemu directly) that deals with a lot of intricacies (i.e. platform differences and whatnot) that we should probably try to leverage another project for. I'm wondering (assuming libvirt has support for powerVM) if we should try to refactor our code to use libvirt instead and drop alot of the code that is going to bitrot over time and we aren't really experts in. Theoretically this would allow us to use powerVM on power and kvm everywhere else. Not sure how feasible this is. Interested in others thoughts there. |
OK first, I am getting confused between "PowerVM" and "PowerVS". I think PowerVM is a bit like KVM+qemu, and PowerVS is more like e.g. OpenStack or KubeVirt?
It's not listed here: https://libvirt.org/drivers.html I think the reason may be actually that the PowerVM code is proprietary? Anyways, to re-summarize my proposal here:
"run tests with PowerVS" requires teaching kola about powervs. Alternatively...one thing I think could work decently well is for us to add a mode to What you're talking about is really different, it's more like "use powervm everywhere we use qemu/kvm". I don't know the feasibility of that. It might indeed be the case that we could install the powervm userspace in our container, but the real next problem is we have a lot of dependencies on qemu specifically. I think a wholesale port to use libvirt would only make sense if there were a powervm backend for libvirt - as you say. |
Scenario 2
|
If it takes 8 min for a build I'd say it's perfectly fine. How long does a full kola run takes?
|
We need a solution for continuing to use
coreos-assembler
across all supported architectures where/dev/kvm
is not available, in order to support our strongly coupled build+test model that we have adopted.In the near future,
/dev/kvm
will not be available for newerppc64le
platforms and will break our ability to produce CoreOS-based builds on that platform. We may be able to continue to limp along on older versions ofppc64le
platforms, but we will eventually reach a point in time where that will no longer be possible.How will we be able to continue to support our opinionated, container-based build+test model using
coreos-assembler
in that future?The text was updated successfully, but these errors were encountered: