-
-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate power consumption estimation on public cloud virtual machines as a degraded mode #25
Comments
@bpetit did you imagine this as a kind of synthetic, 'modelled' sensor, that provides numbers based on the detected number of CPUs, RAM and so on? |
There's some work likely to be released by the German Minisitry of the environment in December that will provide some numbers that could inform this, as it's based on actual recorded energy usage figures for a known set of machines: However, I haven't seen the underlying data yet. Although, it maybe worth speaking to the folks at https://datacenterlight.ch - I know they pretty see much see openness and transparency as a one of their differentiators and might be able to share the some numbers for an basic version of this. |
Hi, First, thanks for involving in this project !
I think so. I imagine a sensor that gathers metrics about resource consumption on the machine, plus data/characteristics about bare metal machines running the VM, that the cloud provider accepts to disclose.
I'll have a look at that, thanks !
I'm definitely interested to talk with people from ungleich/datacenterlight to imagine a proof of concept. (I love their work <3) |
I'm investigating on the feasibility to feed a centralized database of processor models per cloud provider that would both benefit from and to scaphandre. If you have access to an instance on any public cloud provider, could you send the content of /proc/cpuinfo in this thread please ? With name of the cloud provider, model of the instance, content of the file ? I'd like to check that this file contains enough data for most of the cloud providers. |
@bpetit Here's what I've got on my personal VM:
|
@bpetit my cloud provider is Ikoula
|
https://www.ikoula.com/fr/cloud-public Oh, Cloud offerings seem super nice. VM Micro is basically the same specs and price than my DO droplet. Except Ikoula is French, has servers in the EU, low-carbon, etc… I've been looking for some bog-standard public Cloud like this at a competitive price for a while! Sweet. :-) |
Hi, I'll give you cpuinfo of our Bare Metal offers First is our Ultimate Performance Range: Offer: UP-BM2-XL scaleway-cpuinfo-up-bm2-xl.txt Offer: UP-BM2-M |
The second one is Bare Metal General Purpose Range Offer: GP-BM1-S & GP-BM1-M (Same CPU) scaleway-cpuinfo-gp-bm1-s_and_m.txt Offer: GP-BM1-L |
Thanks a lot @jdrouet and @florimondmanca for the data. This seems to confirms that, most of the time, on most cloud providers, instances cpuinfo contains enough data to guess the baremetal hypervisor cpu model. This is great to move forward on that feature. Hi, and thanks a lot @pydubreucq for the data about scaleway's bare metal machines. Even if this thread is more about guessing the consumption in a virtual machine without having access to the bare metal, this is valuable data. Would you have by any chance the same data for the machines that are running scalway cloud's instances ? |
Integrate power consumption estimation on public cloud virtual machines as a degraded mode I missed something :) Sorry for the noice :) I'll try to get these info for Instances. |
Scaleway DEV Range DEV1-S DEV1-M DEV1-L DEV1-XL |
Scaleway GP Range (part 1) GP1-XS |
Cloud provider : AWS
|
Cloud provider : Azure
|
Cloud provider : Ovh
|
Hi @bpetit, I'm running a CX41 in the Helsinki datacentre with Hetzner - it's got 4VCPUs, 16gb RAM, and 160gb of RAM.I've got a 1tb block storage device attached to it too. See the attached output.You can see more stats on their page for their cloud servers. You said something interesting here:
Would you elaborate a bit more here? |
thanks a lot @pydubreucq @uggla and @mrchrisadams
What I imagine so far is feeding a database of some sort with data that contains the following data and their relationships:
With what we can learn from the providers you provided data from, 1,2,3 could be easily retrievable automatically from scaphandre. Number 4. needs more work, but some part solutions already exist in publicly available data from vendors. There are also some initiatives to work on it such as one from our friends of the boavizta project (sorry the post is in french) With such a database, using both those data and the resources consumption (cpu/ram/io/...) from the virtual machines, we could get to some estimations of the power consumed (4. is really critical here, as power efficiency may vary a lot from one cpu model to another). Those are just abstract ideas for now, I'd really enjoy having your thoughts on this and even more to work to refine those ideas together :) |
Hello @bpetit, Will it be possible to have a wrap-up on this topic ? Can we draft the steps to implement this ? |
I've made a quick draw of what I imagine. We could think about the steps from then. To describe it a little bit. The structure of the data (that is imagined on the right of the draw) is a key point here, but I'll keep a macro look to imagine the concepts first. I think it would be interesting to have that data in a VCS or equivalent, to allow contributions and reviews. Therefor I imagine a local DB, embedded in scaphandre, that would be a serialized/binary version of the structured/collaborative database or repository. This is very macro and not clear, but it may enable first discussions on the topic. |
@bpetit, thanks for this first draft. Here are some notes about it.
|
I get your point but I don't think having it as a binary locally embedded in scaphandre would be an issue for versioning as the main version of the data could be in a VCS and will be the source of that tiny local DB (we could imagine building new releases of that binary DB every time there are important improvements in the centraliazed data repository). I think it's interessting to have the data in binary as it will still allow to use scaphandre as a single binary.
This is what I meant by having a central repository which is the real data, and "snapshots" of that data as a local database that is embedded in scaphandre. This way no remote communication, no risks of failure getting the data. We just need to inform the user of the snapshot version his/her version of scaphandre is using, so that he can update if needed.
There are some data on spec.org. The rest of it is on websites of each server manufacturer. Boavizta has started aggregating metrics here but it wouldn't fit our use case in the current version:
But it can still be an interesting basis, maybe we should join forces on building a more generic and complete database. There is also the work done by Teads that is of high interest. I'll discuss with the author to see how we could collaborate on building such dataset.
Actually yes. To be accurate regarding the estimation, knowing the CPU model may not be enough, for multiple reasons:
I do think we will need to start small and provide rough estimations and be clear on their lack of accurateness, and then improve. But I think it will require a bit more than just max power consumption to give interesting result. I was thinking of something like idle CPU power consumption, 50% CPU power consumption and 100% CPU power consumption. I wonder also how the CPU time consumption relative to each core allocation is important for power consumption, but this is one of the potential topics boavizta is about. So I may have data or interesting models to share at some point (all the work in this group is supposed to be open sourced, but as it is volunteer work from all the members, I cannot imagine when we could effectively build something of that order.) |
@bpetit thanks for your answers. I need to read the article from Teads that looks interesting. |
I have a meeting next Tuesday with some people interested from boavizta. Would you like to join ? |
Hi, I just wanted to present here a new source of possible errors in servers consumption estimation -because it was too easy, indeed. I found a few weeks ago this article explaining that hardware consume differently according to their fabrication. In fact, this paper shows that "Under the same load, power variation among identical system in the same rack can reach up to 7.8%". I invite you to read these two papers : They show that random and non-controlable factors (disposition of the machine in the rack, cooling, fabrication...) may have an important impact on the power consumption of the machine. What I am saying is that the cloud estimation will be based on the activity of an emulation of physical components. But this components won't consume equally according to these factors, and thus there will always be a relatively important gap between the consumption retrieved on the VM and the real power consumed by the machine... |
I recently handed over https://github.com/cloud-carbon-footprint/cloud-carbon-coefficients to the Cloud Carbon Footprint project which provides energy consumption coefficients for the various CPU architectures running on AWS, GCP and Azure. The notebook does the calculations based on the SPECpower database, then groups them by CPU so they can be injected into the main project: This then calculates the carbon footprint based on real-usage from cloud billing data. |
To give a bit of news on this, we are about to launch a collaborative database (and stress test protocol + aggregation process) of power consumption profile per hardware, with Boavizta : Energizta Which is kind of the database part of the solution that has been described in this thread. Hope to find you numerous contributing to this project ! :) |
To add on this, here is how this is estimated today in the BoaviztAPI. The Energizta project I mentioned before is about providing better data for this kind of power usage modeling. |
Problem
Until the cloud provider does install scaphandre on its hypervisors, we should enable cloud customers to estimate their power usage and thus their climate impact.
Solution
Integrate statistical models like https://github.com/etsy/cloud-jewels for GCP (look for other models) as sensors to enable using scaphandre on cloud providers even if they didn't implement (yet) a scaphandre-like solution from the hypervisor layer.
Alternatives
Implements a ratio based approached like powertop (cpu time consumed / cpu time globally consumed) in this context (VM). Mix it with provider public informations about hypervisor hosts hardware ?
Additional context
This is more thatn a feature request, it is a starting point for a wider study and discussions.
The text was updated successfully, but these errors were encountered: