Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way to generate function based on punctual consumption profile #86

Closed
Tracked by #85
da-ekchajzer opened this issue May 11, 2022 · 12 comments
Closed
Tracked by #85
Labels
help wanted Extra attention is needed methodology
Milestone

Comments

@da-ekchajzer
Copy link
Collaborator

da-ekchajzer commented May 11, 2022

Problem

A consumption profile is a function which associate a workload to an electrical_consumption : consumption_profil(workload) = electrical_consumption

This continues function will be generated from punctual measures on different workload for a given configuration. The punctual measure could come from our measurement or from secondary sources.

We want to provide a way to generate continuous consumption profiles (function) from those punctual measures. Such process could be used for device or components usage impacts evaluation.

Solution

We should set up a regression process. We call regression the process of defining a continuous relationship (function) between workload and electrical_consumption based on a punctual measurement.

Regression shouldn't be linear. From what we have seen, consumption profile follow a logarithmic rule.

This might be a problem when only two points are given (min 0%, max 100% for instance) since we don't want a linear distribution. We could use existing consumption profile in the regression process.

Input value

Format

We should have this type of input format :

"workload":{
  "10":30.6,
  "20":34,
  "56":67,
  ...
  "workload %": "power_consumption"
}

Data example

Example for AWS server CPU from TEADS Link :

PkgWatt Idle (0%) PkgWatt CPUStress 10% PkgWatt CPUStress 20% PkgWatt CPUStress 30% PkgWatt CPUStress 40% PkgWatt CPUStress 50% PkgWatt CPUStress 60% PkgWatt CPUStress 70% PkgWatt CPUStress 80% PkgWatt CPUStress 90% PkgWatt CPUStress 100%
51 135 174 212 249 293 330 357 382 404 413
113 146 194 225 244 263 295 295 311 333 387
58 176 241 299 375 448 520 562 590 607 617
59 174 243 299 372 441 522 564 592 606 617
116 148 188 205 222 258 240 277 284 287 346
55 138 178 224 272 307 344 375 401 426 440
110 127 150 188 214 224 224 241 244 240 287
58 147 193 246 298 344 381 417 453 481 492
48 148 198 245 305 361 389 413 447 471 480
57 147 212 270 331 381 417 453 481 513 513
35 100 126 152 178 205 223 238 250 263 272
2 10 16 17 15 15 15 38 40 42 41
26 65 83 96 109 117 122 127 130 134 133
50 98 120 141 159 172 181 190 195 204 207
28 57 88 110 125 136 146 151 156 157 168
32 62 79 91 103 112 120 127 134 138 145
40 64 77 86 92 98 104 107 110 114 120
2 22 41 59 44 41 39 47 90 82 85
17 68 93 106 117 128 135 140 146 150 154
71 107 132 154 167 179 184 195 199 204 204
38 113 134 160 178 203 221 234 243 247 252

Example from specpower agregated by Cloud carbon footprint : Link

Architecture Min Watts (0%) Max Watts (100%)
Skylake 0.6446044454253452 4.193436438541878
Broadwell 0.7128342245989304 3.6853275401069516
Haswell 1.9005681818181814 6.012910353535353
EPYC 2nd Gen 0.4742621527777778 1.6929615162037037
Cascade Lake 0.6389493581523519 3.9673047343937564
EPYC 3rd Gen 0.44538981119791665 2.0193277994791665
Ivy Bridge 3.0369270833333335 8.248611111111112
Sandy Bridge 2.1694411458333334 8.575357663690477

Output value

A function described by its coefficients

@da-ekchajzer
Copy link
Collaborator Author

@samuelrince I would be interested in your opinion.

@samuelrince
Copy link
Member

samuelrince commented May 14, 2022

I have deep dived a little bit more using the original spreadsheet. Looks like making one logarithm-like function model per CPU "model" or "family" (Platinum, Gold, Silver, etc.) could be a good idea?

Platinum

image

image

Here the red one is the line 6 from the spreadsheet, corresponding to c5.metal*. I don't know what the star means here neither why the red curve stands out that much. Looks like it should be on the other graph.

Gold

image

Silver

image

E, E3, E5

image

@da-ekchajzer, let me known what you think of this approach. I guess if can have the CPU model name we will have a better estimation of what the power consumption profile should look like for new CPU models.

@samuelrince
Copy link
Member

Also I don't understand the second table, it's not the power consumption per CPU architecture right?

Architecture Min Watts (0%) Max Watts (100%)
Skylake 0.6446044454253452 4.193436438541878
Broadwell 0.7128342245989304 3.6853275401069516
Haswell 1.9005681818181814 6.012910353535353
EPYC 2nd Gen 0.4742621527777778 1.6929615162037037
Cascade Lake 0.6389493581523519 3.9673047343937564
EPYC 3rd Gen 0.44538981119791665 2.0193277994791665
Ivy Bridge 3.0369270833333335 8.248611111111112
Sandy Bridge 2.1694411458333334 8.575357663690477

@da-ekchajzer
Copy link
Collaborator Author

da-ekchajzer commented May 14, 2022

The second table represents the medium consumption of a server depending on the CPU family (also called Architecture).

My idea is to generate server consumption profiles at first per CPU family, until we gather data on specific CPU (and other components) to make consumption profile based on more precise data (number of core, CPU model, …).

But either way we should come up with a generic way of generating a consumption profile from an workload object I mention above. As I saw in your graph you use a linear approach to connect a point with its successor. IMHO the approach is limited :

When few points are given (in the case of the second table for instance) the consumption profile will be an affine function

Example for Skylake family

"workload":{
"0":0.6446044454253452,
"100": 4.193436438541878
}
consumption_profil(x) =  ((100 - 0) / (4.193436438541878 - 0.6446044454253452)) * x + 0.6446044454253452 

Yet, we know the consumption profile is not linear.

Besides, with this approach, we won't come up with a function defined by its coefficient but with a set of affine function connecting a point to another.

Cloud carbon footprint uses an affine equation (as saw above) to come up with an average watts' consumption
Average Watts (x) = Min Watts + x * (Max Watts - Min Watts)

I think with the AWS data from teads and the future data we'll have we can be more ambitious and generate more precise functions.

What do you think of using a logarithmic regression (which I am not familiar with) based on the workload object and previous consumption_profil ?

Does this make sense to you ?

@da-ekchajzer da-ekchajzer added help wanted Extra attention is needed methodology labels May 14, 2022
@samuelrince
Copy link
Member

I think a logarithmic function is good enough to model the CPU power consumption given the workload, and I understand that often we will only have the min (idle) and max (100%) power consumption.

The idea is to have a more precise model if we can have access to the CPU model. Precisely in Intel case, if we know the CPU is either Xeon Platinum, Xeon Gold, Xeon Silver we can use a different base model to compute the "Final model", based on min and max power consumption.

Here is an example:

We receive from the API, both the CPU model and workload as following:

{
  "cpu": {
    "model": "Intel Xeon Platinum 8124M"
  },
  "workload": {
    0: 51       <<< Power consumption in idle state (in W)
    100: 413    <<< Power consumption in 100% state (in W)
  }
}

(Maybe not the actual json fields here)

Given that we know it is a Xeon Platinum CPU and we can use a more precise model previously fitted on Xeon Platinum CPU data only, see the following:

image

Here the white curve called "Platinum model" is a power consumption model inferred from all power consumption curves for Xeon Platinum CPUs.

We can then build a second model called "Final model" using the "Platinum model" and min and max power consumption values. We build the following model in pink:

image

(In blue it is the actual cpu power consumption model)

In the case where we don't have the CPU model, but only min/max workload, we can use a default model (still a log function) built from the whole power consumption datasets. This method will give less precise values but still better than an affine function.

The log function I use to fit in the data is:

power_consumption(workload) = a * ln(b * (workload + c)) + d

I hope it is more clear what my idea is. Let me know if you think that it can be useful or if it is totally overkill.

@da-ekchajzer
Copy link
Collaborator Author

da-ekchajzer commented May 14, 2022

It is exactly what I was thinking but couldn't explain it so clearly. Thank you.

I think we should work with CPU family (architecture) / core number rather than the commercial naming (Xeon, …) for several reasons :

  • There are fewer architectures which make it easyer to have exhaustive data
  • We already implemented a first classification per CPU family and core number
  • Soon will be able to automatically smart-complete CPU family from commercial naming (Smart complete CPU based on CPU name #82)
  • 2 commercial naming can be given to the same CPU depending on its usage (server, laptop, …)

Could you explain the process with the equations to ease the implementation part. For example how do you define a,b,c,d in your equation ?
Also, could we apply this mechanism when more than 2 values are given (0%, 50%, 100% for instance) ? Does it make sense ?

process summary

1 - Input data

{
"cpu" :{
  "family":"skylake",
  "nb_core":8
}
"workload":{
    0: 51,       <<< Power consumption in idle state (in W)
    50: 293,       <<< Power consumption in 50% state (in W)
    100: 413    <<< Power consumption in 100% state (in W)
}
}

2 - Look for equivalent consumption profile

If exist : Search for equivalent consumption profile with same family and core_number
else if exist : Search for equivalent consumption profile with same family
else : Use default consumption profile and go to (4)

3 - Infer the consumption profile for the current type of CPU

⇒ What magic are you doing here ?

4 - Generate the consumption profile equation from 1) the inferred curve and 2) the input data
⇒ What magic are you doing here ?

power_consumption(workload) = a * ln(b * (workload + c)) + d

@samuelrince
Copy link
Member

Could you explain the process with the equations to ease the implementation part. For example how do you define a,b,c,d in your equation ?

The implementation is really easy, it is just using scipy.optimize.curve_fit function to create all the previous models. Basically, it is an optimization problem where we try to fit a function (power_consumption(workload) = a * ln(b * (workload + c)) + d) to some data points. If we have multiple data points, we can just fit one model per CPU and then merge all the models into one, by averaging the parameters (a, b, c, d) of the models. I have only set the following constraints to parameters:

  • a > 0
  • b > 0
  • c > 0

The optimization done in curve_fit to find a, b, c and d is least squares approximation.

I can provide you the POC in a notebook if you want? (I have to clean it a bit first).

Also, could we apply this mechanism when more than 2 values are given (0%, 50%, 100% for instance) ? Does it make sense ?

The optimization process described above can work with 2 or more values. With more values we can expect a higher precision. Depending on the number of data points, we have it can be useful to start the optimization process from a base model (like the Platinum model) in that way we start with parameters that are already defined and we can just "try to shift the curve" until it meets min workload and max workload for instance.
But I think that if we have 3 or more data points in input we don't need that first step as the model we try to fit is very simple and regular.

I think we should work with CPU family (architecture) / core number rather than the commercial naming (Xeon, …) for several reasons :
...

I have tried to put the family (or architecture?) in front of each CPU, tell me if you see an error, but I think it is OK. It gives me that:

CPU model CPU family
Intel Xeon E-2278G Coffee Lake
Intel Xeon E3 1240v6 Sandy Bridge
Intel Xeon E5-2660 Sandy Bridge
Intel Xeon E5-2686 v4 Broadwell
Intel Xeon Gold 5120 Skylake
Intel Xeon Gold 5218 Cascade Lake
Intel Xeon Gold 6230R Cascade Lake
Intel Xeon Platinum 8124M Skylake
Intel Xeon Platinum 8151 Skylake
Intel Xeon Platinum 8175M Skylake
Intel Xeon Platinum 8252C Cascade Lake
Intel Xeon Platinum 8259CL Cascade Lake
Intel Xeon Platinum 8275CL Cascade Lake
Intel Xeon Silver 4110 Skylake
Intel Xeon Silver 4114 Skylake
Intel Xeon Silver 4210R Cascade Lake
Intel Xeon Silver 4214 Cascade Lake

(I've removed the ones with * for now because I don't understand why they look so weird on the graphs...)

Given that classification I can plot all CPU power consumption curves per family.

image

image

image

Only one CPU for both Coffee Lake and Broadwell so I haven't plotted them.

You see that on each graph we clearly have different CPU profiles even though they are from the same family/architecture. And on the first 2 graphs we can see that the Platinum ones are always close together at the top, then Gold ones, and then Silver ones at the bottom.

That is why I first grouped them by CPU "model" (Platinum, Gold, Silver, E3, E5, E) because when you plot them together there profiles look very similar even though they are not from the same family/architecture or launch year.

If we take into account the number of CPU cores in addition of the CPU architecture it is still not really satisfying:

image

image

image

(The number after the CPU full name is the number of cores, e.g. "Intel Xeon Platinum 8275CL 24" cores)

You have CPU with less cores over CPU with more cores and vice versa.

Let me know of what you think, maybe it is a subject to discuss in a meeting? But in the end, at this stage I am only convinced by grouping CPUs by their "model". Of course, if we have more data we can then consider architecture and number of cores, but only within the same CPU model group.

@da-ekchajzer
Copy link
Collaborator Author

da-ekchajzer commented May 15, 2022

Thank you for the explanations.

From your work it seems very clear that the CPU model is the best strategy. As you mentioned it would be nice to find data on other CPU (AMD for instance) to validate this.

@github-benjamin-davy since this strategy is based on your data I think your opinion would be precious.

I think a Jupyter notebook is a good input for the implementation if you can provide it.

I thought we could begin to implement it as a route (POST /cpu/consumption_profil) which takes a CPU object and a workload object and returns the coefficient a, b, c of the function.

The usage of the consumption profile will be implemented in #87 and #88

@da-ekchajzer
Copy link
Collaborator Author

It makes me think that we should add a model CPU attribute :

  • family (skylake, rome, …)
  • model (Ryzen 3, Xeon Gold, Xeon E, Core i7, …)
  • name (Xeon Platinum 8153, ...)

I will modify #82 to make it possible to complete family and model from cpu name.

@github-benjamin-davy
Copy link
Collaborator

Hello here, I'll try to catch up on the discussion,

@samuelrince the * is simply used as a way to exclude some lines from the VLOOKUP in the spreadsheet (some lines refer to underclocked machines).

I would say that the most essential characteristic of the CPU is its TDP which should most of the time be close to the max consumption from what I've experienced (however two CPUs with the same TDP might not have the exact same behavior). As you have seen, CPUs from the same family can have very different power consumption for the same number of cores (depends on voltage & frequency).

@samuelrince
Copy link
Member

Hey @da-ekchajzer you can take a look at this notebook as a working implementation.

POC_cpu_workload_power_consumption.zip

@da-ekchajzer
Copy link
Collaborator Author

Implemented as a router for CPU in #113

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed methodology
Projects
None yet
Development

No branches or pull requests

3 participants