Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Estimate RAM consumption profile from discrete workload data points #115

Closed
samuelrince opened this issue Aug 23, 2022 · 8 comments
Closed
Assignees

Comments

@samuelrince
Copy link
Member

samuelrince commented Aug 23, 2022

Problem

Memory (RAM) components electrical consumption depends on the workload. We want to estimate the following function:

$$ CP_{ram} : \mathnormal{workload} \in [0,1] \mapsto \mathnormal{power} \in \mathbb{R^{+}} $$

In our context we can assess the following hypotheses:

  • The underlying function is a strictly positive smooth function
  • The function can be approached using standard regression analysis techniques

This issue is related to #86 where we solved the same problem for CPU consumption profiles.

Solution

Implement a new consumption profile model dedicated to RAM components that will infer the parameters of the consumption profile function described above. It should take into account any relevant features to estimate these parameters.

Possible features:

  • Workload data points
{
    "load_percentage": 50,
    "power_watt": 80
}
  • Categorical features (RAM manufacture, RAM launch year, RAM frequency, etc.)

Using workload data, we can apply a regression algorithm using log-like functions for instance. The categorical features can help select fine-tuned initial parameters for the regression. If no workload is provided, we can use the fine-tuned model as an output.

Additional context or elements

@samuelrince
Copy link
Member Author

After some more digging on the data here is what I found out.

First, consumption profiles from the "same type" of instances look similar (at least for c5 instances) so it is a good information to collect.

Screen Shot 2022-09-01 at 7 19 00 PM

Second, consumption profiles from different servers with the same memory capacity also look similar.

Screen Shot 2022-09-01 at 7 21 25 PM

Screen Shot 2022-09-01 at 7 21 10 PM

On the last one there is roughly 40 Watts of difference between the max and min.

There are consumption profiles that are almost constant given any workload:

  • n2.xlarge.x86 at ±100W
  • c3.small.x86 at ±2.95W
  • GP-BM1-S at ±2.65 W

Are these outliers?

We also see a big difference between using the CPU stress test and other types of stress tests (that put more pressure on the memory).

product_name ramwatt_cpu_stress_100 ramwatt_vmstress_100 diff_vmstress_percent ramwatt_maximize_100 diff_maximize_percent
c5n.metal 85.0 153.0 +80% 169.0 +99%
c5.metal 90.0 210.0 +133% 210.0 +133%
c5.metal* 94.0 214.0 +128% 210.0 +123%
r5.metal 277.0 510.0 +84% 510.0 +84%
m5.metal 132.0 360.0 +173% 384.0 +191%
z1d.metal 111.0 256.0 +131% 230.0 +107%
m5zn.metal 89.0 184.0 +107% 160.0 +80%
i3.metal 22.0 52.0 +136% 57.0 +159%
GP-BM1-S 2.6 5.4 +108% 3.8 +46%
HC-BM1-XS 28.0 68.0 +143% 62.0 +121%
HC-BM1-L 104.0 216.0 +108% 196.0 +88%
Lenovo ST550 72.0 131.0 +82% 108.0 +50%
c3.small.x86 3.0 6.1 +103% 5.4 +80%
s3.xlarge.x86 55.0 142.0 +158% 110.0 +100%
n2.xlarge.x86 99.0 173.0 +75% 230.0 +132%
2xIntelGold6230R 43.0 151.0 +251% 143.0 +233%

This makes me question the relevance of estimating the RAM consumption profile from CPU workload. I guess this may work most of the time for average processing workloads, but if we use a server for a more specific type workload like databases or VMs the profile we will estimate will be way of the underelying reality.

So at a first glance we can say that:

  • Estimating the consumption profile of the memory will depend on the quantity of memory installed. (This is obvious but good to notice, I guess)
  • Similar cloud instances can have similar memory consumption profiles. (c5 vs c5n)
  • CPU workload can be a good enough variable to guess the memory consumption, though it might be wrong for specific applications that are very memory intensive.

Also I haven't found a lot of data on the memory itself (manufacturer, launch year, etc.). There are some information in the spreadsheet but not enough to estimate anything based on these potential variables.

@github-benjamin-davy I haven't found anything on the type of memory bank used in cloud instances so if you have information on that I am interested.

@samuelrince
Copy link
Member Author

samuelrince commented Sep 1, 2022

Some graphs on the potential outliers:

  • n2.xlarge.x86 in golden:

Consumption at 0% is weirdly high compared to others.

Screen Shot 2022-09-01 at 8 07 43 PM

  • c3.small.x86 and GP-BM1-S

Screen Shot 2022-09-01 at 8 10 30 PM

@github-benjamin-davy
Copy link
Collaborator

Hello @samuelrince thanks a lot for this work! I fully agree with your conclusions, memory consumption will vary a lot depending on the type of workload and there is some form of efficiency (consumption per GB of memory) with newer machines with dense memory DIMMs.
The outliers we see could be related to architectures that didn't properly support RAM consumption reporting with RAPL so on my side I didn't consider them. As well I think that the idle measurement might have some limitations with RAPL (@bpetit do you have feedback on this?). We would need more measurements on other hardware ideally and especially with wattmeters.
Regarding the memory bank info for cloud hardware I started to collect them on the spreadsheet you linked using the dmidecode -t memory command on bare metal machines. The number of DIMMs should be consistent per generation and memory quantity I guess.

@da-ekchajzer
Copy link
Collaborator

Thank you for your work @samuelrince and your feedback @github-benjamin-davy.

@samuelrince : Do you think that we could implement a first "dump" consumption profile for RAM that generate/use a profile only based on the RAM quantity (based on the RAM stress test ?). Or at least a fix factor proportional to the ram quantity ?

@github-benjamin-davy : To collect more measurement on a variety of hardware, we should let the community conduct the work that you have done on their own servers. Why not adapting your code during a hackathon to write in an open database ?

I was thinking that the feature we are developing could interest different international organization such as cloud carbon footprint, the SDIA and the GSF. They could also make a call to their own communities.

@github-benjamin-davy
Copy link
Collaborator

Great idea @da-ekchajzer! Centralizing these measures and having some dataviz to compare several tests could be helpful as well. Ideally, it would be also nice to add AMD machines support.

@samuelrince
Copy link
Member Author

@samuelrince : Do you think that we could implement a first "dump" consumption profile for RAM that generate/use a profile only based on the RAM quantity (based on the RAM stress test ?). Or at least a fix factor proportional to the ram quantity ?

Thanks @da-ekchajzer, I'll make a prototype as you said and in the future if we gather more data on RAM consumption profiles (with categorical features) we will update the feature accordingly.

@samuelrince
Copy link
Member Author

According to what we've said, I've finished the implementation of RAM consumption profile as a simple constant function (independent from the workload). The model is determined using RAM capacity only. Let's review this feature when we have more data !

@da-ekchajzer
Copy link
Collaborator

Can we close this issue @samuelrince ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants