Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix long execution times caused by isAdRelated #330

Merged
merged 2 commits into from
Feb 18, 2022

Conversation

tollmanz
Copy link
Contributor

@tollmanz tollmanz commented Feb 17, 2022

Problem

Running Lighthouse tests with the publisher-ads-lighthouse-plugin increases runtime significantly.

Via the CLI on an M1 MacBook Pro, the following increase in runtime was observed for the following sites

There is a relationship between the number of resources loaded on a page and the increase in execution time.

This issue was initially detected when adding the plugin to a suite of tests evaluating ~160 different domains. Average Lighthouse run time increased from 20s to 34s.

Reproduction Steps

  • Environment (CLI or web service): cli and node package
  • Version: 1.5.5, Lighthouse 9.2.0 and 9.3.1
  • URL: https://www.cnn.com
  • Settings: n/a

To reproduce:

  1. Run

    lighthouse https://www.cnn.com --plugins=lighthouse-plugin-publisher-ads --chrome-flags="--headless"
    ...
    LH:status Auditing: Cumulative ad shift +2ms
    LH:status Auditing: Deprecated GPT API Usage +8s
    LH:status Auditing: GPT Errors +1ms
    LH:status Auditing: Total ad JS blocking time +1ms
    LH:status Generating results... +2ms
    
  2. Note the 8s time attributed (incorrectly) to Auditing: Deprecated GPT API Usage

Solution

Through profiling and manually removing audits, the audit causing this issue is "Cumulative ad shift", not "Deprecated GPT API Usage" as the output suggests. To demonstrate this, comment out the following two lines in plugin.js and run the reproduction steps above:

{path: `${PLUGIN_PATH}/audits/cumulative-ad-shift`},

{id: 'cumulative-ad-shift', weight: 5, group: 'metrics'},

The performance issue will disappear, implicating the "Cumulative ad shift" audit as the source of the problem.

Generating a flame chart using clinic will clearly show the issue:

npx clinic flame -- node $(which lighthouse) https://www.cnn.com --plugins=lighthouse-plugin-publisher-ads --chrome-flags="--headless"

Screen Shot 2022-02-16 at 9 29 20 PM

The function taking a lot of time is onParseError from Node. This function is called when toURL receives an undefined value. In the try statement, the value is passed to new URL() and throws. Apparently, parsing and throwing this error is expensive.

In my testing, I console logged url in toURL and found that it is undefined up to ~1 million times when testing a site that loads a lot (e.g., 500) of resources.

The function's comment states:

This function is guaranteed not to throw. Be sure to validate URL format
before calling this method if needed.

As such, my fix in this PR is to validate the URL before it ever gets to toURL in isAdRelated. isAdRelated seems to be the root of the majority of the toURL calls. As such, this PR changes the function to return false whenever it receives falsey url value. If it's falsey, there is no way this is ad related so it makes sense that all of the other logic in this method can be skipped.

The fix in this PR reduces the runtime dramatically (8s to 104ms):

lighthouse https://www.cnn.com --plugins=lighthouse-plugin-publisher-ads --chrome-flags="--headless"
LH:status Auditing: Cumulative ad shift +1ms
LH:status Auditing: Deprecated GPT API Usage +104ms
LH:status Auditing: GPT Errors +1ms
LH:status Auditing: Total ad JS blocking time +1ms
LH:status Generating results... +1ms

Note that no tests were previously written for isAdRelated. This PR adds those tests and attempts to evaluate each branch. The tests pass before and after applying the patch, except for the tests of the undefined url that are handled with the patch.

@tollmanz tollmanz changed the title Return early when URL is not available in isAdRelated to fix performance issue Fix long execution times caused by isAdRelated Feb 17, 2022
@jburger424 jburger424 self-requested a review February 17, 2022 14:55
@tollmanz
Copy link
Contributor Author

Thanks for running the workflows! I'll review and attempt to fix the issues with the Smoke Tests. I can reproduce it locally so looks like it caught a legit problem.

It seems that the Lighthouse CI tests tend to fail, but I'll still look into it and see if I can resolve it.

@tollmanz
Copy link
Contributor Author

I've executed the smoke tests locally with and without the fix and I get the same failures. Seems that something else may have previously introduced a regression in these tests. I've not had time to take a closer look, but should be able to later today.

@tollmanz
Copy link
Contributor Author

tollmanz commented Feb 17, 2022

After spending some time with this, I'm seeing the following:

  • cumulative-ad-shift-0 is failing its initial attempts on master and this branch regularly, but eventually, the 2nd or 3rd attempt does pass. It feels like a race condition. I'm noticing that the first few tries at this test generally fail. The Chrome browser launches, loads the page and quickly exits. When the tests pass, Chrome stays open for a few seconds longer. I'm not sure if there is some condition that the test is waiting for before collecting data and closing the browser, but it seems like it, especially because the errors tend to indicate that there are fewer shifts than expected even when I can visually see the expected errors
  • gpt-errors-overall-0 is failing on master and this branch consistently. The output suggests that a console warning should be thrown to indicate that an invalid size was requested. The actual output shows no console messages were thrown. When running the page locally, I see no console messages. I'm thinking that this could be due to a change in the GPT library

@jburger424
Copy link
Contributor

Thanks for the contribution Zack! Really appreciate the time you spent putting together such a thorough analysis and finding such a simple fix :) This is really quite an impressive delta in time!

It does appear that the failures are not due to your test, I'll update the test assertions later today to unblock this.

We definitely want to credit you in the release notes for this valuable fix, please confirm that that would be ok.

@tollmanz
Copy link
Contributor Author

It does appear that the failures are not due to your test, I'll update the test assertions later today to unblock this.

Excellent! We are excited to get this running in our test suite.

We definitely want to credit you in the release notes for this valuable fix, please confirm that that would be ok.

Definitely ok! Thanks!

@jburger424
Copy link
Contributor

Updated the tests in #333, I went ahead and merged the changes in here.

@jburger424 jburger424 merged commit 4ff793c into googleads:master Feb 18, 2022
@tollmanz
Copy link
Contributor Author

Thank you for moving this along so quickly @jburger424!

@josephrussell-cn
Copy link

@jburger424 thanks for merging this PR. We are looking to integrate the publisher ads plugin into our lighthouse workflow and would like for it to include these performance improvements as it is a non-trivial impact on the execution time without it. When do you anticipate a new version will be published to npm that includes this change? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants