Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSoC: Distributed error reporting #12489

Open
wants to merge 45 commits into
base: develop
Choose a base branch
from
Open

Conversation

akolson
Copy link
Member

@akolson akolson commented Jul 25, 2024

Summary

This pr implements the distributed error reporting feature as part of the Google Summer of Code(GSoC) 2024 program. See the epic for details.

References

Closes #12214

Reviewer guidance

All tests associated with the implementation must run successfully


Testing checklist

  • Contributor has fully tested the PR manually
  • If there are any front-end changes, before/after screenshots are included
  • Critical user journeys are covered by Gherkin stories
  • Critical and brittle code paths are covered by unit tests

PR process

  • PR has the correct target branch and milestone
  • PR has 'needs review' or 'work-in-progress' label
  • If PR is ready for review, a reviewer has been added. (Don't use 'Assignees')
  • If this is an important user-facing change, PR or related issue has a 'changelog' label
  • If this includes an internal dependency change, a link to the diff is provided

Reviewer checklist

  • Automated test coverage is satisfactory
  • PR is fully functional
  • PR has been tested for accessibility regressions
  • External dependency files were updated if necessary (yarn and pip)
  • Documentation is updated
  • Contributor is in AUTHORS.md

@akolson akolson added DEV: backend Python, databases, networking, filesystem... gsoc A GSoC project task labels Jul 25, 2024
@akolson akolson added this to the Distributed Error Reporting milestone Jul 25, 2024
Copy link
Contributor

github-actions bot commented Jul 25, 2024

Build Artifacts

@akolson akolson changed the title Distributed error reporting GSoC: Distributed error reporting Jul 29, 2024
Copy link
Member

@bjester bjester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main blockers: we should add more logic to remove parameters like passwords from request data, and we should have request timeouts configured on the report requests

},
};

Resource.client = jest.fn();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, whenever you mock something in Python or JS, you want to ensure that the original implementation can be restored after the test is completed. That approach can keep tests from interfering with other tests, because of their use of mocks.

Since this is a direct replace of Resource.client, there isn't a way for it to be restored. So it would be better to use mock.spyOn or mock.replaceProperty here, and do so in the beforeEach. Then instead of clearAllMocks in afterEach (which only clears the mock state), I would suggest using restoreAllMocks as that would ensure any mocks are restored to what they should be (assuming the appropriate approach was used to create the mock in the first place).

height: window.screen.height,
available_width: window.screen.availWidth,
available_height: window.screen.availHeight,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was discussion about using the screen size breakpoints instead of the actual width and height. Is that the case, because it doesn't look like it? The reason is that it protects privacy. Specific sizes can be used to identify users, which reduces the anonymity of the data

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should definitely make this update. Although I am also noticing that this file shouldn't exist, because it has been moved into the plugin to make this behaviour pluggable.

request_headers.pop("Cookie", None)

request_get = dict(request.GET)
request_get.pop("token", None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, probably for POST, we should ensure passwords are not sent?

error_report.context = context

error_report.save()
logger.error("ErrorReports: Database updated.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels more like info-type logging?

ping_once(started, server=server)
pingback_id = ping_once(started, server=server)
if pingback_id:
ping_error_reports.enqueue(args=(server, pingback_id))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is creating two different pathways that hinges on the pingback_id. In utils.py, there already exists logic dependent on if "id" in data:, which is the same condition here. It seems like this fits alongside the existing logic there.

Comment on lines 81 to 95
Vue.config.errorHandler = function (err, vm) {
logging.error(`Unexpected Error: ${err}`);
const error = new VueErrorReport(err, vm);
ErrorReportResource.report(error);
};

window.addEventListener('error', e => {
logging.error(`Unexpected Error: ${e.error}`);
const error = new JavascriptErrorReport(e);
ErrorReportResource.report(error);
});

window.addEventListener('unhandledrejection', event => {
event.preventDefault();
logging.error(`Unhandled Rejection: ${event.reason}`);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the unhandledrejection listener will prevent default logging of the error, so in regards to that and the other logging statements, I'm concerned whether these are suppressing necessary log information, i.e. a stack trace, that developers would need? If logging.error outputs a stack trace, that may not be the same trace as the error itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to check whether we are in a prod or dev env in the frontend? If yes then we can skip the preventDefault and use logging.debug instead when in dev mode

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, as error_reports is a plugin now, can we unplug it by default in dev environment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unplugging the plugin sounds fine for development. Although, having a proper stack trace logged to the console is helpful regardless of environment. On line 40 of this file, you can see that the it configures the logger based off environment.

What I'm trying to point out is if logging.error eventually feeds into console.error, the stack trace it prints would likely be different for:

`Unhandled Rejection: ${event.reason}`

then it would be for:

event.reason

assuming event.reason is an Error object.

join_url(server, "/api/v1/errors/report/"),
data=errors_json,
headers={"Content-Type": "application/json"},
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is using raw python requests, lets ensure this has explicit timeouts configured, and ideally separate timeouts for connection vs request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use NetworkClient here ? The other ping-back task already uses NetworkClient

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah if it's compatible, using the NetworkClient is just fine!

@rtibbles rtibbles added the DEV: Core JS API Changes related to, or to the Core JS API label Nov 5, 2024
@thesujai
Copy link
Contributor

I am just seeing the reviews here

@rtibbles rtibbles force-pushed the distributed-error-reporting branch from 0c82a59 to 1d90422 Compare March 10, 2025 21:56
@rtibbles rtibbles force-pushed the distributed-error-reporting branch from 716e662 to b5ba36f Compare March 11, 2025 00:23
@rtibbles rtibbles force-pushed the distributed-error-reporting branch from d408e96 to 8b3b22a Compare March 11, 2025 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DEV: backend Python, databases, networking, filesystem... DEV: Core JS API Changes related to, or to the Core JS API DEV: frontend gsoc A GSoC project task
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Distributed error reporting
5 participants