Capture line number for code frames #7786

milesgranger · 2023-04-18T08:19:17Z

Tests added / passed
Passes pre-commit run --all-files

github-actions · 2023-04-18T09:20:38Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      19 files +      15       19 suites +15 9h 58m 43s ⏱️ + 8h 45m 44s
  3 642 tests +    426   3 532 ✔️ +    549   108 💤 -   125 2 ❌ +2
32 892 runs +26 561 31 213 ✔️ +25 493 1 677 💤 +1 066 2 ❌ +2

For more details on these failures, see this check.

Results for commit b913b59. ± Comparison against base commit ff6327f.

♻️ This comment has been updated with latest results.

jrbourbeau

Thanks @milesgranger

distributed/client.py

milesgranger · 2023-05-08T09:02:11Z

@ntabris IIRC, you had mentioned you wanted to check out how this would affect the platform? I'm reasonably sure, as it is now, it shouldn't affect much of anything. All callers immediately drop the use of the line number and only use the src code lines as before so I think it shouldn't have any side effects.

I was thinking this would be strictly implementing capturing the line numbers, then in a later PR actually making use of them in.

ntabris · 2023-05-08T12:22:31Z

Okay, I didn't realize that. Doing this in multiple steps makes it unclear to me when review of design should happen. Would code from this PR be in scope when reviewing the PR which makes use of this code? (What's the reason for doing this in multiple PR? It seems like a single unit from design and UX perspective.)

milesgranger · 2023-05-08T12:26:27Z

What's the reason for doing this in multiple PR?

Because the original issue seemed to be, in my interpretation, a feature for the platform indicating the line number; and I'm not sure where that would be. If I'm mistaken, I'd be happy to do this in one; just need a bit of guidance on where the indicator would go, in which caller.

shughes-uk · 2023-05-08T19:12:03Z

@milesgranger i'm a bit clueless to the internals of dask but here is where where we are capturing code frames and sending them to the platform

https://github.com/coiled/platform/blob/0b6bfe34da7a2b1e1b68a77ef4103a14e500bef6/analytics/preload_scripts/telemetry.py#L402-L464

We run this as a scheduler plugin on clusters. We just need to know when a PR goes in that would break that code (or when it might need an update)

milesgranger · 2023-05-09T06:49:59Z

Thanks @shughes-uk, seems like we'd be in the first situation, this PR would break that code. As code would no longer have the type Tuple[str, ...] but Tuple[SourceCode, ...] where SourceCode is a namedtuple with the original 'code' and 'line_number':

Alternatively, I suppose we could update the sending to scheduler to have another line_number key that when zipped with code are aligned if you really wanted to avoid breaking anything on your end.

milesgranger · 2023-05-15T09:52:55Z

Okay @shughes-uk, I've updated it to send the line number, so as it stands now, this patch would break the code you've linked to.

jacobtomlinson

Oooh would this be useful for adding line numbers to performance reports?

hendrikmakait · 2023-05-17T12:12:28Z

I am wondering whether we should capture the line number within the file - line 7174 in test_client.py - or the line number within the frame - line 5 in

def test_computation_object_code_dask_compute(client): 
    ...

From what I understand, in this PR we capture the former, which might not be very useful without capturing the source file (name). It might also not suffice if one wanted to add indicators to the calling line. @shughes-uk: What were you thinking about when creating #7779? Similarly, @jacobtomlinson: What exactly would you be looking for to add line numbers to performance reports?

ntabris · 2023-05-17T12:33:58Z

distributed/client.py

@@ -2992,7 +2994,7 @@ def run(
    @staticmethod
    def _get_computation_code(
        stacklevel: int | None = None, nframes: int = 1
-    ) -> tuple[str, ...]:
+    ) -> tuple[SourceCode, ...]:


Assuming we want to use this data, whatever type we return here is going to be what gets passed to update_graph

distributed/distributed/scheduler.py

Line 4269 in fbab38a

code: tuple[str] | None = None,

and used as Computation.code, which then needs to be read/parsed/used by the analytics plugin for this to be useful on the Coiled platform side.

Given that it's a pain to coordinate schema changes and that tuples don't give a lot of flexibility, what do you think of switching here to a dict? We're making a breaking change (from str to something else) anyway, so why not make it to something more flexible so that in the future we can avoid breaking changes. From my perspective this would make things easier to maintain.

Is it not viable to import the SourceCode obj on the platform side? AFAIK, a schema change in the dict would be just as inconvenient as a change in a namedtuple.

Aside from really not liking to pass around generic dicts, it would also require a change in Computation.code from being a SotrtedSet to something else since dict isn't hashable.

Hm, I guess the plugin could import SourceCode to see what's available. Things are more complicated because plugin has to be fairly agnostic about dask/distributed version but this should work (with a fallback to current way of interpreting code when SourceCode isn't importable).

jacobtomlinson · 2023-05-18T12:28:07Z

@hendrikmakait taking this performance report as an example there is a code snippet at the bottom. It would be nice if it included the file name and line numbers for where that snippet came from. I was curious if this change gets us part of the way to implementing that.

milesgranger · 2023-05-22T08:06:03Z

Thanks for the review @hendrikmakait I've updated to make line number reference the current source code/frame 75cfdcb, give a holler if this isn't what you had in mind.

hendrikmakait · 2023-05-22T15:39:45Z

To summarize an offline discussion with @ntabris and @milesgranger:

We care about knowing what line number triggered a computation/submission within a frame. It would also be nice to know the file number the frame's code belongs to and its absolute position within the file (as per @jacobtomlinson's comment). Whatever changes we make in this PR should allow us to add those attributes. If it's easy to do it would be nice if this PR also added those attributes.

We do not care about deduplicating equal code frames with different line numbers (but we do care about deduplicating equal code frames with equal line numbers which might be caused by a for-loop). We assume that the remaining duplicates will not cause issues on the scheduler in terms of size and could easily be deduplicated by the consumer. Given that we feel a need to generally iterate on computations, we do not want to sink resources into building a nice tree-like deduplication.

We should run an A/B test before merging this PR to avoid an unexpected performance impact.

cc @fjetter in case you see any red flags with this approach.

hendrikmakait

Let's fire off an A/B test to see if this has a performance impact!

hendrikmakait · 2023-05-23T08:41:21Z

distributed/client.py

@@ -130,6 +130,13 @@
 TOPIC_PREFIX_FORWARDED_LOG_RECORD = "forwarded-log-record"


+class SourceCode(NamedTuple):


nit: Maybe Frame might be a better name? If so, I'd also suggest renaming lineno_frame to firstlineno.

❤️ for the typed NamedTuple!

distributed/client.py

hendrikmakait · 2023-05-23T08:52:04Z

distributed/client.py

@@ -3035,8 +3044,15 @@ def _get_computation_code(
                or fr.f_code.co_name in ("<listcomp>", "<dictcomp>")
            ):
                continue
+
+            kwargs = dict(


This kwargs dict feels awkward, but I don't see an obvious way of restructuring the code to remove the need for it. Do you? Please ignore this if you don't see an obvious change jumping out at you, this is not blocking this PR.

fjetter · 2023-05-24T15:38:03Z

cc @fjetter in case you see any red flags with this approach.

Haven't looked at the code but what you are describing sounds good.

hendrikmakait · 2023-05-26T09:02:09Z

As assumed, there's no visible effect in the A/B test: https://github.com/coiled/benchmarks/actions/runs/5081854642

ntabris

I've made the relevant change in Coiled (and manually confirmed that Coiled will continue to work with existing distributed and with change made in the PR), so LGTM.

hendrikmakait

Thanks, @milesgranger!

milesgranger requested a review from fjetter as a code owner April 18, 2023 08:19

jrbourbeau reviewed Apr 18, 2023

View reviewed changes

distributed/client.py Show resolved Hide resolved

distributed/client.py Outdated Show resolved Hide resolved

distributed/client.py Outdated Show resolved Hide resolved

milesgranger force-pushed the 7779-line-number-code-frames branch from e7912ab to ec7a971 Compare May 9, 2023 07:14

milesgranger requested a review from jacobtomlinson as a code owner May 15, 2023 08:22

milesgranger force-pushed the 7779-line-number-code-frames branch from 7b6a7dd to 0d6dac1 Compare May 15, 2023 08:30

milesgranger added 2 commits May 15, 2023 10:53

Minor: fix concat of NoneType and str

baef315

Capture line number for code frames

8bc2357

milesgranger force-pushed the 7779-line-number-code-frames branch from 0d6dac1 to 8bc2357 Compare May 15, 2023 08:53

jacobtomlinson reviewed May 15, 2023

View reviewed changes

hendrikmakait added the needs review Needs review from a contributor. label May 16, 2023

hendrikmakait assigned hendrikmakait and unassigned hendrikmakait May 17, 2023

hendrikmakait self-requested a review May 17, 2023 09:53

ntabris reviewed May 17, 2023

View reviewed changes

Make line_number relative to calling source code

75cfdcb

hendrikmakait mentioned this pull request May 22, 2023

Computations meta-issue #7830

Open

SourceCode - keep filename, relative and frame line numbers

c6941a0

hendrikmakait reviewed May 23, 2023

View reviewed changes

Merge branch 'main' into 7779-line-number-code-frames

0f1710f

hendrikmakait requested review from ntabris and hendrikmakait May 26, 2023 13:32

ntabris approved these changes May 26, 2023

View reviewed changes

Fix IPython computation test

b913b59

hendrikmakait approved these changes May 30, 2023

View reviewed changes

hendrikmakait merged commit 8dad1fc into dask:main May 30, 2023

milesgranger deleted the 7779-line-number-code-frames branch June 1, 2023 12:39

fjetter mentioned this pull request Jun 20, 2023

Spans: capture code snippets #7930

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capture line number for code frames #7786

Capture line number for code frames #7786

milesgranger commented Apr 18, 2023 •

edited

Loading

github-actions bot commented Apr 18, 2023 •

edited

Loading

jrbourbeau left a comment

milesgranger commented May 8, 2023

ntabris commented May 8, 2023

milesgranger commented May 8, 2023

shughes-uk commented May 8, 2023 •

edited

Loading

milesgranger commented May 9, 2023 •

edited

Loading

milesgranger commented May 15, 2023

jacobtomlinson left a comment

hendrikmakait commented May 17, 2023

ntabris May 17, 2023

milesgranger May 22, 2023

ntabris May 22, 2023

jacobtomlinson commented May 18, 2023 •

edited

Loading

milesgranger commented May 22, 2023 •

edited

Loading

hendrikmakait commented May 22, 2023 •

edited

Loading

hendrikmakait left a comment

hendrikmakait May 23, 2023

hendrikmakait May 23, 2023 •

edited

Loading

fjetter commented May 24, 2023

hendrikmakait commented May 26, 2023

ntabris left a comment

hendrikmakait left a comment

		@@ -130,6 +130,13 @@
		TOPIC_PREFIX_FORWARDED_LOG_RECORD = "forwarded-log-record"


		class SourceCode(NamedTuple):

Capture line number for code frames #7786

Capture line number for code frames #7786

Conversation

milesgranger commented Apr 18, 2023 • edited Loading

github-actions bot commented Apr 18, 2023 • edited Loading

Unit Test Results

jrbourbeau left a comment

Choose a reason for hiding this comment

milesgranger commented May 8, 2023

ntabris commented May 8, 2023

milesgranger commented May 8, 2023

shughes-uk commented May 8, 2023 • edited Loading

milesgranger commented May 9, 2023 • edited Loading

milesgranger commented May 15, 2023

jacobtomlinson left a comment

Choose a reason for hiding this comment

hendrikmakait commented May 17, 2023

ntabris May 17, 2023

Choose a reason for hiding this comment

milesgranger May 22, 2023

Choose a reason for hiding this comment

ntabris May 22, 2023

Choose a reason for hiding this comment

jacobtomlinson commented May 18, 2023 • edited Loading

milesgranger commented May 22, 2023 • edited Loading

hendrikmakait commented May 22, 2023 • edited Loading

hendrikmakait left a comment

Choose a reason for hiding this comment

hendrikmakait May 23, 2023

Choose a reason for hiding this comment

hendrikmakait May 23, 2023 • edited Loading

Choose a reason for hiding this comment

fjetter commented May 24, 2023

hendrikmakait commented May 26, 2023

ntabris left a comment

Choose a reason for hiding this comment

hendrikmakait left a comment

Choose a reason for hiding this comment

milesgranger commented Apr 18, 2023 •

edited

Loading

github-actions bot commented Apr 18, 2023 •

edited

Loading

shughes-uk commented May 8, 2023 •

edited

Loading

milesgranger commented May 9, 2023 •

edited

Loading

jacobtomlinson commented May 18, 2023 •

edited

Loading

milesgranger commented May 22, 2023 •

edited

Loading

hendrikmakait commented May 22, 2023 •

edited

Loading

hendrikmakait May 23, 2023 •

edited

Loading