Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - Resolves #73 - Implement first order differences of prediction trajectories #77

Conversation

alexmlong
Copy link
Collaborator

@alexmlong alexmlong commented Apr 23, 2023

I'm not sure if I'm going down the right path so I'm just committing what I've got so far for a "midway" review.

@alexmlong alexmlong changed the title WIP - Initial untested attempt at first order diff WIP - Resolves #73 - Implement first order differences of prediction trajectories Apr 23, 2023
@alexmlong alexmlong requested a review from levmckinney April 23, 2023 22:36
@levmckinney levmckinney linked an issue Apr 23, 2023 that may be closed by this pull request

# for each input token:
# for each layer (starting at 2nd layer):
# get top k tokens for that layer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be vectorized. We don't want to be looping over the token index this will be 40k+ tokens per layer that is going to be slow. Instead, we should use an element wise subtraction. If layer[i, :] and layer[i+1, :] are NDArrays then we should just do layer[i+1, :] - layer[i, :].

In addition, we should not be thinking about the top k tokens at this point since in the end the tokens that have changed the most may not be among top k tokens in either layer. We may need to do something in the select top k tokens code latter but don't worry about that just yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement first order differences of prediction trajectories
2 participants