Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update TrackClusterMergeSplitter to output track-cluster associations (PFA0) #1699

Open
wants to merge 34 commits into
base: main
Choose a base branch
from

Conversation

ruse-traveler
Copy link
Contributor

@ruse-traveler ruse-traveler commented Jan 9, 2025

Briefly, what does this PR introduce?

This PR updates the TrackClusterMergeSplitter algorithm to output both edm4eic::TrackClusterMatch and MC paritcle-cluster associations. In this process, it reaps what was sown by originally writing the algorithm to operate on protoclusters rather than clusters: the algorithm will now ingest fully formed cluster and update relevant quantities.

What kind of change does this PR introduce?

Please check if this PR fulfills the following:

  • Tests for the changes have been added
  • Documentation has been added / updated
  • Changes have been communicated to collaborators

Does this PR introduce breaking changes? What changes might users need to make to their code?

No.

Does this PR change default behavior?

Yes. Track-cluster and MC particle-cluster associations will now be produced by the algorithm.

@ruse-traveler ruse-traveler marked this pull request as ready for review February 4, 2025 22:53
Comment on lines 528 to 535
// --------------------------------------------------------------------------
//! Calculate cluster shape parameters
// --------------------------------------------------------------------------
/*! Calculation originally written by Chao Peng, Dhevan Gangadharan,
* and Sebouh Paul. Code is copied from the `CalorimeterClusterRecoCoG`
* algorithm.
*/
void TrackClusterMergeSplitter::calculate_shape_parameters(edm4eic::MutableCluster& clust) const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this factorizes so well, we might as well move it to a separate factory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is already a big PR. If you decide to go along with my suggestion, you could refactor existing code separately, and then rebase this to re-use that new facility)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be very much in favor of that! Keeps it nice and modular 😉

But how would this work in the data model? Would we have one collection without shape parameters filled in (likely not saved by default) and then one with them filled in (saved by default)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started work on this in #1734 !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebase done! One new wrinkle I can immediately spot is how to handle the track-cluster matches: those will be pointing back to the split/merged clusters without cluster shapes...

A proposal: what if we had a new meta algorithm for copying associations? This could be useful for other algorithms that are one-to-one transformations. I have some ideas about how this might be implemented...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding general studies using the matches, aren't these a bit special? In the sense that they were used to construct resulting clusters, but they might be not what a track-cluster matcher would find by looking at those clusters.

I think, we can be optimistic about our ability to follow up with a relation re-writer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding general studies using the matches, aren't these a bit special? In the sense that they were used to construct resulting clusters, but they might be not what a track-cluster matcher would find by looking at those clusters.

True! The matching procedure here does differ in that regard! But it's not clear to me how we would propagate the track-merged cluster relation downstream without outputting updated track-cluster matches...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only talking about what is being written to the file. IIRC "dangling" relations are not allowed (there is a bug in PODIO which allows for OneToOne relations to not get checked, but that may be fixed at some point). To fix those, we would need to also save clusters without the shapes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhhh! I understand now! 😅
I'm absolutely good with disabling them in the output! 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed changes to disable the track-cluster associations!

Comment on lines 63 to 64
using MatrixF = std::vector<std::vector<float>>;
using VecMatrix = std::vector<MatrixF>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like, in some places you try to use maps, and here you give up and just rely on array indices? Is it true that every hit has a weight related to every cluster and every projection?

Copy link
Contributor Author

@ruse-traveler ruse-traveler Feb 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I do think we need a weight that's related to every cluster and every projection pointing to the merged cluster:

  • the former is because the merged cluster will (naturally) be composed of all of the hits of the clusters being merged,
  • and the latter is because we'll need to know the sum of all projections' momenta for the weight, so that summing over the split clusters should give the merged cluster energy before splitting.

(Unless I'm misunderstanding how the splitting should work!)

All that being siad, I probably could rework it to use maps instead of indices. I was thinking of this map of weights (hit_weight = hit_weight(cluster, projection)) as a "matrix", and this was an easy way of implementing a dynamic 2D matrix of floats 🤷

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I do think we need a weight that's related to every cluster and every projection pointing to the merged cluster:

I think you're right. Looks like this only involves hits from the cluster group.

All that being siad, I probably could rework it to use maps instead of indices. I was thinking of this map of weights (hit_weight = hit_weight(cluster, projection)) as a "matrix", and this was an easy way of implementing a dynamic 2D matrix of floats 🤷

Ideally we write the code to maximize clarity. Having a consistent use of maps could help with that, if there is a neat way to use those. Using maps also provides basic sanity checks to the code.

If you have a preference to use tensors here, then just remember std::vector<std::vector<std::vector< has triple dereferences/allocations and spoils data locality, so it won't win that much in terms of performance. We already use eigen library, that could be better for your case, perhaps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! I'll give it some thought and let you know which route (map or eigen) I go with!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After mulling it over, I think it a natural way might be to actually use a mix of maps and eigen: for each projection, we know how many clusters and hits we're going to need so we could easily define a map of projections onto MatrixXd! Let me give it a shot...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stand corrected: it was much easier to just use std::map<edm4eic::CalorimeterHit, double>. Just pushed some changes that rewrite the splitting calculation to use that (it's a little cleaner now).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants