Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proximity matrix diagonal should be set to 0 #38

Open
complexly opened this issue Jan 27, 2025 · 1 comment
Open

Proximity matrix diagonal should be set to 0 #38

complexly opened this issue Jan 27, 2025 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@complexly
Copy link
Member

complexly commented Jan 27, 2025

I wasn't using this piece of code before and just find that calc_proximity.py creates a proximity matrix with diagonal 1, which was used in following calculations of density/coi/cog/etc.

In density regressions, we tend to use density to capture implied comparative advantage from other products, and add a regression to the mean term to capture its own effect.

Usually, the diagonal of proximity is explicitly set to 0 (or subtract an identity matrix as in Stata version), otherwise when doing the normalization step or knn step, the product itself will have an influence on its density (in knn version, the product itself is the nearest neighbor).

The suggested change is to add a row before returning phi matrix at line 29 and 44 in calc_proximity.py:
np.fill_diagonal(phi, 0)

However, this might be a breaking change for other pieces of code (e.g. knn, density, coi/cog), and I would suggest a cautious examination before implementing this.

@shreyasgm
Copy link
Member

Great catch, thanks Yang. I think it makes sense to keep the proximities as having diagonal 1, but in the density calculations, I'll add in the 0 diagonal fill after some tests. The normalization step is likely affected by this but I'm not sure that the knn part is affected. I need to check but because we supply a square precomputed distance matrix, I think sklearn's knn implementation might ignore the point itself when computing nearest neighbors. Will take a look when I get some time. Thanks again for the catch!

@shreyasgm shreyasgm self-assigned this Jan 28, 2025
@shreyasgm shreyasgm added the bug Something isn't working label Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants