You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wasn't using this piece of code before and just find that calc_proximity.py creates a proximity matrix with diagonal 1, which was used in following calculations of density/coi/cog/etc.
In density regressions, we tend to use density to capture implied comparative advantage from other products, and add a regression to the mean term to capture its own effect.
Usually, the diagonal of proximity is explicitly set to 0 (or subtract an identity matrix as in Stata version), otherwise when doing the normalization step or knn step, the product itself will have an influence on its density (in knn version, the product itself is the nearest neighbor).
The suggested change is to add a row before returning phi matrix at line 29 and 44 in calc_proximity.py: np.fill_diagonal(phi, 0)
However, this might be a breaking change for other pieces of code (e.g. knn, density, coi/cog), and I would suggest a cautious examination before implementing this.
The text was updated successfully, but these errors were encountered:
Great catch, thanks Yang. I think it makes sense to keep the proximities as having diagonal 1, but in the density calculations, I'll add in the 0 diagonal fill after some tests. The normalization step is likely affected by this but I'm not sure that the knn part is affected. I need to check but because we supply a square precomputed distance matrix, I think sklearn's knn implementation might ignore the point itself when computing nearest neighbors. Will take a look when I get some time. Thanks again for the catch!
I wasn't using this piece of code before and just find that calc_proximity.py creates a proximity matrix with diagonal 1, which was used in following calculations of density/coi/cog/etc.
In density regressions, we tend to use density to capture implied comparative advantage from other products, and add a regression to the mean term to capture its own effect.
Usually, the diagonal of proximity is explicitly set to 0 (or subtract an identity matrix as in Stata version), otherwise when doing the normalization step or knn step, the product itself will have an influence on its density (in knn version, the product itself is the nearest neighbor).
The suggested change is to add a row before returning phi matrix at line 29 and 44 in calc_proximity.py:
np.fill_diagonal(phi, 0)
However, this might be a breaking change for other pieces of code (e.g. knn, density, coi/cog), and I would suggest a cautious examination before implementing this.
The text was updated successfully, but these errors were encountered: