Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

glmnet.lasso_maxCoef(): incorrect coefficient sorting? [bug] #32

Open
ptaitAtMcMaster opened this issue Aug 24, 2023 · 0 comments
Open

Comments

@ptaitAtMcMaster
Copy link

Hi,
on line 129 of fitfuns.R, it looks like glmnet.lasso_maxCoef() is not sorting the coefficients in the right order. The code is:
selected <- order(coef(fit)[-1])[1:q]

Here is a reproducible example that would capture lines 126 and 129:

library(glmnet)
library(stabs)
set.seed(36)

data(QuickStartExample)
x <- QuickStartExample$x
y <- QuickStartExample$y

lambda.1se <- cv.glmnet(x, y)$lambda.1se # 0.1322761

fit <- glmnet(x, y, lambda = lambda.1se)
fit$df # 8

## coefficients minus the intercept
coef_vec <- coef(fit)[-1] 
coef_vec
# 1.3019591  0.0000000  0.6422423  0.0000000 -0.7892393  0.4944803  0.0000000  0.2943201  0.0000000  0.0000000  0.1058430
# 0.0000000  0.0000000 -1.0402308  0.0000000  0.0000000  0.0000000  0.0000000  0.0000000 -0.9791165

##  line 129 :: order(coef(fit)[-1])[1:q]
i1 <- order(coef_vec); i1
# [1] 14 20  5  2  4  7  9 10 12 13 15 16 17 18 19 11  8  6  3  1
q <- 9; coef_vec[i1[1:q]]
# -1.0402308 -0.9791165 -0.7892393  0.0000000  0.0000000  0.0000000  0.0000000  0.0000000  0.0000000

I would have expected the sorting to be done on the absolute value of the coefficients. Assuming this is the intent, two possible solutions are:

i2 <- order(abs(coef_vec), decreasing = TRUE); i2
# [1]  1 14 20  5  3  6  8 11  2  4  7  9 10 12 13 15 16 17 18 19
coef_vec[i2[1:q]]
# 1.3019591 -1.0402308 -0.9791165 -0.7892393  0.6422423  0.4944803  0.2943201  0.1058430  0.0000000

or

selected <- predict(fit, type = "nonzero")
selected <- selected[[length(selected)]] ## length is 1 when lambda is a scalar
selected
#  1  3  5  6  8 11 14 20

I believe the later solution is used in glmnet.lasso() on lines 31 and 32 of fitfuns.R

Thanks
Peter

## verify that the same coefficients are chosen 
all(selected %in% i2[1:fit$df])
# TRUE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant