ed84c Posted September 30, 2020 Posted September 30, 2020 Hello Looking for someone to point me in the right direction so I can start researching a solution to the problem below: Problem Linear regression with high dimensionality - p=29, n = 5000ish, input variables are generally quite highly correlated When using the model for prediction, data sets regularly have a missing input parameter(s). At the moment I just refit a LSQ solution from the training data with that input deleted. This seems to lead to quite unstable results. Stability is important for my application, more so than absolute accuracy in some senses. -- Regularisation (e.g. Ridge) feels like it should help, but (and I'm not formally train in stats) as I understand that will reduce the variance of the model, with all input variables - and doesn't necessarily achieve anything for model stability where input parameters are deleted. Thanks in advance.
Prometheus Posted September 30, 2020 Posted September 30, 2020 Have you looked at Partial Least Squares? It deals nicely with highly correlated variables by projecting them unto a subspace before fitting the model.
ed84c Posted September 30, 2020 Author Posted September 30, 2020 Thanks - I have very briefly, will have a bit more of a dig into it.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now