0
$\begingroup$

I am attempting to understand how each independent variable effects the probability of each dependent variable, which are ordinal (0, 1 and 2). Therefore, I am attempting to use ordinal logistic regression. The issue is that the frequency of each dependent variable is imbalanced, and therefore the average probability of the minority dependent variable (class) is much lower than the other two. I have tried SMOTE and naïve oversampling, however, these are stochastic and can alter the real distributions of my data and lead to a different result each time.

In addition, does it make sense to fit my ordinal logistic regression on all independent variables if I want to analyse the effect of each one? Ideally, I want to find the cut-off points for my numerical variables that best separate each dependent variable, while also establishing how changes in each effect the odds of each class.

asked Nov 17 at 17:05
$\endgroup$
4
  • 1
    $\begingroup$ the average probability of the minority dependent variable (class) is much lower than the other two Could you please clarify why you find this problematic? $\endgroup$ Commented Nov 17 at 17:51
  • $\begingroup$ I am trying to establish the thresholds at which another class becomes more likely; currently there is no cross-over for this class since its probability is always less than the other two. For example, in my KDE plot I can see where the density crosses over. $\endgroup$ Commented Nov 17 at 17:56
  • 1
    $\begingroup$ If a category is overall unlikely, it might be that it is always unlikely. $\endgroup$ Commented Nov 17 at 18:33
  • 1
    $\begingroup$ I don't have time to post a full answer for a few days, but in the meantime see this page and its links for extensive discussion about unbalanced classes and oversampling. $\endgroup$ Commented Nov 19 at 15:56

1 Answer 1

1
$\begingroup$

... the frequency of each dependent variable is imbalanced, and therefore the average probability of the minority dependent variable (class) is much lower than the other two.

As discussed extensively on this page, if class imbalance in your data matches class imbalance in the population of interest then there isn't typically a problem that can be solved by over/under sampling once you have collected the data.* As @Dave said in a comment: "If a category is overall unlikely, it might be that it is always unlikely." Estimates involving the minority class will have lower precision than those for the other classes, but over/under sampling won't fix that in terms of estimating associations in the underlying population.

... does it make sense to fit my ordinal logistic regression on all independent variables if I want to analyse the effect of each one?

Absolutely. If independent variables are correlated, as they typically are in observational studies, then in practice there is no independent "effect of each one." Omitting any outcome-associated predictor will lead to bias in the estimates of included predictors that are correlated with it (omitted-variable bias), even in ordinary least squares. In logistic regression the problem is even worse, as you can have omitted-variable bias even if an omitted predictor isn't correlated with the included predictors. See this page.

I want to find the cut-off points for my numerical variables that best separate each dependent variable, while also establishing how changes in each effect the odds of each class.

"Cut-off points" can be misleading, if they are chosen without regard to the relative risks of misclassification. If you look for cutoffs that maximize the "accuracy" of class assignment, you are making an implicit assumption that all misclassifcations have the same cost. That's often not the case. This page has extensive discussion, with links to further study.


*As discussed on this page, there can be practical considerations that support over/under sampling at the data acquisition stage.

answered 17 hours ago
$\endgroup$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.