1
$\begingroup$

I have one dependent binary categorical variable, and one independent continuous variable. There is a lot of randomness deciding the result of the dependent variable.

The relationship between the independent variable and the dependent variable is linear. I have 2,000 data points to train data on. Some possibilities are:

  • Logistic regression - simplest option
  • SVM (support vector machines)
  • Naive bayes
  • Random forests - I see this does well on kaggle, but I have a simple one variable linear relationship, so it seems random trees isn't necessary here.
COOLSerdash
31.9k10 gold badges106 silver badges161 bronze badges
asked Jun 10, 2013 at 22:17
$\endgroup$
1
  • 2
    $\begingroup$ It's next to impossible that the binary response is actually linear in the independent variable, unless the IV is very restricted in range --- and if it were actually linear, why would you use logistic regression? Isn't that nonlinear? Lastly, you state that it's linear with apparent certainty. Where does that certainty arise? $\endgroup$ Commented Jun 11, 2013 at 1:13

1 Answer 1

10
$\begingroup$

You should try all of the models you listed, and cross-validate them. It's the name of the site!

answered Jun 10, 2013 at 23:10
$\endgroup$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.