$\begingroup$
$\endgroup$
1
I have one dependent binary categorical variable, and one independent continuous variable. There is a lot of randomness deciding the result of the dependent variable.
The relationship between the independent variable and the dependent variable is linear. I have 2,000 data points to train data on. Some possibilities are:
- Logistic regression - simplest option
- SVM (support vector machines)
- Naive bayes
- Random forests - I see this does well on kaggle, but I have a simple one variable linear relationship, so it seems random trees isn't necessary here.
COOLSerdash
31.9k10 gold badges106 silver badges161 bronze badges
asked Jun 10, 2013 at 22:17
-
2$\begingroup$ It's next to impossible that the binary response is actually linear in the independent variable, unless the IV is very restricted in range --- and if it were actually linear, why would you use logistic regression? Isn't that nonlinear? Lastly, you state that it's linear with apparent certainty. Where does that certainty arise? $\endgroup$Glen_b– Glen_b2013年06月11日 01:13:49 +00:00Commented Jun 11, 2013 at 1:13
1 Answer 1
$\begingroup$
$\endgroup$
You should try all of the models you listed, and cross-validate them. It's the name of the site!
answered Jun 10, 2013 at 23:10