Machine Learning

A Few Practical Thoughts on Supervised Learning

2018-04-01 04:45:00 +0000
Classification, Logistic Regression, Kaggle, Regularization, Cross-Validation

tl;dr

Motivation

There are many precious notebooks in Kaggle, (although I have read only a few of most-voted ones), and many of them just used categorical variables, not using dummy variables. And then the comment pointing the use of dummies also got many votes. This is my motivation; how the dummies improve the model?

Since I think that other similar topics could be included here, below are the main questions:

Here the only logistic regression is used for the training method and there is no comparison among various ones, simply because I do not know much detail of other sophisticated methods. Logistic regression is chosen since it is simple and easy to understand, for me.

Notebook for more

This Kyso notebook is just a clone of my Kaggle notebook. If some texts and tables are badly rendered, please check the Kaggle one.

References