Today I want to explain linear regression. It is one of the simplest statistical learning models and can be implemented in only a couple lines of Python code, in an efficient manner. Being so simple however does not mean it is not useful, in fact it can be very practical to explore relationships between features in a dataset and make predictions on a target value. Therefore I think it’s important to understand how the method works and how the different parameters have an effect on the outcome.
This is Part 2. of my decision tree series. Here we will see how we can build a decision tree algorithmically using Leo Breiman’s (One of the big, big names in decision trees) CART algorithm.
Ok so as we saw in previous parts, the CART algorithm allows us to build decision trees. Up till now we have built these trees until all leaves are pure, meaning they have only one class of examples (for classification trees), however this can lead to overfitting the training data which decreases the generalizability of our model, and therefore it’s usefulness. This is where cost-complexity pruning comes into play.
If you haven’t read my post on linear regression I invite you to do so here, but basically it is a method for modelling the relationship between variables \(X_i\) and a target feature \(y\) in a linear model. This modelling is done through learning weights \(\theta_i\) for each \(X_i\) supposing that our model looks something like this: