Decision Trees, Part 3

Understanding how pruning works in the context of decision trees.

Post-pruning techniques in decision tree | by Z² Little | Medium

Introduction

Thank you for visiting my blog today!

In previous posts, I discussed the decision tree model and how it mathematically goes along its different branches to make decisions. A quick example of a decision tree is the following: if it rains I don’t go to the baseball game, otherwise I go. Assuming it doesn’t rain, if I go to the game and the score has one team winning by 5 or more runs by the eighth inning I leave early, otherwise I stay. Here the target variable is whether I will be at the stadium under different unique situations. Decision tree models are very intuitive as it is part of human nature to create mental decision trees in everyday life. In machine learning, decision trees are a common classifier but they have many flaws that are, in fact, able to be addressed. Let me give you an example. Say we are trying to figure out if Patrick Mahomes (NFL player) is going to throw for 300 yards and 3 touchdowns in a given game and among other information we know about the game, we know that he is playing the Chicago Bears (as a Bears fan it is hard to mention of Mahomes and the Bears in the same sentence). For all you non-football fans – Patrick Mahomes, as long as he stays with the Chiefs, will almost never play the Bears. I’m serious about this. According to the structure of the NFL, they will only play each other once in every four years for one game only (with the exception of possibly meeting in the Super Bowl leading to a maximum of five meetings in a four-year period). The problem we have with this situation is that whether or not the Bears win this game, we haven’t really learned anything meaningful because this situation is rather uncommon and a fluke may occur and therefore this observation only serves to cloud our ability to analyze this situation. I would imagine if we had the same exact situation, but did not mention that the team being played was the Bears, and all we knew was the rankings of the teams and other statistics, we would have a much higher accuracy when predicting the game’s outcome (and performance of Mahomes). That’s what pruning is all about; getting rid of the extra information that doesn’t teach you anything. This is not a novel idea. Amos Tversky and Daniel Kahneman talk about the idea that adding additional information to a problem or question often confuses people and leads them to the wrong conclusions. Say I told you that there was an NFL player who averaged 100 rushing yards per game and 20 receiving yards per game. If I were to ask you whether it’s more likely that he is a NFL running back or an NFL player, you may think with absolute certainty that he is a running back (and you’d probably be right) but since the running back position is a subset of being an NFL player, it is actually more likely that he is an NFL player. You laugh when you read this now but this is actually a fairly common mistake. Anyway, pruning is the process of cutting off lower branches of a tree (top-down approach) and eliminating noise. Getting back to the first example of whether I stay at the baseball game, to prune that tree – if we were to assume I never leave games early – the only feature that matters is the presence of rain and we could cut that tree after the first branch. So let’s see how this actually works.

Fundamental Concept

The fundamental concept is pretty simple, we look at each additional continuation of the tree and see if we actually learn an amount of significant or information or that extra step doesn’t really tell us anything new. In python, you can even specify how deep you want or trees to be. However, that still leaves the question as to how we statistically determine a good stopping point. We can guess-and-check in python with relative ease until we find a satisfying stopping point, but that isn’t really so exciting.

Statistical Method

So this is actually somewhat complex. There are two common methods. There is the cost complexity method and the chi-square method. The cost complexity method operates similarly to optimizing a linear regression using gradient descent in the sense that you leverage calculus to find a minimum. Using this method you create many trees that are all of varying length and solve for the one that leads to the highest accuracy. This concept is not novel to statisticians or data scientists as optimizing a cost function is a key tool in creating models. The chi-square method uses the familiar chi-square test to basically look at each subsequent branch of the tree actually matters. In the football example above, we learn next to nothing from a game between the Bears and Chiefs as they can play a maximum of 13 games against each other in a ten-year period assuming they meet in ten straight super bowls. The Chiefs probably average 195 games in a ten-year period including a minimum of 20 games against the Broncos, Raiders, and Chargers each individually per ten-year period. In python, scipy.stats has a built-in chi-square function that should allow you to perform the test. I currently have a blog in the works with more information on the chi squared test and will provide that link when that blog is released.

Conclusion

Decision trees are fun and intuitive and drive our everyday lives. The issue with them is they can easily overfit. Pruning trees allows us to take a step back and look for broader trends and stories instead of focussing in on features that are two specific to provide any helpful information. Hopefully, this blog can help you to start thinking about ways to improve decision trees!

Thanks for reading!

Gifprint - Convert Animated Gifs to Printable Flipbooks