Pokemon: EDA and Battle Modeling
Exploratory data analysis and models for determining which Pokemon tend to have the biggest pure advantages in battle given no items in play. Thanks for visiting my blog today! Context Pokémon is currently celebrating 25 years in the year 2021 and I think this is the perfect time to reflect on Pokémon from a statistical…
Introductory NLP Models
Building your first Natural Language Processing-based classification model. Thanks for stopping by today! Introduction A popular and exciting application of data science programming is understanding language and the way people speak. Natural Language Processing (NLP) is a field within data science that examines all things language-related. This could be assigning a movie rating from 1-10…
Statistical Power, Part 2
Determining power in hypothesis testing for confidence in test results Introduction Thanks for visiting my blog. Today’s post concerns power in statistical tests. Power, simply put, is a metric of how reliable a statistical test. One of the inputs we will see later on pertains to effect size, which I have a previous blog on.…
Statistical Power, Part 1
An examination of the role effect size plays in whether one can trust their tests or not Introduction Thanks for visiting my blog today! Today’s blog concerns statistical power and the role it plays in hypothesis testing. In very simple terms, power is a number between 0% and 100% that tells us how much we…
Z Scores in Hypothesis Testing
Understanding the use of z scores in performing statistical inquiries Introduction Thanks for visiting my blog today! Today’s blog concerns hypothesis testing and the use of z scores to answer questions. We’ll talk about what a z score is, when it needs to be used, and how to interpret the results from a z score-related…
Was Gretzky Really THAT Great?
(I used to genuinely question Gretzky’s greatness and the data helped me make my decision) A unique approach to evaluate NHL players by accounting for era effects Introduction Thank you for visiting my blog! Let’s get right into this blog as there isn’t a whole to be said in the introduction other that we are…
Decision Trees, Part 3
Understanding how pruning works in the context of decision trees. Introduction Thank you for visiting my blog today! In previous posts, I discussed the decision tree model and how it mathematically goes along its different branches to make decisions. A quick example of a decision tree is the following: if it rains I don’t go…
Dealing With Imbalanced Datasets The Easy Way
Imposing data balance in order to have meaningful and accurate models. Introduction Thanks for visiting my blog today! Today’s blog will discuss what to do with imbalanced data. Let me quickly explain what I’m talking about for all you non-data scientists. If I am screening people too see if they have a disease and I…
Abbey Code
A Picture Is Worth 1000 Words. In This Case, I Decided To Stick With 400. Introduction Thanks for visiting my blog today! For those of you who may not know, I love music and also play a couple instruments. One of my top 3 favorite bands of all time is The Beatles (#1 is Chumbawamba).…
Basic AutoGluon Models
Learning the basics of a useful AutoML library Introduction Thank you for visiting my blog today! Recently, I was introduced to an interesting library geared toward building fast and accurate machine learning models using a library called AutoGluon. I don’t claim credit for any of the fancy backend code or functionality. However, I would like…
Decision Trees, Part 2
Using a sample data set to create and test a decision tree model. Introduction Welcome to my blog and thanks for dropping by! Today, I have my second installment in my decision tree blog series. I’m going to create and run a model leveraging decision trees using data I found on a public data repository.…
Decision Trees, Part 1
Understanding how to build a decision tree using statistical methods. Introduction Thanks for visiting my blog today! Life is complex and largely guided by the decisions we make. Decisions are also complex and are usually the result of a cascade of other decisions and logic flowing through our heads. Just like we all make various…
Sink or Swim
Effectively Predicting the Outcome of a Shark Tank Pitch Introduction Thank you for visiting my blog today! Recently, during my quarantine, I have found myself watching a lot of Shark Tank. In case you are living under a rock, Shark Tank is a thrilling (and often parodied) reality TV show (currently on CNBC) where hopeful…
Linear Regression, Part 3
Simple Multiple Linear Regression in Python Introduction Thank you for visiting my blog! Today, I’m going to be doing the third part of my series in linear regression. I originally intended to end on part 3, but I have considered continuing with further blogs. We’ll see what happens. In my first blog, I introduced the…
Linear Regression, Part 2
Building an Understanding of Gradient Descent Using Computer Programming Introduction Thank you for visiting my blog. Today’s blog is the second blog in a series I am doing on linear regression. If you are reading this blog, I hope you have a fundamental understanding of what linear regression is and what a linear regression model…
Linear Regression, Part 1
Acquiring a baseline understanding of the ideas and concepts that drive linear regression. Introduction Thanks for visiting my blog! Today, I’d like to do my first part in a multi-part blog series discussing linear regression. In part 1, I will go through the basic ideas and concepts that describe what a linear regression is at…
Machine Learning Metrics and Confusion Matrices
Understanding the Elements and Metrics Derived from Confusion Matrices when Evaluating Model Performance in Machine Learning Classification Introduction Thanks for visiting my blog. I hope my readers get the joke displayed above. If they don’t and they’re also millennials, they missed out on some great childhood fun. What are confusion matrices? Why do they matter?…
Devin Booker vs. Gregg Popovich (vs. Patrick McCaw)
Developing a process to predict which NBA teams and players will end up in the playoffs. Introduction Hello! Thanks for visiting my blog. After spending the summer being reminded of just how incredible Michael Jordan was as a competitor and leader, the NBA is finally making a comeback and the [NBA] “bubble” is in full…
Sell Phones
Understanding the main influences and driving factors behind how a phone’s price is determined. Introduction If you made it this far it means you were not completely turned off by my terrible attempt to use a pun in the title and I thank you for that. This blog will probably be pretty quick I think.…
Exciting Graphs for Boring Maths
Understanding style options available in Seaborn that can enhance your explanatory graphs and other visuals. Introduction Welcome to my blog! Today, I will be discussing ways to leverage Seaborn (I will often use the ‘sns’ abbreviation later on) into spicing up python visuals. Seaborn [and Matplotlib – abbreviated as ‘plt’] are powerful libraries that can…
Capstone Project at The Flatiron School
Introduction Thank you for visiting my blog today. Today I would like to discuss my capstone project that I worked on as I graduated from the Flatiron School data science boot camp (Chicago Campus). At the Flatiron School, I began to build a foundation in python and its applications in data science. It was an…
Method to the Madness
Imposing Structure and Guidelines in Exploratory Data Analysis Introduction I like to start off my blogs with the inspiration that led me to write that blog and this blog is no different. I recently spent time working on a project. Part of my general process when performing data science inquiries and building models is to…
Out Of Office
Predicting Absenteeism At Work Introduction It’s important to keep track of who does and does not show up to work when they are supposed to. I found some interesting data online that gives information on how much work from a range of 0 to 40 hours any employee is expected to miss in a certain…
Feature Selection in Data Science
Introduction Often times, when addressing data sets with many features, reducing features and simplifying your data can be helpful. Usually, one particular juncture where you remove a lot of data or features is by reducing correlation using a filter of 70%, or so. (Having highly correlated variables usually leads to overfitting). However, you can continue…
Feature Scaling In Machine Learning
Accounting for the Effect of Magnitude in Comparing Features and Building Predictive Models Introduction The inspiration for this blog post comes from some hypothesis testing I performed on a recent project. I needed to put all my data on the same scale in order to compare it. If I wanted to compare the population of…
Encoding Categorical Data
Introduction (PLEASE READ – Later on in this blog I describe target encoding without naming it as such. I wrote this blog before I knew target encoding was a popular thing and I am glad to have learned that it is a common encoding method. If you read later on, I will include a quick-fix…
Bank Shot: Inside The Vault
Introduction Thanks for visiting my blog today! To provide some context to today’s blog, I have another blog I have yet to release called “Bank Shot” which talks about NBA salaries. Over there, I talked a little bit about contracts and designing a model to predict NBA player salaries. However, that analysis was pretty short…
Follow My Blog
Get new content delivered directly to your inbox.