Inquiries

Pokemon: EDA and Battle Modeling

Exploratory data analysis and models for determining which Pokemon tend to have the biggest pure advantages in battle given no items in play. Thanks for visiting my blog today! Context Pokémon is currently celebrating 25 years in the year 2021 and I think this is the perfect time to reflect on Pokémon from a statistical…

by josephcohen94 March 8, 2021March 12, 2021

Introductory NLP Models

Building your first Natural Language Processing-based classification model. Thanks for stopping by today! Introduction A popular and exciting application of data science programming is understanding language and the way people speak. Natural Language Processing (NLP) is a field within data science that examines all things language-related. This could be assigning a movie rating from 1-10…

by josephcohen94 March 5, 2021March 7, 2021

Statistical Power, Part 2

Determining power in hypothesis testing for confidence in test results Introduction Thanks for visiting my blog. Today’s post concerns power in statistical tests. Power, simply put, is a metric of how reliable a statistical test. One of the inputs we will see later on pertains to effect size, which I have a previous blog on.…

by josephcohen94 February 8, 2021

Statistical Power, Part 1

An examination of the role effect size plays in whether one can trust their tests or not Introduction Thanks for visiting my blog today! Today’s blog concerns statistical power and the role it plays in hypothesis testing. In very simple terms, power is a number between 0% and 100% that tells us how much we…

by josephcohen94 December 24, 2020December 25, 2020

Z Scores in Hypothesis Testing

Understanding the use of z scores in performing statistical inquiries Introduction Thanks for visiting my blog today! Today’s blog concerns hypothesis testing and the use of z scores to answer questions. We’ll talk about what a z score is, when it needs to be used, and how to interpret the results from a z score-related…

by josephcohen94 December 7, 2020December 11, 2020

Was Gretzky Really THAT Great?

(I used to genuinely question Gretzky’s greatness and the data helped me make my decision) A unique approach to evaluate NHL players by accounting for era effects Introduction Thank you for visiting my blog! Let’s get right into this blog as there isn’t a whole to be said in the introduction other that we are…

by josephcohen94 November 20, 2020December 25, 2020

Decision Trees, Part 3

Understanding how pruning works in the context of decision trees. Introduction Thank you for visiting my blog today! In previous posts, I discussed the decision tree model and how it mathematically goes along its different branches to make decisions. A quick example of a decision tree is the following: if it rains I don’t go…

by josephcohen94 November 6, 2020November 6, 2020

Dealing With Imbalanced Datasets The Easy Way

Imposing data balance in order to have meaningful and accurate models. Introduction Thanks for visiting my blog today! Today’s blog will discuss what to do with imbalanced data. Let me quickly explain what I’m talking about for all you non-data scientists. If I am screening people too see if they have a disease and I…

by josephcohen94 October 30, 2020November 24, 2020

Abbey Code

A Picture Is Worth 1000 Words. In This Case, I Decided To Stick With 400. Introduction Thanks for visiting my blog today! For those of you who may not know, I love music and also play a couple instruments. One of my top 3 favorite bands of all time is The Beatles (#1 is Chumbawamba).…

by josephcohen94 October 2, 2020

Basic AutoGluon Models

Learning the basics of a useful AutoML library Introduction Thank you for visiting my blog today! Recently, I was introduced to an interesting library geared toward building fast and accurate machine learning models using a library called AutoGluon. I don’t claim credit for any of the fancy backend code or functionality. However, I would like…

by josephcohen94 September 25, 2020September 29, 2020

Decision Trees, Part 2

Using a sample data set to create and test a decision tree model. Introduction Welcome to my blog and thanks for dropping by! Today, I have my second installment in my decision tree blog series. I’m going to create and run a model leveraging decision trees using data I found on a public data repository.…

by josephcohen94 September 18, 2020

Decision Trees, Part 1

Understanding how to build a decision tree using statistical methods. Introduction Thanks for visiting my blog today! Life is complex and largely guided by the decisions we make. Decisions are also complex and are usually the result of a cascade of other decisions and logic flowing through our heads. Just like we all make various…

by josephcohen94 September 10, 2020September 11, 2020

Sink or Swim

Effectively Predicting the Outcome of a Shark Tank Pitch Introduction Thank you for visiting my blog today! Recently, during my quarantine, I have found myself watching a lot of Shark Tank. In case you are living under a rock, Shark Tank is a thrilling (and often parodied) reality TV show (currently on CNBC) where hopeful…

by josephcohen94 September 4, 2020September 3, 2020

Linear Regression, Part 3

Simple Multiple Linear Regression in Python Introduction Thank you for visiting my blog! Today, I’m going to be doing the third part of my series in linear regression. I originally intended to end on part 3, but I have considered continuing with further blogs. We’ll see what happens. In my first blog, I introduced the…

by josephcohen94 August 28, 2020

Linear Regression, Part 2

Building an Understanding of Gradient Descent Using Computer Programming Introduction Thank you for visiting my blog. Today’s blog is the second blog in a series I am doing on linear regression. If you are reading this blog, I hope you have a fundamental understanding of what linear regression is and what a linear regression model…

by josephcohen94 August 21, 2020August 21, 2020

Linear Regression, Part 1

Acquiring a baseline understanding of the ideas and concepts that drive linear regression. Introduction Thanks for visiting my blog! Today, I’d like to do my first part in a multi-part blog series discussing linear regression. In part 1, I will go through the basic ideas and concepts that describe what a linear regression is at…

by josephcohen94 August 12, 2020August 14, 2020

Machine Learning Metrics and Confusion Matrices

Understanding the Elements and Metrics Derived from Confusion Matrices when Evaluating Model Performance in Machine Learning Classification Introduction Thanks for visiting my blog. I hope my readers get the joke displayed above. If they don’t and they’re also millennials, they missed out on some great childhood fun. What are confusion matrices? Why do they matter?…

by josephcohen94 August 2, 2020August 7, 2020

Devin Booker vs. Gregg Popovich (vs. Patrick McCaw)

Developing a process to predict which NBA teams and players will end up in the playoffs. Introduction Hello! Thanks for visiting my blog. After spending the summer being reminded of just how incredible Michael Jordan was as a competitor and leader, the NBA is finally making a comeback and the [NBA] “bubble” is in full…

by josephcohen94 July 24, 2020July 26, 2020

Sell Phones

Understanding the main influences and driving factors behind how a phone’s price is determined. Introduction If you made it this far it means you were not completely turned off by my terrible attempt to use a pun in the title and I thank you for that. This blog will probably be pretty quick I think.…

by josephcohen94 July 17, 2020

Exciting Graphs for Boring Maths

Understanding style options available in Seaborn that can enhance your explanatory graphs and other visuals. Introduction Welcome to my blog! Today, I will be discussing ways to leverage Seaborn (I will often use the ‘sns’ abbreviation later on) into spicing up python visuals. Seaborn [and Matplotlib – abbreviated as ‘plt’] are powerful libraries that can…

by josephcohen94 July 10, 2020July 10, 2020

Capstone Project at The Flatiron School

Introduction Thank you for visiting my blog today. Today I would like to discuss my capstone project that I worked on as I graduated from the Flatiron School data science boot camp (Chicago Campus). At the Flatiron School, I began to build a foundation in python and its applications in data science. It was an…

by josephcohen94 July 3, 2020July 6, 2020

Method to the Madness

Imposing Structure and Guidelines in Exploratory Data Analysis Introduction I like to start off my blogs with the inspiration that led me to write that blog and this blog is no different. I recently spent time working on a project. Part of my general process when performing data science inquiries and building models is to…

by josephcohen94 June 26, 2020July 16, 2020

Out Of Office

Predicting Absenteeism At Work Introduction It’s important to keep track of who does and does not show up to work when they are supposed to. I found some interesting data online that gives information on how much work from a range of 0 to 40 hours any employee is expected to miss in a certain…

by josephcohen94 June 12, 2020

Feature Selection in Data Science

Introduction Often times, when addressing data sets with many features, reducing features and simplifying your data can be helpful. Usually, one particular juncture where you remove a lot of data or features is by reducing correlation using a filter of 70%, or so. (Having highly correlated variables usually leads to overfitting). However, you can continue…

by josephcohen94 June 5, 2020June 8, 2020

Feature Scaling In Machine Learning

Accounting for the Effect of Magnitude in Comparing Features and Building Predictive Models Introduction The inspiration for this blog post comes from some hypothesis testing I performed on a recent project. I needed to put all my data on the same scale in order to compare it. If I wanted to compare the population of…

by josephcohen94 May 6, 2020June 19, 2020

Encoding Categorical Data

Introduction (PLEASE READ – Later on in this blog I describe target encoding without naming it as such. I wrote this blog before I knew target encoding was a popular thing and I am glad to have learned that it is a common encoding method. If you read later on, I will include a quick-fix…

by josephcohen94 April 19, 2020December 3, 2020

Bank Shot: Inside The Vault

Introduction Thanks for visiting my blog today! To provide some context to today’s blog, I have another blog I have yet to release called “Bank Shot” which talks about NBA salaries. Over there, I talked a little bit about contracts and designing a model to predict NBA player salaries. However, that analysis was pretty short…

by josephcohen94 April 7, 2020October 9, 2020

Follow My Blog

Get new content delivered directly to your inbox.

Inquiries

Data Science and Beyond with Joseph Cohen

Blog Links

Pokemon: EDA and Battle Modeling

Introductory NLP Models

Statistical Power, Part 2

Statistical Power, Part 1

Z Scores in Hypothesis Testing

Was Gretzky Really THAT Great?

Decision Trees, Part 3

Dealing With Imbalanced Datasets The Easy Way

Abbey Code

Basic AutoGluon Models

Decision Trees, Part 2

Decision Trees, Part 1

Sink or Swim

Linear Regression, Part 3

Linear Regression, Part 2

Linear Regression, Part 1

Machine Learning Metrics and Confusion Matrices

Devin Booker vs. Gregg Popovich (vs. Patrick McCaw)

Sell Phones

Exciting Graphs for Boring Maths

Capstone Project at The Flatiron School

Method to the Madness

Out Of Office

Feature Selection in Data Science

Feature Scaling In Machine Learning

Encoding Categorical Data

Bank Shot: Inside The Vault

Follow My Blog

Follow My Blog

Share this: