Ryan Gust

Developer, LLM & Generative AI Aficionado, Data Science Practitioner

Notebooks

python

Clustering on the World Happiness Report 2019

35 minute read

Attempting to quantify happiness. Building clustering models on the 2019 World happiness report.

Tags: AffinityPropagation, Agglomerative, clustering, GMM, happiness, KMeans, python

Model Interpretability with XGBoost and the Agaricus Dataset

43 minute read

What to do when things go too well. Building and comparing XGBoost and Random Forest models on the Agaricus dataset (Mushroom Database).

Tags: agaricus, LIME, python, SHAP, synthetic

Dimensionality Reduction and Feature Analysis

25 minute read

2016 Kaggle Caravan Insurance Challenge (Part 2 of 2). Dimensionality reduction and feature analysis.

Tags: dimensionalityReduction, featureImportance, featureSelection, PCA, python, RFE, t-SNE, UMAP, unbalanced

Modeling on Imbalanced Data: Caravan Insurance

16 minute read

2016 Kaggle Caravan Insurance Challenge (Part 1 of 2). Dealing with imbalanced data.

Tags: Bagging, Boosting, imbalanced, oversampling, python, RandomForest, SMOTE, undersampling

Multiple Linear Regression

10 minute read

Getting started with modeling. Multiple approaches to Multiple Linear Regression using the US DoT airfare dataset.

Tags: airfare, linear, python, regression

T-tests and Analysis of Variance (ANOVA)

4 minute read

Analyzing a mock voters dataset using ANOVA, T-tests, and Turkey’s Range Test.

Tags: ANOVA, python, statistics, ttests, voting

Plotting Distributions

2 minute read

Plotting a few common statistical functions, namely: PDF, CDF, and iCDF

Tags: functions, plotting, probability, python, statistics

Introduction: Fortune 500 Companies

6 minute read

A brief introduction to data analysis with Python using the fortune 500 dataset.

Tags: EDA, fortune500, introduction, python

r

Multiple Linear Regression

20 minute read

Getting started with modeling. Multiple approaches to Multiple Linear Regression using the classic Boston Housing dataset

Tags: airfare, linear, regression, r

Statistical Power

5 minute read

Analyzing the classic sleep dataset using, two-sample and paired t-tests, and calculating statistical power.

Tags: power, r, sleep, statistics, ttests

Plotting Distributions

4 minute read

Plotting a few common statistical functions, namely: PDF, CDF, and iCDF

Tags: functions, plotting, probability, r, statistics

Gradient Boosting with XGBoost and the Agaricus Dataset

10 minute read

What to do when things go too well. Building and comparing XGBoost and Random Forest models on the Agaricus dataset (Mushroom Database).

Tags: agaricus, RandomForest, r, synthetic, XGBoost

Clustering on the World Happiness Report 2016

9 minute read

Attempting to quantify happiness. Building clustering models on the 2016 World happiness report.

Tags: AffinityPropagation, Agglomerative, clustering, GMM, happiness, KMeans, r

Dimensionality Reduction and Feature Analysis

26 minute read

2016 Kaggle Caravan Insurance Challenge (Part 2 of 2). Dimensionality reduction and feature analysis.

Tags: dimensionalityReduction, featureSelection, PCA, RFE, r, t-SNE, UMAP, unbalanced

Multi-Model Approach to Imbalanced Data with Caravan Dataset

12 minute read

2016 Kaggle Caravan Insurance Challenge (Part 1 of 2). Dealing with imbalanced data.

Tags: Bagging, Boosting, imbalanced, oversampling, RandomForest, r, SMOTE, undersampling

Introduction: Fortune 500 Companies R

6 minute read

A brief introduction to data analysis with R using the fortune 500 dataset.

Tags: EDA, fortune500, introduction, r