Clustering on the World Happiness Report 2019
Attempting to quantify happiness. Building clustering models on the 2019 World happiness report.
Attempting to quantify happiness. Building clustering models on the 2019 World happiness report.
Tags: AffinityPropagation, Agglomerative, clustering, GMM, happiness, KMeans, python
What to do when things go too well. Building and comparing XGBoost and Random Forest models on the Agaricus dataset (Mushroom Database).
Tags: agaricus, LIME, python, SHAP, synthetic
2016 Kaggle Caravan Insurance Challenge (Part 2 of 2). Dimensionality reduction and feature analysis.
Tags: dimensionalityReduction, featureImportance, featureSelection, PCA, python, RFE, t-SNE, UMAP, unbalanced
2016 Kaggle Caravan Insurance Challenge (Part 1 of 2). Dealing with imbalanced data.
Tags: Bagging, Boosting, imbalanced, oversampling, python, RandomForest, SMOTE, undersampling
Getting started with modeling. Multiple approaches to Multiple Linear Regression using the US DoT airfare dataset.
Tags: airfare, linear, python, regression
Analyzing a mock voters dataset using ANOVA, T-tests, and Turkey’s Range Test.
Tags: ANOVA, python, statistics, ttests, voting
Plotting a few common statistical functions, namely: PDF, CDF, and iCDF
Tags: functions, plotting, probability, python, statistics
A brief introduction to data analysis with Python using the fortune 500 dataset.
Tags: EDA, fortune500, introduction, python
Getting started with modeling. Multiple approaches to Multiple Linear Regression using the classic Boston Housing dataset
Tags: airfare, linear, regression, r
Analyzing the classic sleep dataset using, two-sample and paired t-tests, and calculating statistical power.
Tags: power, r, sleep, statistics, ttests
Plotting a few common statistical functions, namely: PDF, CDF, and iCDF
Tags: functions, plotting, probability, r, statistics
What to do when things go too well. Building and comparing XGBoost and Random Forest models on the Agaricus dataset (Mushroom Database).
Tags: agaricus, RandomForest, r, synthetic, XGBoost
Attempting to quantify happiness. Building clustering models on the 2016 World happiness report.
Tags: AffinityPropagation, Agglomerative, clustering, GMM, happiness, KMeans, r
2016 Kaggle Caravan Insurance Challenge (Part 2 of 2). Dimensionality reduction and feature analysis.
Tags: dimensionalityReduction, featureSelection, PCA, RFE, r, t-SNE, UMAP, unbalanced
2016 Kaggle Caravan Insurance Challenge (Part 1 of 2). Dealing with imbalanced data.
Tags: Bagging, Boosting, imbalanced, oversampling, RandomForest, r, SMOTE, undersampling
A brief introduction to data analysis with R using the fortune 500 dataset.
Tags: EDA, fortune500, introduction, r