Plotting Distributions

2 minute read

Overview

This notebook will demonstrate the plotting of a few staple statistical functions with python:.

For a normal distribution and Student's T distribution (degrees of freedom = 4,8,12,30) the following will be plotted:

  • Probability Density Function (PDF) - A function of a continuous random variable, whose integral across an interval denotes the probability that the variable's value lies within the same interval.
  • Cumulative Distribution Function (CDF) - A function whose value is the probability that a corresponding continuous random variable has a value less than or equal to the function's argument
  • Quantile Function/Inverse Cumulative Distribution Function - A function that determines the value of the variable associated with a specific probability, such that the probability of the variable being less than or equal to that value equals the given probability.
In [5]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as ss

Data

The data used for this notebook is generated with numpy's linspace function, chosen on an symmetric interval from -e to e which, when used as a z-score, encompasses ~99.67% of a normal distribution

The variable naming convention used specifies x_norm, x_t as input values and y_norm_<func>,y_t_<func> as function output values

In [6]:
mu = 0; sigma = 1
x_norm = np.linspace(-np.e, np.e,100) # Very close to np.linspace(ss.norm.ppf(0.0033),ss.norm.ppf(0.9967), 100) 

Exploratory Data Analysis

We will be using an artificially generated dataset for the purpose of this demonstration, as such, no preprocessing is required.

Normal distribution

In [7]:
y_norm_pdf = ss.norm.pdf(x_norm, loc=mu, scale=sigma)
y_norm_cdf = ss.norm.cdf(x_norm, loc=mu, scale=sigma)
y_norm_icdf= ss.norm.ppf(x_norm, loc=mu, scale=sigma)
In [4]:
fig, axes = plt.subplots(1, 3, figsize=(14,4))
ax1, ax2, ax3 = axes

ax1.plot(x_norm, y_norm_pdf, 'r-', lw=2, alpha=0.6, label='PDF')
ax1.set_xlabel('x (in SDs)')
ax1.set_ylabel('Probability density')

ax2.plot(x_norm, y_norm_cdf, 'b-', lw=2, alpha=0.6, label='CDF')
ax2.set_xlabel('x (in SDs)')
ax2.set_ylabel('Probability')

ax3.plot(x_norm, y_norm_icdf, 'g-', lw=2, alpha=0.6, label='$CDF^{-1}$')
ax3.set_xlabel('Probability')
ax3.set_ylabel('Quantile')

for ax in axes:
    ax.set_title('Normal: $\mu$=%.1f, $\sigma^2$=%.1f' % (mu, sigma))
    ax.legend(loc='upper left', frameon=False)
plt.tight_layout()
plt.show()

Student's T Distribution

In [70]:
fig, axes = plt.subplots(1, 3, figsize=(15,4))
ax1, ax2, ax3 = axes

ax1c = iter(plt.cm.autumn(np.linspace(0,1,4)[::-1]))
ax2c = iter(plt.cm.winter(np.linspace(0,1,4)[::-1]))
ax3c = iter(plt.cm.summer(np.linspace(0,1,4)[::-1]))

for df in [4,8,12,30]:
    x_t = np.linspace(ss.t.ppf(0.0033, df), ss.t.ppf(0.9967, df), 100) # Set close to norm for consistency

    ax1.plot(x_t, ss.t.pdf(x_t, df), c=next(ax1c), lw=2, alpha=0.6, label=f'PDF: {df} DoF')
    ax2.plot(x_t, ss.t.cdf(x_t, df), c=next(ax2c), lw=2, alpha=0.6, label=f'CDF: {df} DoF')
    ax3.plot(x_t, ss.t.ppf(x_t, df), c=next(ax3c), lw=2, alpha=0.6, label='$CDF^{-1}$: %d DoF' % (df))

ax1.plot(x_norm, y_norm_pdf, '-.',color='black', lw=2, alpha=0.6, label='PDF: norm')
ax1.update({'xlabel':'x (in SDs)', 'ylabel':'Probability density', 'title':"Student's T: Probability Density"})

ax2.plot(x_norm, y_norm_cdf, '-.',color='black', lw=2, alpha=0.6, label='CDF: norm')
ax2.update({'xlabel':'x (in SDs)', 'ylabel':'Probability', 'title':"Student's T: Cumulative Density"})

ax3.plot(x_norm, y_norm_icdf,'-.',color='black', lw=2, alpha=0.6, label='$CDF^{-1}$: norm')
ax3.update({'xlabel':'Probability', 'ylabel':'Quantile', 'title':"Student's T: Inverse CDF"})

[a.legend() for a in axes]
plt.tight_layout()
plt.show()

We can see as we raise the degrees of freedom, each graph increasingly begins to look like the normal distribution graphs above. Degrees of freedom can be thought of as the minimum number of independent coordinates that can determine the position of entire system.

Conclusions

This notebook demonstrates the calculation and plotting of three fundamental statistical functions:

  • Probability Density Function
  • Cumulative Distribution Function
  • Quantile Function/Inverse Cumulative Distribution Function

for a normal distribution and Student's T distributions with varying degrees of freedom

Future work

Other works could involve using an actual dataset rather than a contrived one to look for interesting real-world insights, exploring additional distributions such as Weibull, Gamma, Beta, or Chi-Square, and applying other statistical functions like the survival and momentum generating functions.