Skip to content

Latest commit

 

History

History
643 lines (404 loc) · 42 KB

README.md

File metadata and controls

643 lines (404 loc) · 42 KB

my-masters-degree-in-statistics Awesome

A curated list of resources for my custom-designed Master's degree in Statistics with a concentration in Machine Learning at Louisiana State University.

Yeah, it's a lot! I wanted to learn a variety of interesting topics with connections to Machine Learning, so I ended up taking a ton of classes.

Contents



Theory

Linear Algebra

Probably one of the top 3 most relevant areas of study for statisticians. You could get away with little knowledge in Abstract Algebra, but you really need to master Linear Algebra. Go beyond the usual algebraic matrix calculations - try to get a deep, almost philosphical view on concepts like projectivity, spectral decomposition, sparse matrices, etc.

return to contents


Numerical Analysis

I HATED this subject as an undergrad - I don't recall why though :) Now, I can barely imagine computational statistics without some approximate or numerical solutions. Almost all statistical software provide numerical solutions to all probability distributions, simulation and optimization problems. Again over here, go beyond the usual undergrad numerical topics - at least try numerical implementations/solutions to eigen-problems, smoothing and optimization, splines, random number generation, and Monte Carlo Simulations. This subject also provides good practice in computing and design in general.

return to contents


Partial Differential Equations

Personally, this is the most exciting field in applied math that I have encountered so far. The overarching theme is the relationship between space and time variables to rates of change. I took this class because of its deep connections with uncertainty quantification, optimization, network analysis, image analysis and stochastic DEs (most notably, the Black-Sholes in Options Pricing).

return to contents


Probability and Statistics

Well, this section is self explanatory. I will recommend you thoroughly understand the following concepts: Transformations, Independency and Copulas, Convergence, Sufficiency, Parameter Estimation and Inference, and the Exponential Family of Distributions. Dr. Luis Escobar of LSU has amazing lecture notes on these topics.

return to contents



Advanced Theory

Here are the bitter pills that you just have to swallow :)

Analysis

Real Analysis

return to contents


Measure Theory

return to contents


Stochastic Processes



Convex Optimization

This module is a MUST. Enough said!

return to contents



Applied

Statistical Methods

This was mostly a slightly refined version of undergrad statistical methods with little bits of design of experiments. This class, I believe, was designed for grad students with little to no background in math/stats. For folks with some stats background, I highly recommend the resources below especially Dr. Cosma Shalizi's book, which is hands down, the best introductory stats book I have ever read.

return to contents


Statistical Computing

This module is almost as important as Linear Algebra.

Some of these are more 'pseudo' than actual statistical computing, but very useful content all the same.

return to contents


Generalized Linear Models

Regression Analysis

This stand-alone class on regression is one of the best classes I took at LSU. The professor, Dr. Brian Marx, effortlessly walks you through the rationale behind the nitty-gritties of regression and generalized modelling in general. You do not want to miss his classes. Alan Agresti's book below is invaluable!

return to contents


Categorical Data Analysis

Again Dr. Brian D. Marx does a masterful job at teaching this class. This class introduces the concepts of Generalized Linear Modelling, with the latter half of the class focusing mainly on the analysis of categorical/discrete data.

return to contents


Generalized Nonlinear Models

Interest in this field after my Fall 2019 summer internship.

Current reading materials is Generalized Linear and Nonlinear Models for Correlated Data: Theory and Applications Using SAS.

return to contents


Time Series Analysis

Best taken after after the modules on Generalized Linear Models. I found Dr. Robert Nau's notes on Time Series Analysis particularly useful. Dr. Shalizi does an equally good job.

return to contents

Resampling Methods for Time Series

return to contents

Time Series Anomaly Detection

Just follow this amazing group, Event and Pattern Detection Laboratory - Carnegie Mellon University and this NYU Professor, Daniel B. Neill, (his other link)

See below a selection of the most relevant links to me from this exhaustive list

return to contents

Some Anomaly Detection Software

Name Language Pitch License
Twitter's AnomalyDetection R AnomalyDetection is an open-source R package to detect anomalies which is robust, from a statistical standpoint, in the presence of seasonality and an underlying trend. GPL
Linkedin's luminol Python Luminol is a light weight python library for time series data analysis. The two major functionalities it supports are anomaly detection and correlation. Apache-2.0
Donut Python Donut is an unsupervised anomaly detection algorithm for seasonal KPIs, based on Variational Autoencoders. -
NASA's Telemanom Python A framework for using LSTMs to detect anomalies in multivariate time series data. Includes spacecraft anomaly data and experiments from the Mars Science Laboratory and SMAP missions. custom
banpei Python Outlier detection (Hotelling's theory) and Change point detection (Singular spectrum transformation) for time-series. MIT
CAD Python Contextual Anomaly Detection for real-time AD on streaming data (winner algorithm of the 2016 NAB competition). AGPL

return to contents

Time Series Changepoint Models

This content is complementary to Time Series Anomaly Detection.

return to contents


Design of Experiments

Maybe the MVP in the entire masters program. With the current era of 'big data', there is unnecessarily high focus on prediction, but prediction in itself is sometimes useless without control. Randomized experiments is one way to test competing ideas or solutions, and thus control predictions or outcomes. The professor of this course, Dr. David C. Blouin is a seasoned expert in experimental design. I recommend taking this class after taking the linear modelling and or data mining classes because this class (DoE) introduces an almost artistic yet scientific paradigm to problem solving which you would not appreciate if taken prior to taking the latter classes. This class is project-heavy - roughly 2 projects every week - which helps in honing this almost artistic skill.

Although not necessary, I recommend you take this class simultaneously with an introductory course in Causal Inference.

One critique of this class is that there is little emphasis on testable problem formulation - the statistician is relegated to only churning out design solutions (that is, model specification and analysis), and not appropriate problem formulation or experimentation. The book below offers approaches to optimal experimentation for a given problem or purpose.

return to contents



Advanced

Computational Linear Algebra

Basically numerical methods to linear algebra

return to contents


Advanced Statistical Computing

Although one of the most important skills a modern statistician needs to acquire, this field is barely teased in graduate statistics programs in the US. You can think of the techniques discussed in this module as machine translations or implementations of statistical algorithms. And provides answers or introduces questions like how computers solve statistical problems. Sampling and/or Resampling techniques, the EM and other deterministic algorithms, and the whole gamut of [Markov Chain] Monte Carlo algorithms including the rather famous Gibbs Sampler, and some popular machine learning optimization algorithms like Stochastic Gradient Descent are discussed. Concepts on convergence and numerical optimization are further explored. Ideas on cloud computing and parallel computing are also introduced.

Best taken in parallel with the Bayesian Analysis and Advanced Statistical Modeling modules, and after the Generalized Linear Models modules. Having a good background in both Linear Algebra and Real Analysis will for sure make life a little easier in this module.

The Computing for Data Science provides a practical introduction to modern techniques for computing with data, teaching advanced use of the R system and exploring connections to other environments such as C, python, Java, and databases. It also provides a hands-on practice on some computing paradigms, including parallel, cluster and map/reduce style computation.

return to contents


Bayesian Analysis

I struggled in this class partly because I had a terrible load of 18 credits that semester - which is A LOT for graduate shool. I spent the subsequent semester vacation reviewing the material and I found it surprisingly coherent and accessible. I found reading materials from other schools and practicing the coding very useful.

I have come to like it so much that I am even considering a lifetime of research in Bayesian frameworks.

I would strongly recommend the following materials first before any advanced material:

As always, it is good to look up the similar content from other schools or programs:

return to contents


Multivariate Statistics

I took this class after the GLM modules - it kind of made this class at LSU is somewhat redundant. I suggest you read on the freely available NC State class on Nonlinear Models for Univariate and Multivariate Response as a supplement.

This course is extremely vital if you are interested in any high dimensional statistics or manifold learning in general. So I suggest you look elsewhere for relevant materials.

return to contents


Statistical Learning

With the recent 'Data Science' craze there is a litany of dumbed-down materials on SL. Although good for beginners, these are woefully lacking for statisticians hoping to engineer new methodologies, improve or critique current techniques.

Dive into these as much as you can - they are the best materials on Introduction to Statistical Machine Learning out there. Take key note of concepts regarding RE-presentation of data, and projectivity.

return to contents


Deep Learning

Stanford Univeristy has made this topic and all its allied topics amazingly accessible, and freely available online. Just burry yourself in these 4 classes on Deep Learning:

I found these other materials also useful:

return to contents


Network Analysis and Causal Inference

Discovering this area of statistics has been more than fulfilling - it bridges several disciplines like Philosophy (although more subtly), Behavorial Psychology and Economics, Geometric Topology and Probabilistic Modelling. I have dedicated most of my weekends perusing as much material on these topics as I can.

My current reading materials or courses are based on the following:

return to contents


Advanced Statistical Modelling

This module is a revisit of Generalized Linear Modelling and Statistical Machine Learning that emphasize model building skill and intuition with relatively advanced statistical rigor and computation. Empirical model selection techniques (e.g. basis degree, effective degrees of freedom) and regularization, and extensions to linear regression via basis functions and kernels are further explored. The relationship between randomness and determinism is also further probed.

return to contents


Probabilistic Graphical Models

return to contents


Longitudinal Data Analysis

I took this class simultaneously with Dr. Kevin S. McCarter's Multivariate Analysis.

return to contents


Survival and Reliability Analysis

Spontaneous and somewhat hobby-like reading on:

I have also been trying to learn how to model survival data with Deep Learning algorithms by reading some of the papers from the link below.

return to contents


Databases

return to contents



Potential PhD Research Areas

soon...


Personal and Practicuum Consulting Projects

soon...


MOOCs

List incomplete...

return to contents


People and Programs

soon...


Miscs

soon...