Part 0: Intro
- Time Series - Cheat Sheet v2.1 Time Series analysis is one of the most challenging machine learning technique. In order to start research on modelling times series with Multi-Agent techniques, it is essential to identify the different components in the Time Series.
- Classical time series methods (+cheat sheet) Autoregression (AR): The autoregression (AR) method models the next step in the sequence as a linear function of the observations at prior time steps. The method is suitable for univariate time series without trend and seasonal components.
- This cheat shines with its complete section on time series and statistics. There are methods for calculating covariance, correlation, and regression here. So, if you are using pandas for some advanced statistics or any kind of scientific work, this is going to be your cheat sheet. Where to go from here?
This is where cheat sheets can be a useful tool to save you time and frustration. Below you are going to find some of the best cheat sheets I’ve found across the internet. The cheat sheets are by no means exhaustive, by they still cover many topics, such as the data.table library in R, scikit-learn and R markdown.
Why
Deep Learning is a powerful toolset, but it also involves a steep learning curve and a radical paradigm shift.
For those new to Deep Learning, there are many levers to learn and different approaches to try out. Even more frustratingly, designing deep learning architectures can be equal parts art and science, without some of the rigorous backing found in longer studied, linear models.
In this article, we’ll work through some of the basic principles of deep learning, by discussing the fundamental building blocks in this exciting field. Take a look at some of the primary ingredients of getting started below, and don’t forget to bookmark this page as your Deep Learning cheat sheet!
FAQ
What is a layer?
A layer is an atomic unit, within a deep learning architecture. Networks are generally composed by adding successive layers.
What properties do all layers have?
Almost all layers will have :
- Weights (free parameters), which create a linear combination of the outputs from the previous layer.
- An activation, which allows for non-linearities
- A bias node, an equivalent to one incoming variable that is always set to
1
What changes between layer types?
There are many different layers for many different use cases. Different layers may allow for combining adjacent inputs (convolutional layers), or dealing with multiple timesteps in a single observation (RNN layers).
Difference between DL book and Keras Layers
Frustratingly, there is some inconsistency in how layers are referred to and utilized. For example, the Deep Learning Book commonly refers to archictures (whole networks), rather than specific layers. For example, their discussion of a convolutional neural network focuses on the convolutional layer as a sub-component of the network.
1D vs 2D
Some layers have 1D and 2D varieties. A good rule of thumb is:
- 1D: Temporal (time series, text)
- 2d: Spatial (image)
Cheat sheet
Part 1: Standard layers
Input
- Simple pass through
- Needs to align w/ shape of upcoming layers
Embedding
- Categorical / text to vector
- Vector can be used with other (linear) algorithms
- Can use transfer learning / pre-trained embeddings(see example)
Dense layers
- Vanilla, default layer
- Many different activations
- Probably want to use ReLu activation
Dropout
- Helpful for regularization
- Generally should not be used after input layer
- Can select fraction of weights (
p
) to be dropped - Weights are scaled at train / test time, so average weight is the same for both
- Weights are not dropped at test time
Part 2: Specialized layers
Convolutional layers
- Take a subset of input
- Create a linear combination of the elements in that subset
- Replace subset (multiple values) with the linear combination (single value)
- Weights for linear combination are learned
Time series & text layers
- Helpful when input has a specific order
- Time series (e.g. stock closing prices for 1 week)
- Text (e.g. words on a page, given in a certain order)
- Text data is generally preceeded by an embedding layer
- Generally should be paired w/
RMSprop
optimizer
Simple RNN
- Each time step is concatenated with the last time step's output
- This concatenated input is fed into a dense layer equivalent
- The output of the dense layer equivalent is this time step's output
- Generally, only the output from the last time step is used
- Specially handling for the first time step
LSTM
- Improvement on Simple RNN, with internal 'memory state'
- Avoid issue of exploding / vanishing gradients
Utility layers
- There for utility use!
This post updates a previous very popular post 50+ Data Science, Machine Learning Cheat Sheets by Bhavya Geethika. If we missed some popular cheat sheets, add them in the comments below.
Cheatsheets on Python, R and Numpy, Scipy, Pandas
Data science is a multi-disciplinary field. Thus, there are thousands of packages and hundreds of programming functions out there in the data science world! An aspiring data enthusiast need not know all. A cheat sheet or reference card is a compilation of mostly used commands to help you learn that language’s syntax at a faster rate. Here are the most important ones that have been brainstormed and captured in a few compact pages.
Mastering Data science involves understanding of statistics, mathematics, programming knowledge especially in R, Python & SQL and then deploying a combination of all these to derive insights using the business understanding & a human instinct—that drives decisions.
Here are the cheat sheets by category:
Cheat sheets for Python:
Python is a popular choice for beginners, yet still powerful enough to back some of the world’s most popular products and applications. It's design makes the programming experience feel almost as natural as writing in English. Python basics or Python Debugger cheat sheets for beginners covers important syntax to get started. Community-provided libraries such as numpy, scipy, sci-kit and pandas are highly relied on and the NumPy/SciPy/Pandas Cheat Sheet provides a quick refresher to these.
- Python Cheat Sheet by DaveChild via cheatography.com
- Python Basics Reference sheet via cogsci.rpi.edu
- OverAPI.com Python cheatsheet
- Python 3 Cheat Sheet by Laurent Pointal
Cheat sheets for R:
The R's ecosystem has been expanding so much that a lot of referencing is needed. The R Reference Card covers most of the R world in few pages. The Rstudio has also published a series of cheat sheets to make it easier for the R community. The data visualization with ggplot2 seems to be a favorite as it helps when you are working on creating graphs of your results.
At cran.r-project.org:
At Rstudio.com:
- R markdown cheatsheet, part 2
Others:
- DataCamp’s Data Analysis the data.table way
Cheat sheets for MySQL & SQL:
For a data scientist basics of SQL are as important as any other language as well. Both PIG and Hive Query Language are closely associated with SQL- the original Structured Query Language. SQL cheatsheets provide a 5 minute quick guide to learning it and then you may explore Hive & MySQL!
- SQL for dummies cheat sheet
Cheat sheets for Spark, Scala, Java:
Apache Spark is an engine for large-scale data processing. For certain applications, such as iterative machine learning, Spark can be up to 100x faster than Hadoop (using MapReduce). The essentials of Apache Spark cheatsheet explains its place in the big data ecosystem, walks through setup and creation of a basic Spark application, and explains commonly used actions and operations.
- Dzone.com’s Apache Spark reference card
- DZone.com’s Scala reference card
- Openkd.info’s Scala on Spark cheat sheet
- Java cheat sheet at MIT.edu
- Cheat Sheets for Java at Princeton.edu
Cheat sheets for Hadoop & Hive:
Hadoop emerged as an untraditional tool to solve what was thought to be unsolvable by providing an open source software framework for the parallel processing of massive amounts of data. Explore the Hadoop cheatsheets to find out Useful commands when using Hadoop on the command line. A combination of SQL & Hive functions is another one to check out.
Cheat sheets for web application framework Django:
Django is a free and open source web application framework, written in Python. If you are new to Django, you can go over these cheatsheets and brainstorm quick concepts and dive in each one to a deeper level.
- Django cheat sheet part 1, part 2, part 3, part 4
Cheat sheets for Machine learning:
We often find ourselves spending time thinking which algorithm is best? And then go back to our big books for reference! These cheat sheets gives an idea about both the nature of your data and the problem you're working to address, and then suggests an algorithm for you to try.
- Machine Learning cheat sheet at scikit-learn.org
- Scikit-Learn Cheat Sheet: Python Machine Learning from yhat (added by GP)
- Patterns for Predictive Learning cheat sheet at Dzone.com
- Equations and tricks Machine Learning cheat sheet at Github.com
- Supervised learning superstitions cheatsheet at Github.com
Cheat sheets for Matlab/Octave
Options Cheat Sheet Series 7
MATLAB (MATrix LABoratory) was developed by MathWorks in 1984. Matlab d has been the most popular language for numeric computation used in academia. It is suitable for tackling basically every possible science and engineering task with several highly optimized toolboxes. MATLAB is not an open-sourced tool however there is an alternative free GNU Octave re-implementation that follows the same syntactic rules so that most of coding is compatible to MATLAB.
Infinite Series Cheat Sheet
Cheat sheets for Cross Reference between languages
Related: