
home > Data Science, Machine Learning & Artificial Intelligence 
Email 

Data Science, Machine Learning & Artificial Intelligence
On This Page 
More 
Other Pages 
 intro 
 x 
 x  




How to learn "Machine Learning" and "Artificial Intelligence"
updated September 10, 2018
All my slides are here:
https://goo.gl/3v8DAS
I have uploaded some stuff here:
http://myhash.com/ai/
===========================================
Unix:
You need basic working knowledge of Unix / Linux.
 buy yourself a Macbook and learn to work from terminal.
 book  Learning the UNIX Operating System O'Reilly
 tutorials on youtube
grep, find, set vs env, running script vs sourcing script
 vi editor:
 http://www.levselector.com/vi.html
 tutorials on youtube
===========================================
Python:
anaconda: https://www.anaconda.com/download/
ipython,
ipython notebook (Jupyter)
python versions 2.7 vs 3.x
numpy, pandas DataFrame, pd.read_csv(file), df.to_csv(file)
===========================================
Math:
 calculus:
e = 2.71828... lim( (1+1/n)**n ) for large n
derivatives and integrals,
Dirac Delta Function,
multivariate calculus, partial derivatives,
gradient,
 linear algebra:
vector, matrix, dotproduct of vectors and matrices,
vector spaces, base vectors, matrices of space transformations,
determinant, rank,
eigen vectors, eigen values,
inverse matrix,
tensor
===========================================
Probability & Statistics:
 probability:
definition,
combinatorics, subsets, C(n,m) = n!/(m!*(nm)!)
Venn Diagrams,
Conditional Probability, Bayes Theorem,
random variable, probability distribution, expected value,
variance, standard deviation,
discrete and continous cases,
PDF (Probability Density Function) vs Cumulative Probability Function ( CPF),
Binomial Distribution,
Poisson and Exponential Distributions,
Uniform Distribution,
Central Limit Theorem and Normal (Gaussian) Distribution
exp(x^2/2)
 statistics:
Sample mean (average), Sample Standard Deviation (why /(N1) ?), Median
Linear Regression  draw line through points,
OLS = Ordinary Least Squares (minimizing quadratic error),
R2 = Coefficient of Determination
Confidence interval, zscore
not needed: Student’s tdistribution, Chisquared distribution, Fdistribution, Gamma distribution
Stochastic Processes, Time Series Analysis,
Random Walk, Brownian motion, Diffusion,
Poisson Process/distribution, exponential distribution
white noise, gaussian noise
Markov Process
Monte Carlo method, MCMC (Markov Chain Monte Carlo)
Correlation function, Autocorrelation
Fourier Analysis, filtering to reduce noise
Extracting Signal from Noise by synchronization (S/N improves ~sqrt(N)),
===========================================
You need to learn the meaning of these words:
Data Science (DS)  use computer to process data from different sourceess
(CSV files, Databases), apply statistics, make graphs 
Machine Learning (ML)  subset of DS to extract patterns from data
to do predictions.
ML Example  Linear Regression (draw streight line through points)
Input data array of points data = [(x0,y0), (x1,y1), ...]
model function
def linear_regression(x, [a, b]):
y = a*x + b
return y
training of the model: find two numbers (slope a and intercept b)
which do best fit between the model and data.
(minimize the error)
ML  many types of models (linear regression and logistic regression,
support vector machines, Knearest neighbors, Kmeans clustering,
decision trees (RandomForest, XGBoost, etc.), Neural Networks, etc.) 
Deep Learning (DL)  ML implemented using multilayered structures (Networks) 
Artificial Intelligence (AI)  DL algorithm trained to perform function
which are usually associated only with humans
(vision, speech comprehension, autonomous driving, etc.) 
data cleaning, scaling, normalization
synthetic data augmentation
highlyunbalanced data (minority & majority class, imputing data in minority class)
sparse matrix, sparse matrix data representation (Yale format)
 https://en.wikipedia.org/wiki/Sparse_matrix
feature engineering (extraction/preparation)
Dimensional Reduction,


PCA (Principal Components Analysis) 

LDA (Linear Discriminant Analysis) 
statistical method to find a linear combination of features to achieve separation of two or more classes.
Used for dimensionality reduction before classification. Similar to PCA (Principal Component Analysis).
Note: LDA also stands for Latent Dirichlet Allocation  a generative probabilistic model (to find topics in texts). 
Regression 
regress vs proress, simplify (for example, from 100 (x,y) pairs to 2 numbers (slope, intercept)) 
Linear regression 
OLS = Ordinary Least Squares 
Logistic regression 
classification
:
1 var to 1 binary, multivar to 1 binary,or
multivar to several classes (multinomial)
We model loglikelihood
as a linear combination of some predictors:
logit(p) = log(p/(1p)) = b0 + b1*x1 + b2*x2 + . . . + bN*xN 
Propensity 

Bayesian theorem / approach 

K Nearest Neighbors (KNN)  tuning "K" 

Kmeans clustering 

SVM = Support Vector Machines 

Decision Tree 
Ensemble Methods, bagging, boosting, Random Forest, XGBoost 
softmax
training and test data, overfitting
Regularization (ridge regression, LASSO)
dropout, adding noise
biasvariance tradeoff
Determining feature importance
Supervised vs unsupervised Machine Learning
Stochastic Gradient Descent
NLP = Natural Language Processing
sentiment analysis
text as a 'bag of words'
TFIDF = Term frequency–inverse document frequency 

Classifier
Outlier/anomaly Detection
ROC curve = Receiver Operating Characteristic curve
True Positive Rate vs False Positive Rate (TPR vs FPR)
Precision P = 1FPR
Recall = TPR
F1 score = 2/(1/TPR + 1/P)
confusion matrix = actual (0,1) vs predicted (0,1)
++
 predicted No  predicted Yes
++
Actual No  TN=50  FP=10
Actual Yes  FN=5  TP=100
++
where
TP, TN, FP, FN = True/False Positive/Negative 
ML/DL libraries and tools:
Scikitlearn (sklearn)  http://scikitlearn.org/stable/
TF = TensorFlow  https://www.tensorflow.org/
PyTorch  https://pytorch.org/
Keras  https://keras.io/
XGBoost  https://xgboost.ai/
MXNet (Apache DL library)  https://mxnet.apache.org/
fastText  https://fasttext.cc/
NLTK  Natural Language Toolkit  https://www.nltk.org/
CNTK = Microsoft Cognitive Toolkit  https://cntk.ai/
H2O  https://www.h2o.ai/  parallel training and execution on cloud
SageMaker  https://aws.amazon.com/sagemaker/  on Amazon Cloud
Google AI  https://cloud.google.com/products/ai/  Cloud AutoML, Cloud Machine Learning Engine, Cloud TPUs, BigQuery ML, ...
Nicrosoft Azure ML  https://azure.microsoft.com/enus/overview/machinelearning/ 

Daily timeseries  detecting seasonality with Fourier Transforms
removing trend
MovingWindow Averages
Recommendation Engine
Collaborative Filtering
Neural Networks (NN)
perceptron, multilayer perceptron
The XOR problem, hidden layers, nonlinearity, ReLU
FeedForward Network
nodes, connections, weights, biases, activations
learning as optimization problem
cost function, objective function, loss function, regret
backpropagation, SGD (Stochastic Gradient Descent)
Boltzman Machine
RBM (Restricted Boltzman Machine, 2006),
Deep Belief Network
CNN  Convolutional Neural Network
layers: convolutional, pooling, fullyconnected
Short overview and comparison of CNNs:
 https://medium.com/@sidereal/cnnsarchitectureslenetalexnetvgggooglenetresnetandmore666091488df5
 LeNet5 (1998  by Yann LeCun, 60K parameters),
MNIST database  handwritten digits (60K training, 10K testing)
ImageNet  14 Mln images, 1000 classes
 AlexNet (2012  Alex Krizhevsky, Geoffrey Hinton, and Ilya Sutskever, 60 Mln parameters)
 VGGNet (2014, 138 Mln parameters)
 GoogLeNet (2014, inception blocks, 19 layers, 4 Mln parameters)
 ResNet (2015  Residual Neural Network, 152 layers, 25 Mln parameters, Microsoft Research)
 Google AutoML, NASNet architecture (2017  https://ai.googleblog.com/2017/11/automlforlargescaleimage.html )
YOLO  You Look Only Once  real time object detection
Image segmentation (semantic segmentation)
UNet: Convolutional Networks for Biomedical Image Segmentation

AutoEncoder
Variational AutoEncoder
seq2seq word2vec, embeddings (king  man + woman ~= queen)

RNN  Recurrent NN BRNN  Bidirectional RNN
Exploding/Vanishing Gradient problem
LSTM  Long Short Term Memory (1997  Sepp Hochreiter and Jürgen Schmidhuber)
GRU  Gated Recurrent Unit (2014, Univ. of Montreal, Canada)  simpler than LSTM
Attention (2014)  https://arxiv.org/abs/1409.0473
Atention Is All You Need (2017, Transformer)  https://arxiv.org/abs/1706.03762
Read this:
 https://machinelearningmastery.com/howdoesattentionworkinencoderdecoderrecurrentneuralnetworks/
Google Translate:
 https://arxiv.org/pdf/1609.08144v2.pdf  31 authors deep LSTM network 8 encoder and 8 decoder layers parallelism  decreasing training time using attention and residual connections attention mechanism  connects the bottom layer of the decoder to the top layer of the encoder lowprecision arithmetic for inference computations
Chatbots:
(Natural Language Comprehension => Understanding Intent => Action)
Amazon Connect + Amazon Lex
Google Dialog Flow

regularization  overfitting & dropout
Changing learing rate dynamically , ADAM optimizer
batchprocessing, minibatch, batchsize, SGD
Cross Entropy
Softmax classification (uses same formula, but different meaning)
KLdivergence (Kullback–Leibler divergence, also called relative entropy)
GANs (Generative Adversarial Networks)
Generative Adversarial Examples
Reinforcement Learning, Deep Reinforcement Learning
Agent, Policy, Reward, Regret
AlphaGo, DeepMind
multiarmed bandit problem  a fixed limited set of resources
must be allocated between competing (alternative) choices
in a way that maximizes their expected gain,
when each choice's properties are only partially known
at the time of allocation, and may become better understood
as time passes or by allocating resources to the choice.
The name comes from imagining a gambler at a row of
slot machines (sometimes known as "onearmed bandits"). 
===========================================
Good courses about Deep Learning (DL) and ML:
 https://www.coursera.org/specializations/deeplearning
 https://www.coursera.org/specializations/machinelearningtensorflowgcp
 http://www.fast.ai
 https://www.udacity.com/course/deeplearningud730
etc.
About Coursera:
 https://www.ted.com/talks/daphne_koller_what_we_re_learning_from_online_education
================================================================
deeplearning.ai  5 courses  all videos on youtube:
courses 1,2,3 of 5  98 videos:
 https://www.youtube.com/watch?v=7PiK4wtfvbA&list=PLBAGcD3siRDguyYYzhVwZ3tLvOyyG5k6K
course 1: video 141
course 2: video 4170
course 3: video 7198
course 4 of 5  43 videos
Convolutional Neural Network (CNN)
 https://www.youtube.com/watch?v=Z91YCMvxdo0&list=PLBAGcD3siRDjBU8sKRk0zX9pMz9qeVxud
course 5 of 5  33 videos
Recurrent Neural Networks (RNN)
 https://www.youtube.com/watch?v=5VlbK7tfD8&list=PLBAGcD3siRDittPwQDGIIAWkjzRucAc7
fast.ai videos on youtube (parts 1,2  14 videos):
 https://www.youtube.com/watch?v=IPBSB1HLNLo&list=PLCdvEQLhYkYmKTKWTrH7bHtQ1CsKZaQBl
================================================================
You can also read this book:
 http://www.deeplearningbook.org
or find youtube videos where people discuss chapters of this book.
The book is comprehensive  but difficult to read.
You will need to do a lot of internet browsing to clarify things.
Audio interviews:
(TWiML&AI) podcast
 https://twimlai.com 
For Russian speaking  good channel on Youtube
 https://www.youtube.com/watch?v=MYp3OwkiJAs
 https://www.youtube.com/channel/UCQj_dwbIydi588xrfjWSL5g/videos
Google's Tensor Processing Units (TPUs):
 https://www.wired.com/2017/05/googlerattlestechworldnewaichip/
Here are some links related to ML & AI
Nice 2y old tutorial with pictures:
 http://www.iro.umontreal.ca/%7Ebengioy/talks/DLTutorialNIPS2015.pdf
New online publication:
 http://distill.pub
Christopher Olah has a great blog with
very clear explanations of DL concepts
 http://colah.github.io/
 https://github.com/colah/
Nice short online book about DL:
 http://neuralnetworksanddeeplearning.com/index.html
Good 1hr lecture by Yann LeCun:
 https://www.youtube.com/watch?v=IbjF5VjniVE
Stanford  15 lectures ( CS231n ) FeiFei Li & Andrej Karpathy & Justin Johnson
 https://www.youtube.com/channel/UC2__PIf36huAgKFumlOIs6A
Oxford  Deep Learning lectures  Nando de Freitas
 https://www.youtube.com/user/ProfNandoDF/videos
Hinton lectures (Neural Networks for Machine Learning)
 https://www.youtube.com/user/colinmcd94/videos
Ian Goodfellow PhD Defense Presentation
 https://www.youtube.com/watch?v=ckoD_bE8Bhs
GANs  short 5min video by Siraj Raval (he has lots of videos)
 https://www.youtube.com/watch?v=deyOX6Mt_As
The Great A.I. Awakening  by Gideon LewisKraus, Dec. 14, 2016
 https://www.nytimes.com/2016/12/14/magazine/thegreataiawakening.html
Google Translate research paper (Sep 2016)
 https://arxiv.org/pdf/1609.08144v2.pdf
Andrew Ng: Artificial Intelligence is the New Electricity
 https://www.youtube.com/watch?v=21EiKfQYZXc
Andrew Ng  The State of Artificial Intelligence (Dec 15, 2017)
 https://www.youtube.com/watch?v=NKpuX_yzdYs
Follow on Facebook:
 Yann LeCun
 Adversarial Training
 Deep AI
 Deep Learning Patterns, Methodology and Strategy
 Montreal.AI
 …
Newsletters, meetups, courses:
 https://opendatascience.com/
 http://DataScienceWeekly.org
 NYCMachineLearninglist@meetup.com
 NYC Artificial Intelligence & Deep Learning@meetup.com
 http://machinelearningmastery.com
 http://byteacademy.co/allcourses/datascienceminicourses/
 https://www.coursera.org/courses/?languages=en&query=deep+learning
 https://www.udacity.com/course/deeplearningud730
 https://www.youtube.com/watch?v=MYp3OwkiJAs  (in Russian)
 ...
TensorFlow is an open source software library
Written by Google Brain Team (C++, Python)
Available for Linux, Mac, Windows
Deep Learning Frameworks Compared (5:10):
 https://www.youtube.com/watch?v=MDP9FfsNx60
TFLearn – a beginners wrapper around TensorFlow:
 http://tflearn.org