Selectorweb.com New York
 home > Data Science, Machine Learning & Artificial Intelligence Email

Data Science, Machine Learning & Artificial Intelligence
 On This Page More Other Pages - intro - - x - - x - - -

Intro ------------------------------

```How to learn "Machine Learning" and "Artificial Intelligence"
updated September 10, 2018

All my slides are here:
https://goo.gl/3v8DAS

I have uploaded some stuff here:
http://myhash.com/ai/

===========================================
Unix:
You need basic working knowledge of Unix / Linux.
- buy yourself a Macbook and learn to work from terminal.
- book - Learning the UNIX Operating System O'Reilly
grep, find, set vs env, running script vs sourcing script
- vi editor:
- http://www.levselector.com/vi.html

===========================================
Python:
ipython,
ipython notebook (Jupyter)
python versions 2.7 vs 3.x
===========================================
Math:
- calculus:
e = 2.71828... lim( (1+1/n)**n ) for large n
derivatives and integrals,
Dirac Delta Function,
multivariate calculus, partial derivatives,
- linear algebra:
vector, matrix, dot-product of vectors and matrices,
vector spaces, base vectors, matrices of space transformations,
determinant, rank,
eigen vectors, eigen values,
inverse matrix,
tensor
===========================================
Probability & Statistics:

- probability:
definition,
combinatorics, subsets,  C(n,m) = n!/(m!*(n-m)!)
Venn Diagrams,
Conditional Probability, Bayes Theorem,
random variable, probability distribution, expected value,
variance, standard deviation,
discrete and continous cases,
PDF (Probability Density Function) vs Cumulative Probability Function ( CPF),
Binomial Distribution,
Poisson and Exponential Distributions,
Uniform Distribution,
Central Limit Theorem and Normal (Gaussian) Distribution
exp(-x^2/2)

- statistics:
Sample mean (average), Sample Standard Deviation (why /(N-1) ?), Median
Linear Regression - draw line through points,
OLS = Ordinary Least Squares (minimizing quadratic error),
R2  = Coefficient of Determination

Confidence interval, z-score
not needed: Student’s t-distribution, Chi-squared distribution, F-distribution, Gamma distribution
Stochastic Processes, Time Series Analysis,
Random Walk, Brownian motion, Diffusion,
Poisson Process/distribution, exponential distribution
white noise, gaussian noise
Markov Process
Monte Carlo method, MCMC (Markov Chain Monte Carlo)
Correlation function, Autocorrelation
Fourier Analysis, filtering to reduce noise
Extracting Signal from Noise by synchronization (S/N improves ~sqrt(N)),

===========================================
You need to learn the meaning of these words:
```
 ```Data Science (DS) - use computer to process data from different sourceess (CSV files, Databases), apply statistics, make graphs ``` ```Machine Learning (ML) - subset of DS to extract patterns from data to do predictions. ``` ``` ML Example - Linear Regression (draw streight line through points) Input data array of points data = [(x0,y0), (x1,y1), ...] model function def linear_regression(x, [a, b]): y = a*x + b return y training of the model: find two numbers (slope a and intercept b) which do best fit between the model and data. (minimize the error) ML - many types of models (linear regression and logistic regression, support vector machines, K-nearest neighbors, K-means clustering, decision trees (RandomForest, XGBoost, etc.), Neural Networks, etc.) ``` `Deep Learning (DL) - ML implemented using multi-layered structures (Networks)` ```Artificial Intelligence (AI) - DL algorithm trained to perform function which are usually associated only with humans (vision, speech comprehension, autonomous driving, etc.) ```
```     data cleaning, scaling, normalization
synthetic data augmentation
highly-unbalanced data (minority & majority class, imputing data in minority class)

sparse matrix, sparse matrix data representation (Yale format)
- https://en.wikipedia.org/wiki/Sparse_matrix

feature engineering (extraction/preparation)```
 ```Dimensional Reduction, ``` PCA (Principal Components Analysis) LDA (Linear Discriminant Analysis) statistical method to find a linear combination of features to achieve separation of two or more classes. Used for dimensionality reduction before classification. Similar to PCA (Principal Component Analysis). Note: LDA also stands for Latent Dirichlet Allocation - a generative probabilistic model (to find topics in texts). Regression regress vs proress, simplify (for example, from 100 (x,y) pairs to 2 numbers (slope, intercept)) Linear regression OLS = Ordinary Least Squares Logistic regression classification : 1 var to 1 binary, multi-var to 1 binary,or multi-var to several classes (multinomial) We model log-likelihood as a linear combination of some predictors:   logit(p) = log(p/(1-p)) = b0 + b1*x1 + b2*x2 + . . . + bN*xN Propensity Bayesian theorem / approach K Nearest Neighbors (KNN) - tuning "K" K-means clustering SVM = Support Vector Machines Decision Tree Ensemble Methods, bagging, boosting, Random Forest, XGBoost
```     softmax

training and test data, overfitting
Regularization (ridge regression, LASSO)

Determining feature importance

Supervised vs unsupervised Machine Learning

 ```NLP = Natural Language Processing sentiment analysis text as a 'bag of words' TF-IDF = Term frequency–inverse document frequency```

---

 ```Classifier Outlier/anomaly Detection ROC curve = Receiver Operating Characteristic curve True Positive Rate vs False Positive Rate (TPR vs FPR) Precision P = 1-FPR Recall = TPR F1 score = 2/(1/TPR + 1/P) confusion matrix = actual (0,1) vs predicted (0,1) ------------+--------------+--------------- | predicted No | predicted Yes ------------+--------------+--------------- Actual No | TN=50 | FP=10 Actual Yes | FN=5 | TP=100 ------------+--------------+--------------- where TP, TN, FP, FN = True/False Positive/Negative```
```
ML/DL libraries and tools:
Scikit-learn (sklearn) - http://scikit-learn.org/stable/
TF = TensorFlow - https://www.tensorflow.org/
PyTorch - https://pytorch.org/
Keras - https://keras.io/
XGBoost - https://xgboost.ai/
MXNet (Apache DL library) - https://mxnet.apache.org/
fastText - https://fasttext.cc/
NLTK - Natural Language Toolkit - https://www.nltk.org/
CNTK = Microsoft Cognitive Toolkit - https://cntk.ai/
H2O - https://www.h2o.ai/ - parallel training and execution on cloud
SageMaker - https://aws.amazon.com/sagemaker/ - on Amazon Cloud
Google AI - https://cloud.google.com/products/ai/ - Cloud AutoML, Cloud Machine Learning Engine, Cloud TPUs, BigQuery ML, ...
Nicrosoft Azure ML - https://azure.microsoft.com/en-us/overview/machine-learning/ -

---
Daily time-series - detecting seasonality with Fourier Transforms
removing trend
Moving-Window Averages

Recommendation Engine
Collaborative Filtering

Neural Networks (NN)
perceptron, multilayer perceptron
The XOR problem, hidden layers, non-linearity, ReLU
Feed-Forward Network
nodes, connections, weights, biases, activations

learning as optimization problem
cost function, objective function, loss function, regret

Boltzman Machine
RBM (Restricted Boltzman Machine, 2006),
Deep Belief Network```
 ```CNN - Convolutional Neural Network layers: convolutional, pooling, fully-connected Short overview and comparison of CNNs: - https://medium.com/@sidereal/cnns-architectures-lenet-alexnet-vgg-googlenet-resnet-and-more-666091488df5 - LeNet-5 (1998 - by Yann LeCun, 60K parameters), MNIST database - handwritten digits (60K training, 10K testing) ImageNet - 14 Mln images, 1000 classes - AlexNet (2012 - Alex Krizhevsky, Geoffrey Hinton, and Ilya Sutskever, 60 Mln parameters) - VGGNet (2014, 138 Mln parameters) - GoogLeNet (2014, inception blocks, 19 layers, 4 Mln parameters) - ResNet (2015 - Residual Neural Network, 152 layers, 25 Mln parameters, Microsoft Research) - Google AutoML, NASNet architecture (2017 - https://ai.googleblog.com/2017/11/automl-for-large-scale-image.html ) YOLO - You Look Only Once - real time object detection Image segmentation (semantic segmentation) U-Net: Convolutional Networks for Biomedical Image Segmentation ``` ```AutoEncoder Variational AutoEncoder seq2seqword2vec, embeddings (king - man + woman ~= queen)``` ```RNN - Recurrent NN BRNN - Bi-directional RNN Exploding/Vanishing Gradient problem LSTM - Long Short Term Memory (1997 - Sepp Hochreiter and Jürgen Schmidhuber) GRU - Gated Recurrent Unit (2014, Univ. of Montreal, Canada) - simpler than LSTM Attention (2014) - https://arxiv.org/abs/1409.0473 Atention Is All You Need (2017, Transformer) - https://arxiv.org/abs/1706.03762 Read this: - https://machinelearningmastery.com/how-does-attention-work-in-encoder-decoder-recurrent-neural-networks/ Google Translate: - https://arxiv.org/pdf/1609.08144v2.pdf - 31 authors deep LSTM network 8 encoder and 8 decoder layers parallelism - decreasing training time using attention and residual connections attention mechanism - connects the bottom layer of the decoder to the top layer of the encoder low-precision arithmetic for inference computations Chatbots: (Natural Language Comprehension => Understanding Intent => Action) Amazon Connect + Amazon Lex Google Dialog Flow ```
```
regularization - overfitting & dropout
Changing learing rate dynamically , ADAM optimizer
batch-processing, minibatch, batch-size, SGD

Cross Entropy
Softmax classification (uses same formula, but different meaning)
KL-divergence (Kullback–Leibler divergence, also called relative entropy)

```
 ```Reinforcement Learning, Deep Reinforcement Learning Agent, Policy, Reward, Regret AlphaGo, DeepMind multi-armed bandit problem - a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice's properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice. The name comes from imagining a gambler at a row of slot machines (sometimes known as "one-armed bandits").```
```

===========================================

Good courses about Deep Learning (DL) and ML:
- https://www.coursera.org/specializations/deep-learning
- https://www.coursera.org/specializations/machine-learning-tensorflow-gcp
- http://www.fast.ai
- https://www.udacity.com/course/deep-learning--ud730
etc.

- https://www.ted.com/talks/daphne_koller_what_we_re_learning_from_online_education

================================================================
deeplearning.ai - 5 courses - all videos on youtube:

courses 1,2,3 of 5 - 98 videos:
course 1: video 1-41
course 2: video 41-70
course 3: video 71-98

course 4 of 5 - 43 videos
Convolutional Neural Network (CNN)

course 5 of 5 - 33 videos
Recurrent Neural Networks (RNN)

fast.ai videos on youtube (parts 1,2 - 14 videos):

================================================================
You can also read this book:
- http://www.deeplearningbook.org
or find youtube videos where people discuss chapters of this book.
The book is comprehensive - but difficult to read.
You will need to do a lot of internet browsing to clarify things.

Audio interviews:
(TWiML&AI) podcast
- https://twimlai.com -

For Russian speaking - good channel on Youtube

Here are some links related to ML & AI

Nice 2-y old tutorial with pictures:
- http://www.iro.umontreal.ca/%7Ebengioy/talks/DL-Tutorial-NIPS2015.pdf

New online publication:
- http://distill.pub

Christopher Olah has a great blog with
very clear explanations of DL concepts
- http://colah.github.io/
- https://github.com/colah/

Nice short online book about DL:
- http://neuralnetworksanddeeplearning.com/index.html

Good 1-hr lecture by Yann LeCun:

Stanford - 15 lectures ( CS231n ) Fei-Fei Li & Andrej Karpathy & Justin Johnson

Oxford - Deep Learning lectures - Nando de Freitas

Hinton lectures (Neural Networks for Machine Learning)

Ian Goodfellow PhD Defense Presentation

GANs - short 5min video by Siraj Raval (he has lots of videos)

The Great A.I. Awakening - by Gideon Lewis-Kraus, Dec. 14, 2016
- https://www.nytimes.com/2016/12/14/magazine/the-great-ai-awakening.html

Google Translate research paper (Sep 2016)
- https://arxiv.org/pdf/1609.08144v2.pdf

Andrew Ng: Artificial Intelligence is the New Electricity

Andrew Ng - The State of Artificial Intelligence (Dec 15, 2017)

- Yann LeCun
- Deep AI
- Deep Learning Patterns, Methodology and Strategy
- Montreal.AI
- …

- https://opendatascience.com/
- http://DataScienceWeekly.org
- NYC-Machine-Learning-list@meetup.com
- NYC Artificial Intelligence & Deep Learning@meetup.com
- http://machinelearningmastery.com
- https://www.coursera.org/courses/?languages=en&query=deep+learning
- https://www.udacity.com/course/deep-learning--ud730