Hands-on in Machine Learning for Beginners
AI or Artificial Intelligence is a field of engineering and science of making machines that can reason, learn, and act intelligently. A simple definition, though carrying the weight of the secrets of the next generation of human evolution. Most of us have been aware about AI very recently when we got cameras tagging our smiling faces or phones with fingerprint access or Alexa responding to our commands. However, AI’s history dates to almost 7 decades when it was formally founded in 1956 at a conference in Dartmouth College, UK. It was also here that the term “Artificial Intelligence” was coined. For almost 3 decades after that AI was being developed by universities and governments either as small projects or futuristic endeavors in their backyards. The major turning point for AI came in 1997 when IBM’s Deep Blue became the first computer to beat a chess champion when it defeated Russian grandmaster Garry Kasparov though some projects like DARPA had covered initial milestones in implementing AI in US. Since then AI development has not turned back and has gained interest in almost every industry. Rather, it’d not be an exaggeration to say that it’s disrupted industries to a very level where the business models are experiencing major changes. Take for example, Retail and Ecommerce where AI chatbots are helping retailers to collect customer data, which they use to provide meaningful insights to retailers or help maintaining CRM applications in retail stores with automated data entry, ad personalization, account insights, and much more. Likewise, AI has triggered major upheavals in Manufacturing and Production with Predictive Maintenance, Telecom with AI-based cloud-based network management, Supply Chain and Logistics by helping in forecasting inventory, demand, and supply and eventually revolutionizing the optimization and agility of supply chain decision making. However, two big industries which have gained significant grounds in AI and have utilized matured AI technologies are Transportation and Healthcare. These two industries together have started to revolutionize the economies and will be the frontier in applying cutting-edge AI research and solutions.
But where did AI evolve from to such a massive scale, was it a science miracle or engineering feat? The answer is, Both. AI combines the fields of both science and engineering to enable machine reason and perform like or better than a human. This encapsulates understanding the science behind capturing of sensory inputs (just like a human brain does) and applying the same through multi-disciplinary engineering techniques to determine the result. Take for example the chatbot that we usually interact while logging a ticket etc. The chatbot is supposed to take inputs from a user and behave or respond like a human with some rational output or results. Here, the inputs could either be a voice command or simple texts. These inputs are processed by a highly intelligent algorithm analyzing the characteristics of text/voice as well as the tone and the urgency. To build such humane system the science of Natural language augmented with highly sophisticated algorithms is required to engineer the inputs. In summary, AI systems are no single example of only science or engineering but utilize both in some form or the other.
With AI now taking the cornerstone of every major technical disruption we need to understand various sub-disciplines of AI which are responsible for these changes. AI for long have been an umbrella terminology encapsulating multiple other facets like Machine Learning and Deep Learning. These two terms have recently gained significance and have established clear identity for themselves due to their distinct applications. The figure below depicts the subsystems of Artificial Intelligence.
Now that we know that Machine Learning and Deep Learning are two critical subsystems of AI let’s dive into them to get a better understanding.
Machine Learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and uses it to learn for themselves. The process of learning begins with observations or data, such as examples, direct experience, or instruction, in order to look for patterns in data and make better decisions in the future based on the examples that we provide. The primary aim is to enable machines to learn automatically without human intervention or assistance and take actions accordingly.
A mathematical definition for Machine Learning could be put as, “a computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E”. Let’s try to understand this through an example. Let’s say your email program watches which emails you do or do not mark as spam. So, in an email client like this, you might click the Spam button to report some email as spam but not the other emails. Over a period, your email program will learn better how to filter spam from hordes of email you receive every day. To derive the analogy, classifying emails is the task T, watching you label emails as spam or not spam would be the experience E and the fraction of emails correctly classified might be a performance measure P.
As contrary to common perception Machine Learning is not all about statistics. Rather machine learning is a process of working with Data/Inputs, Building the learning model, Evaluating the model (using statistics) and Optimizing the model to provide best predictions or actions. All the above steps could be represented as:
Machine Learning: X -> X where X є (Exploration, Representation, Evaluation, Optimization)
To summarize, exploration is a process involving data collection, data cleaning and data rebuilding so that clean and unambiguous data is ready as an input. Representation is a process of transforming the data from one space to another so that it could be more easily interpreted or used by subsequent algorithms. Evaluation is where the heart of Machine Learning lies. It is where all the transformed data is processed or crunched by your algorithm or model to learn about data behavior or hidden trends and predictions thrown out by the model. And lastly, optimization where the hyperparameters are tuned to improve the performance of the model.
As shown in the equation above, learning is an iterative process where a model must constantly learn from new data and improve upon its performance and accuracy to predict. Sometimes, when the variety of data changes significantly over a period the old models will prove ineffective and will have to be replaced by entirely new models and tuning parameters.
Now that we’ve learned the innate working of Machine Learning let’s have a look at primary techniques used.
Supervised Learning
As the name suggests this branch of Machine Learning involves training a machine to learn from already existing data and outcomes. Supervised Learning can handle two category of problems, one that requires Classification or identification of categories and second, Regression which requires a discrete numerical output derived. Let’s understand Classification first. Recall the previous example that we discussed for filtering a spam mail. In order to identify whether a mail is spam or not, your program or machine will have to be trained to identify the characteristics of a spam mail. For this you may either pass some keywords which are usually a part of spam mail e.g. “lottery”, “sale”, “free credit” etc. With this you’ll also have to tag or label the mails having these as either part of their body or headlines. Now you train your machine with all these ‘labelled’ mails. Once your machine is trained with certain degree of accuracy you can now use it to classify an entire new set of mails as spam or not. This is how supervised learning is used for classification problems. Though the spam mail discussed is an example of binary classification, supervised learning also covers multi-classification problems e.g., when used with animal images to identify fish, dog or cat.
Regression on the other hand establishes a relation between various inputs and the desired output. For example, let’s say you have car data set containing year of manufacturing, horsepower, make, number of insurances claimed, number of kilometers on odometer, current condition etc (the way KBB takes inputs) and the resale value of these cars. Your regression program or model will determine an equation or relation between all these independent inputs and the resale value. Once you get this relation you will be able to predict the resale value of any new input set.
Let’s understand Supervised Learning with the help of Housing Price determination Regression problem from Kaggle (https://www.kaggle.com/c/home-data-for-ml-course/overview).
This problem attempts to build a ML model to estimate the price of the house. It provides information about other houses in the vicinity, such as area in sq-ft, number of rooms, age of house, number of bathrooms etc and current market price of these houses. We’ll go over simple steps to build a ML model or algorithm by feeding all this data to our machine (algorithm) so that your machine learns how to utilize house features and what will be the prices. This exercise where all the house features are fed along with the target price is called Training. In machine learning terminology the house attributes are called ‘Input Features’ and the price of the house is known as ‘Target Feature’.
In the next step we’ll test this ML model for some more data set of houses to ensure that our model has learnt well and can predict with some threshold accuracy. Now your machine is ready, and you can feed in the features of the house you’ve identified to determine the suitable price.
We’ll be using Jupyter Notebook to develop code using Python v 3.7
Step 1: Data Processing
1a: Data Cleaning
In this step we address data quality issues such as missing values, outliers, data pollution etc. Import the libraries (Numpy, Pandas and sklearn libraries are the primary ones)
i. Download the housing data set from Kaggle link and import the csv file in jupyter notebook to create a Dataframe
ii. Check for the features
The results show that the data set has 81 features including house prices. As we observe not all features have numerical values. Features ‘Street’, ‘Utilities’ etc have categorical data which we’ll discuss how to handle in sections below.
iii. Remove unwarranted or irrelevant data
Identify and Remove any null values or arbitrary characters (eg. #, & etc) which intrinsically do not belong to a feature data type. This also includes removing any duplicate entries in rows or columns.
In the snippet above we can clearly observe that ‘LotFrontage’ feature has 259 rows with null values.
We need to treat all such scenarios of null, NaN or random characteristics before we can use our data to build a model. Some techniques include replacing NaN values of a feature with either median or mean value of that feature. For cases where entire rows have multiple NaN or null need to be removed as they don’t contribute to data set. The approach depends upon how the data is distributed as well as impact on the overall data volume if NaN or null are removed.
iv. Handling features with strings or ‘object’ data type
As we observed some features (‘MSZoning’, ‘LotShape’ etc) have categorical data i.e. non-numerical. This type of data must be either converted to numerical value or dropped from the data set before it could be processed further. Some well-known techniques which can be deployed are Label Encoding and One Hot Encoding.
1b: Data Visualization
Once data is cleaned and pre-processed an important step is to visualize how data in all the features is distributed and how features are correlated. Pairplot, Box plots and Heatmap are the primary techniques to visualize your data. This will provide you the information about outliers and how statistically your data is distributed.
Below image shows the distribution plots(Pairplot and Boxplot) for few features:
Step 2: Feature Selection and Engineering
This step involves selecting the right attributes or features for building our model. Feature selection technique needs knowledge of the problem domain as it helps identify the best set of features. For eg. In case of Housing problem, we may categorize area of house, building material used, utilities etc as high priority while Alley pavement, tilted tiling of roof etc could be categorized as low priority features.
There are lot of feature selection techniques mainly categorized in Filter, Wrapper, Embedded and Hybrid.
Feature Selection techniques is a vast topic in itself and is not in the scope of this article.
Feature Engineering is another technique which could be applied to create new and more meaningful features out of existing features. For eg. We have summed up ‘LotFrontageArea’, ‘LotArea’ etc to come up with a new feature ‘TotalHouseArea’.
For our example we’ll use Filter based feature selection technique that selects the best 15 features and lists them down based on their scores:
Step 3: Creating Training and Testing Data Sets
Split entire data into training and test sets. As a thumb rule, the data is split in 70:30 ratio for training and test sets.
The other important step after creating training and test sets is normalizing or standardizing your data so that all your data is at a common scale. Remember to normalize training and test sets separately to avoid data leakage from training to test data set and vice-versa. Z-score or Standard Scalar are the popular methods used for scaling.
Step 4: Training Model
The model is now ready to be trained. As we have a regression problem at hand, we’ll be using a simple grid search regression library to train our model. There are scores of other regression models, linear regression, random forest regressor, gradient booster regressor, etc to name a few which could be applied
4a: Define the Hyperparameters (This is an optional step)
“Hyperparameters are parameters whose values are set prior to the commencement of the learning process” (wiki)
4b: Fit the model
4c: Find best parameters for higher performance of the model
4d: Check Training model performance
Training performance is quite high. This means our features and training data set are fitting very well through grid search algorithm.
Step 5: Testing Model
The model shows very high accuracy for test data as well. Issues which test models usually face is high variance or over-fitting. Finding a balance between Bias-variance for your model needs careful selection of features, feature engineering and right split of training and testing data.
Unsupervised Learning
This type of learning is mostly used identifying clusters or groups among from a huge set of data. Unsupervised learning fits well to those problems where you have raw unlabeled data and you must segregate data as a first step to your problem. For instance, consider you are leading a technical support center and at the end of the month you analyze all the categories of incidences/tickets that your team has handled. One major issue at hand is the huge time taken to resolve all these incidences. In this scenario you would use unsupervised learning to identify what kind of incidences fall in one group, are they technically complex or FAQ which is consuming your team’s bandwidth. This way unsupervised learning provides you the clusters and you can then prioritize which one to address first.
One major difference with supervised learning is that the data used for unsupervised learning is not labeled. This makes an easy starting point for unsupervised learning problems as there is no dependency on label data.
To understand how unsupervised algorithms work we’ll use an example of bank data. This data contains various banking parameters of commercial entities like Industrial Risk, Management Risk, Credibility, Competitiveness and which class do they belong, Bankruptcy or Non-Bankruptcy.
The rating system used for input features are P=Positive, A-Average and N-negative while B-Bankruptcy, NB-Non-Bankruptcy are used for Target Feature i.e. class
Let’s understand the working of unsupervised learning through python code. There are primarily three techniques used in unsupervised learning, Clustering, Anomaly Detection and Neural networks. We’ll be using K-Means Clustering technique.
Step 1: Data Processing
Encode Categorical values with Numerical values. Though label encoding could have been used here we are using another technique to replace the values directly with custom discrete value
Next, we visualize the statistical distribution of data
Step 2: Feature Engineering if required
Step 3: Normalize data
Step 4: Group data into similar clusters
The elbow graph shows the potential number of clusters. The best number of clusters is obtained when the error or distortion decreases maximum with increasing number of clusters. In the example above we find the best case with 2 clusters.
Step 5: Label each data with cluster number
Now our data is ready to be analyzed further for each cluster. For eg we observe that Industrial Risk, Management Risk with Positive Ratings (10) always belong to cluster 1 i.e. Non-Bankruptcy and can draw inference that entities with positive ratings on various parameters will have less chances for bankruptcy.
Reinforcement Learning
This technique employs a system of rewards and penalties to compel the computer to solve a problem by itself. Human involvement is limited to changing the environment and tweaking the system of rewards and penalties. As the computer maximizes the reward, it is prone to seeking unexpected ways of doing it. Reinforcement learning is useful when there is no “proper way” to perform a task, yet there are rules the model must follow to perform its duties correctly. Moreover, just like unsupervised learning reinforcement learning also doesn’t take any labelled inputs to plan its actions.
The best way to understand how reinforcement learning works is to visualize playing Chess where your opponent is AI based reinforcement learning program and it calculates the rewards and penalties for every move and then take an action. This way every time it makes a move in the white or black square it will not only calculate the probabilities of winning i.e. reaching opposite side but also rewards and penalties. Reinforcement learning has so far been successfully implemented in simulation games, rather Go or Atari Games have popularized reinforcement implementation. However, there is still a long way to go for reinforcement learning especially in mission critical applications like Autonomous Driving.
Code implementation for reinforcement learning is usually complex and requires understand of complex mathematics (Q Functions or Bellman’s Theorem). If you are interested to understand primary coding, I encourage you to visit:
(https://www.freecodecamp.org/news/a-brief-introduction-to-reinforcement-learning-7799af5840db/)
Deep Learning
Deep learning is a type of machine learning (ML) and artificial intelligence (AI) that imitates the way humans gain certain types of knowledge. Deep learning is an important element of data science, which includes statistics and predictive modeling. It is extremely beneficial to data scientists who are tasked with collecting, analyzing and interpreting large amounts of data; deep learning makes this process faster and easier.
At its simplest, deep learning can be thought of as a way to automate predictive analytics. While traditional machine learning algorithms are linear, deep learning algorithms are stacked in a hierarchy of increasing complexity and abstraction.
To understand Deep Learning consider you are building a machine to identify a Car. Initially you will pass a huge data set with Car images. What Deep learning will do is start identifying the features of a car like door, tire, windshield, head lamps etc. This it does using Neural Networks which is an algorithm mimicking human brain. Just like human brains has millions of neurons the Neural network contains nodes and layers to build a complex algorithm for feature extraction. Finally, this neural network when supplied with images of vehicles will be able to differentiate cars from truck or vans or bicycles because of unique features it has stored in its memory.
We’ll be using Google Colab to implement the Deep Learning model for MNIST data. The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. Our problem is to design a deep learning model using Neural Networks to identify handwritten digits (from 0–9).
Step 1: Install Tensorflow and Load libraries
Step 2: Collect or Load the data
Tensor flow comes pre-loaded with MNIST database. The data base contains 70,000 images of handwritten digits. After loading we simultaneously split the data set into Train and Test sets.
The train and test input sets (trainX and testX) contain the bitmap values (0–255) of the image while the output sets (trainY, testY) contain the digit number.
As we observe below the first image is a handwritten digit ‘5
Step 3: Convert output labels to binary classes
The output labels trainY and testY can have multiple values (from 0–9) for an input row. Since we are building a categorical neural network, we’ll have to convert the output labels to multi-class format.
We’ll be using Keras Neural Network library for our example.
Step 4: Prepare inputs for model
Keras Neural Network Model can be Sequential or Functional. We’ll be using sequential API as it adds Neural network layers linearly while building model.
Step 5: Build and Compile Model
We’ve built a 4-layer model with ‘sigmoid’ activation. The more layers we add the more complex the neural network will become and hence computationally expensive. Apart from sigmoid we could also use ‘relu’ activation function. Observe the last layer (output layer) has ‘softmax’ as an activation function the reason being that we are building a multi class neural network.
We’ve chosen output layer as ‘softmax’ because it is a multiclass label problem. Loss is also chosen as ‘categorical_crossentropy’ because of the same reason. In case it is a binary classification we could have chosen sigmoid and binary cross entropy respectively.
Step 6: Review Model (Optional)
Step 7: Training the Model
As can be observed the final accuracy of this model after 20 epochs in 89.65. A combination of higher epochs and batch size can provide even hgiher accuracy. Usually batch size is kept at sizes of 32, 64, 128 and rarely 256.
Step 8: Validate results
i. Input image for first data set
ii. Model prediction
As we can see our model is identifying the handwritten image correctly as ‘7’.
Summary:
It’ll not be an exaggeration to say that 22nd century will be the century of technological breakthroughs with the help of AI. Be it Automotive, Retail, Logistics, Genetics or any other field, AI will become a vital component and will be all pervasive at the grassroot levels. The next wave of AI revolution will ride on a more human centric AI though as humans ought to be the key beneficiaries from AI. Undoubtedly then, AI will aid human intelligence and evolution to the next level.
References:
1. https://expertsystem.com/machine-learning-definition/
2. Machine Learning by Coursera
4. https://towardsdatascience.com/no-machine-learning-is-not-just-glorified-statistics-26d3952234e3
5. https://deepsense.ai/what-is-reinforcement-learning-the-complete-guide/
6. https://intellipaat.com/blog/supervised-learning-vs-unsupervised-learning-vs-reinforcement-learning/
7. https://searchenterpriseai.techtarget.com/definition/deep-learning-deep-neural-network