The top machine learning skills you need
If you're a programmer who's decided to specialize in machine learning you've made a very wise choice.
Machine learning programmers are growing in demand and that trend is not expected to change, since data just keeps getting bigger and bigger.
If you want to keep yourself as employable as possible it's imperative for you to master the most sought-after machine learning skills. But before we dive into these algorithms, here’s a bit of background knowledge:
We have reached a stage in our technological development where computers no longer require human input in order for them to know which tasks need to be executed. They can now think for themselves.
Yes, you read that correctly. Computers can now write their own programs.
While this doesn’t mean you should worry about your computer autonomously compiling your recent search history into a catchy pop song, this does mean that a new exciting world of data analysis has dawned on us.
Welcome to the wonderful world of machine learning.
What is machine learning?
Machine learning, in its most simplistic form, is the language that teaches computers how to learn.
This field of computer science was pioneered by engineers who figured out how to program a computer to recognize different sets of patterns. It has since then evolved into a programming language that can autonomously alter its own program depending on the type of data it's exposed to.
Ever notice those oddly relevant movie recommendations in Netflix?
That’s machine learning.
Traffic predictions in your navigation app?
Yep, machine learning.
Weird warm smell on the bus?
Nope, not machine learning. Definitely not machine learning.
Why is machine learning important?
Machine learning is important because data is important.
Hidden within the deep crevices of data are the answers to the important questions big businesses are asking.
Why do customers do what they do?
What could be done to increase sales?
In order to extract the right answers to these questions, effective models need to be created to illustrate them.
Big data is very dynamic and always evolving. You therefore need machine learning that is capable of moulding to such ever-changing data.
How does machine learning relate to artificial intelligence?
Contrary to the opinions of many, machine learning and artificial intelligence are not the same thing. Machine learning is an application of artificial intelligence.
Think of artificial intelligence as a “brain,” and machine learning algorithms as the neurons of this brain.
With the stage set, let’s now dive into the top machine learning skills your employer will love!
The top must-know machine learning algorithms
To illustrate the application of machine learning and its most useful algorithms, in this post we will run through the solution to the following case study:
You have had a history of rocky relationships and you want to anticipate when your current spouse is likely to break up with you. In this case we will need to implement a method of machine learning known as “Supervised Learning.”
Supervised learning algorithms can predict future actions based on historical data.
To tackle this problem you’d start by compiling all of the data points pertaining to all of your previous successful relationships (we’re being generous here and assuming you’ve had at least some success):
This data could contain details such as:
- Behavioral patterns
- Things that were said.
- Appendages that were not ridiculed
All of this data would then be compiled into an array and labelled as “correct” or “things are going well, Mom.”
You would then compare all new data against this ultimate standard in order to track the success of your current relationship.
A set of machine learning algorithms will recognize all of the worrisome patterns within the new data set that could lead to your relationship failing miserably.
These algorithms are super useful to have in your machine learning toolbox, let’s go over each of them:
Classification is the method of classifying data.
Data can be classified into different categories depending on the outcome that is being predicted.
Classifications can either be:
- Binary. Yes or no
- Multi-class. Car make and/or model
In our scenario, not laughing at a joke you’ve made would be classified as “not wife material.”
Here are the 3 most popular methods of classifying data:
1. Decision Trees
A decision tree uses a sequence of rules to split data using a tree-like model.
Data is constantly fed into the tree and split until the results are as “pure”as possible, or most likely to be free of errors.
Figure 2: Decision Tree - medium.com
2. K - Nearest Neighbour (KNN)
KNN is a very archaic method of cleaning up distributed data.
This algorithm classifies data based on its relationship to neighbouring data.
For example, if the majority data points located within a given radius (K) are closely related to one another, that data set can then be classified under a given label.
Figure 3: analyticsvidhya.com
3. Random Forest
Random forest algorithms are a very popular classifier, they are highly accurate and super fast!
Random forests use the average outputs of multiple decision trees in order to create a highly accurate classification.
Random forest algorithms LOVE big sets of data.
Figure 4: medium.com
The supervised machine learning method of classification is used in tandem with other machine learning algorithms in order to create accurate predictions.
Some of the primary ones are outlined below:
Regression can be thought of as a one dimensional estimate to multi-dimensional data:
There are different regression models you can implement depending on the complexity of the data spread.
Figure 5: Polynomial Regression - medium.com
Figure 6: Linear regression - medium.com
Figure 7: Support vector regression - medium.com
Regression algorithms, tree based algorithms and neural networks can all be used to formulate a prediction of the most likely outcome.
When making a prediction, in order to satisfy your employer, you need to also provide an explanation of that prediction.
Prediction explanations outline the key variables that dictated the outcome of a given scenario.
In our scenario, a prediction explanation would provide the much needed “closure” to our failed relationship.
The most effective prediction explanation library to use for you machine learning algorithms is SHAP (SHapley Additive exPlanations). SHAP creates beautiful visualizations that clearly outline the relevance of key features to a certain prediction.
Predictions can be illustrated via different visualization models.
Here is a list of some of the most important ones to be aware of:
- Summary plot
- Dependance plot
- Model explainer
- Prediction explainer
Here is an example of a summary plot pertaining to the market value of a given home.
Figure 8: Summary Plot - towardsdatascience.com
The list of acronyms on the left represent different factors that affect the price of a home. The factor at the very top (LSTAT) refers to the lower status of a population. The factor directly below that (RM) represents the number of rooms of a given household. The bar on the right represents the relevance of each feature to the overall outcome (price of a home).
We can see that the top two factors that affect the price of a given home in this data set are LSTAT (Lower status of the population) and RM (Number of rooms).
With some color association we can see that the more negative the LSAT value and the more positive the RM value the higher the likelihood of a house to have a high valuation.
Which makes sense, really. If you have a residential area with less “lower status” individuals and more homes with a high number of rooms, there is a high chance of a home in this area having a high market value.
In our scenario, a summary plot would be the most relevant prediction explanation model to use.
An extremely positive SHAP value for the feature “body odor” might help you finally pinpoint the reason for all of your failed relationships and therefore increase the odds of your next one being a success.