Machine learning has managed to go mainstream with a big bang over the last year. The “instantaneous” emergence of machine learning is not only driven by the ever stronger GPU hardware and cheap cloud environments. It is also because of an open source frameworks explosion that is designed to abstract away machine learning’s hardest parts, thus making its techniques accessible to a wider class of developers.
Below are 15 machine learning frameworks that are either newly revised, or freshly minted in the last year. The tools on this list were chosen due to their ability to address a specific issue relating to machine learning, bring a novel simplicity to their problem domain or both.
Frameworks for “deep learning” control heavy-duty functions for machine-learning, including image recognition and natural language processing. Being an Apache Incubator project, Singa is an open source framework whose function is to make the training of deep-learning models easy on massive volumes of data.
This framework offers its users a programming model that is simple for training deep-learning networks across a collection of machines. It also supports a wide variety of common training jobs such as recurrent neural networks, convolutional neural networks, and restricted Boltzmann machines. Models can either be side by side or one after the other on both GPU and CPU clusters. FPGA support will be available soon. Using Apache Zookeeper, Singa is also able to simplify cluster setup.
Microsoft Distributed Machine Learning Toolkit
Developing machine learning applications that can run across numerous machines can be quite tricky. Microsoft Distributed Machine Learning Toolkit (DMTK) is a framework that addresses the issue of dispensing different kinds of machine learning jobs across a cluster of systems.
The number of algorithms associated with DMTK is small since it is not billed as a full-blown-out-of-the-box-solution, but rather as a framework. DMTK’s design permits users to make the most of the clusters with limited resources. For example, each node in the cluster comprises of a local cache which helps reduce the amount of traffic using the central server node to provide parameters for the job in question.
The Marvin, a recent neural network framework production, is a product of the Princeton Vision Group. According to creators of this project, Marvin was ‘born to be hacked.' The framework relies on several files written in CUDA GPU and C++. Despite deliberately comprising of minimal code, Marvin comes with a few pre-trained models that can be reused with the correct citation, and contributed to pulling requests like the very own code of the project.
This framework's goal is to make deep neural networks fun, flexible and reliable. With Brainstorm, support is already included for various recurrent neural network models such as LSTM.
This framework uses Python to provide two data management APIs or ‘handers.' One is for leveraging GPUs via CUDA, while the other is for computing CPU by the Numpy library. A large portion of the work is carried out through Python Scripting, so do not expect a rich front-end GUI apart from the one you will bring.
Apache Spark MLlib
Even though this framework might be popularly known for being a member of the Hadoop family, Apache Spark was actually born outside of Hadoop. This in-memory data processing framework is making a name for itself outside the Hadoop ecosystem. Thanks to this Spark’s growing library of algorithms, the framework has become a popular tool for machines learning, and can be applied to in-memory data at high speed.
Amazon Machine Learning
The approach Amazon has taken to cloud services follows a pattern. Amazon Machine Learning provides the basics, then brings in a primary audience that cares and lets them build on top of the framework. This way Amazon gets to find out what its audience needs, and deliver just that.
This framework shows how machine learning is being turned into a practicality rather than a luxury.
Neon is an open source project created by Nervana, which is currently a part of Intel. This framework uses modules that are pluggable, thus allowing heavy lifting to be done on Nervana’s own silicon, CPUs or GPUs.
This framework is mainly written in Python, but a few pieces are written in C++ and assembly for speed. This makes Neon instantly available to other people doing data science work in Python, as well as other frameworks using Python bindings.
Apache Mahout was initially built to permit scalable machine learning on Hadoop a long time ago, even before Apache Spark seized that throne. However, after long periods of rather minimal activity, the framework has been revitalized and comes with new additions.
Microsoft Computational Network Toolkit
Microsoft’s CNTK allows its users to build neural networks by way of a directed map, thus being similar to Google TensorFlow. CNTK is also considered by Microsoft to be in the same ranks as Torch, Caffe and Theano, apart from the framework’s ability to attain greater speed through the exploitation of multiple GPUs and CPUs in parallel. The company claims that running this framework on GPU clusters on Azure permitted it to massively speed up Cortana’s speech recognition training.
This framework is similar to Microsoft’s CNTK and DMTK in that it is a tool for machine learning designed to scale across several nodes. Just as Google’s Kubernetes, TensorFlow was created to solve internal issues at Google, but was later released as an open source product.
Just like Microsoft’s DMTK and Google’s TensorFlow, Veles is written in C++ but uses Python to automate and coordinate between nodes. This framework is a distributed platform for applications using deep-learning. Veles analyzes datasets and automatically normalizes them before they are fed to the cluster. A REST API then permits the trained model to be used in production immediately.
Even though H2O is in its third revision, it offers access to algorithms for machine learning through common development environments, such as Scala, Python, R and Java; data sources such as S3, NoSQL, HDFs and SQL and lastly big data systems such as Spark and Hadoop. This framework is meant to be used as an end-to-end solution for serving predictions, collecting data and building models. For example, users can export models as Java code, thus allowing predictions to be served on numerous platforms in a variety of environments.
Microsoft Azure ML Studio
Microsoft’s Azure ML Studio allows its users to build and train models before turning them into APIs that can be used by other services. This framework has its own pay-as-you-go service for machine learning with free-tier, hourly and monthly versions. Free-tier users get up to 10GB storage for model data per account. Users have the option of connecting their own Azure storage for larger models to the service.
This framework’s machine learning library is C++ based, and was initially launched in 2011. According to the creators of the library, mlpack 2 is designed for three things; ease-of-use, scalability, and speed. Users can implement this framework using a C++ API for more advanced work, or cache command-line executables for quick-and-dirty operations.
If you are wanting to know more about these machine learning frameworks, contact our expert programmers on freelancer.com.
Caffe2 is a deep-learning framework that is designed with modularity, expression, and speed in mind. It was initially created for machine vision projects in 2013. However, this framework has since diversified to incorporate other applications, including multimedia and speech.
Since the primary priority for Caffe2 is speed, the framework has been completely written in C++ with CUDA acceleration support. However, it can also switch between GPU and CPU processing when required.
Machine learning frameworks are essential for the advancement of artificial intelligence, which in turn has the potential to dramatically influence the future of human beings.
If you have any questions feel free to post in our comment section below. Also, let us know what your favorite machine learning framework is.