Intel oneAPI’s Unified Programming Model for Python Machine Learning – The New Stack

The in style Scikit-learn Python machine studying toolkit is a straightforward and highly effective framework for classical machine studying. If you might be coaching fashions based mostly on linear regression, logistic regression, resolution tree, or random forest algorithms, Scikit-learn is the primary selection.
In classical machine studying, one is anticipated to carry out function engineering — figuring out the proper attributes — and handpicking the proper algorithms aligned with the enterprise drawback. It is the proper strategy for most issues based mostly on structured information saved in relational databases, spreadsheets, and flat recordsdata.
On the opposite hand, deep studying is a subset of machine studying that depends on giant datasets and large computational energy to determine high-level options and hidden patterns within the information. When coaching fashions based mostly on unstructured information corresponding to pictures, video, and audio, deep studying methods based mostly on well-defined neural community structure are most popular by ML engineers and researchers.
In addition to Scikit-learn, different superior AI frameworks corresponding to TensorFlow, PyTorch, Apache MXNet, XGBoost, and others could also be used for coaching fashions based mostly on structured or unstructured datasets and all kinds of algorithms which are used as a part of deep studying and classical machine studying workflows. ML researchers and engineers want variations of those frameworks which have been optimized for accelerated efficiency. The AI acceleration is delivered by the mixture of {hardware} and software program.
Deep studying frameworks corresponding to Apache MXNet, TensorFlow, and PyTorch make the most of the acceleration software program based mostly on NVIDIA CUDA and cuDNN that present interfaces to the underlying Nvidia GPUs. AMD supplies an identical mixture via Heterogeneous Interface for Portability (HIP) and ROCm that present entry to AMD GPUs. AI acceleration in these instances squarely focuses on GPU, the software program drivers, runtime, and libraries. Deep studying frameworks are tightly built-in with AI acceleration software program to hurry up the coaching and inference of deep studying fashions on the respective GPUs.
While GPUs are used extensively in deep studying coaching, CPUs are extra ubiquitous within the full end-to-end AI workflow: information preprocessing/analytics and machine & deep studying modeling/deployment. In truth, you could be shocked to be taught that Intel Xeon Scalable processors are essentially the most extensively used server platform from the cloud to the sting for AI.
Intel has been on the forefront of an initiative known as oneAPI — a cross-industry, open, standards-based unified programming mannequin focusing on a number of architectures together with the aforementioned CPUs and GPUs, FPGAs, and different AI accelerators. The oneAPI toolkit is out there to builders as a set of toolkits aligned with HPC, AI, IoT, and ray tracing use instances.
Intel oneAPI AI Analytics Toolkit (AI Kit) targets information scientists and AI engineers via acquainted Python instruments and frameworks. It is a part of Intel’s end-to-end suite of AI developer instruments and comes with optimized AI frameworks for Scikit-learn, XGBoost, TensorFlow, and PyTorch.
The most attention-grabbing elements of the AI Kit for builders and information scientists utilizing a machine studying workflow are Intel Distribution of Modin and Intel Extension for Scikit-learn that are extremely optimized for the CPU, promising a 10-100X efficiency enhance. The smartest thing about these frameworks is that they’re absolutely appropriate with Pandas and inventory Scikit-learn delivering drop-in replacements.
Let’s take a better take a look at Intel Distribution of Modin and Intel Extension for Scikit-learn.
Intel Distribution of Modin
Intel Distribution of Modin is a performant, parallel, distributed, pandas-compatible DataBody acceleration system that’s designed round enabling information scientists to be extra productive. This library is absolutely appropriate with the Pandas API. It is powered by OmniSci within the again finish and supplies accelerated analytics on Intel platforms.
Modin is appropriate with Pandas whereas enabling distributed information processing via Ray and Dask. It is a drop-in substitute for Pandas that transforms single-threaded Pandas into multithreaded ones, utilizing all of the CPU cores and instantaneously dashing up the info processing workflows. Modin is particularly good on giant datasets, the place pandas will both run out of reminiscence or grow to be extraordinarily gradual.
Modin additionally has a wealthy frontend supporting SQL, spreadsheets, and Jupyter notebooks.
Data scientists can simply swap to Modin to make the most of parallelized information processing capabilities whereas utilizing the acquainted Pandas API.
Installing Modin is easy. It’s out there via the Conda bundle supervisor of Intel oneAPI AI Analytics Toolkit.

conda create -n aikit-modin intel-aikit-modin -c intel -c conda-forge
conda activate aikit-modin

conda create -n aikit-modin intel-aikit-modin -c intel -c conda-forgeconda activate aikit-modin

The code snippet under reveals how easy it’s to make use of Modin:

import modin.pandas as pd
df = pd.read_csv(‘~/trips_data.csv’)

import modin.pandas as pddf = pd.read_csv(‘~/trips_data.csv’)df.groupby(“cab_type”).measurement()

Intel Extension for Scikit-learn
The Intel Extension for Scikit-learn supplies optimized implementations of many scikit-learn algorithms, that are in keeping with the unique model and supply quicker outcomes. The bundle merely reverts to Scikit-learn’s unique conduct while you make use of algorithms or parameters not supported by the extension, which delivers a seamless expertise to builders. With no have to rewrite new code, your ML utility works as earlier than and even quicker.
Intel Extension for Scikit-learn helps in style classical machine studying algorithms usually utilized by Scikit-learn builders.

The acceleration in studying pace is achieved by patching, which replaces the inventory scikit-learn algorithms with their optimized variations supplied by the extension. Installation of the module could be carried out via conda or pip.

pip set up scikit-learn-intelex
conda set up scikit-learn-intelex -c conda-forge

pip set up scikit-learn-intelexconda set up scikit-learn-intelex -c conda-forge

As a drop-in substitute, utilizing Intel Extension for Scikit-learn is simple. Just add the under traces to your code:

from sklearnex import patch_sklearn

from sklearnex import patch_sklearnpatch_sklearn()

In an upcoming tutorial, I’ll reveal methods to set up and use Intel oneAPI AI Analytics Toolkit for coaching a linear regression mannequin. Stay tuned.

Feature picture through scikit be taught.

Recommended For You