Low-memory technique for deep-learning recommendation systems — ScienceDaily

A breakthrough low-memory technique by Rice University pc scientists might put some of the resource-intensive types of synthetic intelligence — deep-learning recommendation fashions (DLRM) — inside attain of small corporations.
DLRM recommendation systems are a well-liked type of AI that learns to make ideas customers will discover related. But with top-of-the-line coaching fashions requiring greater than 100 terabytes of reminiscence and supercomputer-scale processing, they’ve solely been out there to a brief checklist of know-how giants with deep pockets.
Rice’s “random offset block embedding array,” or ROBE Array, might change that. It’s an algorithmic strategy for slashing the dimensions of DLRM reminiscence buildings referred to as embedding tables, and it is going to be offered this week on the Conference on Machine Learning and Systems (MLSys 2022) in Santa Clara, California, the place it earned Outstanding Paper honors.
“Using simply 100 megabytes of reminiscence and a single GPU, we confirmed we might match the coaching instances and double the inference effectivity of state-of-the-art DLRM coaching strategies that require 100 gigabytes of reminiscence and a number of processors,” stated Anshumali Shrivastava, an affiliate professor of pc science at Rice who’s presenting the analysis at MLSys 2022 with ROBE Array co-creators Aditya Desai, a Rice graduate pupil in Shrivastava’s analysis group, and Li Chou, a former postdoctoral researcher at Rice who’s now at West Texas A&M University.
“ROBE Array units a brand new baseline for DLRM compression,” Shrivastava stated. “And it brings DLRM inside attain of common customers who would not have entry to the high-end {hardware} or the engineering experience one wants to coach fashions which can be a whole bunch of terabytes in dimension.”
DLRM systems are machine studying algorithms that study from information. For instance, a recommendation system that implies merchandise for customers can be educated with information from previous transactions, together with the search phrases customers supplied, which merchandise they have been provided and which, if any, they bought. One manner to enhance the accuracy of suggestions is to kind coaching information into extra classes. For instance, moderately than placing all shampoos in a single class, an organization might create classes for males’s, girls’s and kids’s shampoos.

For coaching, these categorical representations are organized in reminiscence buildings referred to as embedding tables, and Desai stated the dimensions of these tables “have exploded” on account of elevated categorization.
“Embedding tables now account for greater than 99.9% of the general reminiscence footprint of DLRM fashions,” Desai stated. “This results in a number of issues. For instance, they can not be educated in a purely parallel style as a result of the mannequin needs to be damaged into items and distributed throughout a number of coaching nodes and GPUs. And after they’re educated and in manufacturing, trying up info in embedded tables accounts for about 80% of the time required to return a suggestion to a consumer.”
Shrivastava stated ROBE Array does away with the necessity for storing embedding tables by utilizing a data-indexing methodology referred to as hashing to create “a single array of discovered parameters that could be a compressed illustration of the embedding desk.” Accessing embedding info from the array can then be carried out “utilizing GPU-friendly common hashing,” he stated.
Shrivastava, Desai and Chou examined ROBE Array utilizing the wanted DLRM MLPerf benchmark, which measures how briskly a system can practice fashions to a goal high quality metric. Using various benchmark information units, they discovered ROBE Array might match or beat beforehand revealed DLRM strategies by way of coaching accuracy even after compressing the mannequin by three orders of magnitude.
“Our outcomes clearly present that the majority deep-learning benchmarks could be fully overturned by basic algorithms,” Shrivastava stated. “Given the worldwide chip scarcity, that is welcome information for the way forward for AI.”
ROBE Array is not Shrivastava’s first massive splash at MLSys. At MLSys 2020, his group unveiled SLIDE, a “sub-linear deep studying engine” that ran on commodity CPUs and will outperform GPU-based trainers. They adopted up at MLSys 2021, displaying vectorization and reminiscence optimization accelerators might increase SLIDE’s efficiency, permitting it to coach deep neural nets as much as 15 instances quicker than prime GPU systems.
The ROBE Array analysis was supported by the National Science Foundation (1652131, 1838177), the Air Force Office of Scientific Research (YIP-FA9550-18-1-0152), the Office of Naval Research, Intel and VMware.
Story Source:
Materials supplied by Rice University. Original written by Jade Boyd. Note: Content could also be edited for model and size.


Recommended For You