Why Is There so Much Hype Around Amazon Inferentia?

Why Is There so Much Hype Around Amazon Inferentia?

v3b4.gif Debates like, “whether AI will take over humans or not” builds a succinct foundation of this golden era. Remember the film "Ex Machina"? The most captivating part of the film is when – Ava, an AI robot, becomes self-aware and disappointing as ever imagined by the creator. Apart from the film, we definitely can not comment on witnessing that in fact, but we are witnessing the beginning of a golden age: the AI (artificial intelligence) age. And all thanks to ML Debates such as "whether or not AI will take over humans" is building a succinct foundation for this golden era. Big shots in the tech industry such as Microsoft, Google, and Facebook have already started rolling out AI-enabled devices, which add more value to the table if not ignored.

Compute Power Is the Key for ML

Usually, ML-powered products rely on computational power at their disposal. This becomes the thumb rule in the ML and AI domain, i.e., the greater the availability of cutting edge computing tools, the easier it will be to work on ML and AI projects. An ML practitioner has to wait hours, often days, or even months to train their ML models; and this time variability is due to the computational power which plays a crucial role.

One thing is clear: the future of advanced self-learning technologies such as ML and AI depends on dedicated and purpose-built hardware chips being developed in-depth. The chips will be able to support the computational power needed by those models. Notably, Nvidia and Intel produce the chips for AI-powered devices and their glorified customers are the tech giants.

In November 2018, Amazon announced that it would produce its machine learning chip, called INFERENTIA or Amazon Inferential, something unexpected that happened.

What Makes Amazon Inferentia Chips so Important?

ML developers, AI scientists, and cloud evangelists, everyone around Amazon Inferentia asks a lot of questions. We need to launch ourselves into Machine Learning space for bringing everything into perspective. Typically, any machine learning project involves two phases that transform into products or services, i.e., training and inference.

Training Phase

Training involves, as the name suggests, a distinct process of feeding the required data into a machine. A computer is trained for learning patterns from a given data set. As it learns complex algorithms based on mathematical functions, it is a one-time process that focuses on making machines smarter. The training phase is comparable to a classroom scenario-a professor who teaches his or her students a particular topic. At this point, the Professor is important.

Inference Phase

The machine is ready for the inference phase, after learning all the complex algorithms. How advanced an ML is, can only be determined in the inference process by how a "educated" system reacts. Unlike the training phase, this is not a one-time process; in fact, at the same time, millions of people could make use of those trained models. We will leave you with another comparable example, i.e., the inference process is like a student in real-world scenarios, using the learned knowledge. At this stage the students are important.

Amazon has also concentrated on owning the entire company even though that means starting from scratch. The chips manufactured by Nvidia and Intel have long been used by Amazon Web Services (AWS). AWS unveiled a new chip dedicated to the inference process – Amazon Inferentia, during re: Invent 2019.

Deep Dive Into Amazon Inferentia

The end of the last decade has witnessed a massive demand for the acceleration of deep learning, too, across a wide variety of applications. ML concepts are used in dynamic pricing, image search applications, personalized search recommendations, automated customer support, etc.

There are a plethora of applications, not to mention, that will inevitably increase in the coming years. The problems with ML are that it's complex, costly, and lacks optimized infrastructure for executing ML algorithms.

Besides that, Amazon is keeping a close watch on its arch-rivals. Google announced in 2016 it will be the first custom machine learning chip, Tensor Processing Units (TPUs). Google now offers third-generation TPUs as a cloud service. So, with resources and technology available at the company's disposal, it seems a pretty obvious choice for Amazon.

Meet the Creator of Amazon Inferentia

Amazon acquired Annapurna, an Israeli start-up, in 2015. Engineers from Amazon and Annapurna Labs built the Arm Graviton processor and Amazon Inferentia chip.

Source: perspectives.mvdirona.com

Technical Specifications

Amazon Inferentia Chips were made up of 4 Cores of Neurons. Each Neuron Core implements a "high-performance, multiply-engine systolic array matric" (fancy words for interconnected hardware that perform specific actions with less response time).

As per the technical definition, "A systolic array is a homogeneous network of tightly coupled data processing units (DPUs) called cells or nodes in parallel computer architectures. Every node or DPU calculates a partial result independently as a function of the data obtained from its upstream neighbours, stores the result within itself and transfers it downstream.

Amazon AWS Inferentia

Each chip with 4 Neuron Cores is capable of executing up to 128 TOPS (trillions per second). It supports the data types BF16, INT8, and FP16. One interesting thing is that AWS Inferentia can use BFloat16 to take a 32-bit trained model and run it at the speed of a 16-bit model.

Low Latency for Real-Time Output

During the re: Invent 2019, you must have heard this that Inferentia provides lower Latency. What is it like here? As ML becomes more sophisticated, the models are growing, and transferring the models into and out of memory becomes the most crucial task, which was supposed to improve the algorithm for the model. This brings high latency and magnifies the problems with computing. Amazon Inferentia chip has an even greater capacity to address latency issues. It interconnects chips which serve two purposes. First, with 100 percent on-cache memory capacity, one can partition a model through several cores — stream data at full speed through the cores pipelines preventing the Delay caused by external memory access.

Supports All the Frameworks

ML practitioners work with a broad spectrum of frameworks. AWS makes it simple for ML enthusiasts to run AWS Inferentia on nearly any available platform. To run Inferentia, the models must be compiled to a representation that is optimized for hardware use. This may seem too pro-level, but no, the operations can be performed either through command-line tools available in the AWS Neuron SDK or through Application APIs.

Democratizing Access to the Hardware Required for ML

Running ML models is a costly affair for hours, weeks, or even months. Organizations managing and constructing applications with ML may not be able to bear all expenses of higher computing capacity for owning, running, and maintaining the hardware.

So, AWS has not yet released any Inferentia pricing except for the Amazon EC2 Inf1 instances (an inferential chip powered instance). But the challenges of the customer to reduce the cost of the inference phase must have paved the way for Amazon Inferentia, for sure.

What’s Next in Machine Learning for AWS?

AWS has made more than a dozen announcements of ML-enhancing programs and goods. We can not ignore the announcements from Amazon SageMaker, which came as a gift from AWS to the organizations and individuals who are preaching ML.

AWS is hoping to add Inferentia chips to other instances such as EC2. This adds further depth to AWS 'compute portfolio. Only if they can produce the hardware services at the lightening pace will Amazon's comprehensive strategy to add custom-built best in industry chips flourish at an exponential rate.

ThankYou!