Prof. Brett H. Meyer was invited to present at the 2022 Edge Intelligence Workshop (EIW) on September 20. The two day workshop featured a number of prominent researchers working to develop or optimize machine learning algorithms on the resource-constrained devices at the edge of the Internet, accompanied by a wide variety of student posters. Professor Meyer’s talk focused on recent research at the McGill Edge Intelligence Lab on BERT inference latency modeling, and inference and throughput optimization through task pipelining and mapping on the Kirin 970 SoC.

RSSL students Negin Firouzian and Lily Li presented posters at the workshop on. Lily presented her work, BERT Inference Energy Predictor for Efficient Hardware-aware NAS; and, Negin presented his, Latency and Accuracy Predictors for Efficient BERT Hardwareaware NAS.

Transforming Intelligence for the Edge: Challenges and Opportunities in Modeling, Optimization, and Deployment

Abstract

The state-of-the-art in deep learning has advanced rapidly, with especially significant improvements in algorithm performance on diverse computer vision and natural language processing. Our capacity to deploy such models at the edge lags considerably, however, for a variety of reasons. First, models that achieve SOTA performance are often large and complex, and far beyond what resource-constrained edge devices can manage. Second, the software environments available for deployment to edge devices generally cannot keep pace with the evolution of models and their components. Case in point: though quantization has been a standard practice for complexity reduction for years, support for edge devices remains remarkably uneven, and even varies across different frameworks for the same hardware.

In this talk, I will explore recent work supporting the optimization and deployment of transformer-based models to resourced-constrained heterogeneous systems at the edge. Heterogeneous multiprocessors offer a wide variety of opportunities for improving edge inference but navigating the multitude of options is time consuming. Efficient decision making must be supported by power and performance modeling and estimation. Along the way, I will also present our recent experiences addressing the various challenges that emerge when attempting to make SOTA models run on edge hardware, and what it means to practically optimize such systems.

Download Presentation