AAIEA 2020
Accelerating AI for Embedded Autonomy
September 24, 2020, 9AM-5PM EDT (Virtual)
Hosted by ESWEEK

The Workshop on Accelerating Artificial Intelligence for Embedded Autonomy aims at gathering researchers and practitioners in the fields of autonomy, automated reasoning, planning algorithms, and embedded systems to discuss the development of novel hardware and software architectures that can accelerate the wide variety of AI algorithms demanded by advanced autonomous and intelligent systems.

Preliminary Program



10:15 - 10:30
Introduction to the workshop
Workshop organizers
10:30 - 11:00
The Graphcore Intelligence Processing Unit - A new processor architecture for machine intelligence
Dr. George Mathew
AI Applications Specialist, Graphcore
11:00 - 11:30
Automated Design of Efficient Deep Neural Networks for the edge
Bichen Wu
University of California, Berkeley
11:30 - 12:00
Analog resistive crossbar arrays for deep learning acceleration
Dr. Martin M. Frank
IBM T.J. Watson Research Center
12:00 - 13:00
Lunch break
13:00 - 13:45 : Keynote
Human-Centric Computing
Jan M. Rabaey
Donald O. Pederson Distinguished Professor, EECS Department, University of California, Berkeley
13:45 - 14:00
Break
14:00 - 14:30
Implementing Deep Learning on a Small UAS for an ISR Mission
Dr. Arjuna Flenner
Senior Technical Leader – AI & Image Processing, Advanced & Special Programs, GE Aviation Systems
14:30 - 15:00
Autonomy at the Edge: Ultra-Low Power Hardware Design for Intelligent Micro-robotics
Arijit Raychowdhury
Professor, Electrical and Computer Engineering, Georgia Institute of Technology
15:00 - 15:30
Real-time Motion Planning for the Masses
Dan Sorin
Addy Professor of Electrical and Computer Engineering and of Computer Science, Duke University
15:30 - 16:00
Concluding Remarks and next steps
Enabling the Future of Embedded Autonomy

Abstracts

The Graphcore Intelligence Processing Unit - A new processor architecture for machine intelligence
Dr. George Mathew
AI Applications Specialist, Graphcore
We present the architecture of the Intelligence Processing Unit (IPU) developed by Graphcore for machine intelligence. The second generation IPU has a massively parallel MIMD architecture with 1472 independent processor cores and a tightly coupled 900MB In-Processor-Memory. It uses the Bulk Synchronous Parallel (BSP) paradigm for efficient parallel programming. We will present the various device configurations which include standalone desktop appliances and large-scale solutions which can support training and inference of large models involving hundreds of billions of parameters. We will also briefly talk about our software stack which enables the compilation of computational graphs defined using popular machine learning frameworks for efficient execution on the IPU.

Analog resistive crossbar arrays for deep learning acceleration
Dr. Martin M. Frank
IBM T.J. Watson Research Center
Chip architectures based on resistive crossbar arrays have the potential to surpass digital accelerators in terms of deep learning performance and energy efficiency. In such circuits, neural network weights can be represented by conductances of analog resistive devices at each crosspoint, allowing for a parallel vector-matrix multiplication operation to be performed. Noting that deep learning algorithms are rather robust to reduced arithmetic precision, we discuss criteria analog devices have to meet in order to be suitable for deep learning inference or training. We then give an overview of candidate device technologies, some of which were originally developed for non-volatile memory applications. Finally, we discuss algorithmic innovations designed to accommodate analog device non-idealities.

Implementing Deep Learning on a Small UAS for an ISR Mission
Dr. Arjuna Flenner
Senior Technical Leader – AI & Image Processing, Advanced & Special Programs, GE Aviation Systems
Small autonomous UAV’s are becoming increasingly used in a hostile environment, but in order to complete their mission UAV’s currently require a communication link to an end user. Furthermore, the end user must interpret the information from the sensor feeds to provide control input to the UAV. In future environments, the communication link may be jammed, and the mission needs to be completed with no human input. Therefore, there is a need for autonomous systems to transform raw data into actionable information with zero human input. Recent advancements in deep learning provide a valuable tool in the visual domain to provide the actionable information, but most deep learning inference is performed using a general-purpose graphical processing unit (GP-GPU). The power and swap space requirements of such a processing unit limit their use to larger platforms. In this work, we implemented the YOLOv3 deep network on a small UAV with an autonomous auto-pilot and the vehicle flew a pre-determined route for object detection. Due to the small size of the UAV, we used an FPGA for the neural network inference, which received information from a Trillium camera. Furthermore, we provided an essential fusion task by computing the geolocation of object detections onboard the aircraft. The balsawood Bushmaster aircraft, with camera, batteries, and processing capabilities resulted in a small weight and power aircraft. In this talk, we discuss how to modify a network to run on the low power FPGA. This modification was further complicated due to the lack of training data and we discuss implementing transfer learning to train images taken from the ground to aerial images and we discuss why transfer learning is more difficult to implement on a FPGA than on a conventional GP-GPU. Finally, we present our test flight where the UAV was able to detect objects from the air and geolocate the objects with no user input.

Autonomy at the Edge: Ultra-Low Power Hardware Design for Intelligent Micro-robotics
Arijit Raychowdhury
Professor, Electrical and Computer Engineering, Georgia Institute of Technology
As we make extraordinary progress towards “ubiquitous intelligence”, AI and machine learning are progressively moving from the cloud to the edge devices. This is enabled by hardware-constrained ultra-low-power (μW to mW) circuit and system designs. In this talk, I will discuss the promise and outlook for such small-system AI, and elaborate on some of our recent work on enabling autonomy in sensor nodes and robotics.

Real-time Motion Planning for the Masses
Dan Sorin
Addy Professor of Electrical and Computer Engineering and of Computer Science, Duke University
For decades, we’ve been led to expect wide deployment of robots, yet robots are still mostly limited to fenced-in industrial workcells and vacuum cleaners. The limiter until recently has been a combination of insufficient computer vision, which is no longer a major problem, and an inability to motion plan in real-time. At Realtime Robotics, a startup that spun out of research at Duke University, we have developed the Realtime Controller, which can perform real-time motion planning for as many as four robots at a time. In this talk, I will present the problem of motion planning, why it is hard to perform quickly, and how we achieved real-time performance with co-designed special-purpose hardware and software. I will also discuss the interesting problems that have arisen along the way.