DNN Model Execution Caching

The Ripcord project proposes a new infrastructure for improving the performance of deep learning model serving. Our work explores the promise of model execution caching as a means for improving the performance of cloud-based deep inference serving. Model execution caching requires a CDN-like shared infrastructure designed for workloads that see requests for a large and diverse set of models. That is, a workload where the aggregate volume of requests is high but no single model is popular enough to merit a dedicated server.

We present our vision of model execution caching in the context of a hypothetical system (EdgeServe) and the unique challenges and opportunities (DIDL19).

Confidential and Private Deep Learning on End-user Devices

Providing users with control over their personal data, while still allowing them to benefit from the utility of deep learning, is one of the key challenges of contemporary computer science. Our work on the Capr-DL project is focused on performing deep learning operations directly on a personal device, with a trusted framework, allowing both users to retain control over their private data and companies to retain control over their proprietary models.

Our work in this area starts with leveraging model partitioning to circumvent the severe resource constraints of the trusted framework (confidential deep learning and user-controlled privacy and confidentiality).

Mobile-aware Cloud Resource Management

Modern mobile applications are increasingly relying on cloud data centers to provide both compute and storage capabilities. To guarantee performance for cloud customers, cloud platforms usually provide dynamic provisioning approaches to adjusting resources, in order to meet the demand of fluctuating workload. Modern mobile workload, however, exhibits three key distinct characteristics, i.e., new type of spatial fluctuation, shorter time scale of fluctuation, and more frequent fluctuation, that make current provisioning approaches less effective. The MOBILESCALE project proposes new research on resource management for mobile workload that differs significantly from traditional cloud workload.

Our work in this area includes harnessing cheaper yet revocable transient resources (CloudCoaster and transient distributed training); and pooling together resources from multiple cloud providers (multi-cloud resources).

Embedded Systems Security

Embedded systems form the core of critical infrastructure, perform auxiliary processing on mobile phones, and permeate homes as smart devices. Yet, embedded software security lags behind traditional desktop security. While myriad defenses exist for general-purpose systems (e.g., desktops and servers), embedded systems present several unique challenges for software security such as greater hardware diversity, limited resources (e.g. memory and power), and lack of support for common abstractions like virtual memory.

Our work in this area includes defenses for protecting embedded software from control-flow hijacking attacks (Recfish and Silhouette); FPGA architectures that balance the throughput and resource requirements of AES (Drab-Locus); and techniques for generating secure random numbers (Erhard-RNG).

Efficient Distributed Deep Learning

The Cornucopia project focuses on improving the resource utilization and job response time for distributed deep learning—an emerging datacenter workload. Our work explores the promise of model-specific training speedup, combining both systems and machine learning optimizations. Our current work includes more informed neural architecture searches, and distributed training performance characterization and modeling (ICAC’19 and Google Blog Interview).

Efficient Mobile Deep Inference

An ever-increasing number of mobile applications are leveraging deep learning models to provide novel and useful features, such as real-time language translation and object recognition. However, current mobile inference paradigm requires application developers to statically trade-off between inference accuracy and inference speed during development time. As a result, mobile user experience is negatively impact given dynamic inference scenarios and heterogeneous device capacity. The MODI project proposes new research in designing and implementing a mobile-aware deep inference platform that combines innovations in both algorithm and system optimizations.

Our work presents the mobile deep inference vision (MODI) motivated by empirical measurements (on-device vs. cloud inference and performance characterizations); runtime model selection algorithms that balance inference speed and accuracy (ModiPick); and multi-tenancy GPU inference (Perseus).