DNN Model Execution Caching

DNN models, especially popular CNN models, are often run from within GPU memory. This memory is a limited quantity, especially when compared to the number of models being served. The RIPCORD project focuses on improving DNN serving through better GPU memory management, intelligent model selection and request routing. We believe it is necessary to rethink caching and co-design deep learning models for achieving improved performance.

Confidential Deep Learning on End-user Devices

Performing deep learning on end-user devices provides fast offline inference results and can help protect the user’s privacy. However, running models on untrusted client devices reveals model information which may be proprietary, i.e., the operating system or other applications on end-user devices may be manipulated to copy and redistribute this information, infringing on the model provider’s intellectual property. CAPR-DL leverages the use of ARM TrustZone, a hardware-based security feature present in most phones, to confidentially run a proprietary model on an untrusted end-user device.

Mobile-aware Cloud Resource Management

Modern mobile applications are increasingly relying on cloud data centers to provide both compute and storage capabilities. To guarantee performance for cloud customers, cloud platforms usually provide dynamic provisioning approaches to adjusting resources, in order to meet the demand of fluctuating workload. Modern mobile workload, however, exhibits three key distinct characteristics, i.e., new type of spatial fluctuation, shorter time scale of fluctuation, and more frequent fluctuation, that make current provisioning approaches less effective. The MOBILESCALE project proposes new research on resource management for mobile workload that differs significantly from traditional cloud workload.

Efficient Mobile Deep Inference

An ever-increasing number of mobile applications are leveraging deep learning models to provide novel and useful features, such as real-time language translation and object recognition. However, current mobile inference paradigm requires application developers to statically trade-off between inference accuracy and inference speed during development time. As a result, mobile user experience is negatively impact given dynamic inference scenarios and heterogeneous device capacity. The MODI project proposes new research in designing and implementing a mobile-aware deep inference platform that combines innovations in both algorithm and system optimizations.