This document discusses optimizing deep learning inference on Intel processor graphics using the OpenVINOTM toolkit. Some key points include:
- Running inference on client devices provides advantages over cloud like privacy, bandwidth savings, and responsiveness.
- OpenVINOTM provides tools to optimize models for Intel hardware and achieve 5-10x speedups on Intel GPUs compared to CPU baselines.
- A case study demonstrates optimizing a deep image matting model, reducing inference time from 2.35 seconds to 291 milliseconds on Intel GPU using OpenVINOTM.
- Emerging technologies like federated learning are discussed which could improve privacy for on-device inference.