Edge inference

Edge inference is the process of running machine learning or deep learning models on local devices (edge devices) such as smartphones, IoT devices, embedded systems, and edge servers instead of centralized cloud computing infrastructure. ^[1]^[2]^[3]^[4] A key feature of edge computing is edge inference, which allows for real-time data processing, low latency, and improved privacy by reducing the amount of data sent to remote servers.^[5]

References

^ Shi, Weisong; Cao, Jie; Zhang, Quan; Li, Youhuizi; Xu, Lanyu (2016). "Edge Computing: Vision and Challenges". IEEE Internet of Things Journal. 3 (5): 637–646. Bibcode:2016IITJ....3..637S. doi:10.1109/JIOT.2016.2579198.
^ Ngo, Dat; Park, Hyun-Cheol; Kang, Bongsoon (19 June 2025). "Edge Intelligence: A Review of Deep Neural Network Inference in Resource-Limited Environments". Electronics. 14 (12): 2495. doi:10.3390/electronics14122495. ISSN 2079-9292.
^ "What is AI inference? How it works and examples". Google Cloud. Retrieved 27 April 2026.
^ Seva Vayner (22 July 2024). "What is AI inference at the edge, and why is it important for businesses?". TechRadar. Retrieved 27 April 2026.
^ Satyanarayanan, Mahadev (2017). "The Emergence of Edge Computing". Computer. 50 (1): 30–39. Bibcode:2017Compr..50a..30S. doi:10.1109/MC.2017.9.