On 7 October at 10:15, Shivananda Rangappa Poojara will defend his thesis "Design and Orchestration of Scalable, Event-Driven Serverless Data Pipelines for Internet of Things (IoT) Applications“ to obtain the degree of Doctor of Philosophy (in Computer Science).
Supervisors:
Lecturer Pelle Jakovits, University of Tartu
Visiting Professors Satish Narayana Srirama, University of Tartu
Opponents:
Professor Mohammad Abdullah Al Faruque, University of California (United States)
Associate Professor Nicolas Ferry, Université Côte d'Azur (France)
Summary
Over the years, IoT applications have been widespread in several domains, including smart manufacturing. Consider a use-case in a smart factory, where video surveillance cameras are used to detect the false pose of factory workers while operating a sensitive device/machine. For video analytics, video streams are collected, split into frames (images), detect the pose of the worker and identify the face, annotate with name, and finally alert the administration and the worker about the false pose that yields harm to a person or the machine.
Assume that a few hundred cameras and other connected sensors may be used for intelligent decision-making. Here, properly dealing with the flow of video data and controlling the complete life cycle of video data processing involves several tasks, from data collection, routing, filtering, analyzing and detecting annotation, alerting, storage, and other operations, eventually becoming challenging. Data pipelines are one of the popular mechanisms used to simplify the design of the chain of data processing activities.
Moving all of the video data over a faraway cloud consumes more bandwidth and increases latency. These challenges are eliminated by processing video data in-house on the factory floor (edge ) or nearby fog devices. However, the large and expensive data processing clusters (such as Apache Flink or Spark) and off-the-shelf tools could be unreliable due to resource constraints and the event-driven nature of IoT applications. This is streamlined by the use of Serverless (or FaaS) computing and data pipelines by creating serverless data pipelines.
The proposed thesis aims to solve the challenges of IoT data processing. The goal of the thesis is three-fold. First, we investigate the bottlenecks in existing container-based data processing. Second, we propose the design approaches for creating serverless data pipelines for complex data processing activities and provide suitability analysis for IoT developers. Third, we provide solutions to handle stochastic workloads by using scalable serverless data pipelines and provide suitability analysis of various scaling approaches over various workload patterns.