Date: June 25, 2020
Time: 2:30-3:30pm ET

Abstract

Modern database systems are growing increasingly distributed and struggle to reduce query completion time with a large volume of data. In this paper, we leverage programmable switches in the network to partially offload query computation to the switch. While switches provide high performance, they have resource and programming constraints that make implementing diverse queries difficult. To fit in these constraints, we introduce the concept of data pruning – filtering out entries that are guaranteed not to affect output. The database system then runs the same query but on the pruned data, which significantly reduces processing time. We propose pruning algorithms for a variety of queries. We implement our system, Cheetah, on a Barefoot Tofino switch and Spark. Our evaluation on multiple workloads shows 40−200% improvement in the query completion time compared to Spark.

Bio

Syed is a second year PhD student at Harvard University advised by Dr. Minlan Yu. He has worked on problems involving programmable networks, cluster management, and the intersection of networking and database design.