Skip to main content

Paper accepted for publication in PVLDB Volume 10

The paper "Estimating Join Selectivities using Bandwidth-Optimized Kernel Density Models" by Martin Kiefer, Max Heimel, Sebastian Breß and Volker Markl will be published in PVLDB Vol 10 and will be presented at VLDB 2018 in Rio de Janeiro, Brazil


Accurately predicting the cardinality of intermediate plan operations is an essential part of any modern relational query optimizer. The accuracy of said estimates has a strong and direct impact on the quality of the generated plans, and incorrect estimates can have a negative impact on query performance. One of the biggest challenges in this field is to predict the result size of join operations. Kernel Density Estimation (KDE) is a statistical method to estimate multi-variate probability distributions from a data sample. Previously, we demonstrated how to build a modern, self-tuning selectivity estimator for range scans based on KDE that outperforms state-of-the-art multidimensional histograms and is efficient to evaluate on graphics cards. In this paper, we show how to extend these bandwidth-optimized KDE models to estimate the result size of single and multiple joins. In particular, we propose two approaches: (1) Building a KDE model from a sample drawn from the join result. (2) Efficiently combining the information from base table KDE models. We evaluated our KDE-based join estimators on a variety of synthetic and real-world datasets, demonstrating that they are superior to state-of-the art join estimators based on sketching or sampling.