Abstract: After nearly 4 decades and hundreds of scientific papers, relational query optimization can hardly be characterized as anything but a huge scientific and commercial success. The market in 2016 for relational database products was estimated by IDC to be about $40B, out of a total database market of $45.1B. And SQL still dominates database application development and is widely recognized as the most successful declarative language. None of this would have been possible without the success of query optimization, which transforms declarative SQL statements of what data the user needs into an “optimal” execution plan, i.e., a detailed, procedural specification for how that data will be accessed and processed.
So are we “there” yet? Are we done? Are all the big and interesting problems solved? Is query optimization as an area of scientific inquiry dead, relegated to incremental improvements and mere engineering? Why do we continue to see so many papers on query optimization?
In this talk, I argue that current research appears to be incremental because we are largely attacking the wrong problems while ignoring much harder and more significant problems. We are solving the problems we know how to solve, not the problems that need solving. Fundamentally, query optimizers are mathematical models of the performance of alternative plans. Any such model that is based upon invalid assumptions or that is not systematically validated throughout its parametric space is not worth the paper on which it is written, because it will inevitably yield wrong results at unknown points in that space. Many of the remaining problems caused by invalid assumptions are contained in the Achilles Heel of query optimization: the underlying and ubiquitous cardinality model, which estimates the number of rows resulting from each operation in the execution plan.
Using real-world examples and customer “war stories”, as well as the underlying math, I will illustrate how easy it is to be off by orders of magnitude. Paradoxically, increasing the detail of optimizer models in response to these challenges may actually increase the brittleness of an optimizer!
Yikes! What is a conscientious query optimizer guru to do?
Bio: Dr. Guy M. Lohman recently retired from IBM’s Almaden Research Center in San Jose, California, where he worked for over 34 years as a Distinguished Research Staff Member and Manager.
His group contributed BLU Acceleration to DB2 for Linux, UNIX, and Windows (LUW) 10.5 (2013) and the query engine of the IBM Smart Analytics Optimizer for DB2 for z/OS V1.1 (2010) and the Informix Warehouse Accelerator (2011) products. He was the architect of the Query Optimizer of DB2 LUW and was responsible for its development from 1992 to 1997 (versions 2 – 5), as well as its Visual Explain, efficient sampling, and Index Advisor. Dr. Lohman was elected to the IBM Academy of Technology in 2002, and named an IBM Master Inventor in 2011.
He was the General Co-Chair (with Prof. Sang Cha) of the 2015 IEEE ICDE Conference and General Chair of the 2013 ACM Symposium on Cloud Computing.
He has been awarded 40 U.S. patents and is the (co-)author of over 80 technical papers in the refereed academic literature.