Abstract: Imagine you are in charge of a high-volume stream of social interactions, and you would like to watch for certain graph structures in the stream of interactions. For example, Twitter recommends „who to follow“ by looking for accounts followed by at least two accounts you follow, a structure which can be described by a four node graph. There can be a substantial number of these motifs in a graph, but what if you only need to observe the changes as they happen, rather than enumerate all instances at once?
This work is an instance of the more general problem of maintaining cyclic (non-treelike) joins as the underlying relations change. We will first summarize recent work in worst-case optimal join evaluation (Ngo et al., 2014) which shows how to evaluate cyclic joins using resources asymptotically bounded by the largest possible output of the join (a property standard trees of binary joins cannot satisfy). We then build up a dataflow version of this algorithm, extend it to efficiently respond to changes in its inputs, and describe and evaluate its implementation in timely dataflow*. This project is joint work with Khaled Ammar and Semih Salihoglu, both of University of Waterloo.
*: Timely dataflow‘s cycles are not required for this implementation, so it could also be suitable for other, less crazy streaming systems.
Bio: Frank McSherry is an independent scientist working on scalable computation. His recent work focuses on data-parallel dataflow computation, in particular elevating their expressive power.
He currently develops and maintains several related projects (https://github.com/frankmcsherry) and writes a somewhat sassy blog (https://github.com/frankmcsherry/blog). Independently, he has also done foundational work on differential privacy, and continues to maintain an active presence in this community.