Data Management

Via automatic optimization and parallelization, declarative specifications will enable an increasing number of data analysts to access novel Big Data Analytics, which would benefits economies, science, and society. We will develop methods that automatically optimize and parallelize declarative data analysis specifications as well as adapt them to a given computer architecture. To achieve this, we will address the following research questions:

  • How to characterize different algorithms for iterative processing algorithms for ordered and unordered collections of large amounts of data?
  • Which transformations of equivalent partial programs of a data analysis programs are optimizing?
  • Should a part of an iterative data analysis program be optimized and parallelized as a data flow or control flow?
  • Should fault tolerance for a given data analysis program be realized pessimistically by check pointing, optimistically by defining compensating operations or a hybrid of both?
  • How can fault tolerance and consistency requirements for intermediate results be automatically optimized, particularly in data analysis with high data rates, requiring pipelining within the processing?
  • How should automatically be decided between pipelining or materialization, depending on the requirements of the processing in terms of latency, data volume and data rate?
  • How can we ensure numerical stability of the data analysis algorithms despite optimizing transformations?

In this way, we will dramatically simplify the creation of data analysis programs, increase the big data analyst user base and drastically reduce the cost of the creating complex Big Data analyses.