« Achieving deep big data analysis will require technological breakthroughs that unite research advances in machine learning and database management systems. »Volker Markl, Director of the BBDC
« Let the machines learn! »Klaus-Robert Müller,
Co-Director of the BBDC
Big data is often defined as any data set that cannot be handled using today’s widely available mainstream techniques and technologies. The challenges of handling big data are often described using 3-Vs (volume, variety and velocity): high volume of data from a variety of data sources arriving with high velocity analysed to achieve an economic benefit. However, the 3-Vs fail to reflect complexity of “Big Data” in its entirety. The real complexity from a technical perspective stems from the fact that complex predictive and prescriptive analytic methods need to be applied to huge, heterogeneous data sets. However, “Big Data” (or often also called “Smart Data”) has a much wider scope and has challenges and opportunities in 5 dimensions: technology, application, economic, legal and social ...
Data Scientist - Bridging the Talent Gap
According to the Harvard Business Review, Data Scientist is “The Sexiest Job of the 21st Century”. Data scientists are often considered to be wizards that deliver value from big data. These wizards need to have knowledge in three very distinct subject areas, namely, scalable data management, data analysis and domain area expertise. However, it is a challenge to find these jacks-of-all-trades that cover all three areas. Or, as the Wall Street Journal puts it “Big Data’s Problem is Little Talent”. Naturally, finding talented data scientists is also a requirement, if we are to put big data to good use. If data analysis were specified using a declarative language, data scientists would not have to worry about low-level programming any longer. Instead, they would be free to concentrate on their data analysis problem. The goal of the Berlin Big Data Center is to help bridge the Talent Gap of Big Data through researching and developing novel technology. Our starting point is the Apache Flink system. We aim to enable deep analytics of huge heterogeneous data sets with low latency by developing advanced, scalable data analysis and machine learning methods. Our goal is to specify in these methods a declarative way and optimize and parallelize them automatically, in order to empower data scientists to focus on the analysis problem at hand. That is, relieving them from the need to be system programmers.
Read more about it in the article of the VLDB keynote "Breaking the Chains: On Declarative Data Analysis and Data Independence in the Big Data Era" by Volker Markl.