Year:  
All - 2014 2015 2016 2017  ...  2019
← Select a Year 
References

2019

Derakhshan, Behrouz; Rezaei Mahdiraji, Alireza; Rabl, Tilmann; Markl, Volker
Continuous Deployment of Machine Learning Pipelines
International Conference on Extending Database Technology. International Conference on Extending Database Technology (EDBT-2019), March 25-29, Lisbon, Portugal
Publisher: OpenProceedings,
2019
ISBN: 978-3-89318-081-3
Alt, Christoph; Hübner, Marc; Hennig, Leonhard
Improving Relation Extraction by Pre-trained Language Representations
Proceedings of AKBC 2019. Automated Knowledge Base Construction (AKBC-2019), May 20-22, Amherst, Massachusetts, United States , page 1--18.
Publisher: OpenReview,
2019
Traub, Jonas; Grulich, Philipp; Cuéllar, Alejandro Rodríguez; Breß, Sebastian; Katsifodimos, Asterios; Rabl, Tilmann; Markl, Volker
Efficient Window Aggregation with General Stream Slicing
22th International Conference on Extending Database Technology (EDBT). International Conference on Extending Database Technology (EDBT-2019), 22th, March 26-29, Lisbon, Portugal
Publisher: OpenProceedings,
2019
Zeuch, Steffen; Del Monte, Bonaventura; Karimov, Jeyhun; Lutz, Clemens; Renz, Manuel; Traub, Jonas; Breß, Sebastian; Rabl, Tilmann; Markl, Volker
Analyzing Efficient Stream Processing on Modern Hardware
Proceedings of the VLDB Endowment (PVLDB), 12(5):516--530
2019
Awad, Ahmed; Traub, Jonas; Sakr, Sherif
Adaptive Watermarks: A Concept Drift-based Approach for Predicting Event-Time Progress in Data Streams
21st International Conference on Extending Database Technology (EDBT). International Conference on Extending Database Technology (EDBT-2018), 21st, March 26-29, Vienna, Austria
Publisher: OpenProceedings,
2019
Zhao, Guoguang; Zhao, Jianyu; Li, Yang; Alt, Christoph; Schwarzenberg, Robert; Hennig, Leonhard; Schaffer, Stefan; Schmeier, Sven; Hu, Changjian; Xu, Feiyu
MOLI: Smart Conversation Agent for Mobile Customer Service
Information, 10(2):1--14
February 2019
Mohammad Mahdavi, Ziawasch Abedjan, Raul Castro Fernandez, Samuel Madden, Mourad Ouzzani, Michael Stonebraker,; Tang, Nan
Raha: A Configuration-Free Error Detection System
SIGMOD
2019
Abedjan, Ziawasch
Data Profiling
Encyclopedia of Big Data Technologies.
2019
Çakal, Öykü Özlem; Mahdavi, Mohammad; Abedjan, Ziawasch
CLRL: Feature Engineering for Cross-Language Record Linkage
EDBT , page 678--681.
2019
Esmailoghli, Mahdi; Redyuk, Sergey; Martinez, Ricardo; Abedjan, Ziawasch; Rabl, Tilmann; Markl, Volker
Explanation of Air Pollution Using External Data Sources
BTW , page 297--300.
2019
Abedjan, Ziawasch; Boujemaa, Nozha; Campbell, Stuart; Casla, Patricia; Chatterjea, Supriyo; Consoli, Sergio; Costa Soria, Crist\´obal; Czech, Paul; Despenic, Marija; Garattini, Chiara; Hamelinck, Dirk; Heinrich, Adrienne; Kraaij, Wessel; Kustra, Jacek; Lojo, Aizea; Martin Sanchez, Marga; Angel Mayer, Miguel; Melideo, Matteo; Menasalvas, Ernestina; Moller Aarestrup, Frank; Narro Artigot, Elvira; Petkovic, Milan; Reforgiato Recupero, Diego; Rodr{\'\i}guez Gonz\´alez, Alejandro; Roesems Kerremans, Gisele; Roller, Roland; Romão, M\´ario; Rüping, Stefan; Sasaki, Felix; Spek, Wouter; Stojanovic, Nenad; Thoms, Jack; Vasiljevs, Andrejs; Verachtert, Wilfried; Wuyts, Roel
Data Science in Healthcare: Benefits, Challenges and Opportunities
Data Science for Healthcare - Methodologies and Applications
page 3--38.
2019

2018

Chen Xu, Rudi Poepsel Lemaitre, Juan Soto, Volker Markl
Fault-Tolerance for Distributed Iterative Dataflows in Action
PVLDB 11(12), :1990-1993
2018

Abstract: Distributed dataflow systems (DDS) are widely employed in graph processing and machine learning (ML), where many of these algorithms are iterative in nature. Typically, DDS achieve fault-tolerance using checkpointing mechanisms or they exploit algorithmic properties to enable fault-tolerance without the need for checkpoints. Recently, for graph processing, we proposed utilizing unblocking checkpointing, to parallelize the execution pipeline and checkpoint writing, as well as confined recovery, to enable fast recovery upon partial node failures. Furthermore, for ML algorithms implemented using broadcast variables, we proposed utilizing replica recovery, to leverage broadcast variable replicas and facilitate failure recovery checkpointing-free. In this demonstration, we showcase these fault-tolerance techniques using Apache Flink. Attendees will be able to: (i) run representative iterative algorithms including PageRank, Connected Components, and K-Means, (ii) explore the internal behavior of DDS under the influence of unblocking checkpointing, and (iii) trigger failures, to observe the effects of confined recovery and replica recovery.

Traub, Jonas; Grulich, Philipp; Rodríıguez Cuéllar, Alejandro; Breß, Sebastian; Katsifodimos, Asterios; Rabl, Tilmann; Markl, Volker
Scotty: Efficient Window Aggregation for out-of-order Stream Processing
, page 1300-1303.
2018

Abstract: Computing aggregates over windows is at the core of virtually every stream processing job. Typical stream processing applications involve overlapping windows and, therefore, cause redundant computations. Several techniques prevent this redundancy by sharing partial aggregates among windows. However, these techniques do not support out-of-order processing and session windows. Out-of-order processing is a key requirement to deal with delayed tuples in case of source failures such as temporary sensor outages. Session windows are widely used to separate different periods of user activity from each other. In this paper, we present Scotty, a high throughput operator for window discretization and aggregation. Scotty splits streams into non-overlapping slices and computes partial aggregates per slice. These partial aggregates are shared among all concurrent queries with arbitrary combinations of tumbling, sliding, and session windows. Scotty introduces the first slicing technique which (1) enables stream slicing for session windows in addition to umbling and sliding windows and (2) processes out-of-order tuples efficiently. Our technique is generally applicable to a broad group of dataflow systems which use a unified batch and stream processing model. Our experiments show that we achieve a throughput an order of magnitude higher than alternative stateof-the-art solutions.

Quoc-Cuong To, Juan Soto, Volker Markl
A survey of state management in big data processing systems
VLDB J. 27(6), :847-872
2018

Abstract: The concept of state and its applications vary widely across big data processing systems. This is evident in both the research literature and existing systems, such as Apache Flink, Apache Heron, Apache Samza, Apache Spark, and Apache Storm. Given the pivotal role that state management plays, particularly, for iterative batch and stream processing, in this survey, we present examples of state as an enabler, discuss the alternative approaches used to handle and implement state, capture the many facets of state management, and highlight new research directions. Our aim is to provide insight into disparate state management techniques, motivate others to pursue research in this area, and draw attention to open problems.

Niklas Stoehr, Johannes Meyer, Volker Markl, Qiushi Bai, Taewoo Kim, De-Yu Chen, Chen Li
Heatflip: Temporal-Spatial Sampling for Progressive Heat Maps on Social Media Data
, page 3723-3732.
2018

Abstract: Keyword-based heat maps are a natural way to explore and analyze the spatial properties of social media data. Dealing with large datasets, there may be many different keywords, making offline pre-computations very hard. Interactive frameworks that exploit database sampling can address this challenge. We present a novel middleware technique called Heatflip, which issues diametrically opposed samples into the temporal and spatial dimensions of the data stored in an external database. Spatial samples provide insights into the temporal distribution and vice versa. The progressive exploration approach benefits from adaptive indexing and combines the retrieval and visualization of the data in a middleware layer. Without any a priori knowledge of the underlying data, the middleware can generate accurate heat maps in 85% shorter processing times than conventional systems. In this paper, we discuss the analytical background of Heatflip, showcase its scalability, and validate its performance when visualizing large amounts of social media data.

Seibert, Felix; Peters, Mathias; Schintke, Florian
mproving I/O Performance Through Colocating Interrelated Input Data and Near-Optimal Load Balancing.
Proceedings of the IPDPSW; Fourth IEEE International Workshop on High Performance Big Data, Deep Learning and Cloud Computing (HPBDC), Volume 2018 , page 448-457.
2018

Note: Best paper award

Renner, T.; Müller, J.; Kao, O.
"Endolith: A Blockchain-Based Framework to Enhance Data Retention in Cloud Storages,"
26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), Cambridge , page 627-634..
2018

Abstract: Blockchains like Bitcoin and Ethereum have seen significant adoption in the past few years and show promise to design applications without any centralized reliance on third parties. In this paper, we present Endolith, an auditing framework for verifying file integrity and tracking file history without third party reliance using a smart contract-based blockchain. Annotated files are continuously monitored and metadata about changes including file hashes are stored tamper-proof on the blockchain. Based on this, Endolith can prove that a file stored a long time ago has not been changed without authorization or, if it did, track when it has changed, by whom. Endolith implementation is based on Ethereum and Hadoop Distributed File System (HDFS). Our evaluation on a public blockchain network shows that Endolith is efficient for files that are infrequently modified but often accessed, which are common characteristics of data archives.

Clemens Lutz, Sebastian Breß, Tilmann Rabl, Steffen Zeuch, Volker Markl
Efficient and Scalable k‑Means on GPUs
Datenbank-Spektrum 18(3), :157-169
2018

Abstract: k-Means is a versatile clustering algorithm widely used in practice. To cluster large data sets, state-of-the-art implementations use GPUs to shorten the data to knowledge time. These implementations commonly assign points on a GPU and update centroids on a CPU. We identify two main shortcomings of this approach. First, it requires expensive data exchange between processors when switching between the two processing steps point assignment and centroid update. Second, even when processing both steps of k-means on the same processor, points still need to be read two times within an iteration, leading to inefficient use of memory bandwidth. In this paper, we present a novel approach for centroid update that allows us to efficiently process both phases of k-means on GPUs. We fuse point assignment and centroid update to execute one iteration with a single pass over the points. Our evaluation shows that our k-means approach scales to very large data sets. Overall, we achieve up to 20 × higher throughput compared to the state-of-the-art approach.

Jeyhun Karimov, Tilmann Rabl, Volker Markl
PolyBench: The First Benchmark for Polystores
, page 24-41.
2018

Abstract: Modern business intelligence requires data processing not only across a huge variety of domains but also across different paradigms, such as relational, stream, and graph models. This variety is a challenge for existing systems that typically only support a single or few different data models. Polystores were proposed as a solution for this challenge and received wide attention both in academia and in industry. These are systems that integrate different specialized data processing engines to enable fast processing of a large variety of data models. Yet, there is no standard to assess the performance of polystores. The goal of this work is to develop the first benchmark for polystores. To capture the flexibility of polystores, we focus on high level features in order to enable an execution of our benchmark suite on a large set of polystore solutions.

Thamsen, Lauritz; Verbitskiy, Ilya ,; Rabier, Benjamin; Kao, Odej
Learning Efficient Co-locations for Scheduling Distributed Dataflows in Shared Clusters
In Services Transactions on Big Data (Vol. 4, No. 1). Services Society.
2018

Abstract: Resource management systems like YARN or Mesos allow sharing cluster resources by running data-parallel processing jobs in temporarily reserved containers. Containers, in this context, are logicalleases of resources as, for instance, a number of cores and main memory, allocated on a particularnode. Typically, containers are used without resource isolation to achieve high degrees of overallresource utilization despite the often fluctuating resource usage of single analytic jobs. However, somecombinations of jobs utilize the resources better and interfere less with each other when running on thesame nodes than others. This paper presents an approach for improving the resource utilization and job throughput whenscheduling recurring distributed data-parallel processing jobs in shared cluster environments. Usinga reinforcement learning algorithm, the scheduler continuously learns which jobs are best executedsimultaneously on the cluster. We evaluated a prototype implementation of our approach with HadoopYARN, exemplary Flink jobs from different application domains, and a cluster of commodity nodes.Even though the measure we use to assess the goodness of schedules can still be improved, the resultsof our evaluation show that our approach increases resource utilization and job throughput.

Janßen, Gerrit; Verbitskiy, Ilya; Renner, Thomas; Thamsen, Lauritz
Scheduling Stream Processing Tasks on Geo-Distributed Heterogeneous Resources.
2018 IEEE International Conference on Big Data (IEEE BigData). Presented at the First International Workshop on the Internet of Things Data Analytics (IoTDA)
2018

Abstract: Low-latency processing of data streams from distributed sensors is becoming increasingly important for a growing number of IoT applications. In these environments sensor data collected at the edge of the network is typically transmitted in a number of hops: from devices to intermediate resources to clusters of cloud resources. Scheduling processing tasks of dataflow jobs on all the resources of these environments can significantly reduce application latencies and network congestion. However, for this schedulers need to take the heterogeneity of processing resources and network topologies into account.This paper examines multiple methods for scheduling distributed dataflow tasks on geo-distributed, heterogeneous resources. For this, we developed an optimization function that incorporates the latencies, bandwidths, and computational resources of heterogeneous topologies. We evaluated the different placement methods in a virtual geo-distributed and heterogeneous environment with an IoT application. Our results show that metaheuristic methods that take service quality metrics into account can find significantly better placements than methods that only take topologies into account, with latencies reduced by almost 50%.

Schmidtke, Robert; Schintke, Florian; Schütt, Thorsten
From Application to Disk: Tracing I/O Through the Big Data Stack.
High Performance Computing ISC High Performance 2018 International Workshops
2018
Sebastian Breß, Bastian Köcher, Henning Funke, Steffen Zeuch, Tilmann Rabl, Volker Markl
Generating custom code for efficient query execution on heterogeneous processors
VLDB J. 27(6), :797-822
2018

Abstract: Processor manufacturers build increasingly specialized processors to mitigate the effects of the power wall in order to deliver improved performance. Currently, database engines have to be manually optimized for each processor which is a costly and error- prone process. In this paper, we propose concepts to adapt to and to exploit the performance enhancements of modern processors automatically. Our core idea is to create processor-specific code variants and to learn a well-performing code variant for each processor. These code variants leverage various parallelization strategies and apply both generic- and processor-specific code transformations. Our experimental results show that the performance of code variants may diverge up to two orders of magnitude. In order to achieve peak performance, we generate custom code for each processor. We show that our approach finds an efficient custom code variant for multi-core CPUs, GPUs, and MICs.

Behrens, Tobias; Rosenfeld, Viktor; Traub, Jonas; Breß, Sebastian; Markl, Volker
Efficient SIMD Vectorization for Hashing in OpenCL
, page 489-492.
2018

Abstract: Hashing is at the core of many efficient database operators such as hash-based joins and aggregations. Vectorization is a technique that uses Single Instruction Multiple Data (SIMD) instructions to process multiple data elements at once. Applying vectorization to hash tables results in promising speedups for build and probe operations. However, vectorization typically requires intrinsics – low-level APIs in which functions map to processorspecific SIMD instructions. Intrinsics are specific to a processor architecture and result in complex and difficult to maintain code. OpenCL is a parallel programming framework which provides a higher abstraction level than intrinsics and is portable to different processors. Thus, OpenCL avoids processor dependencies, which results in improved code maintainability. In this paper, we add efficient, vectorized hashing primitives to OpenCL. Our results show that OpenCL-based vectorization is competitive to intrinsics on CPUs but not on Xeon Phi coprocessors.

Verbitskiy, Ilya; Thamsen, Lauritz; Renner, Thomas; Kao, Odej
CoBell: Runtime Prediction for Distributed Dataflow Jobs in Shared Clusters
10th IEEE International Conference on Cloud Computing Technology and Science (CloudCom)
2018

Abstract: Low-latency processing of data streams from distributed sensors is becoming increasingly important for a growing number of IoT applications. In these environments sensor data collected at the edge of the network is typically transmitted in a number of hops: from devices to intermediate resources to clusters of cloud resources. Scheduling processing tasks of dataflow jobs on all the resources of these environments can significantly reduce application latencies and network congestion. However, for this schedulers need to take the heterogeneity of processing resources and network topologies into account.This paper examines multiple methods for scheduling distributed dataflow tasks on geo-distributed, heterogeneous resources. For this, we developed an optimization function that incorporates the latencies, bandwidths, and computational resources of heterogeneous topologies. We evaluated the different placement methods in a virtual geo-distributed and heterogeneous environment with an IoT application. Our results show that metaheuristic methods that take service quality metrics into account can find significantly better placements than methods that only take topologies into account, with latencies reduced by almost 50%.

Thamsen, Lauritz; Renner, Thomas; Verbitskiy, Ilya; Kao, Odej
Adaptive Resource Management for Distributed Data Analytics
In Lucio Grandinetti, Seyedeh Leili Mirtaheri, Reza Shahbazian, Thomas Sterling, Vladimir Voevodin (eds.), Advances in Parallel Computing – Big Data and HPC: Ecosystem and Convergence. IOS Press
2018

Abstract: Increasingly large datasets make scalable and distributed data analytics necessary. Frameworks such as Spark and Flink help users in efficiently utilizing cluster resources for their data analytics jobs. It is, however, usually difficult to anticipate the runtime behavior and resource demands of these distributed data analytics jobs. Yet, many resource management decisions would benefit from such information. Addressing this general problem, this chapter presents our vision of adaptive resource management and reviews recent work in this area. The key idea is that workloads should be monitored for trends, patterns, and recurring jobs. These monitoring statistics should be analyzed and used for a cluster resource management calibrated to the actual workload. In this chapter, we motivate and present the idea of adaptive resource management. We also introduce a general system architecture and we review specific adaptive techniques for data placement, resource allocation, and job scheduling in the context of our architecture.

2017

Traub, Jonas; Steenbergen, Nikolaas; Grulich, Philipp; Rabl, Tilmann; Markl, Volker
I²: Interactive Real-Time Visualization for Streaming Data
in Proc. 20th International Conference on Extending Database Technology (EDBT), March 21-24, 2017.
March 2017
Schütt, Kristof T.; Kindermans, Pieter-Jan,; Sauceda, Huziel E.; Chiemela, Stefan; Tkatchenko, Alexandre; Müller, Klaus-Robert
SchNet: A continuous-filter convolutional neural network for modeling quantum interactions
Neural Information Processing Systems (NIPS)
2017
Schütt, Kristof T.; Arbabzadah, Farhad; Chmiela, Stefan; Müller, Klaus R.; Tkatchenko, Alexandre
Quantum-chemical insights from deep tensor neural networks
In: Nature Communications 8,
January 2017
Rohrmann, Till; Schelter, Sebastian; Rabl, Tilmann; Markl, Volker
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Systems
in BTW 2017 (pp. 269-288)
March 2017
Renner, Thomas; Müller, Johannes; Thamsen, Lauritz; Kao, Odej
Addressing Hadoop's Small File Problem With an Appendable Archive File Format.
In the Proceedings of the Computing Frontiers Conference (CF).
2017
Montavon, Grégoire; Samek, Wojciech; Müller, Klaus-Robert
Methods for interpreting and understanding deep neural networks
Digital Signal Processing February 2018, Vol. 73,:1-15
2017
Kunft, Andreas; Katsifodimos, Asterios; Schelter, Sebastian; Rabl, Tilmann; Mark, Volker
BlockJoin: Efficient Matrix Partitioning Through Joins
Proceedings of the VLDB Endowment (PVLDB) Volume 10
2017
Kiefer, Martin; Heimel, Max; Breß, Sebastian; Markl, Volker
Estimating Join Selectivities using Bandwidth-Optimized Kernel Density Models
Proceedings of the VLDB Endowment (PVLDB) Volume 10
2017
Gupta, P.; Gramatke, A.; Einspanier, R.; Schütte, M.; von Kleist, M.; Sharbati, J.
In silico cytotoxicity assessment on cultured rat intestinal cells deduced from cellular impedance measurement.
Accepted for: Toxicology in Vitro,
2017
Gornitz, N.; Lima, L. A.; Müller, K. R.; Kloft, M.; Nakajima, S.
Support Vector Data Descriptions and K-means Clustering: One Class?
IEEE Transactions on Neural Networks and Learning Systems, Volume: PP, Issue: 99,:1 - 13
2017
Goldsmith, B. R.; Boley, M.; Vreeken, J.; Scheffler, M.; Ghiringhelli, L. M.
Uncovering structure-property relationships of materials by subgroup discovery.
In: New J. Phys. 19: 013031,
2017
Giotsas, V.; Smaragdakis, G.; Feldmann, A.; Berger, A.; Aben, E.
Detecting Peering Infrastructure Outages in the Wild.
In: ACM SIGCOMM
2017
Ghiringhelli, L. M.; Vybiral, J.; Ahmetcik, E.; Ouyang, R.; Levchenko,, S. V.; Draxl, C.; Scheffler, M.
Learning physical descriptors for materials science by compressed sensing.
In: New J. Phys. 19: 023017,
2017
Gelß, P.; Klus, S.; Matera, S.; Schütte, Ch.
Nearest-Neighbor Interaction Systems in the Tensor Train Format.
Accepted for: J. Comp. Physics,
2017
Feldmann, Anja; Hauswirth, M.; Markl, V.
Enabling Wide Area Data Analytics with CDPPs (Collaborative Distributed Processing Pipelines.
In: IEEE Int. Conference on Distributed Computing Systems (IEEE ICDCS), Blue-Sky Ideas / Vison Track,
2017
Deng, D.; Fernandez, R.; Abedjan, Z.; Wang, S.; Stonebraker, S.; Elmagarmid, A.; Ilyas, I.; Madden, S.; Ouzzani, M.; Tang, N.
The Data Civilizer System.
In: CIDR
2017
Conrad, T.; Genzel, M.; Cvetkovic, N.; Wulkow, N.; Leichtle, A.; Vybiral, J.; Kutyniok, G.; Schütte, Ch.
Sparse Proteomics Analysis - a compressed sensing-based approach for feature selection and classification of high-dimensional proteomics mass spectrometry data.
In: BMC Bioinformatics, 18(160),
2017
Chmiela, S.; Tkatchenko, A.; Sauceda, H. E.; Poltavsky, I.; Schütt, K. T.; Müller, K.-R.
Machine learning of accurate energy-conserving molecular force fields.
In: Science Advances, 3(5),
2017
Brockherde, F.; Vogt, L.; Li, L.; Tuckerman, M. E.; Burke, K.; Müller, K. R.
Bypassing the Kohn-Sham equations with machine learning
Nature Communications, 8(1), 872.,
2017
Bosse, S.; Maniry, D.; Müller, K. R.; Wiegand, Th.; Samek, w.
Deep neural networks for no-reference and full-reference image quality assessment
IEEE Transactions on Image Processing.
2017
Boley, M.; Goldsmith, B.; Ghiringhelli, L.; Vreeken, J.
Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery.
In: Data Mining and Knowledge Discovery 5(2017),
2017
Boden, Ch.; Spina, A.; Rabl, T.; Markl, V.
Benchmarking Data Flow Systems for Scalable Machine Learning.
In: BeyondMR@SIGMOD
2017
Bergen, E.; Edlich, St.
Post-Debugging in Large Scale Analytic Systems.
In: Datenbanksysteme für Business, Technologie und Web (BTW), Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS) , page 65-72.
2017
Alber, Maximilian; Zimmert, Julian; Dogan, Urun; Kloft, Marius
Distributed optimization of multi-class SVMs
PLOS ONE 12(6): e0178161,
2017
Alber, Maximilian; Kindermans, Pieter-Jan; Schütt, Kristof T.; Müller, Klaus-Robert; Sha, Fei
An Empirical Study on The Properties of Random Bases for Kernel Methods
Advances in Neural Information Processing Systems 30 (NIPS)
2017

2016

Yukawa, Masahiro; Müller, Klaus-Robert
Why Does a Hilbertian Metric Work Efficiently in Online Learning With Kernels?
In: IEEE SIGNAL PROCESSING LETTERS, 2 3(10):1424 - 1428
2016
Xu, Chen; Holzmer, Marcus; Kaul, Manohar; Markl, Volker
Efficient Fault-tolerance for Iterative Graph Processing on Distributed Dataflow Systems
In: 32nd IEEE International Conference on Data Engineering (ICDE)
2016
Vidovic, Marina M.-C.; Görnitz, Nico; Müller, Klaus R; Kloft, Marius
Feature Importance Measure for Non-linear Learning Algorithms
Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems,
November 2016
Verbitskiy, Ilya; Thamsen, Lauritz; Kao, Odej
When to Use a Distributed Dataflow Engine: Evaluating the Performance of Apache Flink.
International IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), IEEE
2016
Treder, Matthias S.; Porbadnigk, Anne K.; Forooz, Shahbazi; Müller, Klaus-Robert; Blankertz, Benjamin
The LDA beamformer: Optimal estimation of ERP source time series using linear discriminant analysis
In: NeuroImage, 1 2 9:279 - 291
2016
Thamsen, Lauritz; Verbitskiy, Ilya; Schmdt, Florian; Renner, Thomas; Kao, Odej
Selecting Resources for Distributed Dataflow Systems According to Runtime Targets.
In the Proceddings of IEEE 35th International Performance Computing and Communications Conference (IPCCC).
2016
Thamsen, Lauritz; Renner, Thomas; Kao, Odej
Continuously Improving the Resource Utilization of Iterative Parallel Dataflows.
In the Proceedings of the IEEE International Conference on Distributed Computing Systems Workshops (ICDCSW). Presented at the International Workshop on Big Data and Cloud Performance (DCPerf). IEEE.
2016
Thamsen, Lauritz; Renner, Thomas; Byfeld, Marvin; Paeschke, Markus; Schröder, Daniel; Böhm, Felix
Visually Programming Dataflows for Distributed Data Analytics.
In the Proceddings of IEEE International Conference on Big Data (Big Data).
2016
Schmidtke, Robert; Laubender, Guido; Steinke, Thomas
Big Data Analytics on Cray XC Series DataWarp using Hadoop, Spark and Flink
In: Cray User Group (CUG) 2016 Proceedings
2016
Sannelli, C.; Vidaurre, C.; Müller, K. R.; Blankertz, B.
Ensembles of adaptive spatial filters increase BCI performance: an online evaluation.
In: Journal of neural engineering, 13(4), 046003
2016
Samek, Wojciech; Blythe, Duncan A. J.; Curio, Gabriel; Müller, Klaus-Robert; Blankertz, Benjamin; Nikulin, Vadim V.
Multiscale temporal neural dynamics predict performance in a complex sensorimotor task
In: NeuroImage, 141:291 - 303
2016
Renner, Thomas; Thamsen, Lauritz; Kao, Odej
CoLoc: Distributed Data and Container Colocation for Data-Intensive Applications.
. In the Proceddings of IEEE International Conference on Big Data (Big Data).
2016
Pronobis, W.; Panknin, D.; Kirschnik, J.; Srinivasan, V.; Samek, w.; Markl, V.; Kaul, M.; Müller, K.-R.; Nakajima, S
Sharing Hash Codes for Multiple Purpose.
Pronobis et al., arXiv:1609.03219,
2016
Nakajima, S.; Tomioka, R.; Sugiyama, M.; Babacan, S. D.
Condition for Perfect Dimensionality Recovery by Variational Bayesian PCA.
In: Journal of Machine Learning Research, 16(3): 3757-3811,
2016
Min, B. K.; Dähne, S.; Ahn, M. H.; Noh, Y. K.; Müller, K. R.
Decoding of top-down cognitive processing for SSVEP-controlled BMI.
In: Sci. Rep. 6, 36267,
2016
Lapuschkin, S.; Binder, A.; Montavon, G.; Müller, K. R.; Samek, w.
Analyzing classifiers: fisher vectors and deep neural networks.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , page 2912-2920.
2016
Lapuschkin, Sebastian; Binder, Alexander; Montavon, Grégoire; Müller, Klaus-Robert; Samek, Wojciech
The LRP Toolbox for Artificial Neural Networks
In: Journal of Machine Learning Research, 17(114):1 -5
2016
Krause, S.; Xu, F.; Uszkoreit, H.; Weißenborn, D.
Event Linking with Sentential Features from Convolutional Neural Networks.
In: Proceedings of CoNLL, Association for Computational Linguistics
2016
Krause, Sebastian; Hennig, Leonhard; Moro, Andrea; Weissenborn, Dirk; Xu, Feiyu; Uszkoreit, Hans; Navigli, Roberto
Sar-graphs: A language resource connecting linguistic knowledge with semantic relations from knowledge graphs.
In: K, Journal of Web Semantics: Science, Services and Agents on the World Wide Web,
2016
Koltai, P.; Ciccotti, G.; Schütte, Ch.
On Markov state models for non-equilibrium molecular dynamics.
In: The Journal of Chemical Physics 145, 174103,
2016

Note: (Editors' Choice of The Journal of Chemical Physics)

Kindermans, Pieter-Jan; Schütt, Kristof T.; Müller, Klaus R.; Dähne, Sven
Investigating the influence of noise and distractors on the interpretation of neural networks
Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems,
November 2016
Jugel, U.; Jerzak, Z.; Markl, V.
Big data on a few pixels.
In: IEEE International Conference on Big Data (Big Data) , page 895-900.
2016
Jugel, U.; Jerzak, Z.; Hackenbroich, G.; Markl, V.
VDDA: automatic visualization-driven data aggregation in relational databases
In: VLDB J. 25(1): 53-77,
2016

Note: (VLDB Best Paper Award)

Höhne, Johannes; Bartz, Daniel; Hebart, Martin N.; Müller, Klaus-Robert; Blankertz, Benjamin
Analyzing neuroimaging data with subclasses: a shrinkage approach
NeuroImage, 1 2 4::740 - 751
2016
Herb, Tobias; Thamsen, Lauritz; Renner, Thomas; Kao, Odej
Aura: A Flexible Dataflow Engine for Scalable Data Processing.
in Andreas Knüpfer, Tobias Hilbrich, Christoph Niethammer, José Gracia, Wolfgang E. Nagel, Michael M. Resch (eds.), Tools for High Performance Computing 2015. Springer.
2016
Hennig, L.; Thomas, Ph.; Ai, R.; Kirschnick, J.; Wang, H.; Pannier, J.; Zimmermann, N.; Schmeier, S.; Xu, F.; Ostwald, J.; Uszkoreit, H.
Real-Time Discovery and Geospatial Visualization of Mobility and Industry Events from Large-Scale, Heterogeneous Data Streams.
In: Proceedings of ACL
2016
Gül, S.; Meyer, J. T.; Hellge, C.; Schierl, Th.; Samek, w.
Hybrid Video Object Tracking in H.265/HEVC Video Streams.
In: Proceedings of the International Workshop on Multimedia Signal Processing (MMSP), 1-5
2016
Ghiringhelli, L. M.; Carbogno, C.; Levchenko, S.; Mohamed, F.; Huhs, G.; Lueders, M.; Oliveira, M.; Scheffler,, M.
Towards a Common Format for Computational Materials Science Data.
Published as: "?k Scientific Highlight of the Month", 131,
2016
Gabryszak, Aleksandra; Krause, Sebastian; Hennig, Leonhard; Xu, Feiyu; Uszkoreit, Hans
Relation- and phrase-level linking of FrameNet with Sar-graphs.
In: The 10th International Conference on Language Resources and Evaluation, LREC
2016
Fuerst, Carlo; Schmid, Stefan; Suresh, Lalith; Costa, Paolo
Kraken: Online and Elastic Resource Reservations for Multitenant Datacenters.
Proc. 35th IEEE Conference on Computer Communications (INFOCOM),,
2016
Fuerst, Carlo; Schmid, Stefan; Suresh, Lalith; Costa, Paolo
Kraken: Online and Elastic Resource Reservations for Multitenant Datacenters.
Proc. 35th IEEE Conference on Computer Communications (INFOCOM).
2016
Fajerski, J.; Noack, M.; Reinefeld, A.; Schintke, F.; Steinke, Th.
Fast In-Memory Checkpointing with POSIX API for Legacy Exascale-Applications.
H.J. Bungartz, P. Neumann, W.E. Nagel (Eds): Software for Exascale Computing, 2013-2015, Springer Lecture Notes in Computational Science and Engineering, 113: 427-441,
2016
Eichler, Kathrin; Xu, Feiyu; Uszkoreit, Hans; Hennig, Leonhard; Krause, Sebastian
TEG-REP: A corpus of Textual Entailment Graphs based on Relation Extraction Patterns.
In: The 10th International Conference on Language Resources and Evaluation, LREC
2016
Bosse, S.; Maniry, D.; Wiegand, Th.; Samek, w.
A Deep Neural Network for Image Quality Assessment.
In: Proceedings of the IEEE International Conference on Image Processing (ICIP) , page 3773-77.
2016
Bosse, S.; Maniry, D.; Müller, K.-R.; Wiegand, Th.; Samek, w.
Neural Network-Based Full-Reference Image Quality Assessment.
In: Proceedings of the Picture Coding Symposium (PCS), 1-5 , page 3773-77.
2016
Bosse, S.; Chen, Q.; Siekmann, M.; Samek, w.; Wiegand, Th.
Shearlet-based reduced reference image quality assessment.
In: IEEE International Conference on Image Processing (ICIP)
2016
Blythe, Duncan A. J.; Nikulin, Vadim V.; Müller, Klaus-Robert
Robust Statistical Detection of Power-Law Cross-Correlation
Scientific reports, 6:27089
2016
Binder, Alexander; Bach, Sebastian; Montavon, Grégoire; Müller, Klaus-Robert; Samek, Wojciech
Layer-Wise Relevance Propagation for Deep Neural Network Architectures
In: Information Science and Applications (ICISA), 9 13 - 922., :913 - 922
2016
Bauer, Alexander; Nakajima, Shinichi; Müller, K. R.
Efficient Exact Inference With Loss Augmented Objective in Structured Learning.
IEEE transactions on neural networks and learning systems Volume PP, no. 99 , page 1 - 14.
August 2016
Arbabzadah, Farhad; Montavon, Grégoire; Müller, Klaus-Robert; Samek, Wojciech
Identifying individual facial expressions by deconstructing a neural network
In: German Conference on Pattern Recognition , page 344 - 354.
Springer International Publishing
2016
Alexandrov, A.; Salzmann, A.; Krastev, G.; Katsifodimos, A.; Markl, V.
Emma in Action: Declarative Dataflows for Scalable Data Analysis.
In: SIGMOD Volume Record 45(1) , page 51-58.
2016
Alber, Maximilian; Zimmert, Julian; Dogan, Urun; Kloft, Marius
Distributed Optimization of Multi-Class SVMs
Extreme Classification NIPS 2016 Workshop,
November 2016
Abedjan, Z.:; Chu,, X.; Deng, D.; Fernandez, R.; Ilyas, R.; Ouzzani, I.; Papotti, M.; Stonebraker, M.; Tang, N.
Detecting Data Errors: Where are we and What needs to be done?
PVLDB 9(12):993-1004,
2016

2015

Weißenborn, Dirk; Hennig, Leonhard; Xu, Feiyu; Uszkoreit, Hans
Multi-objective Optimization for the Joint Disambiguation of Nouns and Named Entities
In: 53nd Annual Meeting of the Association for Computational Linguistics, ACL , page 596-605.
2015
Weißenborn, Dirk; Xu, Feiyu; Uszkoreit, Hans
DFKI: Multi-objective Optimization for the Joint Disambiguation of Entities and Nouns & Deep Verb Sense Disambiguation
In: 9th International Workshop on Semantic Evaluations (SemVal2015)
2015
Rosenfeld, Viktor; Heimel, Max; Viebig, Christoph; Markl, Volker
The Operator Variant Selection Problem on Heterogeneous Hardware,
In: ADMS@VLDB, 2015. , page 1-12.
2015
Renner, Thomas; Thamsen, Lauritz; Kao, Odej
Network-Aware Resource Management for Scalable Data Analytics Frameworks.
In Proceedings of the First Workshop on Data-Centric Infrastructure for Big Data Science (DIBS) 2015, co-located with the 2015 IEEE International Conference on BigData (BigData). IEEE.
2015
Reinefeld, A.; Schütt, Ch.; Döbbelin, R.
Fast Memory Access for Data Intensive Applications.
In: Forschung im HLRN-Verbund 2015, Konrad-Zuse-Zentrum für Informationstechnik Berlin (Hrsg.): 304-305,
2015
Pujol, Enric; Hohlfeld, Oliver; Feldmann, Anja
Annoyed Users: Ads and Ad-Block Usage in the Wild.
In Proceedings of the 2015 Internet Measurement Conference (IMC '15), ACM, New York, NY, USA, :93-106
2015
Nakajima, S.; Tomioka, R.; Sugiyama, M.; Babacan, S.D.
Condition for Perfect Dimensionality Recovery by Variational Bayesian PCA,
Journal of Machine Learning Research, vol.16, pp.3757-3811
2015
Lohrmann, Björn; Janacik, Peter; Kao, Odej
Elastic Stream Processing with Latency Guarantees
In: IEEE 35th International Conference on Distributed Computing Systems (ICDCS) , page 399-410.
July 2015
Li, Hong; Krause, Sebastian; Xu, Feiyu; Moro, Andrea; Uszkoreit, Hans; Navigli, Roberto
Improvement of n-ary Relation Extraction by Adding Lexical Semantics to Distant-Supervision Rule Learning.
In: International Conference on Agents and Artificial Intelligence (ICAART)
2015
Krause, Sebastian; Hennig, Leonhard; Gabryszak, Aleksandra; Xu, Feiyu; Uszkoreit, Hans
Sar-graphs: A Linked Linguistic Knowledge Resource Connecting Facts with Language.
Workshop on Linked Data in Linguistics: Resources and Applications, co-located with the Annual Meeting of the Association for Computational Linguistics (LDL @ ACL),
2015
Krause, Sebastian; Alfonseca, Enrique; Filippova, Katja; Pighin, Daniele
Learning a Distributed Representation for Event Patterns.
In: IIdest. Conference of the North American Chapter of the ACL – Human Language Technologies (NAACL HLT)
2015
Herb, Tobias; Renner, Thomas; Kao, Odej
Aura: A Flexible Dataflow Engine for Scalable Data Processing.
In: Tools for High Performance Computing: 117-126,
2015
Hennig, Leonhard; Li, Hong; Krause, Sebastian; Xu, Feiyu; Uszkoreit, Hans
A Web-based Collaborative Evaluation Tool for Automatically Learned Relation Extraction Patterns.
Annual Meeting of the Association for Computational Linguistics (ACL), System Demonstrations,
2015
Hansen, S.T.; Winkler, I.; Hansen, L.K.; Müller, K.-R.; Dähne, S.
Fusing simultaneous EEG and fMRI using functional and anatomical information,
In International Workshop on Pattern Recognition in Neuroimaging, 2015. IEEE.
2015
Ghiringhelli, L.M.; Vybiral, J.; Levchenko, S.V.; Draxl, C.; Scheffler, M.
Big Data of Materials Science: Critical Role of the Descriptor,
Phys. Rev. Lett. 114 (10),
March 2015
Dudoladov, Sergey; Katsifodimos, Asterios; Xu, Chen; Ewen, Stephan; Markl, Volker; Schelter, Sebastian; Tzoumas, Kostas
Optimistic Recovery for Iterative Dataflows in Action.
In Proceedings of the 2015 ACM SIGMOD International conference on Management of Data (SIGMOD '15).
2015
Djurdjevac, C.; Banisch, R.; Schütte, Ch.
Modularity of Directed Networks: Cycle Decomposition Approach.
In: Journal of Computational Dynamics 2(1): 1-24,
2015
Dähne, S.; Goltz, D.; Gundlach, C.; Mehnert, J.; Villringer, A.; Haufe, S.; Müller, K.-R.
Multivariate Fusion of EEG oscillations and fMRI using multimodal Source Power Co-modulation (mSPoC),
In Annual Meeting of the Organization for Human Brain Mapping (OHBM).
2015
Carbone, P.; Katsifodimos, A.; Ewen, St.; Markl, V.; Haridi, S.; Tzoumas, K.
Apache Flink™: Stream and Batch Processing in a Single Engine.
In: IEEE Data Eng. Bull. 38(4),
2015
Bhattacharya, S.; Sonin, B.; Jumonville, C.J.; Ghiringhelli, L.M.; Marom, N.
Computational Design of Nanoclusters by Property-Based Genetic Algorithms: Tuning the Electronic Properties of (TiO2)n Clusters.
In: Phys. Rev. B 91, 241115.,
June 2015
Bergen, E.; Edlich, St.
Towards a Taxonomy for Error Management in Big Data Analytics Systems.
n: Proceedings of the Research Day 2015, Beuth University of Applied Sciences Berlin, :80-83
2015
Bauer, Alexander; Braun, Mikio; Müller, Klaus-Robert
Accurate Max-Margin Training for Parsing with Context-Free Grammars,
IEEE Transactions on Neural Networks and Learning Systems. Volume 28(1) , page 44-45.
2015
Bach, S.; Binder, A.; Montavon, G.; Klauschen, F.; Müller, K.-R.; Samek, w.
On Pixel-wise Explanations for Non-Linear Classifier Decisions by Layer-wise Relevance Propagation,
In: PLOS ONE 10(7),,
2015
Alexandrov, Alexander; Kunft, Andreas; Katsifodimos, Asterios; Schüler, Felix; Thamsen, Lauritz; Kao, Odej; Herb, Tobias; Markl, Volker
Implicit Parallelism through Deep Language Embedding
In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15).
2015
Ai, Renlong; Krause, Sebastian; Kasper, Walter; Xu, Feiyu; Uszkoreit, Hans
Semi-automatic Generation of Multiple-Choice Tests from Mentions of Semantic Relations.
Workshop on Natural Language Processing Techniques for Educational Applications at the Annual Meeting of the Association for Computational Linguistics (NLP-TEA @ ACL),
2015

2014

Nakajima, S.; Sato, I.; Sugiyama, M.; Watanabe, K.; Kobayashi, H.
Analysis of Variational Bayesian Latent Dirichlet Allocation: Weaker Sparsity than MAP,
Twenty-Eighth Annual Conference on Neural Information Processing Systems (NIPS2014).
2014
Bauer, A.; Gornitz, N.; Biegler, F.; Müller, K.-R.; Kloft, M.
Efficient algorithms for exact inference in sequence labeling SVMs.
In: IEEE transactions on neural networks and learning systems 25(5): 870-881,
2014