Innovation Lab

In our Innovation Lab, student teams of 5-6 people spend a semester working on Data Science projects. Students are intensively supervised by the team of experienced coaches over the whole project period. We welcome projects from the industry as well as research institutions and NGO’s, please contact us if you have an interesting problem to solve. Our students will deliver a proof-of-concept using state-of-the-art machine learning techniques with the optional possibility to purchase the usage-rights.


Argument Mining for Peer-Reviewing

In this project, students applied techniques from Argument Mining on the new corpus of peer-reviews for scientific publications. Students implemented a pipeline for automatic identification of arguments in peer-reviews and demonstrated empirically the importance of arguments in the decision making process. The work was presented at AAAI-21.


Modeling of Earth’s Radiation Environment

The sun continually emits electrically charged particles. These particles get accelerated/decelerated by the earth’s magnetic field. High-energy particles can pose severe threats to satellite operations and affect electricity plants on the ground. In this project, which is a collaboration with LMU Geophysics, the students developed predictive models for proton intensities in space based on geomagnetic and solar activity indices. The models were applied to investigate the correlation between proton intensities and measurement corruptions of an existing spacecraft and forecast proton intensities to facilitate satellite operators to protect their instruments.

Entity Alignment

Knowledge graphs (KGs) are a way to represent facts in a structured form that machines can efficiently process. There exist several large-scale common knowledge KGs, such as Wikidata or Google Knowledge Graph, but also more specialized ones, for instance, bio-medical ones, such as HetioNet. To combine information from different sources, entities from one graph have to be recognized in the other one, despite potentially additional labels/descriptions / associated data. This task is commonly referred to as Entity Alignment (EA). While humans can easily collect and combine information about an entity from different sources, the task remains challenging for Machine Learning methods. In this project, the students investigated several state-of-the-art entity alignment methods based on Graph Neural Networks (GNNs) and Generate Adversarial Networks (GANs). They re-implemented the techniques in a common framework, compared the code published by the authors to the method described in the papers, and tried to reproduce the reported results.

Generalization of Argument Mining Models

In the project, we studied the performance of several state-of-the-art argument detection models regarding the generalization capability across multiple argumentation-schemes.

KDD-CUP 2020

In this project, we used data from the KDDCup 2020 to create a realistic taxi-dispatching simulation environment for Reinforcement Learning. The data was analyzed, cleaned, and used to model the agent’s idle movement within the simulation and the taxi requests of passengers. Different kinds of policies were then implemented and evaluated, e.g., using Kuhn-Munkres and a value-based Reinforcement Learning algorithm.

Contact-Averse Reinforcement Learning

In this project, we applied Multi-agent Reinforcement Learning techniques to teach agents to avoid contact with each other while at the same time trying to get to their target destination as quickly as possible. For that, a flexible grid environment with different agent observations and rewards was implemented. Then, we trained Deep Q-Learning agents to navigate the environments and avoid each other, comparing them to ignorant shortest-path agents as a baseline.

Sustainability Mirror for European Cities

A team of 5 students, in cooperation with a large national energy provider, worked on a dashboard for various sustainability metrics for cities, making available data for air pollution, urbanization, renewable energy production and more. They combined multiple open data sources which are available for most major cities to provide metrics that are comparable across cities and can be updated automatically.


Energy Consumption Prediction Challenge

A team worked on the Energy Consumption Prediction Challenge and built a well-performing model along with an analytics dashboard that lets users predict energy for custom buildings. Github Page. In the second phase of the project the team built a webpage that allows users to predict the energy consumption of their buildings based on the model from the challenge.

IASS Social Sustainability Barometer

Another team built an interactive dashboard for the investigation of results from the IASS Social Sustainability Barometer in collaboration with the Institute for Advanced Sustainability Studies in Potsdam, Germany. The dashboard, built with R and shiny, shows the results of the social sustainability survey geographically and over time.

mlr3forecasting: Time Series Forecasting with Machine Learning

Students developed mlr3forecasting, an R package for time series forecasting with machine learning. Its goal is to facilitate time series forecasting, e.g., predicting global temperatures. It extends the popular machine learning framework mlr3.

Disaster Prediction Challenge

Students trained deep neural network models for the xview 2 disaster prediction challenge. The challenge’s goal is to find and localize damage from natural disasters on satellite images. The students built a model based on a U-Net architecture with a ResNet backbone and adaptive loss functions that was able to classify and localize damage types.


Anomaly Detection in X-Ray Images

In this project, the goal was to find anomalies in X-Ray images in a completely unsupervised fashion, without using labeled data. This project was done in cooperation with Deepc, a start-up company founded by LMU students.

Cross-Region Spatial Interpolation

In this project, the students dealt with spatial interpolation of weather information in mountain regions. Our industry partner provided us with data from different regions. One of the main challenges in the project is to learn a model that can be applied to new regions without the need for re-training.


The students participated in a real Data Science challenge, where they had to compete against 800 Teams. The goal of the challenge is to determine what is the best route for the user from different variants proposed by a transportation app.
Poster, Presentation

Knowledge graphs are a versatile tool to represent structured data used, for example, in the Wikidata project or for the Google Knowledge Graph. Link prediction aims at predicting missing links in order to enrich the knowledge base. This project’s focus is on combining different models into an ensemble in order to exploit the individual models’ strengths.
Poster, Presentation

Super Resolution

In object detection challenges, neural architectures often fail to detect small objects in images. Through the recent developments in the research area of super-resolution, it is now possible to improve quality of images. The students apply recently introduced Super Resolution techniques to improve object detection performance.

Knowledge-based Argumentation

Argument mining is one of the hardest problems in Natural Language Processing. The main challenge is that arguments are structurally similar to purely informative texts, and only differ semantically. In this project, students utilized background information from knowledge graphs for better argument mining.
Poster, Presentation

The Edmonton Tree Project

Aerial images yield a cost-efficient way to automatically generate census information about the biodiversity in urban environments. In this project, the students developed a neural-network based object detection and recognition method for registering and classifying trees in the city of Edmonton.
Poster, Presentation

Accessibility to public transport in Munich

A group of students worked together with Green City EV, analyzing accessibility for residents of the city of Munich with respect to several modes of public transportation. As a result, students created an interactive dashboard that visualized access to public transportation on a borough-level and allowed for analyzing the influence of so-called ‘mobility stations’, hubs where public transport and private transport (carsharing, bikesharing) are brought together.



A group of two students developed the open-source software ‘ggparty’. ggparty is a successful extension for the famous R package ‘ggplot2’, which allows for visualization of tree-structured models from the ‘partykit’ package. It provides the necessary tools to create clearly structured and highly customizable visualizations for tree-objects of the class ‘party’.


mlr playground

During teaching, it is often vital to be able to visualize properties of machine learning algorithms in order for students to be able to better understand their inner workings. Two students worked on the mlr playground, a webapp that lets the users ‘play’ with machine learning models for education. More information can be found on the project website.


A Labeling Tool for Object Detection with Active Learning

The goal of supervised learning is to learn a function that maps an input to an output based on input-output pairs.
At training time deep learning algorithms generally require a large number of labeled training instances which are fairly rare in many domains. In practice, sets of labeled data are often curated manually which is not only an unattractive job but also time-consuming and expensive.

In this project, which is an industry project commissioned by Harman International Inc., the students developed a labeling tool for object detection in image data which is additionally supported by active learning to reduce the amount of manual labeling effort. While training a deep object detection network in the background, the tool automatically selects unlabeled images that are, with respect to some evaluation metric, expected to improve the object detection network most. These images are shown to human labelers and subsequently used for training the network.

Is this movie worth to watch? Predicting the IMDb rating based on heterogeneous information

Have you ever seen a movie trailer and asked yourself whether you should spend the money watching this movie at the cinema? What are the decisive factors: the cast, the budget, the genre, the plot?

In this project, a group of students tackled this question by developing an AI approach to predict the average movies’ IMDb ratings. Via the web interface, the user can provide any information about a movie, e.g., the plot as text, the movie’s poster as an image file, and/or simply meta information such as actors, duration, genre, and so on. This information is collected, preprocessed, and given as input to the multi-component neural network in the backend which performs the actual regression for predicting the IMDb movie rating.

KDD Cup 2018 - Forecasting the Air Pollution

The KDD Cup is an annual competition in Data Mining and Knowledge Discovery organized by SIGKDD alongside the KDD conference.

In 2018, the participants were requested to predict the concentration of several air pollutants for London and Beijing. Given the historical measurement data from several air quality stations and weather data from meteorology stations, the task was to combine the weather forecast for the next 48 hours with this data to obtain a forecast of the air pollutants’ concentrations. Inspired by this competition, the group built a system that automatically retrieves the data, stores them in a database, and trains a variety of machine learning models on the collected data. Also, the process of evaluating the different models and their configurations is automated.

An appealing web interface allows exploring the data as well as the predictions of the models as well as their quality.

Explainable AI - Investigating the Activations within Deep Convolutional Networks

Despite the overwhelming success of deep convolutional networks in a broad variety of applications, most prominently image classification, their inner workings remain not fully understood. These networks comprise many layers, and each layer consists of a number of so-called channels. When fixing the layer and the channel, the values computed across different image locations are called activations and measure the presence of the learned feature. It is known that deeper layers learn more abstract and invariant features than early layers. In order to facilitate a better understanding of these features, the group built a system for explorative analysis of the activations of the famous Inception V3 model. Accessed via a web interface, the distribution of activation values faceted by class labels can be investigated.

Moreover, the tool shows a ranking of image patches that yielded the highest activation values, i.e. the patches that a feature responds to the most.

Automatic statistician for exploratory data analysis

In this project, the student team developed an R package AEDA that automatically analyzes data, streamlining manual steps of exploratory data analysis. A wide range of common tasks of data scientists is automated with this package: Basic data summaries, correlation analysis, cluster analysis, principal component analysis and more. All results are automatically compiled to a report.


Explaining machine learning predictions with game theory

In this project, the students developed an R package ShapleyR that implements Shapley values for explaining machine learning predictions. Shapley values are a method from game theory, which can also be used to understand the decision making of machine learning models better.

Interactive plots and tables for benchmarking results

The students developed an R package ‘benchmarkViz’ for comparing the results of benchmark studies. The project involved proposing a standard format in which arbitrary benchmark results, be it run time of algorithms or the performance of machine learning models, can be encoded and easily visualized.


Interactive plots for interpretation of global and local effects

The students implemented a dashboard for explaining machine learning models. The dashboard allows users to upload a model and automatically visualize it with different types of plots.



Prof. Dr. Thomas Seidl

Head of group
Databases and Data Mining

      Prof. Dr. Bernd Bischl

      Head of group
      Statistical Learning and Data Science

          Prof. Dr. Dieter Kranzlmüller

          Leibniz Rechenzentrum

              Evgeniy Faerman


              Florian Pfisterer


              Christoph Molnar


              Collaborate with us

              We are always looking for project partners from industry and academia. The Innovation Lab is an excellent opportunity for you to get smart and eager students to work on your data science and machine learning projects. 5-6 students work in teams for 3-4 months on your project, supervised by experienced researchers from the LMU Munich’s Innovation Lab. In our experience, students are highly motivated to work on real projects, which resulted in many successful project completions to the satisfaction of our project partners.

              We are concretely looking for projects that involve some application of methods in Machine Learning or Data Science, which provide interesting challenges to be solved by our students. We would highly welcome relatively concrete goals and projects, which can then be implemented by students.

              If you are interested, feel free to drop us a line. We will gladly help you to find out whether your project is suitable for the Innovation Lab and work out the right scope.

              Promoted by the Bavarian State Ministry of Science and Art and coordinated by BiDT