Knowledge Discovery in Databases, Spatial Data Mining, Spatio-Temporal Indexing, Bioinformatics.
My current focus of interest is in the following sub-areas:
Cluster analysis is a primary method for database mining, where the goal is to find the "natural" groups in a data set based on a similarity or dissimilarity function for pairs of object. It is often used as a first step when studying a data set, in order to focus further analysis on interesting subgroups. Without any special support, most clustering algorithms, however, have a high computational complexity. The goal of this project is to develop highly scalable but still effective clustering methods, based on data summarization and suitable index structures.
- Spatial Data Mining
The main difference between data mining in relational and in spatial databases (such as geographic information systems) is that attributes of the neighbors of some object of interest may have an influence on the object. The explicit location and extension of spatial objects define implicit relations of spatial neighborhood (such as topological, distance and direction relations). The main objective of this project is to develop effective and efficient data mining techniques that take neighborhood relations in into account when looking for pattern in a spatial database.
- Data Mining in Biological Databases
Biological databases contain heterogeneous information such as annotated genomic sequence information, results of microarray experiments, molecular structures and properties of proteins, etc. In addition, more and more databases from the medical domain, containing medical records and other information on diseases, become available. This situation allows, in principle, to derive new knowledge about complex biological systems by correlating the information in those different databases (e.g., information about diseases and their relation to sub-cellular processes). The objective in this project is to develop a general framework and methods for integrated data mining in biological and bio-medical data sets. This involves the development of suitable representations of heterogeneous and complex biological data, as well as the development of new methods for integrated data mining in these data sets.
- Spatio-Temoral Indexing and Querying
More and more dynamic location data is becoming available, e.g., through GPS systems, sensor networks, mobile networks, etc. These data sets offer great potential for advanced services and analyses, but also pose new challenges with respect to storage and querying capabilities of database systems. The objectives of this project currently include the development of efficient and effective index structures for spatio-temporal data that meet real world requirements such as scalability with respect to database size, short update time, and fast query response time even for complex spatio-temporal queries. We also aim for a tight integration of the developed structures into commercial object-relational database systems.