Description
Data stream systems support queries on continuously arriving data. They provide similar query facilities like relational database systems. However, data stream systems continuously evaluate the queries on the arriving data and discard the data afterwards. Thus, it is possible to process high volumes of rapidly arriving data. Application examples are the monitoring of IT infrastructure or the processing of data from wireless sensor networks in wildland or animal surveillance scenarios. Compared to centralized data stream systems, distributed data stream systems can lower the resource demand, improve the performance and increase the lifetime of wireless sensor networks. This is particularly true if the data sources are already distributed and the hosts of the data sources take part in query processing.
In my dissertation, I investigate the interdependencies of logical query optimization and the assignment of operators to hosts for distributed data stream systems on heterogeneous hosts. In particular, I discuss the mathematical representation of selected optimization goals and constraints like resource limits for cost-based query optimization. I propose a technique to estimate whether logical query optimization steps may interfere with the subsequent assignment of operators to hosts. Well-known heuristic algorithms are adapted to the optimization problem of assigning operators to hosts. An evaluation compares the different algorithms. Moreover, I propose an algorithm for load balancing by multiple instantiation of operators.
Reviews
There are no reviews yet.