Author: Wei Zhang (Texas Tech University)
Advisor: Yong Chen (Texas Tech University), Suren Byna (Lawrence Berkeley National Laboratory)
Abstract: Scientific experiments, observations and simulations often store their datasets in various scientific file formats. Regrettably, to efficiently find the datasets that are interesting to scientists remains a challenging task due to the diverse characteristics of metadata, the vast number of datasets and the sheer size of the datasets. This research starts with the empirical study that investigates the essentials of the metadata search problem. Aimed at addressing the metadata search challenges on self-describing data formats, this research further proposes a self-contained metadata indexing and querying service that can provide a self-contained DBMS-independent high-performance metadata search experience to the scientists. Finally, for the metadata search in inter-node settings, this research addresses the challenge of a distributed metadata search by proposing a distributed adaptive radix tree that balances the workload while simultaneously supporting efficient metadata search.
Thesis Canvas: pdf
Presentation: pdf