SC20 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Semantic Search for Self-Describing Scientific Data Formats

Authors: Chenxu Niu and Wei Zhang (Texas Tech University), Suren Byna (Lawrence Berkeley National Laboratory), and Yong Chen (Texas Tech University)

Abstract: It is often a daunting and challenging task for scientists to find datasets relevant to their needs. This is especially true for self-describing file formats, which are often used for data storage in scientific applications. Existing solutions extract the metadata and process search queries with matching search keywords in metadata via exact or partial lexical match approaches. They are hindered, however, by an inability to capture the semantic meaning of the content of the metadata, and are therefore precluded from performing queries at the semantic level. We propose a novel semantic search solution for self-describing datasets, which captures the semantic meaning of dataset metadata and achieves search functionality at semantic level. We have evaluated our approach and compared it against the existing solutions. Our approach demonstrates efficient semantic search performance.

Best Poster Finalist (BP): no

Poster: PDF
Poster summary: PDF

Back to Poster Archive Listing