32nd Annual Fall Workshop

Title: Data Dependent Frameworks for High-Dimensional Problems

Abstract: We discuss data-dependent frameworks for designing efficient data structures for high-dimensional search problems: preprocess a dataset X of n points so that, given a query y, one can efficiently find x in X satisfying F(x,y)=1 for some predicate F. A typical example is the Nearest Neighbor Search problem: when x,y are vectors in a metric space and F(x,y) is 1 iff x and y are close enough. This formulation captures other classic problems such as Partial Match.

In recent years, data-dependent frameworks led to dramatic improvements in the performance of a number of classic algorithms. A key idea in such frameworks is that the enabling data transformation (e.g., embedding) depends on the structure of the overall dataset X --- in contrast to the standard data-oblivious methods, such as the classic Johnson-Lindenstrauss dimension reduction.