Locally sensitive hashing for hiearchical data structures
Locality Sensitivity Hashing (LSH) is a powerful approach to similarity (or distance) estimation, which exploits a family of randomized hash functions to map the similar data instances to the same buckets with a higher probability than the dissimilar ones. In other words, LSH algorithms produce comparable hash values for similar values unlike the cryptographic hash algorithms that produces entirely different outputs for similar values. Yet traditional LSH algorithms cannot preserve the structure information represented as relations between elements, which is useful to hash hierarchical data structures like trees and graphs.
JSON is a powerful text based format that supports hierarchical data structures. Using JSON format, we conveniently represent arrays and unordered maps hierarchically.
In this project, you will investigate the application of hierarchical LSH techniques to estimate the similarity between JSON documents.
Also see a recent survey on LSH techniques on the structured data: https://arxiv.org/pdf/2204.11209.pdf