Detailed Information

Cited 0 time in webofscience Cited 3 time in scopus
Metadata Downloads

Fast and scalable vector similarity joins with MapReduce

Authors
Yang, ByoungjuKim, Hyun JoonShim, JunhoLee, DongjooLee, Sang-goo
Issue Date
Jun-2016
Publisher
SPRINGER
Keywords
Similarity join; MapReduce; Cosine similarity; Filtering
Citation
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, v.46, no.3, pp 473 - 497
Pages
25
Journal Title
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS
Volume
46
Number
3
Start Page
473
End Page
497
URI
https://scholarworks.sookmyung.ac.kr/handle/2020.sw.sookmyung/9774
DOI
10.1007/s10844-015-0363-6
ISSN
0925-9902
1573-7675
Abstract
Vector similarity join, which finds similar pairs of vector objects, is a computationally expensive process. As its number of vectors increases, the time needed for join operation increases proportional to the square of the number of vectors. Various filtering techniques have been proposed to reduce its computational load. On the other hand, MapReduce algorithms have been studied to manage large datasets. The recent improvements, however, still suffer from its computational time and scalability. In this paper, we propose a MapReduce algorithm FACET(FAst and sCalable maprEduce similariTy join) to efficiently solve the vector similarity join problem on large datasets. FACET is an all-pair exact join algorithm, composed of two stages. In the first stage, we use our own novel filtering techniques to eliminate dissimilar pairs to generate non-redundant candidate pairs. The second stage matches candidate pairs with the vector data so that similar pairs are produced as the output. Both stages employ parallelism offered by MapReduce. The algorithm is currently designed for cosine similarity and Self Join case. Extensions to other similarity measures and R-S Join case are also discussed. We provide the I/O analysis of the algorithm. We evaluate the performance of the algorithm on multiple real world datasets. The experiment results show that our algorithm performs, on average, 40 % upto 800 % better than the previous state-of-the-art MapReduce algorithms.
Files in This Item
Go to Link
Appears in
Collections
공과대학 > 소프트웨어학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Shim, Junho photo

Shim, Junho
공과대학 (소프트웨어학부(첨단))
Read more

Altmetrics

Total Views & Downloads

BROWSE