Detailed Information

Cited 0 time in webofscience Cited 7 time in scopus
Metadata Downloads

Efficient Filtering Techniques for Cosine Similarity Joins

Authors
Lee, DongjooPark, JaehuiShim, JunhoLee, Sang-goo
Issue Date
Apr-2011
Publisher
INT INFORMATION INST
Keywords
similarity join; cosine similarity join; inverted index; prefix filtering; length filtering
Citation
INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, v.14, no.4, pp 1265 - 1290
Pages
26
Journal Title
INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL
Volume
14
Number
4
Start Page
1265
End Page
1290
URI
https://scholarworks.sookmyung.ac.kr/handle/2020.sw.sookmyung/12604
ISSN
1343-4500
1344-8994
Abstract
Similarity join, an operation that finds all pairs of similar objects in a large collection, is widely used to solve various problems in many application domains. Existing similarity join algorithms use filtering techniques to avoid unnecessary similarity computation based on inverted index. However, they are inefficient in filtering out dissimilar pairs when an aggregate weighted similarity function, such as cosine similarity, is used to quantify similarity values between objects. This is mainly because of loose filtering conditions the existing algorithms adopt. In this paper, we formalize filtering conditions adopted by the previous algorithms and contrive new similarity upper bounds that can be used to make tighter filtering conditions for cosine similarity joins over weight vectors. Our algorithm efficiently filters out dissimilar pairs by exploiting the new similarity upper bounds. We demonstrate that our algorithm outperforms a state-of-the-art algorithm by performing empirical evaluations on large-scale datasets. In addition, we present that our algorithm can be extended to Dice and Tanimito similarity joins over weight vectors.
Files in This Item
Go to Link
Appears in
Collections
공과대학 > 소프트웨어학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Shim, Junho photo

Shim, Junho
공과대학 (소프트웨어학부(첨단))
Read more

Altmetrics

Total Views & Downloads

BROWSE