Efficient Filtering Techniques for Cosine Similarity Joins
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Lee, Dongjoo | - |
dc.contributor.author | Park, Jaehui | - |
dc.contributor.author | Shim, Junho | - |
dc.contributor.author | Lee, Sang-goo | - |
dc.date.available | 2021-02-22T13:17:44Z | - |
dc.date.issued | 2011-04 | - |
dc.identifier.issn | 1343-4500 | - |
dc.identifier.issn | 1344-8994 | - |
dc.identifier.uri | https://scholarworks.sookmyung.ac.kr/handle/2020.sw.sookmyung/12604 | - |
dc.description.abstract | Similarity join, an operation that finds all pairs of similar objects in a large collection, is widely used to solve various problems in many application domains. Existing similarity join algorithms use filtering techniques to avoid unnecessary similarity computation based on inverted index. However, they are inefficient in filtering out dissimilar pairs when an aggregate weighted similarity function, such as cosine similarity, is used to quantify similarity values between objects. This is mainly because of loose filtering conditions the existing algorithms adopt. In this paper, we formalize filtering conditions adopted by the previous algorithms and contrive new similarity upper bounds that can be used to make tighter filtering conditions for cosine similarity joins over weight vectors. Our algorithm efficiently filters out dissimilar pairs by exploiting the new similarity upper bounds. We demonstrate that our algorithm outperforms a state-of-the-art algorithm by performing empirical evaluations on large-scale datasets. In addition, we present that our algorithm can be extended to Dice and Tanimito similarity joins over weight vectors. | - |
dc.format.extent | 26 | - |
dc.language | 영어 | - |
dc.language.iso | ENG | - |
dc.publisher | INT INFORMATION INST | - |
dc.title | Efficient Filtering Techniques for Cosine Similarity Joins | - |
dc.type | Article | - |
dc.publisher.location | 일본 | - |
dc.identifier.scopusid | 2-s2.0-84860123275 | - |
dc.identifier.wosid | 000292055400014 | - |
dc.identifier.bibliographicCitation | INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, v.14, no.4, pp 1265 - 1290 | - |
dc.citation.title | INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL | - |
dc.citation.volume | 14 | - |
dc.citation.number | 4 | - |
dc.citation.startPage | 1265 | - |
dc.citation.endPage | 1290 | - |
dc.type.docType | Article | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Engineering | - |
dc.relation.journalWebOfScienceCategory | Engineering, Multidisciplinary | - |
dc.subject.keywordPlus | ALGORITHM | - |
dc.subject.keywordAuthor | similarity join | - |
dc.subject.keywordAuthor | cosine similarity join | - |
dc.subject.keywordAuthor | inverted index | - |
dc.subject.keywordAuthor | prefix filtering | - |
dc.subject.keywordAuthor | length filtering | - |
dc.identifier.url | http://www.information-iii.org/abs_e2.html#No4-2011 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
Sookmyung Women's University. Cheongpa-ro 47-gil 100 (Cheongpa-dong 2ga), Yongsan-gu, Seoul, 04310, Korea02-710-9127
Copyright©Sookmyung Women's University. All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.