Detailed Information

Cited 0 time in webofscience Cited 23 time in scopus
Metadata Downloads

An efficient similarity join algorithm with cosine similarity predicate

Full metadata record
DC FieldValueLanguage
dc.contributor.authorLee D.-
dc.contributor.authorPark J.-
dc.contributor.authorShim J.-
dc.contributor.authorLee S.-G.-
dc.date.available2021-02-22T14:03:00Z-
dc.date.issued2010-08-
dc.identifier.issn0302-9743-
dc.identifier.urihttps://scholarworks.sookmyung.ac.kr/handle/2020.sw.sookmyung/13606-
dc.description.abstractGiven a large collection of objects, finding all pairs of similar objects, namely similarity join, is widely used to solve various problems in many application domains.Computation time of similarity join is critical issue, since similarity join requires computing similarity values for all possible pairs of objects. Several existing algorithms adopt prefix filtering to avoid unnecessary similarity computation; however, existing algorithms implementing the prefix filtering have inefficiency in filtering out object pairs, in particular, when aggregate weighted similarity function, such as cosine similarity, is used to quantify similarity values between objects. This is mostly caused by large prefixes the algorithms select. In this paper, we propose an alternative method to select small prefixes by exploiting the relationship between arithmetic mean and geometric mean of elements' weights. A new algorithm, MMJoin, implementing the proposed methods dramatically reduces the average size of prefixes without much overhead. Finally, it saves much computation time. We demonstrate that our algorithm outperforms a state-of-the-art one with empirical evaluation on large-scale real world datasets. © 2010 Springer-Verlag.-
dc.format.extent15-
dc.language영어-
dc.language.isoENG-
dc.publisherSpringer Verlag-
dc.titleAn efficient similarity join algorithm with cosine similarity predicate-
dc.typeArticle-
dc.publisher.location독일-
dc.identifier.doi10.1007/978-3-642-15251-1_33-
dc.identifier.scopusid2-s2.0-78049390973-
dc.identifier.bibliographicCitationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), v.6262 LNCS, no.PART 2, pp 422 - 436-
dc.citation.titleLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)-
dc.citation.volume6262 LNCS-
dc.citation.numberPART 2-
dc.citation.startPage422-
dc.citation.endPage436-
dc.type.docTypeConference Paper-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.subject.keywordPlusAlternative methods-
dc.subject.keywordPlusArithmetic mean-
dc.subject.keywordPlusAverage size-
dc.subject.keywordPlusComputation time-
dc.subject.keywordPlusCosine similarity-
dc.subject.keywordPlusCritical issues-
dc.subject.keywordPlusEmpirical evaluations-
dc.subject.keywordPlusGeometric mean-
dc.subject.keywordPlusReal-world datasets-
dc.subject.keywordPlusSimilarity computation-
dc.subject.keywordPlusSimilarity functions-
dc.subject.keywordPlusSimilarity join-
dc.subject.keywordPlusExpert systems-
dc.subject.keywordPlusProblem solving-
dc.subject.keywordPlusAlgorithms-
dc.identifier.urlhttps://link.springer.com/chapter/10.1007%2F978-3-642-15251-1_33-
Files in This Item
Go to Link
Appears in
Collections
공과대학 > 소프트웨어학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Shim, Junho photo

Shim, Junho
공과대학 (소프트웨어학부(첨단))
Read more

Altmetrics

Total Views & Downloads

BROWSE