An efficient similarity join algorithm with cosine similarity predicate

Lee D.; Park J.; Shim J.; Lee S.-G.

doi:10.1007/978-3-642-15251-1_33

Detailed Information

Cited 0 time in webofscience

Cited 23 time in scopus

Metadata Downloads

An efficient similarity join algorithm with cosine similarity predicate

Full metadata record

DC Field	Value	Language
dc.contributor.author	Lee D.	-
dc.contributor.author	Park J.	-
dc.contributor.author	Shim J.	-
dc.contributor.author	Lee S.-G.	-
dc.date.available	2021-02-22T14:03:00Z	-
dc.date.issued	2010-08	-
dc.identifier.issn	0302-9743	-
dc.identifier.uri	https://scholarworks.sookmyung.ac.kr/handle/2020.sw.sookmyung/13606	-
dc.description.abstract	Given a large collection of objects, finding all pairs of similar objects, namely similarity join, is widely used to solve various problems in many application domains.Computation time of similarity join is critical issue, since similarity join requires computing similarity values for all possible pairs of objects. Several existing algorithms adopt prefix filtering to avoid unnecessary similarity computation; however, existing algorithms implementing the prefix filtering have inefficiency in filtering out object pairs, in particular, when aggregate weighted similarity function, such as cosine similarity, is used to quantify similarity values between objects. This is mostly caused by large prefixes the algorithms select. In this paper, we propose an alternative method to select small prefixes by exploiting the relationship between arithmetic mean and geometric mean of elements' weights. A new algorithm, MMJoin, implementing the proposed methods dramatically reduces the average size of prefixes without much overhead. Finally, it saves much computation time. We demonstrate that our algorithm outperforms a state-of-the-art one with empirical evaluation on large-scale real world datasets. © 2010 Springer-Verlag.	-
dc.format.extent	15	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	Springer Verlag	-
dc.title	An efficient similarity join algorithm with cosine similarity predicate	-
dc.type	Article	-
dc.publisher.location	독일	-
dc.identifier.doi	10.1007/978-3-642-15251-1_33	-
dc.identifier.scopusid	2-s2.0-78049390973	-
dc.identifier.bibliographicCitation	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), v.6262 LNCS, no.PART 2, pp 422 - 436	-
dc.citation.title	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)	-
dc.citation.volume	6262 LNCS	-
dc.citation.number	PART 2	-
dc.citation.startPage	422	-
dc.citation.endPage	436	-
dc.type.docType	Conference Paper	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordPlus	Alternative methods	-
dc.subject.keywordPlus	Arithmetic mean	-
dc.subject.keywordPlus	Average size	-
dc.subject.keywordPlus	Computation time	-
dc.subject.keywordPlus	Cosine similarity	-
dc.subject.keywordPlus	Critical issues	-
dc.subject.keywordPlus	Empirical evaluations	-
dc.subject.keywordPlus	Geometric mean	-
dc.subject.keywordPlus	Real-world datasets	-
dc.subject.keywordPlus	Similarity computation	-
dc.subject.keywordPlus	Similarity functions	-
dc.subject.keywordPlus	Similarity join	-
dc.subject.keywordPlus	Expert systems	-
dc.subject.keywordPlus	Problem solving	-
dc.subject.keywordPlus	Algorithms	-
dc.identifier.url	https://link.springer.com/chapter/10.1007%2F978-3-642-15251-1_33	-

Files in This Item: Go to Link

Appears in Collections: 공과대학 > 소프트웨어학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Shim, Junho photo

Shim, Junho: 공과대학 (소프트웨어학부(첨단))

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :6,631,778; Today View :6,235

RSS_1.0 RSS_2.0 ATOM_1.0

Sookmyung Women's University. Cheongpa-ro 47-gil 100 (Cheongpa-dong 2ga), Yongsan-gu, Seoul, 04310, Korea02-710-9127

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE