Dot product 와 cosine similarity

머신러닝&딥러닝/기초 개념

Dot product 와 cosine similarity

계토 2025. 5. 17. 16:23

Prerequisites (간단버전)

vector: 크기와 방향을 동시에 갖는 양. a quantity that has both a magnitude and a direction.
scalar: 크기만 가지는 양. a quantity that is completely described by its magnitude.
norm: 벡터의 크기(length or magnitude)를 측정하는 방법.
- l1 norm (Manhattan norm): 벡터의 component들의 절대값의 합 $||\mathbf{x}||_1:=\sum^n_{i=1}|x_i|$
  - 모든 dimension을 똑같이 고려한다
- l2 norm (Euclidean norm): the square root of the sum of the squares of components. $||\mathbf{x}||_2:=\sqrt{\sum^n_{i=1}x_i^2}$
  - Euclidean space에서 벡터의 크기는 l2 norm으로 계산된다.

dot product

a = [a1, a2, ..., an], b = [b1, b2, ..., bn] 이 있을 때, a⋅b = $ \sum^n_{i=1}{a_ib_i}=a_1b_1+a_2b_2+...+a_nb_n$

기본적으로 두 벡터의 dot product의 결과물은 scalar --> scalar product라고도 불린다.
Euclidean space에서 두 vector의 dot product는 $\mathbf{a\cdot b}=||\mathbf{a}|| \ ||\mathbf{b}|| \ cos\theta$ 로 정의된다.
- 왜 이렇게 되는가? (https://www.mit.edu/~hlb/StantonGrant/18.02/details/tex/lec1snip2-dotprod.pdf) 참고
  - [1] $\mathbf{a\cdot a}=||\mathbf{a}||^2$ 이다. a1^2+a2^2+a3^3 이므로
  - a , b, 그리고 두 벡터의 끝을 연결해서 삼각형을 만드는 c 벡터를 가정
  - [2] cosine law 에 의해, $||\mathbf{c}||^2=||\mathbf{a}||^2+||\mathbf{b}||^2-2||\mathbf{a}||||\mathbf{b}|| \ cos\theta$
  - 이 때, triangle law of vector addition 에 따르면 c = a - b
  - [3] $||\mathbf{c}||^2$ = c⋅c= (a-b)⋅(a-b) = a⋅a - a⋅b - b⋅a + b⋅b = $||\mathbf{a}||^2+||\mathbf{b}||^2-2\mathbf{a\cdot b}$ <-- [1]에 의해 $||\mathbf{c}||^2$=c⋅c
  - [2], [3] 을 합치면 $||\mathbf{a}||^2+||\mathbf{b}||^2-2||\mathbf{a}||||\mathbf{b}|| \ cos\theta=||\mathbf{a}||^2+||\mathbf{b}||^2-2\mathbf{a\cdot b}$ 가 되어 $2||\mathbf{a}||||\mathbf{b}|| \ cos\theta=2\mathbf{a\cdot b}$
- 여기서 $||\mathbf{a}||$와 $||\mathbf{b}||$는 l2 norm.
- 즉, a, b 의 dot product는 a의 크기 x b의 크기 x cosine of the angle between a and b
  - vector $\mathbf{a}$ 와 $\mathbf{b}$ 가 orthogonal 하면, 즉 두 벡터 사이의 각도가 90도 이면, cosine 90도 = 0 이므로 $\mathbf{a\cdot b}=0$
  - codirectional, 즉 각도가 0이면, cosine 0 = 1 이므로 $\mathbf{a\cdot b}=||\mathbf{a}|| \ ||\mathbf{b}|| $
- 즉, a와 b 사이의 각도를 구하고 싶으면, a⋅b / ||a|| ||b|| 를 해주면 $cos\theta$를 구할 수 있음.
  - 이게 dot product가 유용한 여러 이유 중 하나임.
  - 그리고 $cos\theta$ 가 중요한 이유는, 이게 바로 두 벡터의 코사인 유사도이기 때문.

cosine similarity

두 벡터 사이의 각도의 cosine. the cosine of the angle between the vectors
위에서 봤듯이 a⋅b / ||a|| ||b|| 로 구할 수 있음.
수식에서 볼 수 있듯, vector의 크기는 보지 않고, 방향, 즉 각도만 측정한다. 같은 방향일수록, 즉 두 벡터 사이의 각도가 작을수록 두 벡터가 유사하다고 보는 것.
실제로 구현시에는 a / ||a|| ⋅ b / ||b|| 이렇게 각각 norm 으로 나누어서 normalize를 해준 다음 inner product해주면 된다.
- normalized vector는 원본 벡터와 방향이 같지만 크기가 1인 벡터가 된다.
완전히 같은 방향이면 1, 반대방향이면 -1. -1에서 1사이의 값을 가진다.

dot product 와 cosine similarity

dot product도 similarity metric으로 사용될 수 있다. 단 directional similarity와 vector의 크기를 모두 반영하고 있다. vector의 크기에 매우 sensitive하다.
cosine similarity는 벡터의 방향/각도만 고려한 유사도이다.
둘의 차이(https://medium.com/advanced-deep-learning/understanding-vector-similarity-b9c10f7506de)
- 세 개의 단어로만 이루어진 document 들이 있고, 여기서 세 단어의 frequency로 document를 표현한다고 가정해보자
  - document 1: 단어 A가 10번, B가 20번, C가 30번 등장 --> vector for document = [10, 20, 30]
  - document 2: [100, 200, 300]
  - document 3: [30, 20, 10]
  - document 4: [200, 300, 100]
- dot product 결과: 1 vs 2: 14000, 1 vs 3: 1000, 1 vs 4: 11000
- cosine similarity 결과: 1 vs 2: 1.0, 1 vs 3: 0.71, 1 vs 4: 0.78
- 즉, dot product는 절대적인 word count를 더 고려하게 된다 (word count에 따라 vector magnitude가 달라짐). document 1이 절대적으로 word count가 많은 2 혹은 4와 유사도 계산이 되었을 때, 결과 값이 크다. 물론 방향도 고려되어 있긴 한게, 1 vs 2 (같은 방향) 가 1 vs 4보다 크다.
- cosine similarity는 방향만 고려하게 된다. 즉 절대적인 count가 아니라, 전반적인 word distribution, 즉 단어가 사용되는 패턴을 더 본다고 해석할 수 있다.
  - text length와 scale에 관계없이 word distribution / word usage pattern을 고려할 수 있다. 즉, 비슷한 내용이지만 다른 길이의 두 document가 cosine similarity 관점에서는 유사하다고 측정될 수 있다(word usage pattern이 비슷하다면)
위는 아주 간단한 예시였지만, 실제로 word embedding이든 sentence embedding이든, 학습을 해서 임베딩을 만들어내는 요즘의 방식들은 비슷한 '의미'를 가진 단어들이 비슷한 방향을 가리키도록 학습한다. 즉, 벡터간의 Angle이 semantic similarity를 반영할 수 있도록 word embedding을 만들게 학습된다. 그래서 더더욱 cosine similarity를 이용해서 word embedding 이나 sentence embedding의 semantic similarity를 특정하려는 시도가 말이 되게 된다.
- high cosine similarity = similar meaning = similar direction in embedding space.

출처

https://youtu.be/0iNrGpwZwog?si=54lcRdPrNprRpt5c

https://medium.com/advanced-deep-learning/understanding-vector-similarity-b9c10f7506de

https://www.mit.edu/~hlb/StantonGrant/18.02/details/tex/lec1snip2-dotprod.pdf

https://en.wikipedia.org/wiki/Scalar_(mathematics)

https://en.wikipedia.org/wiki/Dot_product

https://en.wikipedia.org/wiki/Vector_(mathematics_and_physics)

저작자표시 (새창열림)