top of page

5 Hours to Actually Understand Cosine Similarity

  • Writer: Renee Li
    Renee Li
  • Jun 9
  • 2 min read

Today I spent about five hours building a much better understanding of cosine similarity. I know the formula now, and I know how to calculate it. More importantly, I now understand that each sentence is transformed into exactly one dot by a pretrained model — a neural network trained on enormous amounts of data.


All of those dots are then scattered across a space. How we choose to group, categorise, or cluster them really depends on the algorithm we use. Once we have that algorithm, it organises the dots by how close they are to one another — judging "closeness" with a measure such as cosine similarity, which compares the angle between two dots (straight-line distance is another option).


In theory, if we didn't have to worry about energy, resources, and speed, we could simply use cosine similarity to compare every sentence with every other one. In reality that's impossible — far too slow and wasteful, even for a small data pool. That is exactly why we need an algorithm: to cut the time and save resources while still arriving at the clustering outcome we're after.


I also went back and re-thought Pythagoras — a² + b² = c² — and the Euclidean distance built on it, and it ended up teaching me more about the cosine similarity formula than I expected, specifically the dot product (the "agreement") part on top.


When I really understood why Pythagoras adds the two sides — because the across-direction and the up-direction are independent and perpendicular, so the only way to combine them is to add — something clicked. The top of the cosine formula does the same thing. It goes direction by direction, multiplies the matching pieces, and adds them up. I'd been wondering why we sum those products instead of keeping just one; Pythagoras was the answer. Independent directions combine by adding — that's true when you're measuring an arrow's length (a² + b²), and it's true when you're measuring how much two arrows agree (a₁b₁ + a₂b₂). Same idea, two different questions. Once Pythagoras made "add across independent directions" feel natural, the dot product stopped looking arbitrary and started looking obvious.

Comments


bottom of page