MAVIS
Measuring Similarity

Similarity between code vectors in MAVIS is measured using Cosine Similarity. Cosine similarity is a metric used to measure how similar two vectors are, based on the angle between them in a multi-dimensional space, rather than their magnitude.

Formula:

cosine_similarity(A,B)=ABAB\text{cosine\_similarity}(A, B) = \frac{A \cdot B}{\|A\| \|B\|}

Where:

Key Properties:

Why Use Cosine Similarity?

It focuses on orientation, not magnitude, making it useful for:

Because it’s unaffected by vector length, cosine similarity is ideal when input vectors vary in scale but direction encodes meaning. This should perform well since MAVIS seeks to identify code intent rather than attempting to scan for known vulnerable patterns or signatures, making MAVIS more powerful than many other code analysis solutions; it should also prove more difficult to fool, whether intentionally or unintentionally.