When Junior College Vector Math Saved My Real-World AI Project

Vector MathRAGAI/MLOpenSearch

How 11th & 12th Grade Math Became the Cornerstone of My AI Career

Mathematics AI/ML Career

In 11th grade, I struggled with vectors. The teacher showed us dot products, cross products, and cosine of angles between vectors. I passed the exam by memorizing formulas I didn't fully understand. Six years later, those exact formulas are at the core of the AI system that achieved ~99% LLM token reduction in production. Math class turned out to matter.

The Problem That Brought Me Back to 11th Grade

At Idyllic Services, I was building a RAG (Retrieval-Augmented Generation) pipeline. The core challenge: given a user's query, find the most relevant documents from a knowledge base of thousands, then pass only those relevant documents to the LLM instead of the entire knowledge base.

This is fundamentally a search problem. But not keyword search — semantic search. The user might ask "how do I handle a refund?" and the relevant document might say "customer reimbursement process" — no keyword overlap, but same meaning.

The solution: vector embeddings and similarity search on AWS OpenSearch.

What Vector Embeddings Actually Are

An embedding model takes text and converts it to a high-dimensional vector — a list of numbers, typically 768 or 1536 dimensions. The magic: semantically similar text ends up in similar regions of this vector space.

"How do I handle a refund?" and "customer reimbursement process" — different words, but similar meaning, so their vectors end up close to each other in 1536-dimensional space.

from openai import OpenAI

client = OpenAI()

def embed_text(text):
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding  # List of 1536 floats

Here's Where 11th Grade Math Appears

To find which stored document is most similar to a query, you compare vectors. The metric used is cosine similarity — the cosine of the angle between two vectors.

Remember this formula from 11th grade?

# Cosine similarity — exactly what we learned in high school
# cos(θ) = (A · B) / (|A| × |B|)

import numpy as np

def cosine_similarity(vec_a, vec_b):
    dot_product = np.dot(vec_a, vec_b)   # The dot product!
    magnitude_a = np.linalg.norm(vec_a)
    magnitude_b = np.linalg.norm(vec_b)
    return dot_product / (magnitude_a * magnitude_b)

# Similarity of 1.0 = identical direction = same meaning
# Similarity of 0.0 = perpendicular = unrelated
# Similarity of -1.0 = opposite direction = opposite meaning

That's it. The core of semantic search in production AI is the dot product formula from 11th grade physics class. Same formula. Same math. Applied to 1536-dimensional vectors instead of 3-dimensional arrows.

"The moment I recognized the cosine similarity formula in my production code, I went back and re-read my 11th grade notes. They made complete sense for the first time."

The Production Implementation on AWS OpenSearch

In practice, I didn't implement cosine similarity manually — OpenSearch (Amazon's managed Elasticsearch) handles this at scale. The workflow:

Index documents: embed each document and store the vector in OpenSearch alongside the text
Query: embed the user's query, then ask OpenSearch for the K most similar vectors
Retrieve: get the top-K most relevant documents
Generate: pass only those K documents (instead of the entire knowledge base) to the LLM

def semantic_search(query, top_k=3):
    # Step 1: Embed the query
    query_vector = embed_text(query)
    
    # Step 2: Search OpenSearch
    search_body = {
        "size": top_k,
        "query": {
            "knn": {
                "embedding": {
                    "vector": query_vector,
                    "k": top_k
                }
            }
        }
    }
    
    response = opensearch_client.search(
        index="knowledge-base",
        body=search_body
    )
    
    # Step 3: Return matched documents
    return [hit["_source"]["text"] 
            for hit in response["hits"]["hits"]]

Why This Produced ~99% Token Reduction

Before this system, the pipeline sent the entire knowledge base to the LLM as context. Thousands of documents, thousands of tokens, expensive API calls, slow responses.

After implementing semantic search: the LLM receives only the 3 most relevant documents — 99% fewer tokens, drastically lower cost, dramatically faster responses. Same answer quality, because the right information was selected by the vector math.

The Irony and the Lesson

The irony is perfect: the math I struggled with in junior college — the formulas I memorized without understanding — turned out to be the literal mathematical foundation of one of the most important features in modern AI systems. RAG pipelines, semantic search, nearest-neighbor retrieval — all fundamentally linear algebra.

The lesson I take from this: foundational mathematics is not abstract. It's infrastructure. You might not see the application for years. But one day you'll be in a production system and you'll recognize the formula, and suddenly the past and present will connect in a way that feels almost magical.

Learn the fundamentals thoroughly. Not because they'll be on the exam. Because they'll be in your production code.

Want to talk about RAG pipelines, vector search, or AI architecture? Let's connect.

Get In Touch

Rugved Chandekar AI Systems Engineer @ Idyllic Services — RAG & Vector Search — AWS — IEEE Author

Twitter LinkedIn