Curiosity Is My Biggest Advantage — I Always Ask Why
I'm not the smartest person in any room I've been in. I'm probably not the fastest learner either. But I have one advantage I've noticed nobody else seems to fully exploit: I never stop asking why. Not just once — at every layer, every abstraction, until I actually understand what's happening.
Two Kinds of Engineers
When most engineers encounter a new tool or technology, they ask: "How do I use this?" They read the quickstart, copy the example code, get it working, ship it. Done.
I can't do that. My first question is always: "Why does this work this way? What problem was it solving? Why did someone build it like this and not some other way?"
This habit is slower in the short run. It takes longer to get something working when you're also asking why it works. But it creates a depth of understanding that pays massive dividends later — when things break, when requirements change, when you need to optimize something nobody else can figure out.
Why Transformers, Not Just How
When I first encountered transformer architecture for LLMs, I didn't just learn the API. I went back and read the original attention paper. I spent time understanding what self-attention actually computes — why you multiply query, key, and value matrices, what that dot product represents geometrically, why multi-head attention captures different relationship types simultaneously.
This deep understanding of attention mechanisms later helped me optimize prompts for massive token reduction. If you don't understand that attention is computed across the entire context window, you won't understand why shorter, more precise contexts produce faster, cheaper inference. You'll just follow rules without knowing where they come from.
"The why is where the leverage lives. The how just gets you started."
Why Async Helps Throughput
When I needed to improve throughput in the agentic AI pipeline at Idyllic Services, I asked: why is the system slow? Not "how do I make it faster" — that question leads to premature optimization. Why is it slow?
I traced the bottleneck to I/O waiting — the synchronous LLM calls were blocking while waiting for API responses. The CPUs were idle during those waits.
Asking why led me to async orchestration. Understanding why async works — the event loop, cooperative multitasking, the fundamental difference between I/O-bound and CPU-bound tasks — let me apply it correctly instead of cargo-culting async/await without understanding the model. The result: ~10× throughput improvement.
Why Vector Embeddings Capture Meaning
Most people use vector embeddings as magic — text goes in, numbers come out, semantic search works somehow. I couldn't accept that. I wanted to understand why words with similar meanings end up close in embedding space.
Learning that embedding models are trained to predict surrounding context — that the meaning of a word is, in a deep sense, the company it keeps — changed how I designed RAG pipelines. I understood what kinds of queries would work well, why certain document chunking strategies fail, and why some similarity thresholds are meaningful and others aren't. All from asking why.
The 99% Token Reduction: A Case Study in Asking Why
The ~99% LLM token reduction I achieved in production is the most concrete result of my curiosity-driven approach. Here's exactly how it happened — four layers of asking why:
- Why do tokens cost money? → Led to understanding tokenization, context windows, and pricing models
- Why is the pipeline sending this much context? → Discovered we were passing entire conversation histories when only recent turns were relevant
- Why does the LLM need all this information? → Realized most context could be extracted upstream with lightweight preprocessing, reducing what we sent
- Why are we calling the LLM for routing decisions? → Replaced brittle if-else routing with a supervisor agent, eliminating redundant LLM calls entirely
Each "why" led to a specific optimization. Four levels of questioning, one massive result.
How to Build This Habit
Curiosity isn't a personality trait you either have or don't. It's a habit built by practicing one behavior: when you encounter something that works, pause before moving on and ask: "Why does this work?"
Start small. When you use a Python list comprehension, ask why it's faster than a for loop. When you add a database index, ask what structure that index creates internally. When you call an API, ask what HTTP verb you're using and why REST uses verbs at all.
One level deeper. Every time. That's the entire practice. It compounds over months and years into a depth of understanding that produces results nobody else can explain.
Want to dig into AI architecture, optimization, or engineering philosophy? Let's have that conversation.
Get In Touch