The Precision-First Approach to Financial AI

The "Hallucination" Tax in Fintech

In the race to build the biggest LLM, we’ve overlooked a critical flaw: Generative AI is a probability engine, not a calculation engine. For most users, a chatbot is simply a shortcut to data. When an investor asks, "What is my 1-year expected return?", a +/- 2% "hallucination" isn't a minor quirk—it's a financial liability.

This is why we built RIIA (Risk Informed Investment Approach) using a deterministic, local-first architecture. Read more about the project here We traded generative creativity for mathematical certainty.

The Architecture: Semantic Routing

Instead of sending raw text to a massive model in the cloud, RIIA uses a three-layer local pipeline:

The Brain (Sentence Transformers): We use all-MiniLM-L6-v2 to map user queries to one of 20 predefined "Investment Intents." By setting a confidence threshold (0.42), we ensure the system only answers when it is certain of the user's goal.
The Muscle (Deterministic Handlers): Once an intent is identified (e.g., stress_crash_10), the system triggers a Python function. This function runs real-time math against local CSV data. There is no "generation"—only calculation.
The Memory (MTime Caching): Performance is maintained by caching indicator DataFrames, re-triggering computations only when the underlying data files are physically updated on the disk.

Why Local Beats Cloud

Privacy: Portfolio data never leaves the applicaton boundry. No external API calls.
Speed: Near-instant response times. No "thinking" dots or token-streaming lag.
Reliability: RIIA works offline and doesn't suffer from "Model Drift" or API downtime.

Further Improvements

Closed-Loop Feedback: Linking the "Outcome Analyst" back to the "Strategy Engine" to automatically refine allocation levels based on chat-driven backtests.
Focus on Contextual Comparison and Parameter Persistence.
1. Historical Sensitivity Analysis: Since we already have the "Stress Scenario" handlers, the next step is allowing users to "tune" these local models via chat—e.g., "What if the crash was 15% instead of 10%?"
2. Local "Data Discovery": Using the mtime caching logic to not just update data, but to alert the researcher when new anomalies are detected in the underlying CSVs (e.g., "I noticed a 3-standard-deviation move in NVIDIA while updating your cache").
Cross-Instrument Intelligence: Moving beyond "single-ticker" queries. By implementing a session-based instrument stack, users can ask "What's the volatility of ASML?" followed by "Now compare that to the NIFTY benchmark," and RIIA will understand the context of the previous query.
Enhancing the "Research Memory" (Stateless to State-Aware ; The Scenario: A researcher asks, "Show me the 1-year return for ASML." Then they follow up with, "Compare this to NVIDIA." The Implementation: Use a "Breadcrumb" State. Instead of just remembering the last instrument, store the last 3 instruments mentioned in a session.
Slot-Filling mechanism. It keeps the system deterministic but makes it feel "smarter."
Returns the source metadata. - Example Response: "The 1-year rolling return for NVIDIA is 42%. [Source: DAILY-DATA/NVIDIA_manual.csv, Last Updated: 2026-04-20]"

In an era of AI noise, RIIA is a reminder that sometimes the best intelligence is just a very smart, very fast calculator that understands what you’re asking for.

AI Practitioner Journey

Search This Blog