AdityaVelpula
Idon'tjustanalyzedata.Iarchitectsystemsthatthink.
M.S. Data Analytics · George Mason University · 2024 to 2026
Building AI Systems That Scale
1,630 Policy Sources → Actionable Intelligence
I build production AI systems for real users. Right now I'm leading the backend of a national-security RAG platform that lets JAG analysts query 1,630 policy sources across 21 countries in seconds instead of hours. Most LLM projects look great in a notebook and fall apart in prod. I focus on the unglamorous middle: retrieval that actually retrieves, evals that catch regressions before users do, and orchestration that survives partial failures.
Where I've Built
Three roles, one through-line: shipping intelligent systems from first principles.
AI Engineer · DAPSE Capstone
Hybrid RAG Pipeline
Hybrid search engine built on SQLite FTS5 + FAISS + Reciprocal Rank Fusion, with authority-weighted reranking and self-correcting retrieval loops (the pipeline detects low-confidence results, rewrites the query, and retries before anything reaches the LLM). Hit nDCG@5 = 0.832 and Precision@5 = 0.954 on a hand-built evaluation suite. A 7-stage async LLM orchestration pipeline on the OpenAI SDK runs with checkpoint recovery, per-model circuit breakers, and Langfuse observability; cut average API cost ~80% by routing simple queries to GPT-5-nano.
Shipped to production on a GMU OpenStack VM with FastAPI + SSE streaming, Docker behind nginx, token auth, and rate limiting. 1,481 automated tests across retrieval, scoring, and composition certified the system for hand-off to NSI under the DAPSE 3.0 program. Lesson learned: production AI is a systems problem first and a model problem second; the model is one node in a graph of retrieval, ranking, orchestration, evals, and recovery.
Graduate Teaching Assistant · AIT-580 Data Analytics
Selected by Prof. Harry J. Foxwell to support graduate sections of AIT-580 Data Analytics. Taught 100+ students across 4 sections over 2 semesters, covering SQL, Python, R, and AWS Cloud. Redesigned grading rubrics that cut regrade volume and shaved roughly 30% off grading time per assignment. Overhauled the lab guides and built supplementary Jupyter notebooks that lifted student assignment scores by about 15% vs. the prior cohort. Ran weekly office hours that translated dense engineering concepts (data modeling, ETL design, query plans) into 5-minute explanations students could actually use under deadline pressure.
AI Engineer · Internship
Collaborated with backend engineers to integrate APIs and dynamic geospatial data rendering into the company's web platform, improving navigation flows and UI reliability for end users. Built responsive layouts in HTML5, CSS3, and JavaScript that worked consistently across the team's full device range. Established the team's first Git-based PR workflow with automated linting, replacing ad-hoc commits and cutting code-review friction across the engineering team. Lesson learned: the shortest path from idea to shipped feature usually runs through better tooling, not better code.
Cyber Security Specialist · Internship
Built foundational skills in identifying vulnerabilities, testing web applications, and understanding security workflows. Hands-on experience with Burp Suite, Wireshark, DNS Discovery, and Bugcrowd for web application penetration testing and vulnerability identification.
Web Development Intern
Built responsive, user-friendly websites end to end using WordPress, HTML, and CSS. Completed multiple full website builds applying modern design principles and responsive layout techniques.
Six Stages, One Query
Scroll through the pipeline. Each stage is a real piece of the system in production.
Ingest
PDF, HTML, and DOCX policy documents pulled from 21 countries, then cleaned and normalized into a single schema with provenance tagging on every artifact. The catalog grew through the DAPSE 3.0 hand-off cycle.
Chunk
Section-aware segmentation with parent-child chunking, plus contextual headers prepended before embedding. Policy docs are full of structural meaning (treaty articles, footnotes, schedules), so naive paragraph splits would wreck retrieval. 25,634 policy objectives are extracted from the same pass.
Embed + Index
OpenAI embeddings stored in FAISS for dense semantic search. SQLite FTS5 provides the lexical sidecar over the same chunks. Two indexes, one query plane, both on the same VM.
Hybrid Retrieve
FTS5 + FAISS results blend through Reciprocal Rank Fusion, with authority-weighted reranking on top. Self-correcting retrieval loops detect low-confidence results, rewrite the query, and retry before anything reaches the LLM. Lands nDCG@5 = 0.832 and Precision@5 = 0.954 on the hand-built eval suite.
Verify
Every source carries a tier (binding legal, official non-binding, trusted secondary, other). The tier rides through retrieval, reranking, and composition, then 5 quality gates downgrade or kill any finding that doesn't hold up. Evidence-First reasoning enforces a 3-sentence BLUF cap on every brief.
Respond
FastAPI + SSE streams the response token-by-token from a GMU OpenStack VM behind Docker + nginx. Every claim traces back to a retrieved chunk with a clickable citation. Saves an estimated 90% on policy-lookup time vs. the manual workflow; 1,481 passing tests certified the system for hand-off under DAPSE 3.0.
Things I've Shipped
Seven projects, end-to-end. Tap any card for the architecture, metrics, and stack.

Pulse · Real-Time Global News Intelligence Globe
Public 3D-Earth platform that aggregates 1,000+ live events from 45+ sources (GDELT 2.0, 45 subreddits, 45+ RSS feeds including Reuters, BBC, AP, Al Jazeera, NHK, Hacker News), geolocates and clusters them onto the globe, and streams Claude Opus 4.7 analyst briefings (what happened, why it matters, key actors, severity, 12-hour forecast) on cluster click. Next.js 16 + React 19 + Three.js with 8K NASA textures and custom GLSL shaders for day/night terminator, atmosphere, and sentiment-reactive auroras. Fingerprint cache + per-IP rate limit + lazy AI keep the Claude bill sane.

DAPSE · Arctic Policy Intelligence Engine
Production RAG backend for a national-security JAG decision-support system, ingesting 1,630 policy sources across 21 countries into 257K embedded chunks. Saves an estimated 90% on policy-lookup time for end users. Hybrid search (SQLite FTS5 + FAISS + RRF) with authority-weighted reranking and self-correcting retrieval loops hits nDCG@5 = 0.832 and Precision@5 = 0.954 on a hand-built eval suite. 7-stage async LLM pipeline on the OpenAI SDK with checkpoint recovery and per-model circuit breakers; routing simple queries to GPT-5-nano cut average API cost ~80%. Shipped to a GMU OpenStack VM (FastAPI + SSE, Docker behind nginx) and certified for NSI hand-off via 1,481 automated tests.
Ticket Resolution & SLA Breach Prediction
End-to-end ITSM analytics pipeline predicting ticket resolution time and flagging SLA breach risk before closure. Built on a realistic 5,000-ticket synthetic dataset simulating ServiceNow/Jira logs. Gradient-boosting models beat baselines for both regression and classification; results surface through a Power BI dashboard for proactive service management.
Wildfire Risk Prediction
ML pipeline fusing MODIS satellite fire data, NOAA climate variables, and NDVI vegetation indices to predict wildfire risk. Random Forest + XGBoost with careful feature engineering reached AUC-ROC 0.99. Python visualisations of high-risk zones and key predictors (elevation, humidity, thermal anomalies) support proactive response.
U.S. Electricity-Rate Analytics
Analysed 320K+ electricity rate records (2020 to 2023) through the DIKW framework, using Python, SQL, and statistical testing (t-tests, regression) to expose material pricing differences between IOU and Non-IOU utilities across sectors and states. Cluster models + forecasts highlight geographic trends and inflation effects for regulators.
License Plate Detection
Real-time license plate recognition combining YOLO object localization with Tesseract OCR for character extraction, plus OpenCV preprocessing (binarization, denoising, perspective correction) for plates under motion blur or low contrast. Boosted detection accuracy through dataset augmentation and bounding-box refinement. B.Tech capstone project.
Hybrid Movie Recommender
Hybrid recommendation engine blending content-based filtering, collaborative filtering, sentiment analysis, and Jaccard similarity for diverse personalised suggestions. Responsive Flask web app with real-time search, tunable parameters, and evaluation metrics for diversity, novelty, and serendipity.
Obesity Risk Analytics
Cloud-native data pipeline predicting county-level obesity trends from CDC BRFSS data, supervised by Prof. Harry Foxwell at GMU. Raw records flow through S3 → AWS Glue DataBrew → RDS, then EDA and modeling in Python (Pandas, Seaborn, Scikit-learn) and R (tidyverse, ggplot2). Three model families compared: regression (interpretable baseline), Random Forest (non-linear + importance ranking), and ARIMA (trend forecasting). End-to-end ownership from raw CDC data to predictive outputs.
Support Circle
Full-stack virtual support platform helping people fight addictions. React frontend for a responsive chat-first UX, Python backend handling auth, data management, and secure real-time messaging + notifications so users get continuous peer support during recovery.
Skills & Stack
Forty-one tools, four domains, one daily driver. Bars show depth at a glance.
AI / ML & Data
Languages
Cloud & Infrastructure
Tools & Frameworks
AWS Certifications
Six active credentials. Tap any badge to verify it on AWS, or copy the validation ID.