INITIALIZING SYSTEM...
Available for opportunities

AdityaVelpula

·
RAG PipelinesLLM SystemsCloud Data Infrastructure
90%
Policy-Lookup Time Saved
1,630
Policy Sources · 21 Countries
0.954
Precision@5

Idon'tjustanalyzedata.Iarchitectsystemsthatthink.

M.S. Data Analytics · George Mason University · 2024 to 2026

Building AI Systems That Scale

1,630 Policy Sources → Actionable Intelligence

SYSTEM PROFILE
ENGINEERAditya Velpula
FOCUSAI Engineering · GenAI · Production ML
DOMAINSLLM Systems · RAG · Vector Search · MLOps · Computer Vision · NLP · Cloud Data
EDUCATIONM.S. Data Analytics (GMU)
STATUSACTIVE

I build production AI systems for real users. Right now I'm leading the backend of a national-security RAG platform that lets JAG analysts query 1,630 policy sources across 21 countries in seconds instead of hours. Most LLM projects look great in a notebook and fall apart in prod. I focus on the unglamorous middle: retrieval that actually retrieves, evals that catch regressions before users do, and orchestration that survives partial failures.

EXPERIENCE

Where I've Built

Three roles, one through-line: shipping intelligent systems from first principles.

FEATURED ROLE · Jan 2026 to May 2026

AI Engineer · DAPSE Capstone

NSI (National Security Innovations Inc.) · Apprenticeship via GMU · Arlington, VA · Hybrid
Lead backend developer on the Arctic Policy Assistant, a production RAG platform shipped with NSI through GMU's capstone program. Lets JAG analysts query 1,630 policy sources across 21 countries in seconds instead of hours, saving an estimated 90%+ on policy-lookup time. Owned end-to-end system design and shipped to production on a GMU OpenStack VM (FastAPI + SSE streaming, Docker behind nginx, token auth, rate limiting). Validated through 1,481 automated tests and certified for NSI hand-off under DAPSE 3.0.

Hybrid RAG Pipeline

Hybrid search engine built on SQLite FTS5 + FAISS + Reciprocal Rank Fusion, with authority-weighted reranking and self-correcting retrieval loops (the pipeline detects low-confidence results, rewrites the query, and retries before anything reaches the LLM). Hit nDCG@5 = 0.832 and Precision@5 = 0.954 on a hand-built evaluation suite. A 7-stage async LLM orchestration pipeline on the OpenAI SDK runs with checkpoint recovery, per-model circuit breakers, and Langfuse observability; cut average API cost ~80% by routing simple queries to GPT-5-nano.

Shipped to production on a GMU OpenStack VM with FastAPI + SSE streaming, Docker behind nginx, token auth, and rate limiting. 1,481 automated tests across retrieval, scoring, and composition certified the system for hand-off to NSI under the DAPSE 3.0 program. Lesson learned: production AI is a systems problem first and a model problem second; the model is one node in a graph of retrieval, ranking, orchestration, evals, and recovery.

1,630 Policy DocumentsRAW INPUT · 21 COUNTRIESChunking Engine257K CHUNKS · 25,634 OBJECTIVESVector StoreFAISS + SQLITE FTS5Hybrid RetrievalDENSE + LEXICALMulti-tier LLM VerificationFACT CHECKStructured Response + CitationsOUTPUTUser QueryINPUT

Graduate Teaching Assistant · AIT-580 Data Analytics

George Mason University · College of Engineering and Computing · Fairfax, VA · On-site
Aug 2025 to May 2026

Selected by Prof. Harry J. Foxwell to support graduate sections of AIT-580 Data Analytics. Taught 100+ students across 4 sections over 2 semesters, covering SQL, Python, R, and AWS Cloud. Redesigned grading rubrics that cut regrade volume and shaved roughly 30% off grading time per assignment. Overhauled the lab guides and built supplementary Jupyter notebooks that lifted student assignment scores by about 15% vs. the prior cohort. Ran weekly office hours that translated dense engineering concepts (data modeling, ETL design, query plans) into 5-minute explanations students could actually use under deadline pressure.

0+
Students Taught
0
AIT-580 Sections (2 semesters)
0
Grading time saved per assignment
0
Student score lift vs prior cohort
PythonSQLRAWS CloudJupyter

AI Engineer · Internship

Indgeos Geospatial · India · Hybrid
Nov 2023 to Jul 2024

Collaborated with backend engineers to integrate APIs and dynamic geospatial data rendering into the company's web platform, improving navigation flows and UI reliability for end users. Built responsive layouts in HTML5, CSS3, and JavaScript that worked consistently across the team's full device range. Established the team's first Git-based PR workflow with automated linting, replacing ad-hoc commits and cutting code-review friction across the engineering team. Lesson learned: the shortest path from idea to shipped feature usually runs through better tooling, not better code.

HTML5CSS3JavaScriptREST APIsGitLinting

Cyber Security Specialist · Internship

Supraja Technologies · India · Remote
Nov 2021 to Jan 2022

Built foundational skills in identifying vulnerabilities, testing web applications, and understanding security workflows. Hands-on experience with Burp Suite, Wireshark, DNS Discovery, and Bugcrowd for web application penetration testing and vulnerability identification.

Burp SuiteWiresharkBugcrowdPen Testing

Web Development Intern

Brainovision Solutions India · Remote
Dec 2020 to Feb 2021

Built responsive, user-friendly websites end to end using WordPress, HTML, and CSS. Completed multiple full website builds applying modern design principles and responsive layout techniques.

WordPressHTMLCSSResponsive Design
HOW DAPSE WORKS

Six Stages, One Query

Scroll through the pipeline. Each stage is a real piece of the system in production.

STAGE 01

Ingest

1,630 sources

PDF, HTML, and DOCX policy documents pulled from 21 countries, then cleaned and normalized into a single schema with provenance tagging on every artifact. The catalog grew through the DAPSE 3.0 hand-off cycle.

STAGE 02

Chunk

257K chunks

Section-aware segmentation with parent-child chunking, plus contextual headers prepended before embedding. Policy docs are full of structural meaning (treaty articles, footnotes, schedules), so naive paragraph splits would wreck retrieval. 25,634 policy objectives are extracted from the same pass.

STAGE 03

Embed + Index

FAISS + SQLite FTS5

OpenAI embeddings stored in FAISS for dense semantic search. SQLite FTS5 provides the lexical sidecar over the same chunks. Two indexes, one query plane, both on the same VM.

STAGE 04

Hybrid Retrieve

nDCG@5 = 0.832

FTS5 + FAISS results blend through Reciprocal Rank Fusion, with authority-weighted reranking on top. Self-correcting retrieval loops detect low-confidence results, rewrite the query, and retry before anything reaches the LLM. Lands nDCG@5 = 0.832 and Precision@5 = 0.954 on the hand-built eval suite.

STAGE 05

Verify

4-tier authority + 5 gates

Every source carries a tier (binding legal, official non-binding, trusted secondary, other). The tier rides through retrieval, reranking, and composition, then 5 quality gates downgrade or kill any finding that doesn't hold up. Evidence-First reasoning enforces a 3-sentence BLUF cap on every brief.

STAGE 06

Respond

90% lookup time saved

FastAPI + SSE streams the response token-by-token from a GMU OpenStack VM behind Docker + nginx. Every claim traces back to a retrieved chunk with a clickable citation. Saves an estimated 90% on policy-lookup time vs. the manual workflow; 1,481 passing tests certified the system for hand-off under DAPSE 3.0.

SELECTED WORK

Things I've Shipped

Seven projects, end-to-end. Tap any card for the architecture, metrics, and stack.

Pulse 3D news-intelligence globe with live event clusters
Solo Build · Live at global-pulse-ai.site · Apr 2026 to Present

Pulse · Real-Time Global News Intelligence Globe

Public 3D-Earth platform that aggregates 1,000+ live events from 45+ sources (GDELT 2.0, 45 subreddits, 45+ RSS feeds including Reuters, BBC, AP, Al Jazeera, NHK, Hacker News), geolocates and clusters them onto the globe, and streams Claude Opus 4.7 analyst briefings (what happened, why it matters, key actors, severity, 12-hour forecast) on cluster click. Next.js 16 + React 19 + Three.js with 8K NASA textures and custom GLSL shaders for day/night terminator, atmosphere, and sentiment-reactive auroras. Fingerprint cache + per-IP rate limit + lazy AI keep the Claude bill sane.

1,000+
Live events surfaced
45+
News sources fused
~1s
First briefing token
Next.js 16React 19TypeScriptThree.jsreact-three-fiber
DAPSE scenario analysis interface, Arctic Policy Intelligence Engine
AI Engineer · NSI Apprenticeship · Jan to May 2026

DAPSE · Arctic Policy Intelligence Engine

Production RAG backend for a national-security JAG decision-support system, ingesting 1,630 policy sources across 21 countries into 257K embedded chunks. Saves an estimated 90% on policy-lookup time for end users. Hybrid search (SQLite FTS5 + FAISS + RRF) with authority-weighted reranking and self-correcting retrieval loops hits nDCG@5 = 0.832 and Precision@5 = 0.954 on a hand-built eval suite. 7-stage async LLM pipeline on the OpenAI SDK with checkpoint recovery and per-model circuit breakers; routing simple queries to GPT-5-nano cut average API cost ~80%. Shipped to a GMU OpenStack VM (FastAPI + SSE, Docker behind nginx) and certified for NSI hand-off via 1,481 automated tests.

90%
Policy-lookup time saved
1,630
Policy sources · 21 countries
257,000
Chunks indexed
PythonFastAPISSEFAISSSQLite FTS5
SLA · 78%
ITSM Analytics · Oct to Nov 2025

Ticket Resolution & SLA Breach Prediction

End-to-end ITSM analytics pipeline predicting ticket resolution time and flagging SLA breach risk before closure. Built on a realistic 5,000-ticket synthetic dataset simulating ServiceNow/Jira logs. Gradient-boosting models beat baselines for both regression and classification; results surface through a Power BI dashboard for proactive service management.

5,000+
Tickets Modelled
PythonXGBoostScikit-learnPandasPower BI
Climate Data × Machine Learning

Wildfire Risk Prediction

ML pipeline fusing MODIS satellite fire data, NOAA climate variables, and NDVI vegetation indices to predict wildfire risk. Random Forest + XGBoost with careful feature engineering reached AUC-ROC 0.99. Python visualisations of high-risk zones and key predictors (elevation, humidity, thermal anomalies) support proactive response.

0.99
AUC-ROC
PythonXGBoostRandom ForestGeoPandasNOAA
DIKW-Driven IOU vs Non-IOU Pricing Study

U.S. Electricity-Rate Analytics

Analysed 320K+ electricity rate records (2020 to 2023) through the DIKW framework, using Python, SQL, and statistical testing (t-tests, regression) to expose material pricing differences between IOU and Non-IOU utilities across sectors and states. Cluster models + forecasts highlight geographic trends and inflation effects for regulators.

320,000+
Records Analysed
PythonSQLStatistical ModelingClusteringForecasting
AV·20260.97 CONF
Real-Time YOLO + Tesseract OCR Pipeline

License Plate Detection

Real-time license plate recognition combining YOLO object localization with Tesseract OCR for character extraction, plus OpenCV preprocessing (binarization, denoising, perspective correction) for plates under motion blur or low contrast. Boosted detection accuracy through dataset augmentation and bounding-box refinement. B.Tech capstone project.

PythonYOLOTesseract OCROpenCVComputer Vision
90%
91%
92%
93%
94%
95%
96%
97%
98%
Collaborative + Content + Sentiment

Hybrid Movie Recommender

Hybrid recommendation engine blending content-based filtering, collaborative filtering, sentiment analysis, and Jaccard similarity for diverse personalised suggestions. Responsive Flask web app with real-time search, tunable parameters, and evaluation metrics for diversity, novelty, and serendipity.

PythonFlaskRecommender SystemsNLPSentiment Analysis
S3GlueAthenaBI
End-to-End AWS Data Pipeline · Prof. Foxwell

Obesity Risk Analytics

Cloud-native data pipeline predicting county-level obesity trends from CDC BRFSS data, supervised by Prof. Harry Foxwell at GMU. Raw records flow through S3 → AWS Glue DataBrew → RDS, then EDA and modeling in Python (Pandas, Seaborn, Scikit-learn) and R (tidyverse, ggplot2). Three model families compared: regression (interpretable baseline), Random Forest (non-linear + importance ranking), and ARIMA (trend forecasting). End-to-end ownership from raw CDC data to predictive outputs.

AWSS3Glue DataBrewRDSPython
Virtual Addiction-Support Platform

Support Circle

Full-stack virtual support platform helping people fight addictions. React frontend for a responsive chat-first UX, Python backend handling auth, data management, and secure real-time messaging + notifications so users get continuous peer support during recovery.

ReactPythonFlaskWebSocketsAuthentication
WHAT I WORK WITH

Skills & Stack

Forty-one tools, four domains, one daily driver. Bars show depth at a glance.

Total Skills
0
Expert / Primary
0
Advanced
0
Domains
0
PrimaryExpertAdvancedProficientIntermediateWorking

AI / ML & Data

Production AI systems and orchestration
12
LLMs
GPT-5 family, Claude, prompt engineering
Expert
RAG Pipelines
Production retrieval systems
Expert
Hybrid Retrieval
BM25 + FAISS, RRF fusion
Expert
FAISS
Dense vector search (IVF)
Expert
Vector Databases
Indexing and ANN search
Advanced
Agentic Workflows
Claude, OpenAI agent loops
Advanced
Multi-Tier Verification
Draft / critique / escalate
Advanced
Langfuse
LLM observability + cost tracing
Advanced
OpenAI API
Completions, embeddings, tools
Expert
Prompt Engineering
Structured outputs + rubrics
Expert
scikit-learn
Classical ML pipelines
Expert
XGBoost
Gradient boosting
Advanced

Languages

Daily-driver and supporting
08
Python
5+ years, every project
Primary
SQL
Complex queries, optimization
Advanced
TypeScript
Typed full-stack web
Proficient
JavaScript
Full-stack web
Proficient
C++
Systems programming
Proficient
Java
Enterprise applications
Proficient
R
Statistical analysis
Proficient
Bash
Automation, ops scripts
Proficient

Cloud & Infrastructure

AWS-certified, cloud-native delivery
09
AWS
Certified, primary platform
Advanced
S3 / Glue / Athena
Lake-house pipelines
Advanced
Lambda / API Gateway
Serverless APIs
Advanced
Docker
Containerized deployments
Advanced
Kubernetes
Orchestration
Intermediate
Terraform
Infrastructure-as-code
Intermediate
CI / CD
GitHub Actions, deploys
Advanced
Azure
Working knowledge
Working
GCP
Working knowledge
Working

Tools & Frameworks

Building, shipping, monitoring
12
FastAPI
Production APIs
Expert
Flask
Lightweight services
Advanced
Pandas
Data manipulation
Expert
NumPy
Numerical computing
Expert
Plotly / Matplotlib
Visualization
Advanced
Power BI / Tableau
BI dashboards
Advanced
Git
Branching, code review
Advanced
Spark
Distributed computing
Intermediate
Airflow
Workflow orchestration
Intermediate
MLflow
Experiment tracking
Intermediate
PostgreSQL
Relational data
Advanced
Redis
Caching, sessions
Intermediate

Let'sbuildintelligentsystemstogether.

Master of Science in Data Analytics · George Mason University · Aug 2024 to May 2026