Available for opportunities

AdityaVelpula

RAG PipelinesLLM SystemsCloud Data Infrastructure

Explore Work

View Resume

90%

Policy-Lookup Time Saved

1,630

Policy Sources · 21 Countries

0.954

Precision@5

Idon'tjustanalyzedata.Iarchitectsystemsthatthink.

M.S. Data Analytics · George Mason University · 2024 to 2026

Building AI Systems That Scale

1,630 Policy Sources → Actionable Intelligence

SYSTEM PROFILE

ENGINEERAditya Velpula

FOCUSAI Engineering · GenAI · Production ML

DOMAINSLLM Systems · RAG · Vector Search · MLOps · Computer Vision · NLP · Cloud Data

EDUCATIONM.S. Data Analytics (GMU)

STATUSACTIVE

I build production AI systems for real users. Right now I'm leading the backend of a national-security RAG platform that lets JAG analysts query 1,630 policy sources across 21 countries in seconds instead of hours. Most LLM projects look great in a notebook and fall apart in prod. I focus on the unglamorous middle: retrieval that actually retrieves, evals that catch regressions before users do, and orchestration that survives partial failures.

EXPERIENCE

Where I've Built

Three roles, one through-line: shipping intelligent systems from first principles.

FEATURED ROLE · Jan 2026 to May 2026

AI Engineer · DAPSE Capstone

NSI (National Security Innovations Inc.) · Apprenticeship via GMU · Arlington, VA · Hybrid

Lead backend developer on the Arctic Policy Assistant, a production RAG platform shipped with NSI through GMU's capstone program. Lets JAG analysts query 1,630 policy sources across 21 countries in seconds instead of hours, saving an estimated 90%+ on policy-lookup time. Owned end-to-end system design and shipped to production on a GMU OpenStack VM (FastAPI + SSE streaming, Docker behind nginx, token auth, rate limiting). Validated through 1,481 automated tests and certified for NSI hand-off under DAPSE 3.0.

Hybrid RAG Pipeline

Hybrid search engine built on SQLite FTS5 + FAISS + Reciprocal Rank Fusion, with authority-weighted reranking and self-correcting retrieval loops (the pipeline detects low-confidence results, rewrites the query, and retries before anything reaches the LLM). Hit nDCG@5 = 0.832 and Precision@5 = 0.954 on a hand-built evaluation suite. A 7-stage async LLM orchestration pipeline on the OpenAI SDK runs with checkpoint recovery, per-model circuit breakers, and Langfuse observability; cut average API cost ~80% by routing simple queries to GPT-5-nano.

Shipped to production on a GMU OpenStack VM with FastAPI + SSE streaming, Docker behind nginx, token auth, and rate limiting. 1,481 automated tests across retrieval, scoring, and composition certified the system for hand-off to NSI under the DAPSE 3.0 program. Lesson learned: production AI is a systems problem first and a model problem second; the model is one node in a graph of retrieval, ranking, orchestration, evals, and recovery.

Graduate Teaching Assistant · AIT-580 Data Analytics

George Mason University · College of Engineering and Computing · Fairfax, VA · On-site

Aug 2025 to May 2026

Selected by Prof. Harry J. Foxwell to support graduate sections of AIT-580 Data Analytics. Taught 100+ students across 4 sections over 2 semesters, covering SQL, Python, R, and AWS Cloud. Redesigned grading rubrics that cut regrade volume and shaved roughly 30% off grading time per assignment. Overhauled the lab guides and built supplementary Jupyter notebooks that lifted student assignment scores by about 15% vs. the prior cohort. Ran weekly office hours that translated dense engineering concepts (data modeling, ETL design, query plans) into 5-minute explanations students could actually use under deadline pressure.

Students Taught

AIT-580 Sections (2 semesters)

Grading time saved per assignment

Student score lift vs prior cohort

PythonSQLRAWS CloudJupyter

Aug 2025

to May 2026

Fairfax, VA · On-site

PythonSQLRAWS Cloud

Nov 2023

to Jul 2024

India · Hybrid

HTML5CSS3JavaScriptREST APIs

AI Engineer · Internship

Indgeos Geospatial · India · Hybrid

Nov 2023 to Jul 2024

Collaborated with backend engineers to integrate APIs and dynamic geospatial data rendering into the company's web platform, improving navigation flows and UI reliability for end users. Built responsive layouts in HTML5, CSS3, and JavaScript that worked consistently across the team's full device range. Established the team's first Git-based PR workflow with automated linting, replacing ad-hoc commits and cutting code-review friction across the engineering team. Lesson learned: the shortest path from idea to shipped feature usually runs through better tooling, not better code.

HTML5CSS3JavaScriptREST APIsGitLinting

Cyber Security Specialist · Internship

Supraja Technologies · India · Remote

Nov 2021 to Jan 2022

Built foundational skills in identifying vulnerabilities, testing web applications, and understanding security workflows. Hands-on experience with Burp Suite, Wireshark, DNS Discovery, and Bugcrowd for web application penetration testing and vulnerability identification.

Burp SuiteWiresharkBugcrowdPen Testing

Nov 2021

to Jan 2022

India · Remote

Burp SuiteWiresharkBugcrowdPen Testing

Dec 2020

to Feb 2021

Remote

WordPressHTMLCSSResponsive Design

Web Development Intern

Brainovision Solutions India · Remote

Dec 2020 to Feb 2021

Built responsive, user-friendly websites end to end using WordPress, HTML, and CSS. Completed multiple full website builds applying modern design principles and responsive layout techniques.

WordPressHTMLCSSResponsive Design

HOW DAPSE WORKS

Six Stages, One Query

Scroll through the pipeline. Each stage is a real piece of the system in production.

STAGE 01

Ingest

1,630 sources

PDF, HTML, and DOCX policy documents pulled from 21 countries, then cleaned and normalized into a single schema with provenance tagging on every artifact. The catalog grew through the DAPSE 3.0 hand-off cycle.

STAGE 02

Chunk

257K chunks

Section-aware segmentation with parent-child chunking, plus contextual headers prepended before embedding. Policy docs are full of structural meaning (treaty articles, footnotes, schedules), so naive paragraph splits would wreck retrieval. 25,634 policy objectives are extracted from the same pass.

STAGE 03

Embed + Index

FAISS + SQLite FTS5

OpenAI embeddings stored in FAISS for dense semantic search. SQLite FTS5 provides the lexical sidecar over the same chunks. Two indexes, one query plane, both on the same VM.

STAGE 04

Hybrid Retrieve

nDCG@5 = 0.832

FTS5 + FAISS results blend through Reciprocal Rank Fusion, with authority-weighted reranking on top. Self-correcting retrieval loops detect low-confidence results, rewrite the query, and retry before anything reaches the LLM. Lands nDCG@5 = 0.832 and Precision@5 = 0.954 on the hand-built eval suite.

STAGE 05

Verify

4-tier authority + 5 gates

Every source carries a tier (binding legal, official non-binding, trusted secondary, other). The tier rides through retrieval, reranking, and composition, then 5 quality gates downgrade or kill any finding that doesn't hold up. Evidence-First reasoning enforces a 3-sentence BLUF cap on every brief.

STAGE 06

Respond

90% lookup time saved

FastAPI + SSE streams the response token-by-token from a GMU OpenStack VM behind Docker + nginx. Every claim traces back to a retrieved chunk with a clickable citation. Saves an estimated 90% on policy-lookup time vs. the manual workflow; 1,481 passing tests certified the system for hand-off under DAPSE 3.0.

SELECTED WORK

Things I've Shipped

Seven projects, end-to-end. Tap any card for the architecture, metrics, and stack.

Pulse 3D news-intelligence globe with live event clusters

Solo Build · Live at global-pulse-ai.site · Apr 2026 to Present

Pulse · Real-Time Global News Intelligence Globe

Public 3D-Earth platform that aggregates 1,000+ live events from 45+ sources (GDELT 2.0, 45 subreddits, 45+ RSS feeds including Reuters, BBC, AP, Al Jazeera, NHK, Hacker News), geolocates and clusters them onto the globe, and streams Claude Opus 4.7 analyst briefings (what happened, why it matters, key actors, severity, 12-hour forecast) on cluster click. Next.js 16 + React 19 + Three.js with 8K NASA textures and custom GLSL shaders for day/night terminator, atmosphere, and sentiment-reactive auroras. Fingerprint cache + per-IP rate limit + lazy AI keep the Claude bill sane.

1,000+

Live events surfaced

45+

News sources fused

~1s

First briefing token

Next.js 16React 19TypeScriptThree.jsreact-three-fiber

DAPSE scenario analysis interface, Arctic Policy Intelligence Engine

AI Engineer · NSI Apprenticeship · Jan to May 2026

DAPSE · Arctic Policy Intelligence Engine

Production RAG backend for a national-security JAG decision-support system, ingesting 1,630 policy sources across 21 countries into 257K embedded chunks. Saves an estimated 90% on policy-lookup time for end users. Hybrid search (SQLite FTS5 + FAISS + RRF) with authority-weighted reranking and self-correcting retrieval loops hits nDCG@5 = 0.832 and Precision@5 = 0.954 on a hand-built eval suite. 7-stage async LLM pipeline on the OpenAI SDK with checkpoint recovery and per-model circuit breakers; routing simple queries to GPT-5-nano cut average API cost ~80%. Shipped to a GMU OpenStack VM (FastAPI + SSE, Docker behind nginx) and certified for NSI hand-off via 1,481 automated tests.

90%

Policy-lookup time saved

1,630

Policy sources · 21 countries

257,000

Chunks indexed

PythonFastAPISSEFAISSSQLite FTS5

ITSM Analytics · Oct to Nov 2025

Ticket Resolution & SLA Breach Prediction

End-to-end ITSM analytics pipeline predicting ticket resolution time and flagging SLA breach risk before closure. Built on a realistic 5,000-ticket synthetic dataset simulating ServiceNow/Jira logs. Gradient-boosting models beat baselines for both regression and classification; results surface through a Power BI dashboard for proactive service management.

5,000+

Tickets Modelled

PythonXGBoostScikit-learnPandasPower BI

Climate Data × Machine Learning

Wildfire Risk Prediction

ML pipeline fusing MODIS satellite fire data, NOAA climate variables, and NDVI vegetation indices to predict wildfire risk. Random Forest + XGBoost with careful feature engineering reached AUC-ROC 0.99. Python visualisations of high-risk zones and key predictors (elevation, humidity, thermal anomalies) support proactive response.

0.99

AUC-ROC

PythonXGBoostRandom ForestGeoPandasNOAA

DIKW-Driven IOU vs Non-IOU Pricing Study

U.S. Electricity-Rate Analytics

Analysed 320K+ electricity rate records (2020 to 2023) through the DIKW framework, using Python, SQL, and statistical testing (t-tests, regression) to expose material pricing differences between IOU and Non-IOU utilities across sectors and states. Cluster models + forecasts highlight geographic trends and inflation effects for regulators.

320,000+

Records Analysed

PythonSQLStatistical ModelingClusteringForecasting

Real-Time YOLO + Tesseract OCR Pipeline

License Plate Detection

Real-time license plate recognition combining YOLO object localization with Tesseract OCR for character extraction, plus OpenCV preprocessing (binarization, denoising, perspective correction) for plates under motion blur or low contrast. Boosted detection accuracy through dataset augmentation and bounding-box refinement. B.Tech capstone project.

PythonYOLOTesseract OCROpenCVComputer Vision

90%

91%

92%

93%

94%

95%

96%

97%

98%

Collaborative + Content + Sentiment

Hybrid Movie Recommender

Hybrid recommendation engine blending content-based filtering, collaborative filtering, sentiment analysis, and Jaccard similarity for diverse personalised suggestions. Responsive Flask web app with real-time search, tunable parameters, and evaluation metrics for diversity, novelty, and serendipity.

PythonFlaskRecommender SystemsNLPSentiment Analysis

End-to-End AWS Data Pipeline · Prof. Foxwell

Obesity Risk Analytics

Cloud-native data pipeline predicting county-level obesity trends from CDC BRFSS data, supervised by Prof. Harry Foxwell at GMU. Raw records flow through S3 → AWS Glue DataBrew → RDS, then EDA and modeling in Python (Pandas, Seaborn, Scikit-learn) and R (tidyverse, ggplot2). Three model families compared: regression (interpretable baseline), Random Forest (non-linear + importance ranking), and ARIMA (trend forecasting). End-to-end ownership from raw CDC data to predictive outputs.

AWSS3Glue DataBrewRDSPython

Virtual Addiction-Support Platform

Support Circle

Full-stack virtual support platform helping people fight addictions. React frontend for a responsive chat-first UX, Python backend handling auth, data management, and secure real-time messaging + notifications so users get continuous peer support during recovery.

ReactPythonFlaskWebSocketsAuthentication

WHAT I WORK WITH

Skills & Stack

Forty-one tools, four domains, one daily driver. Bars show depth at a glance.

Total Skills

Expert / Primary

Advanced

Domains

PrimaryExpertAdvancedProficientIntermediateWorking

AI / ML & Data

Production AI systems and orchestration

LLMs

GPT-5 family, Claude, prompt engineering

Expert

RAG Pipelines

Production retrieval systems

Expert

Hybrid Retrieval

BM25 + FAISS, RRF fusion

Expert

FAISS

Dense vector search (IVF)

Expert

Vector Databases

Indexing and ANN search

Advanced

Agentic Workflows

Claude, OpenAI agent loops

Advanced

Multi-Tier Verification

Draft / critique / escalate

Advanced

Langfuse

LLM observability + cost tracing

Advanced

OpenAI API

Completions, embeddings, tools

Expert

Prompt Engineering

Structured outputs + rubrics

Expert

scikit-learn

Classical ML pipelines

Expert

XGBoost

Gradient boosting

Advanced

Languages

Daily-driver and supporting

Python

5+ years, every project

Primary

SQL

Complex queries, optimization

Advanced

TypeScript

Typed full-stack web

Proficient

JavaScript

Full-stack web

Proficient

C++

Systems programming

Proficient

Java

Enterprise applications

Proficient

Statistical analysis

Proficient

Bash

Automation, ops scripts

Proficient

Cloud & Infrastructure

AWS-certified, cloud-native delivery

AWS

Certified, primary platform

Advanced

S3 / Glue / Athena

Lake-house pipelines

Advanced

Lambda / API Gateway

Serverless APIs

Advanced

Docker

Containerized deployments

Advanced

Kubernetes

Orchestration

Intermediate

Terraform

Infrastructure-as-code

Intermediate

CI / CD

GitHub Actions, deploys

Advanced

Azure

Working knowledge

Working

GCP

Working knowledge

Working

Tools & Frameworks

Building, shipping, monitoring

FastAPI

Production APIs

Expert

Flask

Lightweight services

Advanced

Pandas

Data manipulation

Expert

NumPy

Numerical computing

Expert

Plotly / Matplotlib

Visualization

Advanced

Power BI / Tableau

BI dashboards

Advanced

Git

Branching, code review

Advanced

Spark

Distributed computing

Intermediate

Airflow

Workflow orchestration

Intermediate

MLflow

Experiment tracking

Intermediate

PostgreSQL

Relational data

Advanced

Redis

Caching, sessions

Intermediate

VERIFIED

AWS Certifications

Six active credentials. Tap any badge to verify it on AWS, or copy the validation ID.

0active certifications

PROFESSIONALEARLY ADOPTER

Let'sbuildintelligentsystemstogether.

GitHub

Master of Science in Data Analytics · George Mason University · Aug 2024 to May 2026

AdityaVelpula

Idon'tjustanalyzedata.Iarchitectsystemsthatthink.

Where I've Built

AI Engineer · DAPSE Capstone

Hybrid RAG Pipeline

Graduate Teaching Assistant · AIT-580 Data Analytics

AI Engineer · Internship

Cyber Security Specialist · Internship

Web Development Intern

Six Stages, One Query

Ingest

Chunk

Embed + Index

Hybrid Retrieve

Verify

Respond

Things I've Shipped

Pulse · Real-Time Global News Intelligence Globe

DAPSE · Arctic Policy Intelligence Engine

Ticket Resolution & SLA Breach Prediction

Wildfire Risk Prediction

U.S. Electricity-Rate Analytics

License Plate Detection

Hybrid Movie Recommender

Obesity Risk Analytics

Support Circle

Skills & Stack

AI / ML & Data

Languages

Cloud & Infrastructure

Tools & Frameworks

AWS Certifications

AWS Certified Generative AI Developer - Professional

AWS Certified Solutions Architect - Associate

AWS Certified Data Engineer - Associate

AWS Certified Machine Learning Engineer - Associate

AWS Certified AI Practitioner

AWS Academy Graduate · Data Engineering

AWS Academy Graduate · Cloud Foundations

Let'sbuildintelligentsystemstogether.