RAG Agentic RAG CAG NLP LLM Regulatory AI

ReguGrounded

Python, Groq, LangChain. Covers EU AI Act, NYC Local Law 144, Colorado AI Act, NIST AI RMF.

GitHub
Purpose

A RAG-based regulatory compliance system built to reduce hallucination in AI-generated legal guidance. The goal was grounded, citation-backed answers to compliance questions across major AI regulations. The framework is intentionally modular and transferable to other regulatory domains beyond AI.

My Role

I owned the reasoning layer: the RLM engine, reasoning orchestrator, answer synthesizer, query interface, and evaluation framework. I also diagnosed and rewrote a broken evaluation pipeline and migrated the LLM backend to Groq after Gemini API quota limits became a bottleneck.

Relevance

As AI gets embedded in high-stakes decisions, regulatory compliance is becoming a product requirement, not an afterthought. This project sits at the intersection of AI capability and governance, which is exactly where I want to build.

Computer Vision Uncertainty Modeling Deep Learning ResNet-18 BayesCap Transformer

Uncertainty-Aware Gaze Estimation

PyTorch, ResNet-18, Transformer backbone, BayesCap uncertainty heads.

GitHub
Purpose

Gaze estimation is the mechanism. The real focus is BayesCap, an uncertainty modeling layer that asks a harder question: can a model learn to know what it does not know? Rather than producing a confident output and moving on, the goal was a system that could accurately flag when it was unsure.

Relevance

We rely on models blindly. This project explored whether a model could learn from its own errors and communicate uncertainty instead of masking it. That question matters deeply for any AI product making decisions that affect real people.

Limitations

Small dataset and limited compute meant we could not achieve the results we wanted with BayesCap. The direction was right, the constraints were real.

OLS Regression Econometrics Hypothesis Testing Panel Data

Honors Thesis: Outsourcing Intelligence

Python, STATA. OLS regression, econometric modeling, and hypothesis testing across 52 publicly traded tech companies.

Available Upon Request
Purpose

An empirical study examining whether AI investment by major tech firms produced measurable changes in employment outcomes. Using 16 regression models across two years of data, I tested whether companies accelerating AI adoption experienced greater workforce changes, both immediately and with a one-year lag. AI investment was scaled two ways: as a percentage of revenue, and per thousand employees, to allow fair comparisons across firms of different sizes.

Key Finding

AI investment had no statistically significant impact on net employment change across hardware, software, or consulting sectors. The only consistent predictor of hiring was prior-year revenue growth. This challenges the dominant narrative around AI-driven layoffs and suggests that short-term workforce decisions are driven more by business cycle dynamics than automation activity alone.

Relevance

Understanding AI at the company level, not just the model level, is what separates product leaders from engineers. This research sharpened how I think about AI as a business and strategic decision, and how to separate data-backed insight from media narrative.

Limitations

The dataset of 52 companies introduced constraints. Inconsistencies in how firms report headcount, whether they include contractors or part-time workers, and COVID-era hiring surges were not fully accounted for. These factors likely introduced measurement bias and limit how far the findings generalize.

Agentic AI RAG FastAPI React IBM Granite ChromaDB

Pod Incident Storyteller

Python, FastAPI, React, Ollama, ChromaDB, sentence-transformers. Fully local, no external APIs.

GitHub
Purpose

Built for a hackathon, this tool turns raw Podman container logs from a single incident into an interactive, evidence-linked narrative. A 4-agent pipeline (Triage, Narrator, Advisor, Verifier) classifies log phases, writes a plain-English incident story, generates ranked remediation recommendations, and fact-checks every claim against the original log lines.

Relevance

On-call engineers drown in raw logs. This project asked what it would look like to make incident data human-readable and actionable without sacrificing traceability. Every claim links back to evidence, and every recommendation has a confidence score. That tension between speed and trust is a real product problem in AI-powered developer tooling.

Computer Vision Deep Learning Bias Testing CNN Transfer Learning

Facial Emotion Recognition

Python, TensorFlow, OpenCV, scikit-learn. Trained on the AffectNet dataset.

GitHub
Purpose

A deep learning classifier trained to recognize facial emotions from high-resolution image data using the AffectNet dataset. The system uses transfer learning and includes testing for bias across different emotion classifications.

Relevance

Emotion recognition sits at a sensitive intersection of AI capability and human impact. Including bias testing was not an afterthought. It was the point. Any model that reads human emotion needs to be held to a higher standard of fairness, and this project treated that as a first-class concern.

Computer Vision Multi-Label Classification PyTorch ResNet-50 CNN

CarDD Vehicle Damage Classifier

PyTorch, ResNet-50, COCO-format annotations. Classifies 6 damage types: dent, scratch, crack, glass shatter, lamp broken, tire flat.

GitHub
Purpose

A multi-label classification system that identifies vehicle damage types from images. Built on a ResNet-50 backbone with a custom classification head, the model outputs simultaneous predictions across 6 damage categories using binary cross-entropy loss and per-class F1 evaluation.

Relevance

Multi-label classification is significantly harder than single-label. A single image can have a dent and a cracked windshield at the same time. Building this sharpened how I think about model outputs that reflect real-world complexity, which matters a lot for any AI product operating in messy, uncontrolled environments.