Professional RAG Pipeline
Two-stage retrieval (BGE-base → MS-MARCO cross-encoder) with SHA-256 semantic cache and per-query token tracking. Built from scratch — no LangChain — and deployed as a containerized service.
Hi, I'm Vikhyat — an LLM & RAG infrastructure engineer who likes shipping AI that has to actually work on real hardware, for real people.
I spent two years at GE HealthCare getting FDA-regulated MRI models onto Kubernetes — the kind of work where a missed edge case shows up in an operating room, so you learn to care a lot about latency, reproducibility, and the boring parts of MLOps. These days I'm finishing my M.S. in Computer Engineering at Virginia Tech (GPA 4.0), researching brain-inspired autonomous systems and building RAG pipelines from scratch — no LangChain, just the parts I actually need. Outside of work I'm usually tinkering with multi-agent setups, reading neuroscience papers I'm only half-qualified for, or chasing whatever rabbit hole the latest paper sends me down.
Two-stage retrieval (BGE-base → MS-MARCO cross-encoder) with SHA-256 semantic cache and per-query token tracking. Built from scratch — no LangChain — and deployed as a containerized service.
LangGraph + Llama 3 multi-agent pipeline for grammar-constrained SDF XML generation, automating authoring of 100+ ROS 2 / Gazebo environments with stateful task graphs.
Developed a Conv1D Autoencoder-based multimodal pipeline on the WESAD dataset for unsupervised compression and three-class emotion classification of physiological signals (BVP, EDA, EMG, TEMP), achieving 83.5% accuracy across 15 subjects.
TensorRT-optimized Swin Transformer UNETR for FDA-regulated MRI. ONNX export, FP16 with selective FP32, custom CUDA plugins, and TensorRT layer fusion enabling real-time intraoperative use.
Conflict-resolution framework on a MISD architecture for 3D autonomous UAV navigation. Validated across three real-world-mapped simulation environments with statistically significant gains.
Most autonomous systems force a single resolved output. Biology doesn't. CANavigator runs three navigation algorithms in parallel and only commits when the environment demands a response — and it beats the fastest single algorithm on both time and energy simultaneously.
Building ProfessionalRAG: how I picked BGE-base + an MS-MARCO cross-encoder over a single dense retriever, and what the latency / cost / faithfulness tradeoffs actually look like in production.
I'm focused on LLM & RAG infrastructure roles — production retrieval systems, eval pipelines, and inference cost / latency optimization. Background in FDA-regulated medical AI is a bonus for clinical-grade deploys. US-authorized, no sponsorship needed, open to relocation.