Designing ingestion and modeling systems for messy, high-volume event data
Production ML and LLM workflows with evaluation, monitoring, and deployment hygiene
Resilient integrations handling rate limits, backfills, schema drift, and retries

Now

Building a Google Workspace audit analytics pipeline with overlap-safe ingestion
Developing agentic backend workflows using LLMs
Writing about real-world data failures and system design tradeoffs

Best Works so far

LLM-powered research tooling for AI Teaching companion combining retrieval systems, experimentation workflows, and backend services
Geospatial satellite data pipelines surfacing mineral exploration signals from noisy remote sensing data
Near-real-time energy forecasting pipelines spanning SCADA + weather data ingestion, data warehousing, and ML-driven grid balancing
Semantic search and recommender system and data products powering infrastructure intelligence and risk assessment for institutional investments
OCR-driven clinical data processing pipelines transforming unstructured medical documents into usable datasets

Pinned repositories below reflect the work above.

Stack

python numpy pandas plotly folium scikit-learn tensorflow opencv pytorch fastapi selenium javascript mysql postgresql mongodb elasticsearch kibana git docker kubernetes terraform github-actions google-cloud-platform amazon-aws

GitHub Activity

GitHub Streak GitHub Stats

languages

Open To

Open to Data Engineering, MLOps, and Platform roles. Best reached via LinkedIn or email.

This README is generated every 24 hours!
Last refresh: 02:10:32 GMT+0000 (Coordinated Universal Time)

Pinned Loading

shbyun080/OneNet shbyun080/OneNet Public

Official Implementation of OneNet

Python 22 2
ksanu1998/static_analysis_codegen_llms ksanu1998/static_analysis_codegen_llms Public

This repository contains code base for project titled Leveraging static analysis for evaluating code-generation models developed during the CSCI 544 Applied Natural Language Processing course, Fall...

HTML 5 2
VirtuTA VirtuTA Public

VirtuTA is an AI teaching assistant that delivers quick, accurate responses to student queries directly on Piazza. Powered by agentic workflows, Google Gemini, and Langchain, it automates both conc...

Jupyter Notebook 10 4
gws-audit-analytics-pipeline gws-audit-analytics-pipeline Public

A robust data pipeline to fetch, process, and analyze token activity events from the Google Workspace Admin Reports API. This project ensures no data loss across multiple runs and provides insight ...

Python 1
DataForgeOpenAIHub/Steam-Sales-Analysis DataForgeOpenAIHub/Steam-Sales-Analysis Public

This repository features an ETL pipeline for retrieving, processing, validating, and ingesting game metadata and sales data from SteamSpy and Steam APIs. Data is stored in a MySQL database on Aiven...

Jupyter Notebook 7
DataForgeOpenAIHub/mlops-credit-card-fraud-detection-end-to-end DataForgeOpenAIHub/mlops-credit-card-fraud-detection-end-to-end Public

End to End Machine Learning MLOps Project for Credit Card Fraud Detection using Ensemble Models, Data and Model Versioning through DVC, Github Actions, and Deployment

Jupyter Notebook 4 2

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kayvan Shah KayvanShah1

Achievements

Achievements

Highlights

Block or report KayvanShah1

Data & ML Engineer building reliable data and ML systems

What I’m Known For

Now

Best Works so far

Stack

GitHub Activity

Open To

Pinned Loading

Uh oh!