Skip to content
View Krishna89287's full-sized avatar

Block or report Krishna89287

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Krishna89287/README.md

Hi, I'm Krishna

Senior Software / AI Engineer based in Munich, Germany, with around 14 years across DevOps, SRE, cloud and, more recently, building production AI and RAG systems. I currently work at Audi on AI image-generation pipelines.

What I enjoy: taking messy operational problems and turning them into reliable, automated systems, and lately into agentic AI tools that actually ship and keep a human in the loop.

  • Currently: AI and automation engineering at Audi; finished my M.Sc. in Business Analytics and Data Science (June 2026).
  • Working on: agentic RAG, LLM observability, and AI-driven operations automation.
  • Tools I reach for: Python, FastAPI, LangChain / LangGraph, AWS, Kubernetes, Docker, Terraform, Prometheus and Grafana.
  • Ask me about: RAG, agentic workflows, MLOps, or Linux/Unix automation.

Selected projects

Project What it does
enterprise-agentic-rag-azure Production agentic RAG with LangGraph, guardrails, evals and observability
ai-ops-incident-agent Triages incidents, suggests root cause, drafts change tickets for human review
rag-support-assistant RAG support assistant with citations, guardrails, automated evaluation and KPIs
graphrag-knowledge-assistant Multi-hop RAG over a knowledge graph
llm-observability-platform Tracks LLM cost, latency, tokens and answer faithfulness
cloud-native-platform-aws Internal developer platform: Terraform EKS, ArgoCD GitOps, Prometheus, SLOs

Each repo has a short architecture diagram, a runnable quickstart, and sample output, so you can see how it works in a minute.

Recent projects

A newer set across AI, platform and data. Each one runs with a single make demo, ships a full test suite, and shows real output in its README.

Project What it does
realtime-stream-inference Anomaly detection over event streams with queue backpressure and p99 latency tracking
ai-incident-copilot Collapses Alertmanager alerts into incidents, scores severity, and suggests a runbook
slo-error-budget Error budget, burn rate, and multi-window paging from the SRE workbook
kubernetes-resource-rightsizer Right-sizes CPU and memory from real usage, flags throttling and OOM risk
agent-trajectory-eval Scores an agent run on tool choice, forbidden tools, redundant steps, and budget
llm-semantic-cache Caches LLM responses by prompt similarity to cut repeat cost and latency
llm-finetune-toolkit Validates, splits, formats, and evaluates supervised fine-tuning datasets
ab-test-analyzer A/B test significance, confidence intervals, and sample-size planning

Background

  • 14 years across system engineering, Linux administration, DevOps, SRE and cloud, now focused on AI engineering.
  • Certifications: AWS Solutions Architect Associate, AWS ML Specialty, Databricks ML Professional, RHCE, RHCSA.

Reach me

Pinned Loading

  1. ai-ops-incident-agent ai-ops-incident-agent Public

    Agentic AI that triages incidents, suggests root cause, and drafts change tickets for human review

    Python

  2. cloud-native-platform-aws cloud-native-platform-aws Public

    Internal developer platform: Terraform EKS, ArgoCD GitOps, Prometheus and SLOs

    HCL

  3. enterprise-agentic-rag-azure enterprise-agentic-rag-azure Public

    Production grade agentic RAG on Azure with LangGraph, guardrails, evals and observability

    Python

  4. kubernetes-resource-rightsizer kubernetes-resource-rightsizer Public

    Right-size Kubernetes CPU and memory requests from real usage: cut waste, flag throttling and OOM risk.

    Python

  5. llm-observability-platform llm-observability-platform Public

    Trace LLM calls and track cost, latency, tokens and faithfulness

    Python

  6. llm-semantic-cache llm-semantic-cache Public

    Semantic cache for LLM calls: skip the model when a similar prompt was already answered.

    Python