Shashank Shekhar Pandey – AI Engineer and Founder of Quantora Analytics

Founder & AI Engineer

Shashank Shekhar Pandey

AI Engineer  |  Generative AI  |  Agentic AI  |  Voice AI  |  RAG & LLM Infrastructure

AI Engineer with 4+ years of experience building and deploying production AI systems across Machine Learning, Generative AI, Voice AI, Retrieval-Augmented Generation (RAG), and LLM infrastructure. Experienced in designing scalable agentic workflows, real-time conversational AI systems, hybrid retrieval architectures, and fine-tuning pipelines using AWS and GCP. Demonstrated impact through latency optimization, infrastructure cost reduction, process automation, and enterprise AI adoption.

About Shashank Shekhar Pandey

Shashank Shekhar Pandey is a practitioner-first AI Engineer based in Chandigarh / Mohali, India. He holds strong hands-on expertise across the end-to-end AI lifecycle — from model development, prompt engineering, and RAG architecture to FastAPI backends, vector databases, LLM fine-tuning, and cloud-native MLOps deployments on AWS and GCP.

He is the Founder of Quantora Analytics, an enterprise AI consulting company helping businesses build production-ready AI systems — not just demos. Before founding Quantora, Shashank served as an AI Engineer at Bizetikk (Chandigarh), AI/ML Developer at Eminence Technology (Mohali), Data Scientist at PAI Solutions (Delhi), and Machine Learning Engineer at ECPL (Ludhiana).

He is recognized for building low-latency voice AI agents, enterprise-scale RAG pipelines with hybrid retrieval, multi-agent orchestration systems, and LLM fine-tuning workflows that reduce hallucinations and cut infrastructure costs. His work spans FinTech, Healthcare, EdTech, and enterprise SaaS.

Technical Skills

Core expertise across the full AI stack

Generative AI & LLMs

LLMsRAGPrompt EngineeringFunction CallingLoRAQLoRAPEFTQuantization (4-bit/8-bit)LLM EvaluationHallucination BenchmarkingHugging FaceLLaMAMistral

Agentic AI

LangChainLangGraphCrewAIAutoGenMulti-Agent SystemsTool CallingAgent MemoryState ManagementWorkflow Orchestration

Voice AI

Deepgram ASRAWS TranscribePiper TTSPlivo TelephonyReal-Time STT/TTSWebSocketsConversational AIVoice Agents

Retrieval & Search

FAISSPineconeChromaDBBGE EmbeddingsBM25Reciprocal Rank FusionCross-Encoder RerankingSemantic ChunkingHybrid Retrieval

Machine Learning

PyTorchTensorFlowScikit-LearnXGBoostLightGBMRandom ForestProphet

Cloud & AI Infrastructure

AWS BedrockAWS EC2AWS S3SageMakerGCP Vertex AIDockerCI/CDvLLMLangSmith

Application Development

FastAPIREST APIsWebSocketsAsync PythonBash

Databases & Data Engineering

PostgreSQLMongoDBSnowflakeSQLETL PipelinesFeature Engineering

Experience

Professional journey — 4+ years in production AI

AI Engineer

Bizetikk  ·  Chandigarh

Dec 2025 – Present
  • Architected and deployed a production-grade real-time Voice AI platform integrating Deepgram ASR, GPT-4o Mini, Plivo Telephony, Piper TTS, LangGraph, and WebSockets for appointment scheduling, order tracking, and customer support workflows.
  • Engineered bidirectional streaming speech pipelines with conversational memory and state-driven workflow orchestration, enabling natural multi-turn interactions with sub-second response latency.
  • Implemented LangGraph-based workflow management, tool invocation, and context retention mechanisms to improve dialogue continuity and task completion rates.
  • Designed and built an automated call auditing platform leveraging Amazon S3, AWS Transcribe, and LLM-based evaluation workflows — reducing manual QA effort by 65% and review turnaround time by 70%.
  • Designed and scaled a hybrid RAG architecture across 1,200+ enterprise documents using BGE embeddings, BM25 retrieval, Reciprocal Rank Fusion, and cross-encoder reranking.
  • Reduced GPU memory requirements by 60% through deployment of 4-bit and 8-bit quantized inference pipelines using bitsandbytes on AWS EC2.
  • Fine-tuned open-source LLMs using LoRA and QLoRA with PEFT on domain-specific datasets, achieving a measured 28% reduction in hallucination rates through structured benchmarking.
Deepgram ASRLangGraphVoice AIRAGLoRA/QLoRAAWSPlivo

AI / ML Developer

Eminence Technology  ·  Mohali

Jan 2025 – Dec 2025
  • Reduced end-to-end LLM response latency by 40% through LangChain workflow optimization, retrieval pipeline improvements, and intelligent caching strategies.
  • Architected StoryTime AI using multi-model orchestration across LLaMA, Groq, and GPT-family models with task-specific routing logic, improving user engagement by 35% and narrative consistency by 20%.
  • Built AI-powered parental guidance systems using FastAPI, LangGraph orchestration, custom tools, and LangSmith observability, significantly improving workflow autonomy and debugging efficiency.
  • Developed multi-modal AI services integrating REST APIs, WebSockets, and TTS pipelines, increasing recommendation accuracy by 25%.
LangChainLangGraphFastAPILLaMALangSmithMulti-model

Data Scientist

Pai Solutions Pvt Ltd  ·  Delhi

Dec 2022 – Dec 2024
  • Built and deployed a LLaMA-based NL-to-SQL platform enabling business users to query enterprise data through natural language, eliminating manual reporting workflows.
  • Reduced forecasting MAPE by 30% through development of a Prophet-based demand forecasting solution deployed on GCP Vertex AI.
  • Improved marketing campaign ROI by 25% through predictive modeling using Snowflake-based feature engineering, LightGBM training pipelines, and AWS SageMaker deployment.
  • Increased revenue per user by 15% through customer segmentation and behavioral analytics models delivered via automated Docker-based CI/CD workflows.
LLaMANL-to-SQLProphetLightGBMGCP Vertex AISnowflakeSageMaker

Machine Learning Engineer

ECPL  ·  Ludhiana

Dec 2021 – Dec 2022
  • Developed and deployed revenue forecasting and churn prediction models using Random Forest, XGBoost, Bayesian optimization, and grid-search tuning techniques.
  • Built robust preprocessing pipelines including outlier handling, feature engineering, categorical encoding, and missing-value treatment — reducing pipeline failures by 40%.
  • Delivered Tableau and Power BI dashboards adopted by business leadership for operational reporting and strategic decision-making.
XGBoostRandom ForestBayesian OptimizationPower BITableau

Founder & Director

Quantora Analytics

2023 – Present

Founded Quantora Analytics to deliver end-to-end AI consulting — from strategy and architecture to deployment and MLOps. Leads architecture, client delivery, the Quantora AI Academy internship program, and builds production AI systems for enterprise clients across India and globally.

AI StrategyGenerative AIMLOpsData EngineeringLLMOpsAI Consulting

Education & Certifications

Academic background & professional credentials

MBA – Data Analytics and Intelligence

2024

Graduate studies in analytics strategy, intelligence systems, AI architecture, and data-driven business decision-making.

Certifications

TensorFlow Developer Certificate

Google

Machine Learning Professional

IBM

Google Business Intelligence Professional

Google / Coursera

Statistics with Python

Stanford Online

Generative AI Architecture

Edureka

Measured Impact

Results that speak for themselves

65%

Reduction in manual QA effort via automated call auditing platform

60%

GPU memory reduction via 4-bit / 8-bit quantized inference pipelines

28%

Reduction in LLM hallucination rates via LoRA/QLoRA fine-tuning

40%

LLM response latency reduction via pipeline optimization & caching

30%

Forecasting MAPE reduction via Prophet-based demand forecasting on GCP

25%

Marketing campaign ROI improvement via LightGBM predictive modeling

Signature Projects

Production AI systems built by Shashank

Real-Time Voice AI Platform

Production-grade voice calling system integrating Deepgram ASR, GPT-4o Mini, Plivo, Piper TTS, LangGraph, and WebSockets. Handles appointment scheduling, order tracking, and support workflows with sub-second latency.

DeepgramLangGraphPlivoWebSockets

Hybrid RAG over 1,200+ Documents

Enterprise-scale hybrid retrieval architecture using BGE embeddings, BM25, Reciprocal Rank Fusion, and cross-encoder reranking — significantly improving relevance over dense-only retrieval.

BGEBM25RRFPinecone

Automated Call Auditing Platform

LLM-powered QA system using Amazon S3, AWS Transcribe, rubric-based scoring, compliance validation, and sentiment analysis — reducing manual review effort by 65% and turnaround by 70%.

AWS TranscribeS3LLM Eval

StoryTime AI – Multi-Model Orchestration

AI storytelling platform with task-specific routing across LLaMA, Groq, and GPT models. Improved user engagement by 35% and narrative consistency by 20% through intelligent orchestration.

LLaMAGroqGPT-4oLangChain

LLaMA-based NL-to-SQL Platform

Enables business users to query enterprise databases using natural language, eliminating dependency on manual SQL reporting. Deployed on GCP with FastAPI backend.

LLaMAFastAPIGCPPostgreSQL

Agentic Parental Guidance System

Decision-first multi-domain agent routing parent questions across health, psychology, routine, and nutrition domains using FastAPI, LangGraph, and LangSmith observability.

LangGraphFastAPILangSmithPinecone

FAQ

Frequently asked about Shashank Shekhar Pandey

Shashank Shekhar Pandey is an AI Engineer with 4+ years of experience and the Founder of Quantora Analytics. He specializes in Generative AI, Agentic AI, Voice AI, RAG pipelines, and LLM infrastructure, building production-grade AI systems for enterprise teams across India and globally.

He is known for shipping real-time voice AI agents with sub-second latency, large-scale hybrid RAG architectures, LLM fine-tuning pipelines (LoRA/QLoRA) that reduce hallucinations, multi-agent orchestration systems, and automated AI evaluation workflows — all with measurable production impact.

Shashank is based in Chandigarh / Mohali, India, and works with clients globally across AI consulting, contract engineering, and advisory engagements.

Reach out via shashank.datawiz808@gmail.com, call +91 79798 01671, connect on LinkedIn, read articles on Medium, or use the Quantora Analytics contact form. He accepts select consulting and high-impact AI engineering projects.

Work with Shashank

Let's build AI that works in production.

Available for AI architecture consulting, contract engineering, and strategic advisory through Quantora Analytics.