Internship Curriculum

3-Month Data Science Sprint

A comprehensive, hands-on roadmap to take you from Python basics to deploying GenAI applications. Includes weekly assignments, real-world projects, and code reviews.

Month 1

Foundation & Fundamentals

Weeks 1-2: Python & SQL Mastery

Master the tools of the trade. Write clean, efficient Python code and query databases like a pro.

Python Advanced: Decorators, Generators, Context Managers, OOP patterns.
Pandas & NumPy: Vectorization, Data cleaning, Merging/Joining datasets.
SQL: Complex Joins, Window Functions (RANK, LEAD/LAG), CTEs.

📝 Assignment 1: Sales Data Analysis

Clean a raw CSV of 50k retail transactions and generate a monthly sales report using Pandas.

Weeks 3-4: EDA & Visualization

Turn data into insights. Learn to tell stories with data using modern visualization libraries.

Visualization: Matplotlib customization, Seaborn statistical plots, Plotly interactive charts.
EDA Techniques: Outlier detection, Distribution analysis, Correlation matrices.
Streamlit Basics: Building simple data apps.

🚀 Project 1: Market Trends Dashboard

Build an interactive dashboard using Streamlit that visualizes stock market data (using yfinance).

Month 2

Machine Learning Core

Weeks 5-6: Supervised & Unsupervised Learning

Understand the math and code behind predictive models.

Regression: Linear/Logistic Regression, Metrics (RMSE, R2).
Classification: Decision Trees, Random Forests, SVM.
Clustering: K-Means, Hierarchical Clustering, PCA for dimensionality reduction.

📝 Assignment 2: House Price Prediction

Build a regression model to predict house prices. Optimize hyperparameters using GridSearch.

Week 7: Prompt Engineering Fundamentals

Learn to communicate effectively with AI models.

Core Concepts: Context windows, Temperature, Top-P, Frequency penalty.
Techniques: Role prompting, Delimiters, Output formatting (JSON/Markdown).
Tools: OpenAI Playground, Anthropic Console, Cursor AI features.

⚡ Mini-Project: "Code Refactoring Bot"

Design a system prompt that takes messy Python code and outputs clean, PEP-8 compliant code with docstrings.

Week 8: Deep Learning Intro & Large Scale Projects

Step into the world of Neural Networks and Industrial AI.

Neural Networks: Perceptrons, Backpropagation, Activation Functions.
Model Interpretability: SHAP values, LIME, Feature Importance.
Production ML: Model Drift detection, A/B Testing concepts.

🚀 Project 2: End-to-End Customer Churn Prediction Platform

Build a robust Churn Prediction System for a Telecom dataset. The project involves:

Advanced Feature Engineering & Selection.
Training XGBoost/LightGBM models with Hyperparameter tuning.
Explainable AI: Integrating SHAP plots to explain why a specific customer is at risk.
Deployment: Serving the model via FastAPI with a Drift Monitoring dashboard.

Month 3

Advanced AI & Capstone

Week 9: Advanced Prompt Engineering

Master the art of controlling Large Language Models.

Prompting Frameworks: Zero-shot, Few-shot, Chain-of-Thought (CoT), ReAct.
System Prompts: Designing robust system instructions for role-playing agents.
Guardrails: Preventing hallucinations and jailbreaks using NeMo Guardrails or custom logic.
Evaluation: Measuring prompt performance using RAGAS or custom metrics.

Week 10: NLP & Transformers

Work with the latest GenAI technologies.

NLP Basics: Tokenization, Embeddings (Word2Vec, GloVe).
Transformers: Attention mechanism, BERT, GPT architecture.
LLM Application: LangChain basics, Memory management, Tool usage.

📝 Assignment 3: Sentiment Analysis API

Deploy a FastAPI endpoint that takes text and returns sentiment using a Hugging Face model.

Weeks 11-12: Capstone Project

The final test. Build something production-ready.

RAG (Retrieval-Augmented Generation): Vector Databases (Pinecone/Chroma), Context injection.
Deployment: Dockerizing the app, Deploying to Cloud (AWS/Render).
Presentation: Demo day presentation.

🏆 Capstone: "Chat with your PDF"

Build a RAG application where users upload a PDF and ask questions about it. Tech stack: LangChain, OpenAI, Streamlit, FAISS.

Bonus

Interview Preparation Kit (50 Questions)

A curated list of the most asked questions in Data Science and ML interviews.

🐍 Python & Programming (10)

What is the difference between a list and a tuple? Why would you use one over the other?
Explain list comprehensions and how they differ from generator expressions.
How does memory management work in Python? Explain garbage collection.
What are decorators? Write a simple decorator that times a function.
Explain the difference between `deepcopy` and `shallow copy`.
What is the Global Interpreter Lock (GIL) and how does it affect multithreading?
How do you handle exceptions in Python? Explain `try`, `except`, `else`, and `finally`.
What is the difference between `is` and `==`?
Explain the `with` statement and Context Managers.
How would you optimize a Python script that is running slowly?

📊 SQL & Databases (10)

What is the difference between `INNER JOIN`, `LEFT JOIN`, and `FULL OUTER JOIN`?
Explain the difference between `WHERE` and `HAVING` clauses.
What are Window Functions? Explain `RANK()` vs `DENSE_RANK()`.
What is a Common Table Expression (CTE) and when should you use it?
Explain the difference between `DELETE`, `TRUNCATE`, and `DROP`.
How do you optimize a slow SQL query? (Indexing, execution plans).
What is Normalization vs Denormalization? When to use which?
Explain ACID properties in databases.
How do you handle NULL values in SQL?
Write a query to find the second highest salary in a table.

🤖 Machine Learning (10)

What is the Bias-Variance Tradeoff? How do you manage it?
Explain the difference between L1 (Lasso) and L2 (Ridge) regularization.
How does a Random Forest work? What is bagging?
What is Gradient Boosting? How does it differ from Random Forest?
Explain Precision, Recall, and F1-Score. When should you prioritize one over the other?
What is the ROC Curve and AUC?
How do you handle imbalanced datasets? (SMOTE, Class weights).
Explain K-Means clustering. How do you choose K? (Elbow method).
What is Cross-Validation and why is it important?
Explain the Curse of Dimensionality.

🧠 Deep Learning & NLP (10)

What is Backpropagation? Explain the chain rule.
What is the Vanishing Gradient problem? How do `ReLU` or `LSTMs` solve it?
Explain the architecture of a CNN (Convolution, Pooling, Fully Connected).
What is Transfer Learning? When should you use it?
What is an Activation Function? Why do we need non-linearity?
Explain the Attention Mechanism in simple terms.
What are Word Embeddings? (Word2Vec vs BERT embeddings).
What is a Transformer? Explain Encoder vs Decoder architectures.
What is RAG (Retrieval-Augmented Generation)?
How do you fine-tune an LLM? (PEFT, LoRA).

💡 Behavioral & System Design (10)

Tell me about a time your model failed in production. How did you fix it?
How do you explain a complex technical concept to a non-technical stakeholder?
Describe a challenging data cleaning problem you faced.
How would you design a Recommendation System for Netflix?
How do you handle conflicting requirements from different teams?
What is your process for starting a new data science project?
How do you stay updated with the latest AI trends?
Tell me about a time you had to learn a new tool quickly.
How do you measure the business impact of your model?
If you had unlimited computing power, what would you build?

Apply for this Program