This intensive 10-day program prepares interns with essential ML fundamentals, practical skills, and research methods. Each day combines lectures from established courses with hands-on exercises and assessments using real datasets.
Target Audience: Summer interns, 6-month interns, JRFs, and prospective lab members
Prerequisites: Basic Python programming knowledge
Format: Daily 4-5 hour sessions with lectures, practice, and assessment
Day 1: NumPy Fundamentals
Array Computing for Data Science
Learning Objectives
- Master NumPy array operations and indexing
- Understand broadcasting and vectorization
- Apply NumPy to mathematical computations
Daily Exercises
Exercise 1A: Array Fundamentals (30 minutes)
- Create arrays using different methods (zeros, ones, arange, linspace)
- Practice array indexing and slicing
- Reshape and transpose operations
- Array concatenation and splitting
Exercise 1B: Mathematical Operations (45 minutes)
- Element-wise operations vs matrix operations
- Statistical functions (mean, std, min, max)
- Sorting and searching in arrays
- Random number generation and seeding
Exercise 1C: Broadcasting Practice (30 minutes)
- Add scalar to array
- Operations between arrays of different shapes
- Create meshgrids for plotting
- Normalize arrays using broadcasting
Daily Assessment: Activity Recognition Analysis (60 minutes)
Dataset: UCI Human Activity Recognition Tasks:
- Load and explore accelerometer/gyroscope data
- Calculate basic statistics for each activity type
- Find patterns in sensor readings for walking vs sitting
- Create visualizations using NumPy operations
- Identify which sensors are most informative
Deliverable: Jupyter notebook with clean code and insights
Day 2: Pandas & Matplotlib
Data Manipulation and Visualization
Learning Objectives
- Master DataFrame operations and data cleaning
- Create professional visualizations
- Combine data from multiple sources
Daily Exercises
Exercise 2A: DataFrame Operations (45 minutes)
- Load CSV data and inspect structure
- Handle missing values (dropna, fillna, interpolate)
- Filter and query operations
- Group by operations and aggregations
- Merge and join different datasets
Exercise 2B: Data Cleaning Pipeline (45 minutes)
- Remove duplicates and outliers
- Convert data types appropriately
- Create derived columns
- Handle datetime data
- Export cleaned data
Exercise 2C: Visualization Mastery (60 minutes)
- Line plots with multiple series
- Scatter plots with color coding
- Histograms and distribution plots
- Subplots and figure customization
- Save high-quality figures
Daily Assessment: Weather Data Analysis (90 minutes)
Dataset: UCI Weather Dataset or built-in Seaborn Flights Dataset
Tasks:
- Load and clean weather data (temperature, humidity, pressure)
- Handle missing values using simple methods
- Create basic time-based features (month, season)
- Calculate monthly averages and trends
- Create simple visualizations:
- Temperature trends over time
- Correlation between weather variables
- Monthly distribution plots
- Simple seasonal patterns
- Write 1-page summary of findings
Deliverable: Clean dataset + basic EDA notebook
Day 3: Python Mastery Assessment
Comprehensive Skills Evaluation
📚 Review Materials
- Consolidate learning from Days 1-2
- Python Data Science Handbook (Chapters 1-4)
40 Challenge Questions (180 minutes)
NumPy Section (Questions 1-15)
- Create a 3D array (5×4×3) filled with random integers from 1-100
- Find all elements greater than 50 and replace with their square root
- Compute the covariance matrix for a 2D dataset
- Implement matrix multiplication without using np.dot()
- Create a function to normalize arrays to 0-1 range
- Find the indices of the maximum value in each row of a 2D array
- Create a moving average function using array slicing
- Solve a system of linear equations using NumPy
- Create a function to compute pairwise distances between points
- Implement k-means centroid update using broadcasting
- Create a 2D Gaussian kernel for image filtering
- Find connected components in a binary image (0s and 1s)
- Implement efficient computation of Euclidean distance matrix
- Create a function to remove outliers using z-score
- Compute eigenvalues and eigenvectors for PCA implementation
Pandas Section (Questions 16-30)
- Load multiple CSV files and combine them efficiently
- Create a function to detect and handle different types of missing data
- Implement time-based resampling for irregular time series
- Create a pivot table with multiple aggregation functions
- Implement efficient groupby operations on large datasets
- Create a function to standardize column names
- Merge datasets with different time zones
- Implement outlier detection using IQR method
- Create a function to generate summary statistics report
- Handle categorical data encoding efficiently
- Implement sliding window operations on time series
- Create a function to validate data quality
- Implement efficient data type optimization
- Create custom aggregation functions for groupby
- Handle hierarchical/multi-index DataFrames
Matplotlib Section (Questions 31-40)
- Create a publication-ready figure with subplots
- Implement interactive plots with widgets
- Create custom colormaps for scientific data
- Design a dashboard-style multi-panel figure
- Create animations for time series data
- Implement error bars and confidence intervals
- Create geographic plots using basemap principles
- Design custom plot styles and themes
- Create 3D visualizations for scientific data
- Implement plot export pipeline for publications
Daily Assessment: Multi-Dataset Analysis (120 minutes)
Dataset: Seaborn Tips + Seaborn Flights
Tasks:
- Load both tips and flights datasets
- Clean and explore each dataset separately
- Create 4 visualizations for tips data (bill vs tip patterns)
- Create 4 visualizations for flights data (passenger trends)
- Write simple summary comparing patterns in both datasets
Evaluation Criteria: - Code quality and readability (40%) - Visualization clarity (30%) - Data insights (30%)
Day 4: Machine Learning Introduction
Foundations of Supervised Learning
Learning Objectives
- Understand supervised vs unsupervised learning
- Master the ML workflow and evaluation
- Implement basic algorithms from scratch
Daily Exercises
Exercise 4A: ML Workflow (60 minutes)
- Implement train-validation-test splits
- Create cross-validation from scratch
- Implement basic performance metrics
- Practice bias-variance tradeoff concepts
- Create learning curves
Daily Assessment: Housing Price Prediction (90 minutes)
Dataset: UCI Boston Housing Dataset Tasks:
- Predict house prices using neighborhood features
- Compare linear regression, ridge, and lasso performance
- Create simple feature combinations (e.g., rooms per capita)
- Create basic model evaluation with scatter plots
- Interpret which features matter most for price
Bonus: Implement simple gradient descent from scratch
Day 5: Tree-Based Methods
Non-Linear Models and Ensemble Learning
Learning Objectives
- Master decision tree algorithms
- Understand ensemble methods
- Apply to complex real-world problems
Daily Exercises
Exercise 5A: Decision Tree Implementation (90 minutes)
- Build decision tree from scratch using Gini impurity
- Implement tree pruning techniques
- Visualize decision boundaries
- Compare with scikit-learn implementation
- Analyze tree depth vs performance
Exercise 5B: Ensemble Methods with Scikit-learn (75 minutes)
- Use
StandardScaler
and MinMaxScaler
for data preprocessing
- Implement
RandomForestClassifier
with different parameters
- Compare individual trees vs ensemble performance
- Feature importance analysis using
.feature_importances_
- Cross-validation with
cross_val_score
Exercise 5C: K-Nearest Neighbors (45 minutes)
- Implement KNN for classification and regression
- Experiment with different distance metrics
- Analyze curse of dimensionality
- Implement efficient neighbor search
Daily Assessment: Iris Species Classification (120 minutes)
Dataset: Seaborn Iris Dataset (built-in, no download needed!)
Tasks:
- Build classifier to predict iris species (setosa, versicolor, virginica)
- Compare decision tree, random forest, and KNN performance
- Use tree-based feature importance to find key measurements
- Create simple model interpretation plots
- Visualize decision boundaries for 2D projections
- Create simple prediction function for new flowers
Evaluation: Model performance + interpretability + code clarity
Day 6: Neural Networks & PyTorch
Introduction to Deep Learning
Learning Objectives
- Understand neural network fundamentals
- Master PyTorch for deep learning
- Implement backpropagation from scratch
Daily Exercises
Exercise 6A: Neural Network from Scratch (90 minutes)
- Implement forward pass for multi-layer perceptron
- Code backpropagation algorithm step by step
- Add different activation functions (sigmoid, ReLU, tanh)
- Implement mini-batch gradient descent
- Compare with analytical solutions
Exercise 6B: PyTorch Framework Deep Dive (75 minutes)
- Build neural networks using
torch.nn.Module
- Implement CNNs using
torch.nn.Conv2d
and torch.nn.MaxPool2d
- Create data loaders with
torch.utils.data.DataLoader
- Add regularization (dropout, weight decay)
- Implement learning rate scheduling with
torch.optim.lr_scheduler
Exercise 6C: Advanced PyTorch Architectures (60 minutes)
PDF: Convolutional Neural Networks
- Build CNN using
torch.nn.Conv2d
for image classification
- Implement LSTM using
torch.nn.LSTM
for time series
- Practice sequence modeling with
torch.nn.Sequential
- Compare CNN vs LSTM performance on time series data
- PDF: 1D CNN
Daily Assessment: Simple Time Series Prediction (120 minutes)
Dataset: Seaborn Flights Dataset (passenger numbers over time)
Tasks:
- Build simple neural network to predict passenger numbers
- Compare basic MLP vs simpler approaches
- Create train/validation/test splits for time series
- Plot predictions vs actual values
- Calculate and interpret prediction errors
- Experiment with different number of hidden layers
Bonus: Try predicting multiple steps ahead
Day 7: Version Control & Collaboration
Git, GitHub, and Development Workflow
Learning Objectives
- Master Git version control
- Understand collaborative development
- Apply to ML project workflows
Daily Exercises
Exercise 7A: Git Fundamentals (60 minutes)
- Initialize repository and make commits
- Practice branching and merging strategies
- Handle merge conflicts effectively
- Use git log, diff, and status commands
- Create and apply patches
Exercise 7B: GitHub Collaboration (75 minutes)
- Fork repository and create pull requests
- Practice code review process
- Use GitHub Issues for project management
- Create project documentation with README
- Set up GitHub Pages for project showcase
Exercise 7C: ML Project Structure (60 minutes)
- Organize ML projects with proper structure
- Use .gitignore for ML artifacts
- Version control datasets and models
- Create reproducible environments
- Document experiments and results
Daily Assessment: Collaborative ML Project (150 minutes)
Tasks:
- Work in teams of 2-3 people
- Choose one simple dataset (tips, iris, housing)
- Create shared GitHub repository
- Each member implements different ML algorithm
- Use branches for development, merge via pull requests
- Create basic README with results
- Present final comparison of approaches
Evaluation: Git usage + collaboration quality + final project
Day 8: Scientific Writing & LaTeX
Academic Communication and Documentation
📚 Learning Materials
- Resource: LaTeX Tutorial on Overleaf
- Guide: Lab Handbook - Technical Writing section
- Examples: Sustainability Lab publications for reference
Learning Objectives
- Master LaTeX for academic writing
- Understand scientific paper structure
- Create publication-ready documents
Daily Exercises
Exercise 8A: LaTeX Fundamentals (75 minutes)
- Set up Overleaf account and create first document
- Learn document structure and basic formatting
- Create mathematical equations and formulas
- Insert figures, tables, and references
- Use bibliography management with BibTeX
Exercise 8C: Paper Structure (60 minutes)
- Analyze structure of top-tier ML papers
- Write effective abstracts and introductions
- Create methodology sections with equations
- Present experimental results clearly
- Write conclusions and future work
Daily Assessment: Research Paper Draft (120 minutes)
Task: Write a 4-page research paper on your Day 6 neural network project Required Sections: 1. Abstract (150 words) 2. Introduction with literature review 3. Methodology with mathematical formulation 4. Experimental results with tables and figures 5. Conclusion and future work 6. Properly formatted references
Evaluation: Writing clarity + technical accuracy + LaTeX formatting
Day 9: Research Methods Bootcamp
Scientific Thinking and Research Skills
Learning Objectives
- Develop scientific thinking skills
- Master research communication
- Learn systematic problem-solving
Daily Exercises
Exercise 9A: Email Communication (45 minutes)
Based on Bootcamp Session 1
- Write professional email to potential research advisor
- Compose progress report email to supervisor
- Request help from external researcher professionally
- Practice concise and clear communication
- Learn email etiquette for academia
Exercise 9B: Abstract Analysis (60 minutes)
Based on Bootcamp Session 2
- Analyze 5 abstracts from top ML conferences
- Identify key components of effective abstracts
- Rewrite weak abstracts to improve clarity
- Practice identifying scientific flaws in papers
- Create abstract for your own research idea
Exercise 9C: Scientific Method Application (75 minutes)
Based on Bootcamp Session 3
- Formulate testable hypotheses for ML problems
- Design controlled experiments
- Identify potential confounding variables
- Practice systematic observation and data collection
- Apply statistical hypothesis testing
Exercise 9D: Debugging Mastery (45 minutes)
Based on Bootcamp Session 4
- Analyze effective StackOverflow questions
- Practice systematic debugging approaches
- Create minimal reproducible examples
- Learn to ask precise technical questions
- Develop problem isolation skills
Daily Assessment: Research Proposal (120 minutes)
Tasks:
- Problem statement with motivation
- Literature review (10+ papers)
- Research methodology and experimental design
- Timeline and resource requirements
- Expected contributions and impact
- Potential challenges and mitigation strategies
Presentation: 15-minute presentation + 10-minute Q&A Evaluation: Scientific rigor + presentation quality + feasibility
Day 10: Integration & Lab Projects
Connecting Skills to Research Impact
📚 Learning Materials
- Lab Website: Sustainability Lab Projects
- Research Papers: Recent lab publications
- Resources: Lab handbook and current project overviews
Learning Objectives
- Connect ML skills to sustainability applications
- Understand lab research domains
- Design internship project proposal
Daily Exercises
Exercise 10A: Lab Project Deep Dive (90 minutes)
- JoulesEye Analysis: Understand energy expenditure estimation using thermal imagery
- SpiroMask Study: Explore respiratory monitoring through smart face masks
- VayuBuddy Exploration: Analyze AI-powered air quality chatbot
- Space to Policy: Investigate satellite imagery for environmental compliance
- Choose one project and analyze technical approach in detail
Exercise 10B: Research Gap Identification (60 minutes)
- Read 3 recent papers from chosen lab research area
- Identify current limitations and challenges
- Propose novel extensions using learned ML techniques
- Consider practical implementation challenges
- Assess potential societal impact
Final Assessment: Internship Project Proposal (180 minutes)
Phase 1: Proposal Development (120 minutes)
Create comprehensive project proposal including:
- Problem Statement (500 words)
- Sustainability challenge being addressed
- Current approaches and limitations
- Proposed ML solution overview
- Technical Approach (800 words)
- Detailed methodology with equations
- Dataset requirements and collection plan
- ML algorithms: specify traditional ML + modern tools integration
- Include at least ONE of: Hugging Face, YOLO, or PyTorch LSTM
- Evaluation metrics and baselines
- Computational resource specifications (GPU, RAM, storage)
- Implementation Plan (400 words)
- 6-month timeline with deliverables
- Computational resource requirements
- Risk assessment and mitigation strategies
- Expected Impact (300 words)
- Scientific contributions
- Practical applications
- Societal benefits
Phase 2: Final Presentation (30 minutes)
- Presentation: 20 minutes + 10 minutes Q&A
- Audience: Lab members, instructors, and peers
- Evaluation Criteria:
- Technical soundness (30%)
- Innovation and creativity (25%)
- Feasibility and planning (25%)
- Presentation quality (20%)
Phase 3: Peer Review (30 minutes)
- Review and provide feedback on 2 other proposals
- Practice constructive scientific criticism
- Learn from diverse approaches and ideas
Assessment Framework & Progress Tracking
Daily Assessments (70% of total score)
- Knowledge Application: Coding challenges and implementations (40%)
- Project Work: Daily mini-projects with real datasets (20%)
- Research Skills: Writing, communication, and analysis (10%)
Final Assessment (30% of total score)
- Capstone Project: Comprehensive internship proposal (20%)
- Presentation: Communication and defense of ideas (10%)
Progress Tracking Methods
- Daily Check-ins: 15-minute individual meetings with instructor
- Code Reviews: Live coding sessions to verify understanding
- Peer Presentations: Explain your solution to fellow participants
- Version Control: All work tracked through Git commits with timestamps
- Randomized Questions: Different datasets/parameters for each participant
Anti-Cheating Measures
- Live Coding Sessions: Random code explanation during daily reviews
- Unique Datasets: Each participant gets different subset/parameters
- Timed Assessments: Completed under supervision
- Oral Defense: Must explain methodology and code decisions
- Progressive Complexity: Later exercises build on earlier work
- Pair Programming: Rotate partners to verify individual skills
Grading Scale
- Outstanding (95-100%): Exceptional preparation, ready for independent research
- Excellent (90-94%): Strong foundation, ready for advanced projects
- Good (80-89%): Solid preparation, ready for guided research
- Satisfactory (70-79%): Adequate foundation, needs continued mentorship
- Needs Improvement (<70%): Additional preparation required
Technical Requirements
Software Stack
- Python 3.8+ with Anaconda distribution
- Core Libraries: NumPy, Pandas, Matplotlib, Seaborn
- Traditional ML: Scikit-learn (StandardScaler, MinMaxScaler, RandomForestClassifier)
- Deep Learning: PyTorch (NN, CNN, LSTM)
- Computer Vision: YOLO (ultralytics package)
- NLP/LLMs: Hugging Face Transformers (zero-shot, few-shot, fine-tuning)
- Data Annotation: Label Studio (minimal usage)
- Development: Jupyter Lab, VS Code, Git
- Documentation: LaTeX (Overleaf), Markdown
- Collaboration: Git, GitHub, Slack
Hardware Access
- Personal Setup: Laptop with minimum 8GB RAM, 256GB storage
- Modern ML Requirements:
- YOLO/Computer Vision: RTX 3080 (8GB VRAM) minimum, 16GB RAM
- Hugging Face LLMs: 16GB RAM minimum, 32GB recommended
- PyTorch Training: CUDA-capable GPU, 50GB storage for models
- Lab Resources: Access to computational servers (Ramanujan, Bhaskar, Sustain)
- Ramanujan: 4x A100 GPUs (80GB each) - ideal for large model training
- Bhaskar: 2x RTX A5000 (24GB each) - perfect for YOLO + Hugging Face
- Accounts: GitHub, Overleaf, Google Colab, Hugging Face Hub
Extended Resources
Primary Textbooks
- Python Data Science Handbook - Jake VanderPlas
- An Introduction to Statistical Learning - James, Witten, Hastie, Tibshirani
- Pattern Recognition and Machine Learning - Christopher Bishop
Online Courses (Optional)
- NPTEL Machine Learning - Balaram Ravindran
- Andrew Ng’s ML Course - Coursera
- Fast.ai Practical Deep Learning
Lab-Specific Resources
- Sustainability Lab Publications: Latest research papers
- Computational Infrastructure: Server access and usage guidelines
- Lab Culture Guide: Expectations and best practices
Success Outcomes
Technical Competencies
Participants will be able to:
- ✅ Implement complete ML pipelines from data to deployment
- ✅ Apply appropriate algorithms for sustainability problems
- ✅ Conduct rigorous experimental evaluation
- ✅ Communicate findings through papers and presentations
- ✅ Collaborate effectively using modern development tools
Research Readiness
- Independent Work: Execute research projects with minimal supervision
- Critical Thinking: Evaluate and improve existing approaches
- Innovation: Propose novel solutions to sustainability challenges
- Communication: Present work at conferences and in publications
Lab Integration
- Culture Fit: Understand lab values and working style
- Technical Skills: Ready to contribute to ongoing projects
- Collaboration: Effective team member and mentor to new students
- Research Impact: Ability to produce high-quality, publishable research
Key Resources
Infrastructure
- Lab Servers: GPU access (A100, RTX A5000) for computational work
- Google Colab: Cloud computing for exercises