Complete Program

10-Day Sustainability Research Bootcamp

This intensive 10-day program prepares interns with essential ML fundamentals, practical skills, and research methods. Each day combines lectures from established courses with hands-on exercises and assessments using real datasets.

Target Audience: Summer interns, 6-month interns, JRFs, and prospective lab members
Prerequisites: Basic Python programming knowledge
Format: Daily 4-5 hour sessions with lectures, practice, and assessment

Day 1: NumPy Fundamentals

Array Computing for Data Science

Lecture Materials

Video: Introduction to NumPy (PSDV25 Lecture 1)
Notebook: NumPy Introduction
Additional: Broadcasting Concepts
Lab: Google Colab NumPy Practice

Learning Objectives

Master NumPy array operations and indexing
Understand broadcasting and vectorization
Apply NumPy to mathematical computations

Daily Exercises

Exercise 1A: Array Fundamentals (30 minutes)

Create arrays using different methods (zeros, ones, arange, linspace)
Practice array indexing and slicing
Reshape and transpose operations
Array concatenation and splitting

Exercise 1B: Mathematical Operations (45 minutes)

Element-wise operations vs matrix operations
Statistical functions (mean, std, min, max)
Sorting and searching in arrays
Random number generation and seeding

Exercise 1C: Broadcasting Practice (30 minutes)

Add scalar to array
Operations between arrays of different shapes
Create meshgrids for plotting
Normalize arrays using broadcasting

Daily Assessment: Activity Recognition Analysis (60 minutes)

Dataset: UCI Human Activity Recognition Tasks:

Load and explore accelerometer/gyroscope data
Calculate basic statistics for each activity type
Find patterns in sensor readings for walking vs sitting
Create visualizations using NumPy operations
Identify which sensors are most informative

Deliverable: Jupyter notebook with clean code and insights

Day 2: Pandas & Matplotlib

Data Manipulation and Visualization

Lecture Materials

Video 1: Introduction to Pandas (PSDV25 Lecture 3)
Video 2: Introduction to Matplotlib (PSDV25 Lecture 4)
Notebook 1: Pandas Fundamentals
Notebook 2: Matplotlib Basics
Lab: Google Colab Pandas Practice

Learning Objectives

Master DataFrame operations and data cleaning
Create professional visualizations
Combine data from multiple sources

Daily Exercises

Exercise 2A: DataFrame Operations (45 minutes)

Load CSV data and inspect structure
Handle missing values (dropna, fillna, interpolate)
Filter and query operations
Group by operations and aggregations
Merge and join different datasets

Exercise 2B: Data Cleaning Pipeline (45 minutes)

Remove duplicates and outliers
Convert data types appropriately
Create derived columns
Handle datetime data
Export cleaned data

Exercise 2C: Visualization Mastery (60 minutes)

Line plots with multiple series
Scatter plots with color coding
Histograms and distribution plots
Subplots and figure customization
Save high-quality figures

Daily Assessment: Weather Data Analysis (90 minutes)

Dataset: UCI Weather Dataset or built-in Seaborn Flights Dataset

Tasks:

Load and clean weather data (temperature, humidity, pressure)
Handle missing values using simple methods
Create basic time-based features (month, season)
Calculate monthly averages and trends
Create simple visualizations:
- Temperature trends over time
- Correlation between weather variables
- Monthly distribution plots
- Simple seasonal patterns
Write 1-page summary of findings

Deliverable: Clean dataset + basic EDA notebook

Day 3: Python Mastery Assessment

Comprehensive Skills Evaluation

📚 Review Materials

Consolidate learning from Days 1-2
Python Data Science Handbook (Chapters 1-4)

40 Challenge Questions (180 minutes)

NumPy Section (Questions 1-15)

Create a 3D array (5×4×3) filled with random integers from 1-100
Find all elements greater than 50 and replace with their square root
Compute the covariance matrix for a 2D dataset
Implement matrix multiplication without using np.dot()
Create a function to normalize arrays to 0-1 range
Find the indices of the maximum value in each row of a 2D array
Create a moving average function using array slicing
Solve a system of linear equations using NumPy
Create a function to compute pairwise distances between points
Implement k-means centroid update using broadcasting
Create a 2D Gaussian kernel for image filtering
Find connected components in a binary image (0s and 1s)
Implement efficient computation of Euclidean distance matrix
Create a function to remove outliers using z-score
Compute eigenvalues and eigenvectors for PCA implementation

Pandas Section (Questions 16-30)

Load multiple CSV files and combine them efficiently
Create a function to detect and handle different types of missing data
Implement time-based resampling for irregular time series
Create a pivot table with multiple aggregation functions
Implement efficient groupby operations on large datasets
Create a function to standardize column names
Merge datasets with different time zones
Implement outlier detection using IQR method
Create a function to generate summary statistics report
Handle categorical data encoding efficiently
Implement sliding window operations on time series
Create a function to validate data quality
Implement efficient data type optimization
Create custom aggregation functions for groupby
Handle hierarchical/multi-index DataFrames

Matplotlib Section (Questions 31-40)

Create a publication-ready figure with subplots
Implement interactive plots with widgets
Create custom colormaps for scientific data
Design a dashboard-style multi-panel figure
Create animations for time series data
Implement error bars and confidence intervals
Create geographic plots using basemap principles
Design custom plot styles and themes
Create 3D visualizations for scientific data
Implement plot export pipeline for publications

Daily Assessment: Multi-Dataset Analysis (120 minutes)

Dataset: Seaborn Tips + Seaborn Flights

Tasks:

Load both tips and flights datasets
Clean and explore each dataset separately
Create 4 visualizations for tips data (bill vs tip patterns)
Create 4 visualizations for flights data (passenger trends)
Write simple summary comparing patterns in both datasets

Evaluation Criteria: - Code quality and readability (40%) - Visualization clarity (30%) - Data insights (30%)

Day 4: Machine Learning Introduction

Foundations of Supervised Learning

Lecture Materials

Video: ML Course Playlist - Introduction lectures
PDF: Introduction to ML
Notebooks: ML Teaching Collection - Intro sections

Learning Objectives

Understand supervised vs unsupervised learning
Master the ML workflow and evaluation
Implement basic algorithms from scratch

Daily Exercises

Exercise 4A: ML Workflow (60 minutes)

Implement train-validation-test splits
Create cross-validation from scratch
Implement basic performance metrics
Practice bias-variance tradeoff concepts
Create learning curves

Exercise 4B: Linear Regression Deep Dive (75 minutes)

PDF: Linear Regression Slides
Video: ML Course Playlist - Linear Regression

Implement linear regression using normal equation
Add regularization (Ridge and Lasso)
PDF: Ridge Regression
PDF: Lasso Regression
Compare with scikit-learn implementation
Analyze residuals and model assumptions

Exercise 4C: Classification Fundamentals (60 minutes)

PDF: Logistic Regression
Video: ML Course Playlist - Classification

Implement logistic regression from scratch
Compare with notebook: Logistic Regression Implementation
Create confusion matrices and ROC curves
Practice precision, recall, F1-score calculations

Daily Assessment: Housing Price Prediction (90 minutes)

Dataset: UCI Boston Housing Dataset Tasks:

Predict house prices using neighborhood features
Compare linear regression, ridge, and lasso performance
Create simple feature combinations (e.g., rooms per capita)
Create basic model evaluation with scatter plots
Interpret which features matter most for price

Bonus: Implement simple gradient descent from scratch

Day 5: Tree-Based Methods

Non-Linear Models and Ensemble Learning

Lecture Materials

PDF: Decision Trees
PDF: Ensemble Methods
PDF: K-Nearest Neighbors
Video: ML Course Playlist - Tree Methods

Learning Objectives

Master decision tree algorithms
Understand ensemble methods
Apply to complex real-world problems

Daily Exercises

Exercise 5A: Decision Tree Implementation (90 minutes)

Build decision tree from scratch using Gini impurity
Implement tree pruning techniques
Visualize decision boundaries
Compare with scikit-learn implementation
Analyze tree depth vs performance

Exercise 5B: Ensemble Methods with Scikit-learn (75 minutes)

Use StandardScaler and MinMaxScaler for data preprocessing
Implement RandomForestClassifier with different parameters
Compare individual trees vs ensemble performance
Feature importance analysis using .feature_importances_
Cross-validation with cross_val_score

Exercise 5C: K-Nearest Neighbors (45 minutes)

Implement KNN for classification and regression
Experiment with different distance metrics
Analyze curse of dimensionality
Implement efficient neighbor search

Daily Assessment: Iris Species Classification (120 minutes)

Dataset: Seaborn Iris Dataset (built-in, no download needed!)

Tasks:

Build classifier to predict iris species (setosa, versicolor, virginica)
Compare decision tree, random forest, and KNN performance
Use tree-based feature importance to find key measurements
Create simple model interpretation plots
Visualize decision boundaries for 2D projections
Create simple prediction function for new flowers

Evaluation: Model performance + interpretability + code clarity

Day 6: Neural Networks & PyTorch

Introduction to Deep Learning

Lecture Materials

PDF: Multilayer Perceptron
PDF: Gradient Descent
PDF: Stochastic Gradient Descent
Notebook: PyTorch Logistic Regression

Learning Objectives

Understand neural network fundamentals
Master PyTorch for deep learning
Implement backpropagation from scratch

Daily Exercises

Exercise 6A: Neural Network from Scratch (90 minutes)

Implement forward pass for multi-layer perceptron
Code backpropagation algorithm step by step
Add different activation functions (sigmoid, ReLU, tanh)
Implement mini-batch gradient descent
Compare with analytical solutions

Exercise 6B: PyTorch Framework Deep Dive (75 minutes)

Build neural networks using torch.nn.Module
Implement CNNs using torch.nn.Conv2d and torch.nn.MaxPool2d
Create data loaders with torch.utils.data.DataLoader
Add regularization (dropout, weight decay)
Implement learning rate scheduling with torch.optim.lr_scheduler

Exercise 6C: Advanced PyTorch Architectures (60 minutes)

PDF: Convolutional Neural Networks

Build CNN using torch.nn.Conv2d for image classification
Implement LSTM using torch.nn.LSTM for time series
Practice sequence modeling with torch.nn.Sequential
Compare CNN vs LSTM performance on time series data
PDF: 1D CNN

Daily Assessment: Simple Time Series Prediction (120 minutes)

Dataset: Seaborn Flights Dataset (passenger numbers over time)

Tasks:

Build simple neural network to predict passenger numbers
Compare basic MLP vs simpler approaches
Create train/validation/test splits for time series
Plot predictions vs actual values
Calculate and interpret prediction errors
Experiment with different number of hidden layers

Bonus: Try predicting multiple steps ahead

Day 7: Development Workflow: Git, GitHub & Remote Servers

From Local Code to Collaborative & Remote Execution

📚 Learning Materials

Git & GitHub:
- Tutorial: GitHub Skills Course
- Video: Git Tutorial by Kevin Stratvert
Remote Servers & SSH:
- In-Depth Guide: A Beginner’s Guide to Remote Servers for ML (A great resource to reference)
- Tool: tmux Cheat Sheet
- VS Code: Remote SSH Extension Guide

Learning Objectives

Master Git version control for tracking changes.
Understand collaborative development on GitHub using forks and pull requests.
Apply version control to ML project workflows with proper structure.
Connect securely to remote servers using SSH and key-based authentication.
Monitor and manage server resources like GPUs and disk space.
Run persistent, long-running experiments using tmux.

Daily Exercises

Exercise 7A: Git Fundamentals (60 minutes)

Initialize repository and make commits
Practice branching and merging strategies
Handle merge conflicts effectively
Use git log, diff, and status commands
Create and apply patches

Exercise 7B: GitHub Collaboration (75 minutes)

Fork repository and create pull requests
Practice code review process
Use GitHub Issues for project management
Create project documentation with README
Set up GitHub Pages for project showcase

Exercise 7C: ML Project Structure (60 minutes)

Organize ML projects with proper structure
Use .gitignore for ML artifacts
Version control datasets and models
Create reproducible environments
Document experiments and results

Exercise 7D: Remote Server Connection (45 minutes)

First Connection: Connect to a remote server using the ssh command with the username, IP address, and port provided by your instructor.
SSH Key Authentication: Secure your connection and enable passwordless login.
- Generate an SSH key pair on your local machine: ssh-keygen -t rsa -b 4096
- Copy your public key to the remote server using ssh-copy-id.
```
# Adjust the user, host, and port as needed
ssh-copy-id -p 2222 user@remote.server.com
```
SSH Shortcut: Create a host alias in your local ~/.ssh/config file for quick access to the server.

Exercise 7E: Server Management & Persistent Sessions (45 minutes)

Server Monitoring: Log in and practice these essential monitoring commands.
- Check GPU usage: watch nvidia-smi
- Check CPU and memory usage: htop
- Check your disk usage: du -sh ~
Persistent Sessions with tmux: Run experiments that survive disconnections.
- Start a new named tmux session: tmux new -s my_experiment
- Inside the session, run a long-running command (e.g., top).
- Detach from the session: Press Ctrl+b, then d.
- Log out, log back in, and re-attach to your session to see it is still running: tmux attach -t my_experiment

Daily Assessment: Collaborative Remote ML Project (150 minutes)

Tasks:

Work in teams of 2-3 people.
Choose one simple dataset (tips, iris, housing) and create a shared GitHub repository.
Each member implements a different ML algorithm on a separate branch, then merges via a pull request.
Remote Execution: One team member logs into the remote lab server, clones the final repository, and runs the main training script inside a tmux session.
Documentation: Create a basic README.md file that summarizes the results and includes the command to re-attach to the tmux session, proving the experiment is running persistently.
Present the final comparison of approaches and the running remote session.

Evaluation: Git usage + collaboration quality + successful remote execution + final project.

Day 8: Scientific Writing & LaTeX

Academic Communication and Documentation

📚 Learning Materials

Resource: LaTeX Tutorial on Overleaf
Guide: Lab Handbook - Technical Writing section
Examples: Sustainability Lab publications for reference

Learning Objectives

Master LaTeX for academic writing
Understand scientific paper structure
Create publication-ready documents

Daily Exercises

Exercise 8A: LaTeX Fundamentals (75 minutes)

Set up Overleaf account and create first document
Learn document structure and basic formatting
Create mathematical equations and formulas
Insert figures, tables, and references
Use bibliography management with BibTeX

Exercise 8B: Scientific Figure Creation (60 minutes)

Export high-quality figures from matplotlib
Create publication-ready plots with proper captions
Design tables for experimental results
Learn IEEE/ACM formatting standards
Practice figure referencing in text

Exercise 8C: Paper Structure (60 minutes)

Analyze structure of top-tier ML papers
Write effective abstracts and introductions
Create methodology sections with equations
Present experimental results clearly
Write conclusions and future work

Daily Assessment: Research Paper Draft (120 minutes)

Task: Write a 4-page research paper on your Day 6 neural network project Required Sections: 1. Abstract (150 words) 2. Introduction with literature review 3. Methodology with mathematical formulation 4. Experimental results with tables and figures 5. Conclusion and future work 6. Properly formatted references

Evaluation: Writing clarity + technical accuracy + LaTeX formatting

Day 9: Research Methods Bootcamp

Scientific Thinking and Research Skills

📚 Learning Materials

Primary Resource: CS Research Methods Bootcamp
Video: Bootcamp session recordings
Lab Resource: Academic & Research Skills from handbook

Learning Objectives

Develop scientific thinking skills
Master research communication
Learn systematic problem-solving

Daily Exercises

Exercise 9A: Email Communication (45 minutes)

Based on Bootcamp Session 1

Write professional email to potential research advisor
Compose progress report email to supervisor
Request help from external researcher professionally
Practice concise and clear communication
Learn email etiquette for academia

Exercise 9B: Abstract Analysis (60 minutes)

Based on Bootcamp Session 2

Analyze 5 abstracts from top ML conferences
Identify key components of effective abstracts
Rewrite weak abstracts to improve clarity
Practice identifying scientific flaws in papers
Create abstract for your own research idea

Exercise 9C: Scientific Method Application (75 minutes)

Based on Bootcamp Session 3

Formulate testable hypotheses for ML problems
Design controlled experiments
Identify potential confounding variables
Practice systematic observation and data collection
Apply statistical hypothesis testing

Exercise 9D: Debugging Mastery (45 minutes)

Based on Bootcamp Session 4

Analyze effective StackOverflow questions
Practice systematic debugging approaches
Create minimal reproducible examples
Learn to ask precise technical questions
Develop problem isolation skills

Daily Assessment: Research Proposal (120 minutes)

Tasks:

Problem statement with motivation
Literature review (10+ papers)
Research methodology and experimental design
Timeline and resource requirements
Expected contributions and impact
Potential challenges and mitigation strategies

Presentation: 15-minute presentation + 10-minute Q&A Evaluation: Scientific rigor + presentation quality + feasibility

Day 10: Integration & Lab Projects

Connecting Skills to Research Impact

📚 Learning Materials

Lab Website: Sustainability Lab Projects
Research Papers: Recent lab publications
Resources: Lab handbook and current project overviews

Learning Objectives

Connect ML skills to sustainability applications
Understand lab research domains
Design internship project proposal

Daily Exercises

Exercise 10A: Lab Project Deep Dive (90 minutes)

JoulesEye Analysis: Understand energy expenditure estimation using thermal imagery
SpiroMask Study: Explore respiratory monitoring through smart face masks
VayuBuddy Exploration: Analyze AI-powered air quality chatbot
Space to Policy: Investigate satellite imagery for environmental compliance
Choose one project and analyze technical approach in detail

Exercise 10B: Research Gap Identification (60 minutes)

Read 3 recent papers from chosen lab research area
Identify current limitations and challenges
Propose novel extensions using learned ML techniques
Consider practical implementation challenges
Assess potential societal impact

Exercise 10C: Modern ML Tools Hands-On Integration (150 minutes)

Part A: Hugging Face Zero-Shot Classification (45 minutes)

Dataset: 20 Newsgroups Dataset (sample 100 articles)
Task: Use pipeline('zero-shot-classification') from Hugging Face
Labels: [‘technology’, ‘sports’, ‘politics’, ‘science’, ‘entertainment’]

Code:

from transformers import pipeline
classifier = pipeline("zero-shot-classification")
results = classifier(text, candidate_labels)

Report: Compare zero-shot vs traditional Naive Bayes accuracy on same data
Deliverable: Confusion matrix + accuracy comparison table

Part B: YOLO Object Detection (45 minutes)

Dataset: Download 20 sample images from COCO dataset (people, cars, animals)
Task: Run YOLOv8 object detection using ultralytics package

Code:

from ultralytics import YOLO
model = YOLO('yolov8n.pt')  # nano model
results = model.predict(source='images/', save=True)

Analysis: Count detected objects, measure confidence scores
Report: Detection accuracy on 20 test images with bounding boxes
Deliverable: Annotated images + detection statistics CSV

Part C: Label Studio Annotation Workflow (30 minutes)

Dataset: 50 images from your YOLO exercise (subset with missing labels)
Setup: Create Label Studio project for object detection
Tasks:
- Install: pip install label-studio
- Create project: label-studio start
- Import 50 images needing annotation
- Create labeling interface for bounding boxes
- Label 10 images manually (person, car, bike)
Export: Download annotations in YOLO format
Deliverable: 10 manually labeled images + annotation files

Part D: Traditional ML + Modern Tools Pipeline (30 minutes)

Integration Task: Combine all three approaches
Workflow:
- Use YOLO to detect objects in images
- Extract detected object crops
- Use Hugging Face to classify object types with zero-shot
- Compare with manual Label Studio annotations
Resource Planning:
- GPU requirements: RTX 3080 (8GB) minimum for YOLO
- RAM: 16GB for Hugging Face models
- Storage: 50GB for models + datasets
- Processing time: 2-3 seconds per image
Deliverable: End-to-end pipeline code + performance benchmarks

Final Deliverables:

Jupyter notebook with all 4 parts integrated
Performance comparison: Traditional ML vs Zero-shot vs YOLO
Resource utilization report (GPU/CPU/Memory usage)
Deployment cost estimation for 1000 images/day

Final Assessment: Internship Project Proposal (180 minutes)

Phase 1: Proposal Development (120 minutes)

Create comprehensive project proposal including:

Problem Statement (500 words)
- Sustainability challenge being addressed
- Current approaches and limitations
- Proposed ML solution overview
Technical Approach (800 words)
- Detailed methodology with equations
- Dataset requirements and collection plan
- ML algorithms: specify traditional ML + modern tools integration
- Include at least ONE of: Hugging Face, YOLO, or PyTorch LSTM
- Evaluation metrics and baselines
- Computational resource specifications (GPU, RAM, storage)
Implementation Plan (400 words)
- 6-month timeline with deliverables
- Computational resource requirements
- Risk assessment and mitigation strategies
Expected Impact (300 words)
- Scientific contributions
- Practical applications
- Societal benefits

Phase 2: Final Presentation (30 minutes)

Presentation: 20 minutes + 10 minutes Q&A
Audience: Lab members, instructors, and peers
Evaluation Criteria:
- Technical soundness (30%)
- Innovation and creativity (25%)
- Feasibility and planning (25%)
- Presentation quality (20%)

Phase 3: Peer Review (30 minutes)

Review and provide feedback on 2 other proposals
Practice constructive scientific criticism
Learn from diverse approaches and ideas

Assessment Framework & Progress Tracking

Daily Assessments (70% of total score)

Knowledge Application: Coding challenges and implementations (40%)
Project Work: Daily mini-projects with real datasets (20%)
Research Skills: Writing, communication, and analysis (10%)

Final Assessment (30% of total score)

Capstone Project: Comprehensive internship proposal (20%)
Presentation: Communication and defense of ideas (10%)

Progress Tracking Methods

Daily Check-ins: 15-minute individual meetings with instructor
Code Reviews: Live coding sessions to verify understanding
Peer Presentations: Explain your solution to fellow participants
Version Control: All work tracked through Git commits with timestamps
Randomized Questions: Different datasets/parameters for each participant

LLM and AI Tool Policy

Learning Phase (Days 1-8): NO LLMs allowed - build fundamental understanding
Reference Only: Use official documentation (NumPy docs, Pandas docs, etc.)
Stack Overflow: Allowed for specific error debugging (with citation)
Books/Tutorials: Encouraged for concept learning
Why This Policy: Develop problem-solving skills and deep understanding
Final Projects (Days 9-10): LLMs allowed as coding assistant (must be declared)

Anti-Cheating Measures

Live Coding Sessions: Random code explanation during daily reviews
Unique Datasets: Each participant gets different subset/parameters
Timed Assessments: Completed under supervision
Oral Defense: Must explain methodology and code decisions
Progressive Complexity: Later exercises build on earlier work
Pair Programming: Rotate partners to verify individual skills

Self-Assessment Tools

Daily Reflection: Rate your understanding (1-5) of each concept
Skill Checklist: Track completion of specific competencies
Practice Problems: Additional exercises for self-paced learning
Peer Feedback: Anonymous feedback on collaboration and communication

Grading Scale

Outstanding (95-100%): Exceptional preparation, ready for independent research
Excellent (90-94%): Strong foundation, ready for advanced projects
Good (80-89%): Solid preparation, ready for guided research
Satisfactory (70-79%): Adequate foundation, needs continued mentorship
Needs Improvement (<70%): Additional preparation required

Technical Requirements

Software Stack

Python 3.8+ with Anaconda distribution
Core Libraries: NumPy, Pandas, Matplotlib, Seaborn
Traditional ML: Scikit-learn (StandardScaler, MinMaxScaler, RandomForestClassifier)
Deep Learning: PyTorch (NN, CNN, LSTM)
Computer Vision: YOLO (ultralytics package)
NLP/LLMs: Hugging Face Transformers (zero-shot, few-shot, fine-tuning)
Data Annotation: Label Studio (minimal usage)
Development: Jupyter Lab, VS Code, Git
Documentation: LaTeX (Overleaf), Markdown
Collaboration: Git, GitHub, Slack

Hardware Access

Personal Setup: Laptop with minimum 8GB RAM, 256GB storage
Modern ML Requirements:
- YOLO/Computer Vision: RTX 3080 (8GB VRAM) minimum, 16GB RAM
- Hugging Face LLMs: 16GB RAM minimum, 32GB recommended
- PyTorch Training: CUDA-capable GPU, 50GB storage for models
Lab Resources: Access to computational servers (Ramanujan, Bhaskar, Sustain)
- Ramanujan: 4x A100 GPUs (80GB each) - ideal for large model training
- Bhaskar: 2x RTX A5000 (24GB each) - perfect for YOLO + Hugging Face
Accounts: GitHub, Overleaf, Google Colab, Hugging Face Hub

Simple Public Datasets Used Throughout Program

UCI Human Activity Recognition: Sensor data analysis
UCI Weather Dataset: Basic time series
Seaborn Built-in Datasets: (tips, flights, iris) Simple exploration
Seaborn Iris Dataset: Classic classification (iris species)
UCI Boston Housing: Regression fundamentals
Simple Text Data: Basic text processing

Extended Resources

Primary Textbooks

Python Data Science Handbook - Jake VanderPlas
An Introduction to Statistical Learning - James, Witten, Hastie, Tibshirani
Pattern Recognition and Machine Learning - Christopher Bishop

Online Courses (Optional)

NPTEL Machine Learning - Balaram Ravindran
Andrew Ng’s ML Course - Coursera
Fast.ai Practical Deep Learning

Lab-Specific Resources

Sustainability Lab Publications: Latest research papers
Computational Infrastructure: Server access and usage guidelines
Lab Culture Guide: Expectations and best practices

Success Outcomes

Technical Competencies

Participants will be able to:

✅ Implement complete ML pipelines from data to deployment
✅ Apply appropriate algorithms for sustainability problems
✅ Conduct rigorous experimental evaluation
✅ Communicate findings through papers and presentations
✅ Collaborate effectively using modern development tools

Research Readiness

Independent Work: Execute research projects with minimal supervision
Critical Thinking: Evaluate and improve existing approaches
Innovation: Propose novel solutions to sustainability challenges
Communication: Present work at conferences and in publications

Lab Integration

Culture Fit: Understand lab values and working style
Technical Skills: Ready to contribute to ongoing projects
Collaboration: Effective team member and mentor to new students
Research Impact: Ability to produce high-quality, publishable research

Key Resources

Course Materials

ES 114 PSDV: Python fundamentals with YouTube lectures
ES 335 ML: Machine learning with PDF slides
CS Research Bootcamp: Research methodology

ML/DL Frameworks & Tools

Scikit-learn: StandardScaler, MinMaxScaler, RandomForestClassifier
PyTorch: Neural Networks (NN), Convolutional Networks (CNN), LSTM
Computer Vision: YOLO (object detection)
Hugging Face: LLM integration (zero-shot, few-shot, basic fine-tuning)
Label Studio: Data annotation (minimal usage)

Essential Development Tools

Python 3.8+, NumPy, Pandas, Matplotlib
Git/GitHub, Jupyter Lab, VS Code, LaTeX/Overleaf

Infrastructure

Lab Servers: GPU access (A100, RTX A5000) for computational work
Google Colab: Cloud computing for exercises

Contact & Application

Sustainability Lab, IIT Gandhinagar
Email: nipun.batra@iitgn.ac.in
Website: https://sustainability-lab.github.io/

Application: Email with CV, research interests, and 1-2 lab papers you’ve read.

This program combines rigorous technical training with practical application to sustainability challenges. The integration of established course materials, hands-on coding, and research methodology creates a comprehensive learning experience that prepares participants for impactful research careers.