Complete Program

10-Day Sustainability Research Bootcamp

This intensive 10-day program prepares interns with essential ML fundamentals, practical skills, and research methods. Each day combines lectures from established courses with hands-on exercises and assessments using real datasets.

Target Audience: Summer interns, 6-month interns, JRFs, and prospective lab members
Prerequisites: Basic Python programming knowledge
Format: Daily 4-5 hour sessions with lectures, practice, and assessment


Day 1: NumPy Fundamentals

Array Computing for Data Science

Lecture Materials

Learning Objectives

  • Master NumPy array operations and indexing
  • Understand broadcasting and vectorization
  • Apply NumPy to mathematical computations

Daily Exercises

Exercise 1A: Array Fundamentals (30 minutes)

  1. Create arrays using different methods (zeros, ones, arange, linspace)
  2. Practice array indexing and slicing
  3. Reshape and transpose operations
  4. Array concatenation and splitting

Exercise 1B: Mathematical Operations (45 minutes)

  1. Element-wise operations vs matrix operations
  2. Statistical functions (mean, std, min, max)
  3. Sorting and searching in arrays
  4. Random number generation and seeding

Exercise 1C: Broadcasting Practice (30 minutes)

  1. Add scalar to array
  2. Operations between arrays of different shapes
  3. Create meshgrids for plotting
  4. Normalize arrays using broadcasting

Daily Assessment: Activity Recognition Analysis (60 minutes)

Dataset: UCI Human Activity Recognition Tasks:

  1. Load and explore accelerometer/gyroscope data
  2. Calculate basic statistics for each activity type
  3. Find patterns in sensor readings for walking vs sitting
  4. Create visualizations using NumPy operations
  5. Identify which sensors are most informative

Deliverable: Jupyter notebook with clean code and insights


Day 2: Pandas & Matplotlib

Data Manipulation and Visualization

Lecture Materials

Learning Objectives

  • Master DataFrame operations and data cleaning
  • Create professional visualizations
  • Combine data from multiple sources

Daily Exercises

Exercise 2A: DataFrame Operations (45 minutes)

  1. Load CSV data and inspect structure
  2. Handle missing values (dropna, fillna, interpolate)
  3. Filter and query operations
  4. Group by operations and aggregations
  5. Merge and join different datasets

Exercise 2B: Data Cleaning Pipeline (45 minutes)

  1. Remove duplicates and outliers
  2. Convert data types appropriately
  3. Create derived columns
  4. Handle datetime data
  5. Export cleaned data

Exercise 2C: Visualization Mastery (60 minutes)

  1. Line plots with multiple series
  2. Scatter plots with color coding
  3. Histograms and distribution plots
  4. Subplots and figure customization
  5. Save high-quality figures

Daily Assessment: Weather Data Analysis (90 minutes)

Dataset: UCI Weather Dataset or built-in Seaborn Flights Dataset

Tasks:

  1. Load and clean weather data (temperature, humidity, pressure)
  2. Handle missing values using simple methods
  3. Create basic time-based features (month, season)
  4. Calculate monthly averages and trends
  5. Create simple visualizations:
    • Temperature trends over time
    • Correlation between weather variables
    • Monthly distribution plots
    • Simple seasonal patterns
  6. Write 1-page summary of findings

Deliverable: Clean dataset + basic EDA notebook


Day 3: Python Mastery Assessment

Comprehensive Skills Evaluation

📚 Review Materials

  • Consolidate learning from Days 1-2
  • Python Data Science Handbook (Chapters 1-4)

40 Challenge Questions (180 minutes)

NumPy Section (Questions 1-15)

  1. Create a 3D array (5×4×3) filled with random integers from 1-100
  2. Find all elements greater than 50 and replace with their square root
  3. Compute the covariance matrix for a 2D dataset
  4. Implement matrix multiplication without using np.dot()
  5. Create a function to normalize arrays to 0-1 range
  6. Find the indices of the maximum value in each row of a 2D array
  7. Create a moving average function using array slicing
  8. Solve a system of linear equations using NumPy
  9. Create a function to compute pairwise distances between points
  10. Implement k-means centroid update using broadcasting
  11. Create a 2D Gaussian kernel for image filtering
  12. Find connected components in a binary image (0s and 1s)
  13. Implement efficient computation of Euclidean distance matrix
  14. Create a function to remove outliers using z-score
  15. Compute eigenvalues and eigenvectors for PCA implementation

Pandas Section (Questions 16-30)

  1. Load multiple CSV files and combine them efficiently
  2. Create a function to detect and handle different types of missing data
  3. Implement time-based resampling for irregular time series
  4. Create a pivot table with multiple aggregation functions
  5. Implement efficient groupby operations on large datasets
  6. Create a function to standardize column names
  7. Merge datasets with different time zones
  8. Implement outlier detection using IQR method
  9. Create a function to generate summary statistics report
  10. Handle categorical data encoding efficiently
  11. Implement sliding window operations on time series
  12. Create a function to validate data quality
  13. Implement efficient data type optimization
  14. Create custom aggregation functions for groupby
  15. Handle hierarchical/multi-index DataFrames

Matplotlib Section (Questions 31-40)

  1. Create a publication-ready figure with subplots
  2. Implement interactive plots with widgets
  3. Create custom colormaps for scientific data
  4. Design a dashboard-style multi-panel figure
  5. Create animations for time series data
  6. Implement error bars and confidence intervals
  7. Create geographic plots using basemap principles
  8. Design custom plot styles and themes
  9. Create 3D visualizations for scientific data
  10. Implement plot export pipeline for publications

Daily Assessment: Multi-Dataset Analysis (120 minutes)

Dataset: Seaborn Tips + Seaborn Flights

Tasks:

  1. Load both tips and flights datasets
  2. Clean and explore each dataset separately
  3. Create 4 visualizations for tips data (bill vs tip patterns)
  4. Create 4 visualizations for flights data (passenger trends)
  5. Write simple summary comparing patterns in both datasets

Evaluation Criteria: - Code quality and readability (40%) - Visualization clarity (30%) - Data insights (30%)


Day 4: Machine Learning Introduction

Foundations of Supervised Learning

Lecture Materials

Learning Objectives

  • Understand supervised vs unsupervised learning
  • Master the ML workflow and evaluation
  • Implement basic algorithms from scratch

Daily Exercises

Exercise 4A: ML Workflow (60 minutes)

  1. Implement train-validation-test splits
  2. Create cross-validation from scratch
  3. Implement basic performance metrics
  4. Practice bias-variance tradeoff concepts
  5. Create learning curves

Exercise 4B: Linear Regression Deep Dive (75 minutes)

PDF: Linear Regression Slides
Video: ML Course Playlist - Linear Regression

  1. Implement linear regression using normal equation
  2. Add regularization (Ridge and Lasso)
  3. PDF: Ridge Regression
  4. PDF: Lasso Regression
  5. Compare with scikit-learn implementation
  6. Analyze residuals and model assumptions

Exercise 4C: Classification Fundamentals (60 minutes)

PDF: Logistic Regression
Video: ML Course Playlist - Classification

  1. Implement logistic regression from scratch
  2. Compare with notebook: Logistic Regression Implementation
  3. Create confusion matrices and ROC curves
  4. Practice precision, recall, F1-score calculations

Daily Assessment: Housing Price Prediction (90 minutes)

Dataset: UCI Boston Housing Dataset Tasks:

  1. Predict house prices using neighborhood features
  2. Compare linear regression, ridge, and lasso performance
  3. Create simple feature combinations (e.g., rooms per capita)
  4. Create basic model evaluation with scatter plots
  5. Interpret which features matter most for price

Bonus: Implement simple gradient descent from scratch


Day 5: Tree-Based Methods

Non-Linear Models and Ensemble Learning

Lecture Materials

Learning Objectives

  • Master decision tree algorithms
  • Understand ensemble methods
  • Apply to complex real-world problems

Daily Exercises

Exercise 5A: Decision Tree Implementation (90 minutes)

  1. Build decision tree from scratch using Gini impurity
  2. Implement tree pruning techniques
  3. Visualize decision boundaries
  4. Compare with scikit-learn implementation
  5. Analyze tree depth vs performance

Exercise 5B: Ensemble Methods with Scikit-learn (75 minutes)

  1. Use StandardScaler and MinMaxScaler for data preprocessing
  2. Implement RandomForestClassifier with different parameters
  3. Compare individual trees vs ensemble performance
  4. Feature importance analysis using .feature_importances_
  5. Cross-validation with cross_val_score

Exercise 5C: K-Nearest Neighbors (45 minutes)

  1. Implement KNN for classification and regression
  2. Experiment with different distance metrics
  3. Analyze curse of dimensionality
  4. Implement efficient neighbor search

Daily Assessment: Iris Species Classification (120 minutes)

Dataset: Seaborn Iris Dataset (built-in, no download needed!)

Tasks:

  1. Build classifier to predict iris species (setosa, versicolor, virginica)
  2. Compare decision tree, random forest, and KNN performance
  3. Use tree-based feature importance to find key measurements
  4. Create simple model interpretation plots
  5. Visualize decision boundaries for 2D projections
  6. Create simple prediction function for new flowers

Evaluation: Model performance + interpretability + code clarity


Day 6: Neural Networks & PyTorch

Introduction to Deep Learning

Lecture Materials

Learning Objectives

  • Understand neural network fundamentals
  • Master PyTorch for deep learning
  • Implement backpropagation from scratch

Daily Exercises

Exercise 6A: Neural Network from Scratch (90 minutes)

  1. Implement forward pass for multi-layer perceptron
  2. Code backpropagation algorithm step by step
  3. Add different activation functions (sigmoid, ReLU, tanh)
  4. Implement mini-batch gradient descent
  5. Compare with analytical solutions

Exercise 6B: PyTorch Framework Deep Dive (75 minutes)

  1. Build neural networks using torch.nn.Module
  2. Implement CNNs using torch.nn.Conv2d and torch.nn.MaxPool2d
  3. Create data loaders with torch.utils.data.DataLoader
  4. Add regularization (dropout, weight decay)
  5. Implement learning rate scheduling with torch.optim.lr_scheduler

Exercise 6C: Advanced PyTorch Architectures (60 minutes)

PDF: Convolutional Neural Networks

  1. Build CNN using torch.nn.Conv2d for image classification
  2. Implement LSTM using torch.nn.LSTM for time series
  3. Practice sequence modeling with torch.nn.Sequential
  4. Compare CNN vs LSTM performance on time series data
  5. PDF: 1D CNN

Daily Assessment: Simple Time Series Prediction (120 minutes)

Dataset: Seaborn Flights Dataset (passenger numbers over time)

Tasks:

  1. Build simple neural network to predict passenger numbers
  2. Compare basic MLP vs simpler approaches
  3. Create train/validation/test splits for time series
  4. Plot predictions vs actual values
  5. Calculate and interpret prediction errors
  6. Experiment with different number of hidden layers

Bonus: Try predicting multiple steps ahead


Day 7: Version Control & Collaboration

Git, GitHub, and Development Workflow

📚 Learning Materials

Learning Objectives

  • Master Git version control
  • Understand collaborative development
  • Apply to ML project workflows

Daily Exercises

Exercise 7A: Git Fundamentals (60 minutes)

  1. Initialize repository and make commits
  2. Practice branching and merging strategies
  3. Handle merge conflicts effectively
  4. Use git log, diff, and status commands
  5. Create and apply patches

Exercise 7B: GitHub Collaboration (75 minutes)

  1. Fork repository and create pull requests
  2. Practice code review process
  3. Use GitHub Issues for project management
  4. Create project documentation with README
  5. Set up GitHub Pages for project showcase

Exercise 7C: ML Project Structure (60 minutes)

  1. Organize ML projects with proper structure
  2. Use .gitignore for ML artifacts
  3. Version control datasets and models
  4. Create reproducible environments
  5. Document experiments and results

Daily Assessment: Collaborative ML Project (150 minutes)

Tasks:

  1. Work in teams of 2-3 people
  2. Choose one simple dataset (tips, iris, housing)
  3. Create shared GitHub repository
  4. Each member implements different ML algorithm
  5. Use branches for development, merge via pull requests
  6. Create basic README with results
  7. Present final comparison of approaches

Evaluation: Git usage + collaboration quality + final project


Day 8: Scientific Writing & LaTeX

Academic Communication and Documentation

📚 Learning Materials

  • Resource: LaTeX Tutorial on Overleaf
  • Guide: Lab Handbook - Technical Writing section
  • Examples: Sustainability Lab publications for reference

Learning Objectives

  • Master LaTeX for academic writing
  • Understand scientific paper structure
  • Create publication-ready documents

Daily Exercises

Exercise 8A: LaTeX Fundamentals (75 minutes)

  1. Set up Overleaf account and create first document
  2. Learn document structure and basic formatting
  3. Create mathematical equations and formulas
  4. Insert figures, tables, and references
  5. Use bibliography management with BibTeX

Exercise 8B: Scientific Figure Creation (60 minutes)

  1. Export high-quality figures from matplotlib
  2. Create publication-ready plots with proper captions
  3. Design tables for experimental results
  4. Learn IEEE/ACM formatting standards
  5. Practice figure referencing in text

Exercise 8C: Paper Structure (60 minutes)

  1. Analyze structure of top-tier ML papers
  2. Write effective abstracts and introductions
  3. Create methodology sections with equations
  4. Present experimental results clearly
  5. Write conclusions and future work

Daily Assessment: Research Paper Draft (120 minutes)

Task: Write a 4-page research paper on your Day 6 neural network project Required Sections: 1. Abstract (150 words) 2. Introduction with literature review 3. Methodology with mathematical formulation 4. Experimental results with tables and figures 5. Conclusion and future work 6. Properly formatted references

Evaluation: Writing clarity + technical accuracy + LaTeX formatting


Day 9: Research Methods Bootcamp

Scientific Thinking and Research Skills

📚 Learning Materials

Learning Objectives

  • Develop scientific thinking skills
  • Master research communication
  • Learn systematic problem-solving

Daily Exercises

Exercise 9A: Email Communication (45 minutes)

Based on Bootcamp Session 1

  1. Write professional email to potential research advisor
  2. Compose progress report email to supervisor
  3. Request help from external researcher professionally
  4. Practice concise and clear communication
  5. Learn email etiquette for academia

Exercise 9B: Abstract Analysis (60 minutes)

Based on Bootcamp Session 2

  1. Analyze 5 abstracts from top ML conferences
  2. Identify key components of effective abstracts
  3. Rewrite weak abstracts to improve clarity
  4. Practice identifying scientific flaws in papers
  5. Create abstract for your own research idea

Exercise 9C: Scientific Method Application (75 minutes)

Based on Bootcamp Session 3

  1. Formulate testable hypotheses for ML problems
  2. Design controlled experiments
  3. Identify potential confounding variables
  4. Practice systematic observation and data collection
  5. Apply statistical hypothesis testing

Exercise 9D: Debugging Mastery (45 minutes)

Based on Bootcamp Session 4

  1. Analyze effective StackOverflow questions
  2. Practice systematic debugging approaches
  3. Create minimal reproducible examples
  4. Learn to ask precise technical questions
  5. Develop problem isolation skills

Daily Assessment: Research Proposal (120 minutes)

Tasks:

  1. Problem statement with motivation
  2. Literature review (10+ papers)
  3. Research methodology and experimental design
  4. Timeline and resource requirements
  5. Expected contributions and impact
  6. Potential challenges and mitigation strategies

Presentation: 15-minute presentation + 10-minute Q&A Evaluation: Scientific rigor + presentation quality + feasibility


Day 10: Integration & Lab Projects

Connecting Skills to Research Impact

📚 Learning Materials

  • Lab Website: Sustainability Lab Projects
  • Research Papers: Recent lab publications
  • Resources: Lab handbook and current project overviews

Learning Objectives

  • Connect ML skills to sustainability applications
  • Understand lab research domains
  • Design internship project proposal

Daily Exercises

Exercise 10A: Lab Project Deep Dive (90 minutes)

  1. JoulesEye Analysis: Understand energy expenditure estimation using thermal imagery
  2. SpiroMask Study: Explore respiratory monitoring through smart face masks
  3. VayuBuddy Exploration: Analyze AI-powered air quality chatbot
  4. Space to Policy: Investigate satellite imagery for environmental compliance
  5. Choose one project and analyze technical approach in detail

Exercise 10B: Research Gap Identification (60 minutes)

  1. Read 3 recent papers from chosen lab research area
  2. Identify current limitations and challenges
  3. Propose novel extensions using learned ML techniques
  4. Consider practical implementation challenges
  5. Assess potential societal impact

Exercise 10C: Modern ML Tools Hands-On Integration (150 minutes)

Part A: Hugging Face Zero-Shot Classification (45 minutes)

  1. Dataset: 20 Newsgroups Dataset (sample 100 articles)

  2. Task: Use pipeline('zero-shot-classification') from Hugging Face

  3. Labels: [‘technology’, ‘sports’, ‘politics’, ‘science’, ‘entertainment’]

  4. Code:

    from transformers import pipeline
    classifier = pipeline("zero-shot-classification")
    results = classifier(text, candidate_labels)
  5. Report: Compare zero-shot vs traditional Naive Bayes accuracy on same data

  6. Deliverable: Confusion matrix + accuracy comparison table

Part B: YOLO Object Detection (45 minutes)

  1. Dataset: Download 20 sample images from COCO dataset (people, cars, animals)

  2. Task: Run YOLOv8 object detection using ultralytics package

  3. Code:

    from ultralytics import YOLO
    model = YOLO('yolov8n.pt')  # nano model
    results = model.predict(source='images/', save=True)
  4. Analysis: Count detected objects, measure confidence scores

  5. Report: Detection accuracy on 20 test images with bounding boxes

  6. Deliverable: Annotated images + detection statistics CSV

Part C: Label Studio Annotation Workflow (30 minutes)

  1. Dataset: 50 images from your YOLO exercise (subset with missing labels)
  2. Setup: Create Label Studio project for object detection
  3. Tasks:
    • Install: pip install label-studio
    • Create project: label-studio start
    • Import 50 images needing annotation
    • Create labeling interface for bounding boxes
    • Label 10 images manually (person, car, bike)
  4. Export: Download annotations in YOLO format
  5. Deliverable: 10 manually labeled images + annotation files

Part D: Traditional ML + Modern Tools Pipeline (30 minutes)

  1. Integration Task: Combine all three approaches
  2. Workflow:
    • Use YOLO to detect objects in images
    • Extract detected object crops
    • Use Hugging Face to classify object types with zero-shot
    • Compare with manual Label Studio annotations
  3. Resource Planning:
    • GPU requirements: RTX 3080 (8GB) minimum for YOLO
    • RAM: 16GB for Hugging Face models
    • Storage: 50GB for models + datasets
    • Processing time: 2-3 seconds per image
  4. Deliverable: End-to-end pipeline code + performance benchmarks

Final Deliverables:

  • Jupyter notebook with all 4 parts integrated
  • Performance comparison: Traditional ML vs Zero-shot vs YOLO
  • Resource utilization report (GPU/CPU/Memory usage)
  • Deployment cost estimation for 1000 images/day

Final Assessment: Internship Project Proposal (180 minutes)

Phase 1: Proposal Development (120 minutes)

Create comprehensive project proposal including:

  1. Problem Statement (500 words)
    • Sustainability challenge being addressed
    • Current approaches and limitations
    • Proposed ML solution overview
  2. Technical Approach (800 words)
    • Detailed methodology with equations
    • Dataset requirements and collection plan
    • ML algorithms: specify traditional ML + modern tools integration
    • Include at least ONE of: Hugging Face, YOLO, or PyTorch LSTM
    • Evaluation metrics and baselines
    • Computational resource specifications (GPU, RAM, storage)
  3. Implementation Plan (400 words)
    • 6-month timeline with deliverables
    • Computational resource requirements
    • Risk assessment and mitigation strategies
  4. Expected Impact (300 words)
    • Scientific contributions
    • Practical applications
    • Societal benefits

Phase 2: Final Presentation (30 minutes)

  • Presentation: 20 minutes + 10 minutes Q&A
  • Audience: Lab members, instructors, and peers
  • Evaluation Criteria:
    • Technical soundness (30%)
    • Innovation and creativity (25%)
    • Feasibility and planning (25%)
    • Presentation quality (20%)

Phase 3: Peer Review (30 minutes)

  • Review and provide feedback on 2 other proposals
  • Practice constructive scientific criticism
  • Learn from diverse approaches and ideas

Assessment Framework & Progress Tracking

Daily Assessments (70% of total score)

  • Knowledge Application: Coding challenges and implementations (40%)
  • Project Work: Daily mini-projects with real datasets (20%)
  • Research Skills: Writing, communication, and analysis (10%)

Final Assessment (30% of total score)

  • Capstone Project: Comprehensive internship proposal (20%)
  • Presentation: Communication and defense of ideas (10%)

Progress Tracking Methods

  • Daily Check-ins: 15-minute individual meetings with instructor
  • Code Reviews: Live coding sessions to verify understanding
  • Peer Presentations: Explain your solution to fellow participants
  • Version Control: All work tracked through Git commits with timestamps
  • Randomized Questions: Different datasets/parameters for each participant

LLM and AI Tool Policy

  • Learning Phase (Days 1-8): NO LLMs allowed - build fundamental understanding
  • Reference Only: Use official documentation (NumPy docs, Pandas docs, etc.)
  • Stack Overflow: Allowed for specific error debugging (with citation)
  • Books/Tutorials: Encouraged for concept learning
  • Why This Policy: Develop problem-solving skills and deep understanding
  • Final Projects (Days 9-10): LLMs allowed as coding assistant (must be declared)

Anti-Cheating Measures

  • Live Coding Sessions: Random code explanation during daily reviews
  • Unique Datasets: Each participant gets different subset/parameters
  • Timed Assessments: Completed under supervision
  • Oral Defense: Must explain methodology and code decisions
  • Progressive Complexity: Later exercises build on earlier work
  • Pair Programming: Rotate partners to verify individual skills

Self-Assessment Tools

  • Daily Reflection: Rate your understanding (1-5) of each concept
  • Skill Checklist: Track completion of specific competencies
  • Practice Problems: Additional exercises for self-paced learning
  • Peer Feedback: Anonymous feedback on collaboration and communication

Grading Scale

  • Outstanding (95-100%): Exceptional preparation, ready for independent research
  • Excellent (90-94%): Strong foundation, ready for advanced projects
  • Good (80-89%): Solid preparation, ready for guided research
  • Satisfactory (70-79%): Adequate foundation, needs continued mentorship
  • Needs Improvement (<70%): Additional preparation required

Technical Requirements

Software Stack

  • Python 3.8+ with Anaconda distribution
  • Core Libraries: NumPy, Pandas, Matplotlib, Seaborn
  • Traditional ML: Scikit-learn (StandardScaler, MinMaxScaler, RandomForestClassifier)
  • Deep Learning: PyTorch (NN, CNN, LSTM)
  • Computer Vision: YOLO (ultralytics package)
  • NLP/LLMs: Hugging Face Transformers (zero-shot, few-shot, fine-tuning)
  • Data Annotation: Label Studio (minimal usage)
  • Development: Jupyter Lab, VS Code, Git
  • Documentation: LaTeX (Overleaf), Markdown
  • Collaboration: Git, GitHub, Slack

Hardware Access

  • Personal Setup: Laptop with minimum 8GB RAM, 256GB storage
  • Modern ML Requirements:
    • YOLO/Computer Vision: RTX 3080 (8GB VRAM) minimum, 16GB RAM
    • Hugging Face LLMs: 16GB RAM minimum, 32GB recommended
    • PyTorch Training: CUDA-capable GPU, 50GB storage for models
  • Lab Resources: Access to computational servers (Ramanujan, Bhaskar, Sustain)
    • Ramanujan: 4x A100 GPUs (80GB each) - ideal for large model training
    • Bhaskar: 2x RTX A5000 (24GB each) - perfect for YOLO + Hugging Face
  • Accounts: GitHub, Overleaf, Google Colab, Hugging Face Hub

Simple Public Datasets Used Throughout Program

  1. UCI Human Activity Recognition: Sensor data analysis
  2. UCI Weather Dataset: Basic time series
  3. Seaborn Built-in Datasets: (tips, flights, iris) Simple exploration
  4. Seaborn Iris Dataset: Classic classification (iris species)
  5. UCI Boston Housing: Regression fundamentals
  6. Simple Text Data: Basic text processing

Extended Resources

Primary Textbooks

  1. Python Data Science Handbook - Jake VanderPlas
  2. An Introduction to Statistical Learning - James, Witten, Hastie, Tibshirani
  3. Pattern Recognition and Machine Learning - Christopher Bishop

Online Courses (Optional)

  1. NPTEL Machine Learning - Balaram Ravindran
  2. Andrew Ng’s ML Course - Coursera
  3. Fast.ai Practical Deep Learning

Lab-Specific Resources

  • Sustainability Lab Publications: Latest research papers
  • Computational Infrastructure: Server access and usage guidelines
  • Lab Culture Guide: Expectations and best practices

Success Outcomes

Technical Competencies

Participants will be able to:

  • ✅ Implement complete ML pipelines from data to deployment
  • ✅ Apply appropriate algorithms for sustainability problems
  • ✅ Conduct rigorous experimental evaluation
  • ✅ Communicate findings through papers and presentations
  • ✅ Collaborate effectively using modern development tools

Research Readiness

  • Independent Work: Execute research projects with minimal supervision
  • Critical Thinking: Evaluate and improve existing approaches
  • Innovation: Propose novel solutions to sustainability challenges
  • Communication: Present work at conferences and in publications

Lab Integration

  • Culture Fit: Understand lab values and working style
  • Technical Skills: Ready to contribute to ongoing projects
  • Collaboration: Effective team member and mentor to new students
  • Research Impact: Ability to produce high-quality, publishable research

Key Resources

Course Materials

ML/DL Frameworks & Tools

  • Scikit-learn: StandardScaler, MinMaxScaler, RandomForestClassifier
  • PyTorch: Neural Networks (NN), Convolutional Networks (CNN), LSTM
  • Computer Vision: YOLO (object detection)
  • Hugging Face: LLM integration (zero-shot, few-shot, basic fine-tuning)
  • Label Studio: Data annotation (minimal usage)

Essential Development Tools

  • Python 3.8+, NumPy, Pandas, Matplotlib
  • Git/GitHub, Jupyter Lab, VS Code, LaTeX/Overleaf

Infrastructure

  • Lab Servers: GPU access (A100, RTX A5000) for computational work
  • Google Colab: Cloud computing for exercises

Contact & Application

Sustainability Lab, IIT Gandhinagar
Email: nipun.batra@iitgn.ac.in
Website: https://sustainability-lab.github.io/

Application: Email with CV, research interests, and 1-2 lab papers you’ve read.


This program combines rigorous technical training with practical application to sustainability challenges. The integration of established course materials, hands-on coding, and research methodology creates a comprehensive learning experience that prepares participants for impactful research careers.