LAB WORK
Introduction to Machine Learning
Laboratory work in this course is designed to help students learn machine learning by doing. Labs emphasize experimentation, interpretation, and reproducibility rather than perfect results. Each lab builds toward the skills needed for the final project.
All labs use open datasets, open-source software, and reproducible Jupyter notebooks.
General Lab Expectations
- Labs are completed using Python and Jupyter notebooks
- Code must be well-organized and clearly commented
- Short written reflections are required in Markdown cells
- Results must be reproducible (notebooks should run from top to bottom)
- Collaboration on ideas is allowed, but submitted work must be your own
Lab 1—Data Exploration & Environment Setup
Purpose
This lab introduces the computational environment and foundational data-handling skills needed for the course.
Topics
- Python environment setup
- Jupyter notebooks
- Loading and inspecting datasets
- Basic data visualization
Tasks
- Install required libraries
- Load an open dataset (UCI or OpenML)
- Inspect structure, missing values, and data types
- Create simple plots (histograms, scatter plots)
Learning Outcomes
Students will be able to:
- Set up a working machine learning environment
- Explore datasets systematically
- Identify potential data quality issues
Lab 2—Linear & Logistic Regression from Scratch
Purpose
This lab builds intuition for regression and classification by implementing models manually.
Topics
- Linear regression
- Logistic regression
- Loss functions
- Gradient descent
Tasks
- Implement linear regression using NumPy
- Implement logistic regression for binary classification
- Visualize model predictions
- Compare results with Scikit-learn implementations
Learning Outcomes
Students will be able to:
- Understand how regression models work internally
- Interpret model coefficients
- Explain differences between regression and classification
Lab 3—Classification with Open Datasets
Purpose
This lab focuses on applying classification algorithms to real-world datasets.
Topics
- k-Nearest Neighbors
- Decision Trees
- Model comparison
Tasks
- Choose an open dataset
- Train multiple classifiers
- Evaluate models using accuracy, precision, recall, and confusion matrices
- Analyze model strengths and weaknesses
Learning Outcomes
Students will be able to:
- Apply classification methods appropriately
- Select suitable evaluation metrics
- Compare competing models critically
Lab 4—Unsupervised Learning & Pattern Discovery
Purpose
This lab introduces unsupervised learning techniques for discovering structure in unlabeled data.
Topics
- k-means clustering
- Dimensionality reduction
- Principal Component Analysis (PCA)
Tasks
- Apply k-means clustering
- Reduce dimensionality using PCA
- Visualize clusters
- Interpret discovered patterns
Learning Outcomes
Students will be able to:
- Use clustering for exploratory analysis
- Explain how PCA transforms data
- Evaluate the limitations of unsupervised methods
Lab 5—Model Evaluation & Validation
Purpose
This lab focuses on evaluating model performance and avoiding common pitfalls such as overfitting.
Topics
- Train/test splits
- Cross-validation
- Bias–variance tradeoff
Tasks
- Implement cross-validation
- Compare model performance across multiple splits
- Analyze overfitting and underfitting
- Select appropriate evaluation strategies
Learning Outcomes
Students will be able to:
- Evaluate models rigorously
- Justify evaluation choices
- Interpret performance metrics responsibly
Lab 6—Neural Networks with Open Frameworks
Purpose
This lab introduces students to modern deep learning frameworks and workflows.
Topics
- Neural network architectures
- Activation functions
- Training with PyTorch or TensorFlow
Tasks
- Build a simple neural network
- Train on an open dataset
- Tune basic hyperparameters
- Visualize training loss and accuracy
Learning Outcomes
Students will be able to:
- Implement neural networks using open tools
- Explain training dynamics
- Reflect on strengths and limitations of deep learning
Lab Submission Guidelines
Each lab submission must include:
- A completed Jupyter notebook (
.ipynb) - Clear Markdown explanations
- Well-commented code
- Fully reproducible results
Final Project: Reproducible Machine Learning Research (Weeks 11–14)
Weeks 11–14 are dedicated to the final reproducible project, which serves as the capstone of the course. Students move beyond structured labs to conduct an original machine learning analysis using an open dataset.
Project Requirements
- Dataset: Publicly available dataset (e.g., NYC Open Data, UCI, or Kaggle with open license)
- Platform: Project must be hosted on GitHub for transparency and version control
- Reproducibility: Include a
README.mdexplaining environment setup and execution - Open Licensing: Apply an open license (e.g., MIT or CC BY) to encourage reuse
What to Submit
- GitHub Repository Link
Includes code, cleaned data (or a script to download it), and documentation - Technical Report
A Jupyter notebook demonstrating data cleaning, model selection, hyperparameter tuning, and final evaluation - Ethics Statement (500 words)
Reflection on dataset bias, limitations, and potential societal impact of the model
Project Details Link: https://hanifml2026.commons.gc.cuny.edu/project-details/

