LAB WORK – Introduction to Machine Learning

Introduction to Machine Learning

Laboratory work in this course is designed to help students learn machine learning by doing. Labs emphasize experimentation, interpretation, and reproducibility rather than perfect results. Each lab builds toward the skills needed for the final project.

All labs use open datasets, open-source software, and reproducible Jupyter notebooks.

General Lab Expectations

Labs are completed using Python and Jupyter notebooks
Code must be well-organized and clearly commented
Short written reflections are required in Markdown cells
Results must be reproducible (notebooks should run from top to bottom)
Collaboration on ideas is allowed, but submitted work must be your own

Lab 1—Data Exploration & Environment Setup

Purpose

This lab introduces the computational environment and foundational data-handling skills needed for the course.

Topics

Python environment setup
Jupyter notebooks
Loading and inspecting datasets
Basic data visualization

Tasks

Install required libraries
Load an open dataset (UCI or OpenML)
Inspect structure, missing values, and data types
Create simple plots (histograms, scatter plots)

Learning Outcomes

Students will be able to:

Set up a working machine learning environment
Explore datasets systematically
Identify potential data quality issues

Lab 2—Linear & Logistic Regression from Scratch

Purpose

This lab builds intuition for regression and classification by implementing models manually.

Topics

Linear regression
Logistic regression
Loss functions
Gradient descent

Tasks

Implement linear regression using NumPy
Implement logistic regression for binary classification
Visualize model predictions
Compare results with Scikit-learn implementations

Learning Outcomes

Students will be able to:

Understand how regression models work internally
Interpret model coefficients
Explain differences between regression and classification

Lab 3—Classification with Open Datasets

Purpose

This lab focuses on applying classification algorithms to real-world datasets.

Topics

k-Nearest Neighbors
Decision Trees
Model comparison

Tasks

Choose an open dataset
Train multiple classifiers
Evaluate models using accuracy, precision, recall, and confusion matrices
Analyze model strengths and weaknesses

Learning Outcomes

Students will be able to:

Apply classification methods appropriately
Select suitable evaluation metrics
Compare competing models critically

Lab 4—Unsupervised Learning & Pattern Discovery

Purpose

This lab introduces unsupervised learning techniques for discovering structure in unlabeled data.

Topics

k-means clustering
Dimensionality reduction
Principal Component Analysis (PCA)

Tasks

Apply k-means clustering
Reduce dimensionality using PCA
Visualize clusters
Interpret discovered patterns

Learning Outcomes

Students will be able to:

Use clustering for exploratory analysis
Explain how PCA transforms data
Evaluate the limitations of unsupervised methods

Lab 5—Model Evaluation & Validation

Purpose

This lab focuses on evaluating model performance and avoiding common pitfalls such as overfitting.

Topics

Train/test splits
Cross-validation
Bias–variance tradeoff

Tasks

Implement cross-validation
Compare model performance across multiple splits
Analyze overfitting and underfitting
Select appropriate evaluation strategies

Learning Outcomes

Students will be able to:

Evaluate models rigorously
Justify evaluation choices
Interpret performance metrics responsibly

Lab 6—Neural Networks with Open Frameworks

Purpose

This lab introduces students to modern deep learning frameworks and workflows.

Topics

Neural network architectures
Activation functions
Training with PyTorch or TensorFlow

Tasks

Build a simple neural network
Train on an open dataset
Tune basic hyperparameters
Visualize training loss and accuracy

Learning Outcomes

Students will be able to:

Implement neural networks using open tools
Explain training dynamics
Reflect on strengths and limitations of deep learning

Lab Submission Guidelines

Each lab submission must include:

A completed Jupyter notebook (.ipynb)
Clear Markdown explanations
Well-commented code
Fully reproducible results

Final Project: Reproducible Machine Learning Research (Weeks 11–14)

Weeks 11–14 are dedicated to the final reproducible project, which serves as the capstone of the course. Students move beyond structured labs to conduct an original machine learning analysis using an open dataset.

Project Requirements

Dataset: Publicly available dataset (e.g., NYC Open Data, UCI, or Kaggle with open license)
Platform: Project must be hosted on GitHub for transparency and version control
Reproducibility: Include a README.md explaining environment setup and execution
Open Licensing: Apply an open license (e.g., MIT or CC BY) to encourage reuse

What to Submit

GitHub Repository Link
Includes code, cleaned data (or a script to download it), and documentation
Technical Report
A Jupyter notebook demonstrating data cleaning, model selection, hyperparameter tuning, and final evaluation
Ethics Statement (500 words)
Reflection on dataset bias, limitations, and potential societal impact of the model

Project Details Link: https://hanifml2026.commons.gc.cuny.edu/project-details/

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31