Project Details
Final Project: Reproducible Machine Learning Research
CISC 3220– Introduction to Machine Learning
This page provides complete instructions for the final project.
Read this page carefully before starting your project.
1. Project Overview
The final project is the capstone of this course. You will design, implement, evaluate, and clearly communicate a machine learning workflow using an open dataset and open-source tools.
The project emphasizes:
- Reproducibility
- Transparency
- Ethical awareness
- Clear communication
This project replaces a traditional final exam.
2. Project Timeline (Weeks 11–14)
- Week 11: Dataset selection & project proposal
- Week 12: Model development & experimentation
- Week 13: Finalizing reproducible notebooks & GitHub repository
- Week 14: Final presentations
3. Dataset Requirements (Very Important)
You must use a publicly available dataset.
Acceptable Sources
- UCI Machine Learning Repository
- OpenML
- NYC Open Data
- Kaggle (dataset must have an open license)
Dataset Rules
- You must cite the dataset source
- You must describe known limitations or biases
- Synthetic or private datasets are not allowed
4. Technical Requirements
Your project must include:
- Python-based implementation
- At least one machine learning model
- Appropriate evaluation metrics
- Clear visualizations
- Fully reproducible results
You may use:
- NumPy, Pandas, Scikit-learn
- PyTorch or TensorFlow
- Matplotlib / Seaborn
5. GitHub Repository (Required)
All projects must be hosted on GitHub.
Your repository must include:
- Source code
- Jupyter notebook(s)
README.mdfilerequirements.txtor environment setup instructions
Your repository should be understandable to someone who has never seen your project before.
6. GitHub README Template (Required)
You must include a README.md file using the template below.
Copy this template into your GitHub repository and replace all placeholder text.
# Project Title
## Project Overview
Describe the machine learning problem and why it matters.
## Dataset Description
Source:
Link:
Describe the dataset, features, and known limitations.
## Project Structure
Explain the files and folders in your project.
## Environment Setup
Python version:
Libraries used:
## Methodology
Explain the models and methods used.
## Evaluation
Explain how you evaluated your model.
## Reproducibility
Explain how someone can run your project.
## Ethical Considerations
Discuss bias, fairness, and societal impact.
## Limitations & Future Work
Describe limitations and possible extensions.
## References
List any resources used.
## License
MIT License or CC BY 4.0
## Author
Your name, course, semester
7. Final Presentation (Required)
Presentation Format
- Length: 6–8 minutes
- Audience: Class
- Visuals: Slides or notebook walkthrough
Presentation Must Cover
- Problem & dataset
- Model approach
- Key results
- One limitation
- One ethical concern
The goal is clarity, not complexity.
8. What to submit?
You must submit:
- GitHub repository link
- Final Jupyter notebook
- Ethics statement (500 words)
- Presentation slides or notebook
9. Evaluation Criteria (Summary)
Your project will be evaluated on:
- Technical correctness
- Reproducibility
- Clarity of explanation
- Ethical reflection
- Quality of presentation
See Assignments & Grading for the full rubric.
10. Academic Integrity & Open Licensing
- All code must be your own work
- All sources must be cited
- Open-source libraries must follow license terms
- You are encouraged to apply an open license to your work

