Your task for this project is to delve deep into the realm of data mining.
This is intentionally vague: a project that best mirrors your unique interests and expertise. The project should also include the incorporation of an element or concept you’ve self-taught, this could be an algorithm we haven’t explored in class, a specialized data mining library or tool, or a unique data preprocessing workflow.
If you’re uncertain whether your “newly acquired knowledge” counts, reach out and ask!
Ideas
Your first task is to come up with a goal for your project. Here are a few ideas:
Time Series Analysis
Predict stock market prices using historical data and ARIMA, LSTM, or Prophet.
Anomaly Detection
Detect credit card fraud or identify network intrusions.
Association Rule Mining
Perform market basket analysis on supermarket data.
Social Network Analysis
Identify influential nodes or misinformation spread.
Multimedia Mining
Build a music recommendation system or tag videos automatically.
Interactive Visualization
Create a dashboard for pattern/outlier discovery.
Ensemble Methods
Compare ensemble techniques (boosting, bagging) on a classification task.
Choose a dataset that genuinely resonates with you — not one from TidyTuesday — and offers real exploration potential.
Due Dates
Proposal (for Peer Review): Wed, Oct 29 — 11:59pm
Peer-reviews (Proposals): Wed, Nov 05 — 11:59pm
Revised Proposal (Instructor): Mon, Nov 10 — 11:59pm
Final Write-up & Presentation: Wed, Dec 17 — 11:59pm
Deliverables
Proposal
High-level goal and expanded description
Data and motivation
Weekly “plan of attack”
Repository structure and folder README.md files
Peer Review
Complete assigned proposal and code reviews as GitHub issues using the provided templates.
Reviewer tasks
Each individual will review the proposals of two other individuals. The peer review must be completed on Friday, October 31. On that day teams will have access to the project repos of the two individuals whose work they’re reviewing. Reviews should start by cloning the individual’s repo, re-rendering it locally to make sure you can reproduce their work, and then adding an issue to their repo with your peer review feedback.
The reviewer / reviewee assignments are as follows:
Individuals will develop the review and submit it as an issue on the project repo. To do so, go to the Issues tab, click on the green New issue button on the top right, and then click on the green Get started button for the issue template titled Peer review.
This will start a new issue with a peer review form that you can fill out. Make sure to update the introductory paragraph with your name as a participant in the review. Then, answer each of the questions in the spaces provided underneath them. You’re expected to be thorough in your review, but this doesn’t necessarily require lengthy responses.
Remember, your goal is to help the individual whose project proposal you’re reviewing. The individual will not lose points because of issues you point out, as long as they address them before I review their proposals. You should be critical, but respectful in your review. Also remember that you will be evaluated on the quality of your review. So that’s an additional incentive to do a good job.
Reviewee tasks
Once you receive feedback from your peers, you should address them. You should do this by directly updating your proposal or making any other updates to your repo as needed. You can do these updates all in one commit or you can spread it across multiple commits. Regardless, in the last commit that addresses the peer review comments, you should use a keyword in your commit message that will close the peer review issues. These words are close, closes, closed, fix, fixes, fixed, resolve, resolves, and resolved and they need to be followed by the issue number (which you can find next to the issue title). So, your commit message can say something like “Finished updates based on peer review, fixes #1”.
✨ Optional Track: Vibe Coding
Objective
The Vibe Coding track challenges you to explore a full data-mining project guided by intuition, iteration, and reflection. Your goal is to document the process — what worked, what didn’t, and how you navigated discovery — and to evaluate the strengths and weaknesses of vibe coding as a workflow.
In choosing this path, you must complete:
The Vibe Coding Report, following the structure below.
What Counts as Vibe Coding
Rapid iterative experimentation and pivoting between approaches
Tool or model switching as insights evolve
Opportunistic feature engineering or parameter tuning
Use of LLMs or AutoML tools for ideation/refactoring (must be disclosed)
Runnable demo of the final path One clean Quarto or script that reproduces your “winning” workflow.
Attribution & Provenance Must list datasets, external code, and any AI assistance (model + how used).
Example Attempt Log Renderer
Code
import pandas as pdfrom datetime import datetimefrom pathlib import Pathimport plotly.graph_objects as go# Path to the log filelog_path = Path("data/vibe_log.csv")if log_path.exists():# Read and clean data vibe_log = pd.read_csv(log_path)if"timestamp"in vibe_log.columns: vibe_log["timestamp"] = pd.to_datetime(vibe_log["timestamp"], errors="coerce")# Replace boolean AI flag with readable symbolsif"ai_used"in vibe_log.columns: vibe_log["ai_used"] = vibe_log["ai_used"].apply(lambda x: "AI"ifstr(x).lower() in ["true", "1", "yes"] else"—")# Select and rename columns if present keep_cols = ["timestamp", "step_id", "goal", "approach", "evidence_metric","result_summary", "decision_next", "time_spent_min", "ai_used" ] vibe_log = vibe_log[[c for c in keep_cols if c in vibe_log.columns]] vibe_log = vibe_log.rename(columns={"timestamp": "When","step_id": "Step","goal": "Goal","approach": "Approach","evidence_metric": "Evidence","result_summary": "What Happened","decision_next": "Next Decision","time_spent_min": "Minutes","ai_used": "AI?" })# Display nicely formatted table with Plotly fig = go.Figure(data=[go.Table( header=dict( values=list(vibe_log.columns), fill_color="lightgrey", align="left" ), cells=dict( values=[vibe_log[col] for col in vibe_log.columns], fill_color="white", align="left" ) )]) fig.update_layout(title_text="Vibe Coding Attempt Log") fig.show()
Vibe Coding Rubric Mapping
Component
Points
Notes
Proposal
10
Include evaluation plan for vibe workflow
Presentation
30
Same as standard; emphasize story arc
Write-up
30
Problem framing (4), Attempt Log (8), Decision evidence (8), Final pipeline clarity (6), Reflection (4)
Reproducibility & Style
10
Include clean final path + machine-readable attempt log
Within-team Evaluation
10
Unchanged
Between-team Review
10
Add comments on clarity of decision evidence
Peer Review Additions (for Vibe Projects)
Is the attempt log detailed and decision-driven?
Are pivots tied to evidence or reasoning?
Does the final pipeline align with the reported “winning path”?
Are AI uses disclosed and described responsibly?
AI Use & Attribution Template
> **AI Use & Attribution**: This project used ChatGPT (GPT-5) on 2025-11-03 for ideation and code refactoring.
> Outputs were verified, rewritten, and version-pinned.
> Risks (hallucinations, data leakage, licensing) were mitigated through manual review and documentation.
Write-up or Vibe Report
Your write-up (or Vibe Report) should be 1000–2000 words and include:
Clear introduction and motivation
Methods, analysis, and results
Discussion and conclusion (or reflection, for vibe coding)
Presentation
5 minutes, every team member must speak. Use Quarto or another reproducible format. You’re evaluated on content, professionalism, teamwork, and creativity.
Repo Organization
Organize your repository logically. At a minimum include:
index.qmd
data/ folder (if applicable)
Documentation in each subfolder
Grading Overview
Component
Points
Proposal
10
Presentation
30
Write-up / Vibe Report
30
Reproducibility & Style
10
Within-team Peer Eval
10
Between-team Peer Eval
10
Total
100 pts
Guidelines
Everything must be reproducible and render without errors.
Use inline code for all reported values.
All AI or external tool use must be clearly cited.
Work collaboratively and resolve merge conflicts early.
Tips
Be adventurous — try new methods!
Log your reasoning behind pivots if you choose the vibe track.
Each team member’s commits will be reviewed for equity.
Hide code in your presentation but show it in your write-up.
Submission
Submit your final GitHub repository link by Wed, Dec 17 at 11:59pm. Make sure it renders fully before submission.