Your task for this project is to delve deep into the realm of data mining.
This is intentionally vague: a project that best mirrors your unique interests and expertise. The project should also include the incorporation of an element or concept you’ve self-taught, this could be an algorithm we haven’t explored in class, a specialized data mining library or tool, or a unique data preprocessing workflow.
If you’re uncertain whether your “newly acquired knowledge” counts, reach out and ask!
Ideas
Your first task is to come up with a goal for your project. Here are a few ideas:
Time Series Analysis
Predict stock market prices using historical data and ARIMA, LSTM, or Prophet.
Anomaly Detection
Detect credit card fraud or identify network intrusions.
Association Rule Mining
Perform market basket analysis on supermarket data.
Social Network Analysis
Identify influential nodes or misinformation spread.
Multimedia Mining
Build a music recommendation system or tag videos automatically.
Interactive Visualization
Create a dashboard for pattern/outlier discovery.
Ensemble Methods
Compare ensemble techniques (boosting, bagging) on a classification task.
Choose a dataset that genuinely resonates with you — not one from TidyTuesday — and offers real exploration potential.
Due Dates
Proposal (Peer Review): Wed, Oct 29 — 11:59pm
Revised Proposal (Instructor): Mon, Nov 01 — 11:59pm
Final Write-up & Presentation: Wed, Dec 17 — 11:59pm
Deliverables
Proposal
High-level goal and expanded description
Data and motivation
Weekly “plan of attack”
Repository structure and folder README.md files
Peer Review
Complete assigned proposal and code reviews as GitHub issues using the provided templates.
✨ Optional Track: Vibe Coding
Objective
The Vibe Coding track challenges you to explore a full data-mining project guided by intuition, iteration, and reflection. Your goal is to document the process — what worked, what didn’t, and how you navigated discovery — and to evaluate the strengths and weaknesses of vibe coding as a workflow.
In choosing this path, you must complete:
The Vibe Coding Report, following the structure below.
What Counts as Vibe Coding
Rapid iterative experimentation and pivoting between approaches
Tool or model switching as insights evolve
Opportunistic feature engineering or parameter tuning
Use of LLMs or AutoML tools for ideation/refactoring (must be disclosed)
Runnable demo of the final path One clean Quarto or script that reproduces your “winning” workflow.
Attribution & Provenance Must list datasets, external code, and any AI assistance (model + how used).
Example Attempt Log Renderer
Code
import pandas as pdfrom datetime import datetimefrom pathlib import Pathimport plotly.graph_objects as go# Path to the log filelog_path = Path("data/vibe_log.csv")if log_path.exists():# Read and clean data vibe_log = pd.read_csv(log_path)if"timestamp"in vibe_log.columns: vibe_log["timestamp"] = pd.to_datetime(vibe_log["timestamp"], errors="coerce")# Replace boolean AI flag with readable symbolsif"ai_used"in vibe_log.columns: vibe_log["ai_used"] = vibe_log["ai_used"].apply(lambda x: "AI"ifstr(x).lower() in ["true", "1", "yes"] else"—")# Select and rename columns if present keep_cols = ["timestamp", "step_id", "goal", "approach", "evidence_metric","result_summary", "decision_next", "time_spent_min", "ai_used" ] vibe_log = vibe_log[[c for c in keep_cols if c in vibe_log.columns]] vibe_log = vibe_log.rename(columns={"timestamp": "When","step_id": "Step","goal": "Goal","approach": "Approach","evidence_metric": "Evidence","result_summary": "What Happened","decision_next": "Next Decision","time_spent_min": "Minutes","ai_used": "AI?" })# Display nicely formatted table with Plotly fig = go.Figure(data=[go.Table( header=dict( values=list(vibe_log.columns), fill_color="lightgrey", align="left" ), cells=dict( values=[vibe_log[col] for col in vibe_log.columns], fill_color="white", align="left" ) )]) fig.update_layout(title_text="Vibe Coding Attempt Log") fig.show()
Vibe Coding Rubric Mapping
Component
Points
Notes
Proposal
10
Include evaluation plan for vibe workflow
Presentation
30
Same as standard; emphasize story arc
Write-up
30
Problem framing (4), Attempt Log (8), Decision evidence (8), Final pipeline clarity (6), Reflection (4)
Reproducibility & Style
10
Include clean final path + machine-readable attempt log
Within-team Evaluation
10
Unchanged
Between-team Review
10
Add comments on clarity of decision evidence
Peer Review Additions (for Vibe Projects)
Is the attempt log detailed and decision-driven?
Are pivots tied to evidence or reasoning?
Does the final pipeline align with the reported “winning path”?
Are AI uses disclosed and described responsibly?
AI Use & Attribution Template
> **AI Use & Attribution**: This project used ChatGPT (GPT-5) on 2025-10-15 for ideation and code refactoring.
> Outputs were verified, rewritten, and version-pinned.
> Risks (hallucinations, data leakage, licensing) were mitigated through manual review and documentation.
Write-up or Vibe Report
Your write-up (or Vibe Report) should be 1000–2000 words and include:
Clear introduction and motivation
Methods, analysis, and results
Discussion and conclusion (or reflection, for vibe coding)
Presentation
5 minutes, every team member must speak. Use Quarto or another reproducible format. You’re evaluated on content, professionalism, teamwork, and creativity.
Repo Organization
Organize your repository logically. At a minimum include:
index.qmd
data/ folder (if applicable)
Documentation in each subfolder
Grading Overview
Component
Points
Proposal
10
Presentation
30
Write-up / Vibe Report
30
Reproducibility & Style
10
Within-team Peer Eval
10
Between-team Peer Eval
10
Total
100 pts
Guidelines
Everything must be reproducible and render without errors.
Use inline code for all reported values.
All AI or external tool use must be clearly cited.
Work collaboratively and resolve merge conflicts early.
Tips
Be adventurous — try new methods!
Log your reasoning behind pivots if you choose the vibe track.
Each team member’s commits will be reviewed for equity.
Hide code in your presentation but show it in your write-up.
Submission
Submit your final GitHub repository link by Wed, Dec 17 at 11:59pm. Make sure it renders fully before submission.