Final Project

Overview

Your task for this project is to delve deep into the realm of data mining.

This is intentionally vague: a project that best mirrors your unique interests and expertise. The project should also include the incorporation of an element or concept you’ve self-taught, this could be an algorithm we haven’t explored in class, a specialized data mining library or tool, or a unique data preprocessing workflow.

If you’re uncertain whether your “newly acquired knowledge” counts, reach out and ask!


Ideas

Your first task is to come up with a goal for your project. Here are a few ideas:

Time Series Analysis

  • Predict stock market prices using historical data and ARIMA, LSTM, or Prophet.

Anomaly Detection

  • Detect credit card fraud or identify network intrusions.

Association Rule Mining

  • Perform market basket analysis on supermarket data.

Social Network Analysis

  • Identify influential nodes or misinformation spread.

Multimedia Mining

  • Build a music recommendation system or tag videos automatically.

Interactive Visualization

  • Create a dashboard for pattern/outlier discovery.

Ensemble Methods

  • Compare ensemble techniques (boosting, bagging) on a classification task.

Choose a dataset that genuinely resonates with you — not one from TidyTuesday — and offers real exploration potential.


Due Dates

  • Proposal (for Peer Review): Wed, Oct 29 — 11:59pm
  • Peer-reviews (Proposals): Wed, Nov 05 — 11:59pm
  • Revised Proposal (Instructor): Mon, Nov 10 — 11:59pm
  • Final Write-up & Presentation: Wed, Dec 17 — 11:59pm

Deliverables

Proposal

  • High-level goal and expanded description
  • Data and motivation
  • Weekly “plan of attack”
  • Repository structure and folder README.md files

Peer Review

Complete assigned proposal and code reviews as GitHub issues using the provided templates.

Reviewer tasks

Each individual will review the proposals of two other individuals. The peer review must be completed on Friday, October 31. On that day teams will have access to the project repos of the two individuals whose work they’re reviewing. Reviews should start by cloning the individual’s repo, re-rendering it locally to make sure you can reproduce their work, and then adding an issue to their repo with your peer review feedback.

The reviewer / reviewee assignments are as follows:

Reviewee Reviewer 1 Reviewer 2 Project Link
Team 1 Team 2 Team 3 View Repo
Team 2 Team 3 Team 4 View Repo
Team 3 Team 4 Team 5 View Repo
Team 4 Team 5 Team 1 View Repo
Team 5 Team 1 Team 2 View Repo

Individuals will develop the review and submit it as an issue on the project repo. To do so, go to the Issues tab, click on the green New issue button on the top right, and then click on the green Get started button for the issue template titled Peer review.

This will start a new issue with a peer review form that you can fill out. Make sure to update the introductory paragraph with your name as a participant in the review. Then, answer each of the questions in the spaces provided underneath them. You’re expected to be thorough in your review, but this doesn’t necessarily require lengthy responses.

Remember, your goal is to help the individual whose project proposal you’re reviewing. The individual will not lose points because of issues you point out, as long as they address them before I review their proposals. You should be critical, but respectful in your review. Also remember that you will be evaluated on the quality of your review. So that’s an additional incentive to do a good job.

Reviewee tasks

Once you receive feedback from your peers, you should address them. You should do this by directly updating your proposal or making any other updates to your repo as needed. You can do these updates all in one commit or you can spread it across multiple commits. Regardless, in the last commit that addresses the peer review comments, you should use a keyword in your commit message that will close the peer review issues. These words are close, closes, closed, fix, fixes, fixed, resolve, resolves, and resolved and they need to be followed by the issue number (which you can find next to the issue title). So, your commit message can say something like “Finished updates based on peer review, fixes #1”.


✨ Optional Track: Vibe Coding

Objective

The Vibe Coding track challenges you to explore a full data-mining project guided by intuition, iteration, and reflection. Your goal is to document the process — what worked, what didn’t, and how you navigated discovery — and to evaluate the strengths and weaknesses of vibe coding as a workflow.

In choosing this path, you must complete:

  • The Vibe Coding Report, following the structure below.

What Counts as Vibe Coding

  • Rapid iterative experimentation and pivoting between approaches
  • Tool or model switching as insights evolve
  • Opportunistic feature engineering or parameter tuning
  • Use of LLMs or AutoML tools for ideation/refactoring (must be disclosed)

Required Artifacts

  1. Vibe Report (1000–2000 words) Must include:

    • Problem framing & success criteria
    • Attempt log summary (with evidence-based decisions)
    • What worked & what didn’t
    • Turning points and lessons learned
    • Trade-offs, ethics, reproducibility reflections
    • Conclusions about vibe coding as a strategy
  2. Attempt Log (CSV or embedded table) Columns: timestamp, step_id, goal, approach, evidence_metric, result_summary, decision_next, time_spent_min, ai_used, llm_role, prompt_id

  3. Runnable demo of the final path One clean Quarto or script that reproduces your “winning” workflow.

  4. Attribution & Provenance Must list datasets, external code, and any AI assistance (model + how used).


Example Attempt Log Renderer

Code
import pandas as pd
from datetime import datetime
from pathlib import Path
import plotly.graph_objects as go

# Path to the log file
log_path = Path("data/vibe_log.csv")

if log_path.exists():
    # Read and clean data
    vibe_log = pd.read_csv(log_path)
    if "timestamp" in vibe_log.columns:
        vibe_log["timestamp"] = pd.to_datetime(vibe_log["timestamp"], errors="coerce")

    # Replace boolean AI flag with readable symbols
    if "ai_used" in vibe_log.columns:
        vibe_log["ai_used"] = vibe_log["ai_used"].apply(lambda x: "AI" if str(x).lower() in ["true", "1", "yes"] else "—")

    # Select and rename columns if present
    keep_cols = [
        "timestamp", "step_id", "goal", "approach", "evidence_metric",
        "result_summary", "decision_next", "time_spent_min", "ai_used"
    ]
    vibe_log = vibe_log[[c for c in keep_cols if c in vibe_log.columns]]

    vibe_log = vibe_log.rename(columns={
        "timestamp": "When",
        "step_id": "Step",
        "goal": "Goal",
        "approach": "Approach",
        "evidence_metric": "Evidence",
        "result_summary": "What Happened",
        "decision_next": "Next Decision",
        "time_spent_min": "Minutes",
        "ai_used": "AI?"
    })

    # Display nicely formatted table with Plotly
    fig = go.Figure(data=[go.Table(
        header=dict(
            values=list(vibe_log.columns),
            fill_color="lightgrey",
            align="left"
        ),
        cells=dict(
            values=[vibe_log[col] for col in vibe_log.columns],
            fill_color="white",
            align="left"
        )
    )])
    fig.update_layout(title_text="Vibe Coding Attempt Log")
    fig.show()

Vibe Coding Rubric Mapping

Component Points Notes
Proposal 10 Include evaluation plan for vibe workflow
Presentation 30 Same as standard; emphasize story arc
Write-up 30 Problem framing (4), Attempt Log (8), Decision evidence (8), Final pipeline clarity (6), Reflection (4)
Reproducibility & Style 10 Include clean final path + machine-readable attempt log
Within-team Evaluation 10 Unchanged
Between-team Review 10 Add comments on clarity of decision evidence

Peer Review Additions (for Vibe Projects)

  • Is the attempt log detailed and decision-driven?
  • Are pivots tied to evidence or reasoning?
  • Does the final pipeline align with the reported “winning path”?
  • Are AI uses disclosed and described responsibly?

AI Use & Attribution Template

> **AI Use & Attribution**: This project used ChatGPT (GPT-5) on 2025-11-03 for ideation and code refactoring.  
> Outputs were verified, rewritten, and version-pinned.  
> Risks (hallucinations, data leakage, licensing) were mitigated through manual review and documentation.

Write-up or Vibe Report

Your write-up (or Vibe Report) should be 1000–2000 words and include:

  • Clear introduction and motivation
  • Methods, analysis, and results
  • Discussion and conclusion (or reflection, for vibe coding)

Presentation

5 minutes, every team member must speak. Use Quarto or another reproducible format. You’re evaluated on content, professionalism, teamwork, and creativity.


Repo Organization

Organize your repository logically. At a minimum include:

  • index.qmd
  • data/ folder (if applicable)
  • Documentation in each subfolder

Grading Overview

Component Points
Proposal 10
Presentation 30
Write-up / Vibe Report 30
Reproducibility & Style 10
Within-team Peer Eval 10
Between-team Peer Eval 10
Total 100 pts

Guidelines

  • Everything must be reproducible and render without errors.
  • Use inline code for all reported values.
  • All AI or external tool use must be clearly cited.
  • Work collaboratively and resolve merge conflicts early.

Tips

  • Be adventurous — try new methods!
  • Log your reasoning behind pivots if you choose the vibe track.
  • Each team member’s commits will be reviewed for equity.
  • Hide code in your presentation but show it in your write-up.

Submission

Submit your final GitHub repository link by Wed, Dec 17 at 11:59pm. Make sure it renders fully before submission.