Final Project

Overview

Your task for this project is to delve deep into the realm of data mining.

This is intentionally vague: a project that best mirrors your unique interests and expertise. The project should also include the incorporation of an element or concept you’ve self-taught, this could be an algorithm we haven’t explored in class, a specialized data mining library or tool, or a unique data preprocessing workflow.

If you’re uncertain whether your “newly acquired knowledge” counts, reach out and ask!


Ideas

Your first task is to come up with a goal for your project. Here are a few ideas:

Time Series Analysis

  • Predict stock market prices using historical data and ARIMA, LSTM, or Prophet.

Anomaly Detection

  • Detect credit card fraud or identify network intrusions.

Association Rule Mining

  • Perform market basket analysis on supermarket data.

Social Network Analysis

  • Identify influential nodes or misinformation spread.

Multimedia Mining

  • Build a music recommendation system or tag videos automatically.

Interactive Visualization

  • Create a dashboard for pattern/outlier discovery.

Ensemble Methods

  • Compare ensemble techniques (boosting, bagging) on a classification task.

Choose a dataset that genuinely resonates with you — not one from TidyTuesday — and offers real exploration potential.


Due Dates

  • Proposal (Peer Review): Wed, Oct 29 — 11:59pm
  • Revised Proposal (Instructor): Mon, Nov 01 — 11:59pm
  • Final Write-up & Presentation: Wed, Dec 17 — 11:59pm

Deliverables

Proposal

  • High-level goal and expanded description
  • Data and motivation
  • Weekly “plan of attack”
  • Repository structure and folder README.md files

Peer Review

Complete assigned proposal and code reviews as GitHub issues using the provided templates.


✨ Optional Track: Vibe Coding

Objective

The Vibe Coding track challenges you to explore a full data-mining project guided by intuition, iteration, and reflection. Your goal is to document the process — what worked, what didn’t, and how you navigated discovery — and to evaluate the strengths and weaknesses of vibe coding as a workflow.

In choosing this path, you must complete:

  • The Vibe Coding Report, following the structure below.

What Counts as Vibe Coding

  • Rapid iterative experimentation and pivoting between approaches
  • Tool or model switching as insights evolve
  • Opportunistic feature engineering or parameter tuning
  • Use of LLMs or AutoML tools for ideation/refactoring (must be disclosed)

Required Artifacts

  1. Vibe Report (1000–2000 words) Must include:

    • Problem framing & success criteria
    • Attempt log summary (with evidence-based decisions)
    • What worked & what didn’t
    • Turning points and lessons learned
    • Trade-offs, ethics, reproducibility reflections
    • Conclusions about vibe coding as a strategy
  2. Attempt Log (CSV or embedded table) Columns: timestamp, step_id, goal, approach, evidence_metric, result_summary, decision_next, time_spent_min, ai_used, llm_role, prompt_id

  3. Runnable demo of the final path One clean Quarto or script that reproduces your “winning” workflow.

  4. Attribution & Provenance Must list datasets, external code, and any AI assistance (model + how used).


Example Attempt Log Renderer

Code
import pandas as pd
from datetime import datetime
from pathlib import Path
import plotly.graph_objects as go

# Path to the log file
log_path = Path("data/vibe_log.csv")

if log_path.exists():
    # Read and clean data
    vibe_log = pd.read_csv(log_path)
    if "timestamp" in vibe_log.columns:
        vibe_log["timestamp"] = pd.to_datetime(vibe_log["timestamp"], errors="coerce")

    # Replace boolean AI flag with readable symbols
    if "ai_used" in vibe_log.columns:
        vibe_log["ai_used"] = vibe_log["ai_used"].apply(lambda x: "AI" if str(x).lower() in ["true", "1", "yes"] else "—")

    # Select and rename columns if present
    keep_cols = [
        "timestamp", "step_id", "goal", "approach", "evidence_metric",
        "result_summary", "decision_next", "time_spent_min", "ai_used"
    ]
    vibe_log = vibe_log[[c for c in keep_cols if c in vibe_log.columns]]

    vibe_log = vibe_log.rename(columns={
        "timestamp": "When",
        "step_id": "Step",
        "goal": "Goal",
        "approach": "Approach",
        "evidence_metric": "Evidence",
        "result_summary": "What Happened",
        "decision_next": "Next Decision",
        "time_spent_min": "Minutes",
        "ai_used": "AI?"
    })

    # Display nicely formatted table with Plotly
    fig = go.Figure(data=[go.Table(
        header=dict(
            values=list(vibe_log.columns),
            fill_color="lightgrey",
            align="left"
        ),
        cells=dict(
            values=[vibe_log[col] for col in vibe_log.columns],
            fill_color="white",
            align="left"
        )
    )])
    fig.update_layout(title_text="Vibe Coding Attempt Log")
    fig.show()

Vibe Coding Rubric Mapping

Component Points Notes
Proposal 10 Include evaluation plan for vibe workflow
Presentation 30 Same as standard; emphasize story arc
Write-up 30 Problem framing (4), Attempt Log (8), Decision evidence (8), Final pipeline clarity (6), Reflection (4)
Reproducibility & Style 10 Include clean final path + machine-readable attempt log
Within-team Evaluation 10 Unchanged
Between-team Review 10 Add comments on clarity of decision evidence

Peer Review Additions (for Vibe Projects)

  • Is the attempt log detailed and decision-driven?
  • Are pivots tied to evidence or reasoning?
  • Does the final pipeline align with the reported “winning path”?
  • Are AI uses disclosed and described responsibly?

AI Use & Attribution Template

> **AI Use & Attribution**: This project used ChatGPT (GPT-5) on 2025-10-15 for ideation and code refactoring.  
> Outputs were verified, rewritten, and version-pinned.  
> Risks (hallucinations, data leakage, licensing) were mitigated through manual review and documentation.

Write-up or Vibe Report

Your write-up (or Vibe Report) should be 1000–2000 words and include:

  • Clear introduction and motivation
  • Methods, analysis, and results
  • Discussion and conclusion (or reflection, for vibe coding)

Presentation

5 minutes, every team member must speak. Use Quarto or another reproducible format. You’re evaluated on content, professionalism, teamwork, and creativity.


Repo Organization

Organize your repository logically. At a minimum include:

  • index.qmd
  • data/ folder (if applicable)
  • Documentation in each subfolder

Grading Overview

Component Points
Proposal 10
Presentation 30
Write-up / Vibe Report 30
Reproducibility & Style 10
Within-team Peer Eval 10
Between-team Peer Eval 10
Total 100 pts

Guidelines

  • Everything must be reproducible and render without errors.
  • Use inline code for all reported values.
  • All AI or external tool use must be clearly cited.
  • Work collaboratively and resolve merge conflicts early.

Tips

  • Be adventurous — try new methods!
  • Log your reasoning behind pivots if you choose the vibe track.
  • Each team member’s commits will be reviewed for equity.
  • Hide code in your presentation but show it in your write-up.

Submission

Submit your final GitHub repository link by Wed, Dec 17 at 11:59pm. Make sure it renders fully before submission.