Group Projects

CS-GY 6313 - Fall 2025

Claudio Silva

NYU Tandon School of Engineering

2025-10-10

Group Projects Overview

You will work in teams of 2-3 people on a visualization project that explores a dataset of your choice.

Key Dates:

Milestone Due Date Details
Proposal Oct 20 Problem statement, dataset, questions
Data Analysis & Sketches Nov 3 Data tables and visualization sketches
First Draft Nov 17 Initial D3 implementations
Second Draft Dec 1 Refined narrative and polished visualizations
Final Submission Dec 8 Complete project with presentation

Why Group Projects?

Working in teams teaches essential professional skills:

  • Collaborative design and development
  • Division of labor and project management
  • Communication and coordination
  • Code review and quality control
  • Presenting and defending design decisions

Logistics:

  • Teams of 2-3 students (3 is recommended)
  • Self-organized (use Discord #project-teams channel)
  • All team members contribute to all milestones
  • Individual contributions will be documented

Project Scope: InfoVis vs. Research

This is NOT a research project. This IS:

  • A demonstration of InfoVis fundamentals
  • An exploration of interesting data
  • A showcase of your D3 implementation skills
  • A well-crafted narrative with visualizations

What we’re looking for:

  • Clear, focused questions about your data
  • Appropriate visualization choices
  • Clean, effective D3 implementations
  • Insightful analysis and findings
  • Professional presentation and writeup

Choosing Your Topic

Choose any dataset that interests you! Some possibilities:

  • Urban data: Transportation, 311 calls, housing, environment
  • Sports: Player statistics, game outcomes, team performance
  • Entertainment: Movies, music, books, streaming trends
  • Health: Nutrition, exercise, medical data, public health
  • Finance: Stocks, cryptocurrencies, economic indicators
  • Science: Climate, weather, biodiversity, astronomy
  • Social: Social media trends, demographics, surveys
  • Your own data: Personal projects, work data, research data

Good data sources: NYC Open Data, Kaggle, Data.gov, APIs (Spotify, Twitter, etc.)

What Makes a Good Project?

Strong projects have:

  1. Clear Problem Statement
    • Specific, focused, and well-motivated
    • Explains why this matters
  2. Rich Dataset(s)
    • Accessible, complete, and appropriate
    • Multiple attributes to explore
    • Temporal and/or spatial dimensions
  3. Coherent Questions
    • Form a logical progression (a story)
    • Can be answered with visualizations
    • Build toward insights
  4. Appropriate Visualizations
    • Match the data and questions
    • Well-designed and clearly labeled
    • Interactive where it adds value

Milestone 1: Project Proposal (Due Oct 20)

Submit a 2-3 page document with:

  1. Problem Statement/Background
    • What problem are you exploring?
    • Why is this interesting and worthwhile?
    • What should readers know about the context?
  2. Dataset(s)
    • Where does the data come from?
    • Who collected it and how?
    • What attributes will you use?
    • What preprocessing is needed?
  3. Domain Questions
    • List 5-8 questions you want to answer
    • Questions should form a logical progression
    • Each question should be answerable with visualization

Example: Problem Statement

Background: NYC has implemented over the years a number of initiatives to reduce the number of vehicle collisions in New York City and the severity of the accidents happening in the city.

Goal: The goal of this project is to understand how collisions happen in New York City and how they have evolved over time.

Analysis plan: More specifically, we will investigate how collisions and their severity distribute geographically to identify specific hotspots. We will explore major contributing factors (i.e., causes) behind collisions and their severity. We will look into how these trends change according to whether pedestrians or cyclists are involved and we will determine whether there are any seasonal and temporal trends.

Outcome: As a major outcome of this analysis we will attempt to provide recommendations about possible interventions informed by the insights generated during our analysis.

Example: Dataset Description

For this project we will use the NYC Vehicle Collisions Dataset.

  • Source: NYC Open Data portal
  • Collection: NYPD using FORMS (Finest Online Records Management System)
  • Method: Police officers enter data electronically using Department cellphone or computer
  • Attributes: Date/time, location (lat/lon, borough, zip), number of persons injured/killed, vehicle types, contributing factors

We will use the following attributes:

  • Temporal: Date, time, year, month, day of week
  • Spatial: Latitude, longitude, borough, zip code
  • Severity: Number injured, number killed (total, pedestrians, cyclists)
  • Context: Contributing factors, vehicle types

Derived attributes we plan to create:

  • Time of day categories (morning rush, midday, evening rush, night)
  • Severity categories (property damage only, injury, fatality)
  • Season (winter, spring, summer, fall)

Example: Domain Questions

Our project aims to find and present answers to the following questions:

  1. How many collisions happen daily in NYC? How many injuries and deaths?
  2. Where do collisions happen? How do collisions distribute across NYC?
  3. Are there areas that are particularly deadly (high injuries/deaths)?
  4. When we focus on pedestrians and cyclists, where do they get injured or die?
  5. How have collisions and their severity evolved over time?
  6. Has the situation improved or worsened in specific areas?
  7. What are the major contributing factors for collisions?
  8. Do different areas have different contributing factors?
  9. Is there a relationship between contributing factors and whether pedestrians/cyclists are involved?

Common Proposal Mistakes

Avoid these pitfalls:

  • ❌ Problem statement too vague (“explore patterns in data”)
  • ❌ Problem too narrow (not complex enough for a project)
  • ❌ Problem too ambitious (unrealistic scope)
  • ❌ Questions can’t be answered with your data
  • ❌ You don’t actually have access to the data
  • ❌ Unclear what data attributes you’ll use
  • ❌ Questions are unrelated (no narrative progression)
  • ❌ Questions too vague or ambiguous

Get feedback early! Bring ideas to office hours or post on Discord.

Milestone 2: Data Analysis & Sketches (Due Nov 3)

For each of your domain questions:

  1. Transform the data to extract the information needed
  2. Create a data table showing the results (sample of the processed data)
  3. Sketch the visualization you plan to create

Sketches can be:

  • Hand-drawn (pen and paper, or tablet)
  • Created with drawing software (Figma, Excalidraw, etc.)
  • Generated with data viz tools (Tableau, Matplotlib, Vega-Lite)

Important: It’s okay (and expected!) to refine your questions at this stage based on what you discover in the data.

Data Analysis & Sketches: Best Practices

For data tables:

  • Show a representative sample of your processed data
  • Include column headers with clear names
  • Document any transformations or aggregations
  • Show enough rows to understand the structure

For sketches:

  • Include titles, axis labels, and legends
  • Show how data will be mapped to visual elements
  • Indicate interactive elements if applicable
  • Make it clear enough that someone could implement it

Sketches should be YOUR work - not copied from other projects or stock image collections!

Milestone 3: First Draft (Due Nov 17)

Submit a Jupyter notebook or Observable notebook with:

  • Brief introduction to your project
  • For each question:
    • State the question
    • Show the D3 visualization
    • Describe what the visualization shows
    • Answer the question based on the visualization

At this stage:

  • All visualizations should be implemented in D3
  • Focus on getting the basics working
  • Styling/polish can come later
  • Interactivity should be functional (if included)
  • It’s still okay to refine questions if needed

Milestone 4: Second Draft (Due Dec 1)

Transform your notebook into a complete article with:

  • Title - Clear and informative
  • Introduction - Problem, background, motivation, overview of findings
  • Data Description - Sources, collection methods, attributes used
  • Questions and Findings - For each question:
    • Clear question statement
    • Polished D3 visualization
    • Analysis and interpretation
    • Insights and implications
  • Conclusion - Summary of findings, recommendations, limitations

Focus on narrative flow - someone unfamiliar with your project should be able to read and understand it.

Milestone 5: Final Submission (Due Dec 8)

Your final submission includes:

  1. Final notebook - Polished version of second draft with all feedback addressed
  2. Presentation (10 minutes) - Present to class on Dec 5 or Dec 12
  3. Code repository - GitHub repo with all code and documentation
  4. Team contribution statement - Who did what

Presentations will include:

  • Problem and motivation (2 min)
  • Key visualizations and findings (6 min)
  • Conclusions and recommendations (2 min)
  • Q&A with class

Evaluation Criteria

Technical Implementation (35%)

  • D3 code quality and correctness
  • Appropriate use of D3 features
  • Interactivity implementation
  • Code organization and documentation

Visualization Design (30%)

  • Appropriate chart types
  • Effective visual encodings
  • Clear labels and legends
  • Color and layout choices
  • Accessibility considerations

Analysis & Insights (20%)

  • Question quality and coherence
  • Depth of analysis
  • Insight generation
  • Interpretation accuracy

Communication (10%)

  • Narrative flow
  • Writing clarity
  • Presentation quality
  • Professional polish

Teamwork (5%)

  • Equal contribution
  • Coordination evidence

Example Projects from Previous Years

Strong projects typically:

  • Focus on specific aspects of a dataset rather than trying to show everything
  • Have 5-8 well-designed visualizations that build on each other
  • Include at least 2-3 interactive visualizations
  • Use a mix of chart types appropriate to the questions
  • Derive actionable insights or recommendations
  • Are well-written and professionally presented

I will share example projects on Discord from previous offerings of this course. Look at them for inspiration, but make your project your own!

Tips for Success

  1. Start early - Form teams and choose topics this week
  2. Explore data before proposing - Make sure it has what you need
  3. Keep questions focused - Better to do 6 questions well than 10 poorly
  4. Iterate on designs - Your first sketch won’t be your best
  5. Test on others - Show your visualizations to friends, get feedback
  6. Use version control - Git is your friend, commit frequently
  7. Divide work clearly - But review each other’s code
  8. Document as you go - Future you will thank present you
  9. Ask for help - Office hours, Discord, lab sessions

Getting Help

Office Hours:

  • Instructor: Fridays 1:30-2:30 PM (after class, Room 215)
  • TA office hours: Posted on Discord

Resources:

  • Discord #group-projects channel
  • Weekly check-ins during lab sessions
  • Milestone feedback at each stage

Bring specific questions:

  • “How should I handle this data issue?” ✓
  • “Which visualization is better for this?” ✓
  • “How do I make my project better?” ✗ (too vague)

Finding Teammates

Use Discord #project-teams channel to:

  • Post your interests and availability
  • Find others interested in similar topics
  • Coordinate team formation

When forming teams, consider:

  • Complementary skills (design + coding, analysis + presentation)
  • Similar commitment levels and standards
  • Compatible schedules for meetings
  • Communication styles

Teams must be finalized by Oct 17 (one week from today).

Project Ideas to Get Started

If you’re not sure where to start, consider:

  • NYC 311 Requests - What do New Yorkers complain about? Patterns over time/space?
  • Sports Analytics - NBA player performance, soccer match outcomes, fantasy sports
  • Movie/TV Data - Box office trends, IMDb ratings, streaming popularity
  • Climate Data - Temperature trends, extreme weather events, CO2 levels
  • COVID-19 Data - Case trends, vaccination rates, policy impacts
  • Music Analysis - Spotify streaming data, genre evolution, artist networks
  • Stock Market - Price movements, trading volumes, sector performance
  • Reddit/Twitter - Topic trends, sentiment analysis, viral content

Browse: NYC Open Data, Kaggle, Data.gov, or use APIs (Spotify, TMDB, etc.)

Timeline Recap

Week Date Milestone What’s Due
8 Oct 20 Proposal Problem, data, questions (2-3 pages)
10 Nov 3 Data & Sketches Data tables and visualization sketches
12 Nov 17 First Draft All visualizations implemented in D3
14 Dec 1 Second Draft Complete article with polished visuals
15 Dec 8 Final Submission Final notebook, presentation, code

Presentations: Dec 5 and Dec 12 (teams will be assigned)

Form teams by: Oct 17 (next week!)

Questions?

Discord: #group-projects

Email: csilva@nyu.edu

Office Hours: Fridays 1:30-2:30 PM (Room 215)

Next Steps: 1. Browse datasets (NYC Open Data, Kaggle, etc.) 2. Find teammates (#project-teams on Discord) 3. Start drafting proposal ideas