Group Projects

Group Projects Overview

You will work in teams of 2-3 people on a visualization project that explores a dataset of your choice.

Key Dates:

Milestone	Due Date	Details
Proposal	Oct 20	Problem statement, dataset, questions
Data Analysis & Sketches	Nov 3	Data tables and visualization sketches
First Draft	Nov 17	Initial D3 implementations
Second Draft	Dec 1	Refined narrative and polished visualizations
Final Submission	Dec 8	Complete project with presentation

Why Group Projects?

Working in teams teaches essential professional skills:

Collaborative design and development
Division of labor and project management
Communication and coordination
Code review and quality control
Presenting and defending design decisions

Logistics:

Teams of 2-3 students (3 is recommended)
Self-organized (use Discord #project-teams channel)
All team members contribute to all milestones
Individual contributions will be documented

Project Scope: InfoVis vs. Research

This is NOT a research project. This IS:

A demonstration of InfoVis fundamentals
An exploration of interesting data
A showcase of your D3 implementation skills
A well-crafted narrative with visualizations

What we’re looking for:

Clear, focused questions about your data
Appropriate visualization choices
Clean, effective D3 implementations
Insightful analysis and findings
Professional presentation and writeup

Choosing Your Topic

Choose any dataset that interests you! Some possibilities:

Urban data: Transportation, 311 calls, housing, environment
Sports: Player statistics, game outcomes, team performance
Entertainment: Movies, music, books, streaming trends
Health: Nutrition, exercise, medical data, public health
Finance: Stocks, cryptocurrencies, economic indicators
Science: Climate, weather, biodiversity, astronomy
Social: Social media trends, demographics, surveys
Your own data: Personal projects, work data, research data

Good data sources: NYC Open Data, Kaggle, Data.gov, APIs (Spotify, Twitter, etc.)

What Makes a Good Project?

Strong projects have:

Clear Problem Statement
- Specific, focused, and well-motivated
- Explains why this matters
Rich Dataset(s)
- Accessible, complete, and appropriate
- Multiple attributes to explore
- Temporal and/or spatial dimensions
Coherent Questions
- Form a logical progression (a story)
- Can be answered with visualizations
- Build toward insights
Appropriate Visualizations
- Match the data and questions
- Well-designed and clearly labeled
- Interactive where it adds value

Milestone 1: Project Proposal (Due Oct 20)

Submit a 2-3 page document with:

Problem Statement/Background
- What problem are you exploring?
- Why is this interesting and worthwhile?
- What should readers know about the context?
Dataset(s)
- Where does the data come from?
- Who collected it and how?
- What attributes will you use?
- What preprocessing is needed?
Domain Questions
- List 5-8 questions you want to answer
- Questions should form a logical progression
- Each question should be answerable with visualization

Example: Problem Statement

Background: NYC has implemented over the years a number of initiatives to reduce the number of vehicle collisions in New York City and the severity of the accidents happening in the city.

Goal: The goal of this project is to understand how collisions happen in New York City and how they have evolved over time.

Analysis plan: More specifically, we will investigate how collisions and their severity distribute geographically to identify specific hotspots. We will explore major contributing factors (i.e., causes) behind collisions and their severity. We will look into how these trends change according to whether pedestrians or cyclists are involved and we will determine whether there are any seasonal and temporal trends.

Outcome: As a major outcome of this analysis we will attempt to provide recommendations about possible interventions informed by the insights generated during our analysis.

Example: Dataset Description

For this project we will use the NYC Vehicle Collisions Dataset.

Source: NYC Open Data portal
Collection: NYPD using FORMS (Finest Online Records Management System)
Method: Police officers enter data electronically using Department cellphone or computer
Attributes: Date/time, location (lat/lon, borough, zip), number of persons injured/killed, vehicle types, contributing factors

We will use the following attributes:

Temporal: Date, time, year, month, day of week
Spatial: Latitude, longitude, borough, zip code
Severity: Number injured, number killed (total, pedestrians, cyclists)
Context: Contributing factors, vehicle types

Derived attributes we plan to create:

Time of day categories (morning rush, midday, evening rush, night)
Severity categories (property damage only, injury, fatality)
Season (winter, spring, summer, fall)

Example: Domain Questions

Our project aims to find and present answers to the following questions:

How many collisions happen daily in NYC? How many injuries and deaths?
Where do collisions happen? How do collisions distribute across NYC?
Are there areas that are particularly deadly (high injuries/deaths)?
When we focus on pedestrians and cyclists, where do they get injured or die?
How have collisions and their severity evolved over time?
Has the situation improved or worsened in specific areas?
What are the major contributing factors for collisions?
Do different areas have different contributing factors?
Is there a relationship between contributing factors and whether pedestrians/cyclists are involved?

Common Proposal Mistakes

Avoid these pitfalls:

❌ Problem statement too vague (“explore patterns in data”)
❌ Problem too narrow (not complex enough for a project)
❌ Problem too ambitious (unrealistic scope)
❌ Questions can’t be answered with your data
❌ You don’t actually have access to the data
❌ Unclear what data attributes you’ll use
❌ Questions are unrelated (no narrative progression)
❌ Questions too vague or ambiguous

Get feedback early! Bring ideas to office hours or post on Discord.

Milestone 2: Data Analysis & Sketches (Due Nov 3)

For each of your domain questions:

Transform the data to extract the information needed
Create a data table showing the results (sample of the processed data)
Sketch the visualization you plan to create

Sketches can be:

Hand-drawn (pen and paper, or tablet)
Created with drawing software (Figma, Excalidraw, etc.)
Generated with data viz tools (Tableau, Matplotlib, Vega-Lite)

Important: It’s okay (and expected!) to refine your questions at this stage based on what you discover in the data.

Data Analysis & Sketches: Best Practices

For data tables:

Show a representative sample of your processed data
Include column headers with clear names
Document any transformations or aggregations
Show enough rows to understand the structure

For sketches:

Include titles, axis labels, and legends
Show how data will be mapped to visual elements
Indicate interactive elements if applicable
Make it clear enough that someone could implement it

Sketches should be YOUR work - not copied from other projects or stock image collections!

Milestone 3: First Draft (Due Nov 17)

Submit a Jupyter notebook or Observable notebook with:

Brief introduction to your project
For each question:
- State the question
- Show the D3 visualization
- Describe what the visualization shows
- Answer the question based on the visualization

At this stage:

All visualizations should be implemented in D3
Focus on getting the basics working
Styling/polish can come later
Interactivity should be functional (if included)
It’s still okay to refine questions if needed

Milestone 4: Second Draft (Due Dec 1)

Transform your notebook into a complete article with:

Title - Clear and informative
Introduction - Problem, background, motivation, overview of findings
Data Description - Sources, collection methods, attributes used
Questions and Findings - For each question:
- Clear question statement
- Polished D3 visualization
- Analysis and interpretation
- Insights and implications
Conclusion - Summary of findings, recommendations, limitations

Focus on narrative flow - someone unfamiliar with your project should be able to read and understand it.

Milestone 5: Final Submission (Due Dec 8)

Your final submission includes:

Final notebook - Polished version of second draft with all feedback addressed
Presentation (10 minutes) - Present to class on Dec 5 or Dec 12
Code repository - GitHub repo with all code and documentation
Team contribution statement - Who did what

Presentations will include:

Problem and motivation (2 min)
Key visualizations and findings (6 min)
Conclusions and recommendations (2 min)
Q&A with class

Evaluation Criteria

Technical Implementation (35%)

D3 code quality and correctness
Appropriate use of D3 features
Interactivity implementation
Code organization and documentation

Visualization Design (30%)

Appropriate chart types
Effective visual encodings
Clear labels and legends
Color and layout choices
Accessibility considerations

Analysis & Insights (20%)

Question quality and coherence
Depth of analysis
Insight generation
Interpretation accuracy

Communication (10%)

Narrative flow
Writing clarity
Presentation quality
Professional polish

Teamwork (5%)

Equal contribution
Coordination evidence

Example Projects from Previous Years

Strong projects typically:

Focus on specific aspects of a dataset rather than trying to show everything
Have 5-8 well-designed visualizations that build on each other
Include at least 2-3 interactive visualizations
Use a mix of chart types appropriate to the questions
Derive actionable insights or recommendations
Are well-written and professionally presented

I will share example projects on Discord from previous offerings of this course. Look at them for inspiration, but make your project your own!

Tips for Success

Start early - Form teams and choose topics this week
Explore data before proposing - Make sure it has what you need
Keep questions focused - Better to do 6 questions well than 10 poorly
Iterate on designs - Your first sketch won’t be your best
Test on others - Show your visualizations to friends, get feedback
Use version control - Git is your friend, commit frequently
Divide work clearly - But review each other’s code
Document as you go - Future you will thank present you
Ask for help - Office hours, Discord, lab sessions

Getting Help

Office Hours:

Instructor: Fridays 1:30-2:30 PM (after class, Room 215)
TA office hours: Posted on Discord

Resources:

Discord #group-projects channel
Weekly check-ins during lab sessions
Milestone feedback at each stage

Bring specific questions:

“How should I handle this data issue?” ✓
“Which visualization is better for this?” ✓
“How do I make my project better?” ✗ (too vague)

Finding Teammates

Use Discord #project-teams channel to:

Post your interests and availability
Find others interested in similar topics
Coordinate team formation

When forming teams, consider:

Complementary skills (design + coding, analysis + presentation)
Similar commitment levels and standards
Compatible schedules for meetings
Communication styles

Teams must be finalized by Oct 17 (one week from today).

Project Ideas to Get Started

If you’re not sure where to start, consider:

NYC 311 Requests - What do New Yorkers complain about? Patterns over time/space?
Sports Analytics - NBA player performance, soccer match outcomes, fantasy sports
Movie/TV Data - Box office trends, IMDb ratings, streaming popularity
Climate Data - Temperature trends, extreme weather events, CO2 levels
COVID-19 Data - Case trends, vaccination rates, policy impacts
Music Analysis - Spotify streaming data, genre evolution, artist networks
Stock Market - Price movements, trading volumes, sector performance
Reddit/Twitter - Topic trends, sentiment analysis, viral content

Browse: NYC Open Data, Kaggle, Data.gov, or use APIs (Spotify, TMDB, etc.)

Timeline Recap

Week	Date	Milestone	What’s Due
8	Oct 20	Proposal	Problem, data, questions (2-3 pages)
10	Nov 3	Data & Sketches	Data tables and visualization sketches
12	Nov 17	First Draft	All visualizations implemented in D3
14	Dec 1	Second Draft	Complete article with polished visuals
15	Dec 8	Final Submission	Final notebook, presentation, code

Presentations: Dec 5 and Dec 12 (teams will be assigned)

Form teams by: Oct 17 (next week!)

Questions?

Discord: #group-projects

Email: csilva@nyu.edu

Office Hours: Fridays 1:30-2:30 PM (Room 215)

Next Steps: 1. Browse datasets (NYC Open Data, Kaggle, etc.) 2. Find teammates (#project-teams on Discord) 3. Start drafting proposal ideas