Group Projects
CS-GY 6313 - Fall 2025
Claudio Silva
NYU Tandon School of Engineering
2025-10-10
Group Projects Overview
You will work in teams of 2-3 people on a visualization project that explores a dataset of your choice.
Key Dates:
Proposal
Oct 20
Problem statement, dataset, questions
Data Analysis & Sketches
Nov 3
Data tables and visualization sketches
First Draft
Nov 17
Initial D3 implementations
Second Draft
Dec 1
Refined narrative and polished visualizations
Final Submission
Dec 8
Complete project with presentation
This project is your opportunity to apply everything you’ve learned in the course to a real-world data problem. Unlike research-focused projects, this is about demonstrating mastery of InfoVis fundamentals: asking good questions, choosing appropriate visualizations, implementing them well in D3, and communicating insights effectively. You’ll work through five milestones that scaffold the work and ensure you make steady progress. Choose any dataset that interests you!
Why Group Projects?
Working in teams teaches essential professional skills:
Collaborative design and development
Division of labor and project management
Communication and coordination
Code review and quality control
Presenting and defending design decisions
Logistics:
Teams of 2-3 students (3 is recommended)
Self-organized (use Discord #project-teams channel)
All team members contribute to all milestones
Individual contributions will be documented
In the real world, visualization work is almost always collaborative. You need to learn how to divide work, integrate different contributions, maintain code quality across team members, and present unified results. Teams of 3 are ideal - enough people to tackle ambitious projects without becoming unwieldy. Use the #project-teams channel on Discord to find teammates. Start forming teams this week!
Project Scope: InfoVis vs. Research
This is NOT a research project. This IS:
A demonstration of InfoVis fundamentals
An exploration of interesting data
A showcase of your D3 implementation skills
A well-crafted narrative with visualizations
What we’re looking for:
Clear, focused questions about your data
Appropriate visualization choices
Clean, effective D3 implementations
Insightful analysis and findings
Professional presentation and writeup
This is different from research-focused projects. We’re not expecting novel research contributions or cutting-edge ML techniques. We want to see that you can: (1) formulate good questions about data, (2) choose the right visualizations to answer them, (3) implement those visualizations effectively in D3, (4) derive insights from your visualizations, and (5) communicate your findings clearly. Quality over novelty. Execution over ambition.
Choosing Your Topic
Choose any dataset that interests you! Some possibilities:
Urban data: Transportation, 311 calls, housing, environment
Sports: Player statistics, game outcomes, team performance
Entertainment: Movies, music, books, streaming trends
Health: Nutrition, exercise, medical data, public health
Finance: Stocks, cryptocurrencies, economic indicators
Science: Climate, weather, biodiversity, astronomy
Social: Social media trends, demographics, surveys
Your own data: Personal projects, work data, research data
Good data sources: NYC Open Data, Kaggle, Data.gov, APIs (Spotify, Twitter, etc.)
The key is choosing data you’re genuinely interested in - you’ll be working with it for 8 weeks! Look for datasets that are: (1) complete (not too many missing values), (2) temporal or spatial (patterns over time/space are interesting), (3) rich in attributes (multiple variables to explore), (4) accessible (you can actually get the data). Avoid datasets that are too simple (just one or two numbers) or too complex (requires deep domain expertise you don’t have). NYC Open Data is great if you’re interested in urban topics, but you’re not limited to that!
What Makes a Good Project?
Strong projects have:
Clear Problem Statement
Specific, focused, and well-motivated
Explains why this matters
Rich Dataset(s)
Accessible, complete, and appropriate
Multiple attributes to explore
Temporal and/or spatial dimensions
Coherent Questions
Form a logical progression (a story)
Can be answered with visualizations
Build toward insights
Appropriate Visualizations
Match the data and questions
Well-designed and clearly labeled
Interactive where it adds value
The best projects tell a story. They start with a compelling problem, use appropriate data to explore it, ask questions that build on each other, and reveal insights that matter. Avoid projects that are just “let’s visualize this data” without clear questions. Avoid questions that are too broad (“what patterns exist?”) or too narrow (“what was the value on Tuesday?”). Think about what a reader would learn from your project and why they should care.
Milestone 1: Project Proposal (Due Oct 20)
Submit a 2-3 page document with:
Problem Statement/Background
What problem are you exploring?
Why is this interesting and worthwhile?
What should readers know about the context?
Dataset(s)
Where does the data come from?
Who collected it and how?
What attributes will you use?
What preprocessing is needed?
Domain Questions
List 5-8 questions you want to answer
Questions should form a logical progression
Each question should be answerable with visualization
The proposal is due October 20 (Week 8). This is your chance to get feedback before investing significant implementation time. Take this seriously - a good proposal prevents problems later. Your problem statement should be 2-3 paragraphs that motivate the work. Your dataset section should show you’ve actually looked at the data, not just found a URL. Your questions should form a narrative arc, not just be a random list. We’ll provide detailed feedback to help you succeed.
Example: Problem Statement
Background: NYC has implemented over the years a number of initiatives to reduce the number of vehicle collisions in New York City and the severity of the accidents happening in the city.
Goal: The goal of this project is to understand how collisions happen in New York City and how they have evolved over time.
Analysis plan: More specifically, we will investigate how collisions and their severity distribute geographically to identify specific hotspots. We will explore major contributing factors (i.e., causes) behind collisions and their severity. We will look into how these trends change according to whether pedestrians or cyclists are involved and we will determine whether there are any seasonal and temporal trends.
Outcome: As a major outcome of this analysis we will attempt to provide recommendations about possible interventions informed by the insights generated during our analysis.
This example shows a well-structured problem statement. Notice: (1) Background establishes context and prior work, (2) Goal is clear and focused, (3) Analysis plan is specific about what will be explored, (4) Outcome explains what the project aims to deliver. This gives the reader a complete picture of what to expect. Your problem statement should follow a similar structure.
Example: Dataset Description
For this project we will use the NYC Vehicle Collisions Dataset .
Source: NYC Open Data portal
Collection: NYPD using FORMS (Finest Online Records Management System)
Method: Police officers enter data electronically using Department cellphone or computer
Attributes: Date/time, location (lat/lon, borough, zip), number of persons injured/killed, vehicle types, contributing factors
We will use the following attributes:
Temporal: Date, time, year, month, day of week
Spatial: Latitude, longitude, borough, zip code
Severity: Number injured, number killed (total, pedestrians, cyclists)
Context: Contributing factors, vehicle types
Derived attributes we plan to create:
Time of day categories (morning rush, midday, evening rush, night)
Severity categories (property damage only, injury, fatality)
Season (winter, spring, summer, fall)
This dataset description shows you’ve done your homework. You know where the data comes from, how it was collected, what’s in it, and what you’ll need to compute. The derived attributes show you’re thinking ahead about how to analyze the data. This level of detail gives us confidence you can execute the project.
Example: Domain Questions
Our project aims to find and present answers to the following questions:
How many collisions happen daily in NYC? How many injuries and deaths?
Where do collisions happen? How do collisions distribute across NYC?
Are there areas that are particularly deadly (high injuries/deaths)?
When we focus on pedestrians and cyclists, where do they get injured or die?
How have collisions and their severity evolved over time?
Has the situation improved or worsened in specific areas?
What are the major contributing factors for collisions?
Do different areas have different contributing factors?
Is there a relationship between contributing factors and whether pedestrians/cyclists are involved?
Notice the progression: (1-2) establish baseline patterns, (3-4) focus on severity and vulnerable users, (5-6) examine temporal trends, (7-9) explore causes and relationships. This forms a coherent narrative that builds toward actionable insights. Each question can be answered with visualization. Questions are specific enough to guide implementation but open enough to allow discovery. Your questions should follow a similar logical flow.
Common Proposal Mistakes
Avoid these pitfalls:
❌ Problem statement too vague (“explore patterns in data”)
❌ Problem too narrow (not complex enough for a project)
❌ Problem too ambitious (unrealistic scope)
❌ Questions can’t be answered with your data
❌ You don’t actually have access to the data
❌ Unclear what data attributes you’ll use
❌ Questions are unrelated (no narrative progression)
❌ Questions too vague or ambiguous
Get feedback early! Bring ideas to office hours or post on Discord.
These are the most common ways proposals fail. The vague problem: “I want to visualize subway data” - why? what question? The too-narrow problem: “I want to show the number of trains per hour” - that’s one chart, not a project. The too-ambitious problem: “I want to predict future traffic using deep learning” - wrong class! Questions that need data you don’t have: “I want to show why people chose that route” when you only have aggregate counts. Start early, get feedback, iterate.
Milestone 2: Data Analysis & Sketches (Due Nov 3)
For each of your domain questions:
Transform the data to extract the information needed
Create a data table showing the results (sample of the processed data)
Sketch the visualization you plan to create
Sketches can be:
Hand-drawn (pen and paper, or tablet)
Created with drawing software (Figma, Excalidraw, etc.)
Generated with data viz tools (Tableau, Matplotlib, Vega-Lite)
Important: It’s okay (and expected!) to refine your questions at this stage based on what you discover in the data.
This milestone is about understanding your data and planning your visualizations. You’re not implementing in D3 yet - you’re figuring out what data you need and what visualizations will work. The data table proves you can extract the information needed. The sketch shows you’ve thought about how to visualize it. This is where you discover problems: “Oh, the data doesn’t have what I thought it had” or “This question isn’t as interesting as I thought.” Better to discover this now than during implementation!
Data Analysis & Sketches: Best Practices
For data tables:
Show a representative sample of your processed data
Include column headers with clear names
Document any transformations or aggregations
Show enough rows to understand the structure
For sketches:
Include titles, axis labels, and legends
Show how data will be mapped to visual elements
Indicate interactive elements if applicable
Make it clear enough that someone could implement it
Sketches should be YOUR work - not copied from other projects or stock image collections!
The data table doesn’t need to be the full dataset - show 10-20 rows that illustrate what the data looks like after your processing. The sketch should be detailed enough that someone else could implement it. Include annotations explaining what’s shown. If the visualization is interactive, indicate what happens on hover, click, etc. Hand-drawn sketches are fine as long as they’re clear and complete. What we’re checking: Can you get the data you need? Have you thought through the visualization design?
Milestone 3: First Draft (Due Nov 17)
Submit a Jupyter notebook or Observable notebook with:
Brief introduction to your project
For each question:
State the question
Show the D3 visualization
Describe what the visualization shows
Answer the question based on the visualization
At this stage:
All visualizations should be implemented in D3
Focus on getting the basics working
Styling/polish can come later
Interactivity should be functional (if included)
It’s still okay to refine questions if needed
This is your first implementation milestone. We expect working D3 code for all your visualizations. They don’t need to be perfect - labels might be messy, colors might be defaults, interactions might be basic - but they should work and show your data correctly. This is when you discover implementation challenges: “This chart type is harder than I thought” or “The data is more complex than I realized.” We’ll give you feedback on what to improve for the next draft.
Milestone 4: Second Draft (Due Dec 1)
Transform your notebook into a complete article with:
Title - Clear and informative
Introduction - Problem, background, motivation, overview of findings
Data Description - Sources, collection methods, attributes used
Questions and Findings - For each question:
Clear question statement
Polished D3 visualization
Analysis and interpretation
Insights and implications
Conclusion - Summary of findings, recommendations, limitations
Focus on narrative flow - someone unfamiliar with your project should be able to read and understand it.
This is where you transition from “a bunch of charts” to “a coherent story.” The introduction sets up the problem and tells readers what to expect. Each section builds on the previous one. The conclusion ties it all together and explains what it means. This is also when you polish your visualizations: fix labels, improve colors, refine interactions, add annotations. Think about it like writing a blog post or article for a technical audience. What would they need to know? What order makes sense?
Milestone 5: Final Submission (Due Dec 8)
Your final submission includes:
Final notebook - Polished version of second draft with all feedback addressed
Presentation (10 minutes) - Present to class on Dec 5 or Dec 12
Code repository - GitHub repo with all code and documentation
Team contribution statement - Who did what
Presentations will include:
Problem and motivation (2 min)
Key visualizations and findings (6 min)
Conclusions and recommendations (2 min)
Q&A with class
The final submission is your polished, publication-ready work. We’ll grade based on: technical execution (D3 implementation quality), visualization design (appropriate choices, clear encoding), analysis depth (insights, not just description), presentation quality (narrative flow, writing, polish), and teamwork (balanced contributions). The presentation is your chance to showcase your work - practice it! The code repository should be clean and documented - others should be able to run your code. The contribution statement ensures everyone gets appropriate credit.
Evaluation Criteria
Technical Implementation (35%)
D3 code quality and correctness
Appropriate use of D3 features
Interactivity implementation
Code organization and documentation
Visualization Design (30%)
Appropriate chart types
Effective visual encodings
Clear labels and legends
Color and layout choices
Accessibility considerations
Analysis & Insights (20%)
Question quality and coherence
Depth of analysis
Insight generation
Interpretation accuracy
Communication (10%)
Narrative flow
Writing clarity
Presentation quality
Professional polish
Teamwork (5%)
Equal contribution
Coordination evidence
Technical implementation and visualization design together are 65% of your grade - execution matters! We’re checking: Does your D3 code work correctly? Did you choose appropriate visualizations? Are they well-designed? Analysis and insights are 20% - you need to actually interpret your visualizations and derive meaningful findings. Communication is 10% - can someone else understand your work? Teamwork is 5% but we may adjust individual grades if contributions are very unequal. Document who did what in your final submission.
Example Projects from Previous Years
Strong projects typically:
Focus on specific aspects of a dataset rather than trying to show everything
Have 5-8 well-designed visualizations that build on each other
Include at least 2-3 interactive visualizations
Use a mix of chart types appropriate to the questions
Derive actionable insights or recommendations
Are well-written and professionally presented
I will share example projects on Discord from previous offerings of this course. Look at them for inspiration, but make your project your own!
Past successful projects include: subway performance analysis identifying specific lines and times with delays, 311 complaint patterns revealing seasonal and geographic trends, taxi trip analysis showing how ride-sharing impacted traditional taxis, bike share growth and usage patterns across different neighborhoods, property development trends and their relationship to rezoning. What made these successful: focused problem, rich data, coherent questions, appropriate visualizations, interesting insights. Study the examples on Discord but don’t copy - we know what those projects look like!
Tips for Success
Start early - Form teams and choose topics this week
Explore data before proposing - Make sure it has what you need
Keep questions focused - Better to do 6 questions well than 10 poorly
Iterate on designs - Your first sketch won’t be your best
Test on others - Show your visualizations to friends, get feedback
Use version control - Git is your friend, commit frequently
Divide work clearly - But review each other’s code
Document as you go - Future you will thank present you
Ask for help - Office hours, Discord, lab sessions
The most common failure mode: starting too late. The most common success factor: iterative development with frequent feedback. Form your teams NOW - use #project-teams on Discord. Start browsing datasets THIS WEEK. The proposal deadline (Oct 20) comes faster than you think. Use office hours strategically - come with specific questions. Don’t wait until something is broken to ask for help. The milestone structure is designed to keep you on track - use it!
Getting Help
Office Hours:
Instructor: Fridays 1:30-2:30 PM (after class, Room 215)
TA office hours: Posted on Discord
Resources:
Discord #group-projects channel
Weekly check-ins during lab sessions
Milestone feedback at each stage
Bring specific questions:
“How should I handle this data issue?” ✓
“Which visualization is better for this?” ✓
“How do I make my project better?” ✗ (too vague)
We want you to succeed! Use office hours, use Discord, use lab time. But come with specific questions - show us what you’ve tried, explain what’s not working, have your data and code ready. “I don’t know where to start” → go browse datasets for 30 minutes, come back with 3 ideas. “My visualization isn’t working” → share your code, explain what you expect vs what you see. “How do I make it better?” → be specific - better how? more interactive? clearer labels? different colors?
Finding Teammates
Use Discord #project-teams channel to:
Post your interests and availability
Find others interested in similar topics
Coordinate team formation
When forming teams, consider:
Complementary skills (design + coding, analysis + presentation)
Similar commitment levels and standards
Compatible schedules for meetings
Communication styles
Teams must be finalized by Oct 17 (one week from today).
Finding good teammates is crucial. You need people who will contribute equally, communicate clearly, and meet deadlines. Look for complementary skills - if you’re great at D3 but weak on design, find someone who’s good at visual design. If you’re analytical but not a strong writer, team up with someone who writes well. Schedule a quick video call before committing to a team - make sure you can work together. Discuss expectations: how often will you meet? what happens if someone misses a deadline? Better to have these conversations now than in November when things get stressful.
Project Ideas to Get Started
If you’re not sure where to start, consider:
NYC 311 Requests - What do New Yorkers complain about? Patterns over time/space?
Sports Analytics - NBA player performance, soccer match outcomes, fantasy sports
Movie/TV Data - Box office trends, IMDb ratings, streaming popularity
Climate Data - Temperature trends, extreme weather events, CO2 levels
COVID-19 Data - Case trends, vaccination rates, policy impacts
Music Analysis - Spotify streaming data, genre evolution, artist networks
Stock Market - Price movements, trading volumes, sector performance
Reddit/Twitter - Topic trends, sentiment analysis, viral content
Browse: NYC Open Data, Kaggle, Data.gov, or use APIs (Spotify, TMDB, etc.)
These are just starting points - you can take any of these in many different directions. NYC data: focus on specific types of complaints, response times, neighborhood patterns. Sports: compare players, track team performance over seasons, analyze game strategies. Movies: explore genre trends, director/actor networks, rating distributions. Climate: look at global patterns, seasonal variations, extreme events. COVID: examine waves, vaccination rollout, regional differences. Pick something you’re genuinely curious about - you’ll be working on this for 8 weeks!
Timeline Recap
8
Oct 20
Proposal
Problem, data, questions (2-3 pages)
10
Nov 3
Data & Sketches
Data tables and visualization sketches
12
Nov 17
First Draft
All visualizations implemented in D3
14
Dec 1
Second Draft
Complete article with polished visuals
15
Dec 8
Final Submission
Final notebook, presentation, code
Presentations: Dec 5 and Dec 12 (teams will be assigned)
Form teams by: Oct 17 (next week!)
Mark these dates in your calendar now! The milestones are spaced to give you steady progress without overwhelming you. Two weeks between most milestones - that’s enough time to do good work if you start promptly, but not enough to procrastinate. The proposal is due Week 8 (Oct 20) - that’s two weeks from today. Teams must be formed by next Friday (Oct 17) so you have time to discuss ideas before the proposal is due. Final presentations are in the last two weeks - we’ll split the class roughly in half. Use Thanksgiving break wisely to polish your second draft!
Questions?
Discord: #group-projects
Email: csilva@nyu.edu
Office Hours: Fridays 1:30-2:30 PM (Room 215)
Next Steps: 1. Browse datasets (NYC Open Data, Kaggle, etc.) 2. Find teammates (#project-teams on Discord) 3. Start drafting proposal ideas
This is the time for questions! Ask about anything: team formation, choosing topics, finding data, scope of work, evaluation criteria, milestone requirements. After this lecture, you have three immediate tasks: (1) Start browsing datasets - NYC Open Data, Kaggle, or other sources - spend at least an hour this weekend, (2) Find teammates using Discord #project-teams - post your interests and respond to others, aim to have a team by Monday, (3) Start thinking about your proposal - what problem interests you? What data could you use? What questions would you ask? The proposal is due in two weeks - that sounds like a lot of time but it goes fast. Start early, get feedback, iterate!