Group Projects 
  CS-GY 6313 - Fall 2025
Claudio Silva 
        
            NYU Tandon School of Engineering
          
     
 
  2025-10-10
 
Group Projects Overview 
You will work in teams of 2-3 people  on a visualization project that explores a dataset of your choice.
Key Dates: 
 
 
Proposal Oct 20 
Problem statement, dataset, questions 
 
Data Analysis & Sketches Nov 3 
Data tables and visualization sketches 
 
First Draft Nov 17 
Initial D3 implementations 
 
Second Draft Dec 1 
Refined narrative and polished visualizations 
 
Final Submission Dec 8 
Complete project with presentation 
 
 
This project is your opportunity to apply everything you’ve learned in the course to a real-world data problem. Unlike research-focused projects, this is about demonstrating mastery of InfoVis fundamentals: asking good questions, choosing appropriate visualizations, implementing them well in D3, and communicating insights effectively. You’ll work through five milestones that scaffold the work and ensure you make steady progress. Choose any dataset that interests you!
 
 
Why Group Projects? 
Working in teams teaches essential professional skills: 
Collaborative design and development 
Division of labor and project management 
Communication and coordination 
Code review and quality control 
Presenting and defending design decisions 
 
Logistics: 
Teams of 2-3 students (3 is recommended) 
Self-organized (use Discord #project-teams channel) 
All team members contribute to all milestones 
Individual contributions will be documented 
 
In the real world, visualization work is almost always collaborative. You need to learn how to divide work, integrate different contributions, maintain code quality across team members, and present unified results. Teams of 3 are ideal - enough people to tackle ambitious projects without becoming unwieldy. Use the #project-teams channel on Discord to find teammates. Start forming teams this week!
 
 
Project Scope: InfoVis vs. Research 
This is NOT a research project. This IS: 
A demonstration of InfoVis fundamentals 
An exploration of interesting data 
A showcase of your D3 implementation skills 
A well-crafted narrative with visualizations 
 
What we’re looking for: 
Clear, focused questions about your data 
Appropriate visualization choices 
Clean, effective D3 implementations 
Insightful analysis and findings 
Professional presentation and writeup 
 
This is different from research-focused projects. We’re not expecting novel research contributions or cutting-edge ML techniques. We want to see that you can: (1) formulate good questions about data, (2) choose the right visualizations to answer them, (3) implement those visualizations effectively in D3, (4) derive insights from your visualizations, and (5) communicate your findings clearly. Quality over novelty. Execution over ambition.
 
 
Choosing Your Topic 
Choose any dataset that interests you! Some possibilities: 
Urban data:  Transportation, 311 calls, housing, environmentSports:  Player statistics, game outcomes, team performanceEntertainment:  Movies, music, books, streaming trendsHealth:  Nutrition, exercise, medical data, public healthFinance:  Stocks, cryptocurrencies, economic indicatorsScience:  Climate, weather, biodiversity, astronomySocial:  Social media trends, demographics, surveysYour own data:  Personal projects, work data, research data 
Good data sources:  NYC Open Data, Kaggle, Data.gov, APIs (Spotify, Twitter, etc.)
The key is choosing data you’re genuinely interested in - you’ll be working with it for 8 weeks! Look for datasets that are: (1) complete (not too many missing values), (2) temporal or spatial (patterns over time/space are interesting), (3) rich in attributes (multiple variables to explore), (4) accessible (you can actually get the data). Avoid datasets that are too simple (just one or two numbers) or too complex (requires deep domain expertise you don’t have). NYC Open Data is great if you’re interested in urban topics, but you’re not limited to that!
 
 
What Makes a Good Project? 
Strong projects have: 
Clear Problem Statement 
Specific, focused, and well-motivated 
Explains why this matters 
 Rich Dataset(s) 
Accessible, complete, and appropriate 
Multiple attributes to explore 
Temporal and/or spatial dimensions 
 Coherent Questions 
Form a logical progression (a story) 
Can be answered with visualizations 
Build toward insights 
 Appropriate Visualizations 
Match the data and questions 
Well-designed and clearly labeled 
Interactive where it adds value 
  
The best projects tell a story. They start with a compelling problem, use appropriate data to explore it, ask questions that build on each other, and reveal insights that matter. Avoid projects that are just “let’s visualize this data” without clear questions. Avoid questions that are too broad (“what patterns exist?”) or too narrow (“what was the value on Tuesday?”). Think about what a reader would learn from your project and why they should care.
 
 
Milestone 1: Project Proposal (Due Oct 20) 
Submit a 2-3 page document with: 
Problem Statement/Background 
What problem are you exploring? 
Why is this interesting and worthwhile? 
What should readers know about the context? 
 Dataset(s) 
Where does the data come from? 
Who collected it and how? 
What attributes will you use? 
What preprocessing is needed? 
 Domain Questions 
List 5-8 questions you want to answer 
Questions should form a logical progression 
Each question should be answerable with visualization 
  
The proposal is due October 20 (Week 8). This is your chance to get feedback before investing significant implementation time. Take this seriously - a good proposal prevents problems later. Your problem statement should be 2-3 paragraphs that motivate the work. Your dataset section should show you’ve actually looked at the data, not just found a URL. Your questions should form a narrative arc, not just be a random list. We’ll provide detailed feedback to help you succeed.
 
 
Example: Problem Statement 
Background:  NYC has implemented over the years a number of initiatives to reduce the number of vehicle collisions in New York City and the severity of the accidents happening in the city.
Goal:  The goal of this project is to understand how collisions happen in New York City and how they have evolved over time.
Analysis plan:  More specifically, we will investigate how collisions and their severity distribute geographically to identify specific hotspots. We will explore major contributing factors (i.e., causes) behind collisions and their severity. We will look into how these trends change according to whether pedestrians or cyclists are involved and we will determine whether there are any seasonal and temporal trends.
Outcome:  As a major outcome of this analysis we will attempt to provide recommendations about possible interventions informed by the insights generated during our analysis.
This example shows a well-structured problem statement. Notice: (1) Background establishes context and prior work, (2) Goal is clear and focused, (3) Analysis plan is specific about what will be explored, (4) Outcome explains what the project aims to deliver. This gives the reader a complete picture of what to expect. Your problem statement should follow a similar structure.
 
 
Example: Dataset Description 
For this project we will use the NYC Vehicle Collisions Dataset .
Source:  NYC Open Data portalCollection:  NYPD using FORMS (Finest Online Records Management System)Method:  Police officers enter data electronically using Department cellphone or computerAttributes:  Date/time, location (lat/lon, borough, zip), number of persons injured/killed, vehicle types, contributing factors 
We will use the following attributes: 
Temporal: Date, time, year, month, day of week 
Spatial: Latitude, longitude, borough, zip code 
Severity: Number injured, number killed (total, pedestrians, cyclists) 
Context: Contributing factors, vehicle types 
 
Derived attributes we plan to create: 
Time of day categories (morning rush, midday, evening rush, night) 
Severity categories (property damage only, injury, fatality) 
Season (winter, spring, summer, fall) 
 
This dataset description shows you’ve done your homework. You know where the data comes from, how it was collected, what’s in it, and what you’ll need to compute. The derived attributes show you’re thinking ahead about how to analyze the data. This level of detail gives us confidence you can execute the project.
 
 
Example: Domain Questions 
Our project aims to find and present answers to the following questions:
How many collisions happen daily in NYC? How many injuries and deaths? 
Where do collisions happen? How do collisions distribute across NYC? 
Are there areas that are particularly deadly (high injuries/deaths)? 
When we focus on pedestrians and cyclists, where do they get injured or die? 
How have collisions and their severity evolved over time? 
Has the situation improved or worsened in specific areas? 
What are the major contributing factors for collisions? 
Do different areas have different contributing factors? 
Is there a relationship between contributing factors and whether pedestrians/cyclists are involved? 
 
Notice the progression: (1-2) establish baseline patterns, (3-4) focus on severity and vulnerable users, (5-6) examine temporal trends, (7-9) explore causes and relationships. This forms a coherent narrative that builds toward actionable insights. Each question can be answered with visualization. Questions are specific enough to guide implementation but open enough to allow discovery. Your questions should follow a similar logical flow.
 
 
Common Proposal Mistakes 
Avoid these pitfalls: 
❌ Problem statement too vague (“explore patterns in data”) 
❌ Problem too narrow (not complex enough for a project) 
❌ Problem too ambitious (unrealistic scope) 
❌ Questions can’t be answered with your data 
❌ You don’t actually have access to the data 
❌ Unclear what data attributes you’ll use 
❌ Questions are unrelated (no narrative progression) 
❌ Questions too vague or ambiguous 
 
Get feedback early!  Bring ideas to office hours or post on Discord.
These are the most common ways proposals fail. The vague problem: “I want to visualize subway data” - why? what question? The too-narrow problem: “I want to show the number of trains per hour” - that’s one chart, not a project. The too-ambitious problem: “I want to predict future traffic using deep learning” - wrong class! Questions that need data you don’t have: “I want to show why people chose that route” when you only have aggregate counts. Start early, get feedback, iterate.
 
 
Milestone 2: Data Analysis & Sketches (Due Nov 3) 
For each of your domain questions:
Transform the data  to extract the information neededCreate a data table  showing the results (sample of the processed data)Sketch the visualization  you plan to create 
Sketches can be: 
Hand-drawn (pen and paper, or tablet) 
Created with drawing software (Figma, Excalidraw, etc.) 
Generated with data viz tools (Tableau, Matplotlib, Vega-Lite) 
 
Important:  It’s okay (and expected!) to refine your questions at this stage based on what you discover in the data.
This milestone is about understanding your data and planning your visualizations. You’re not implementing in D3 yet - you’re figuring out what data you need and what visualizations will work. The data table proves you can extract the information needed. The sketch shows you’ve thought about how to visualize it. This is where you discover problems: “Oh, the data doesn’t have what I thought it had” or “This question isn’t as interesting as I thought.” Better to discover this now than during implementation!
 
 
Data Analysis & Sketches: Best Practices 
For data tables: 
Show a representative sample of your processed data 
Include column headers with clear names 
Document any transformations or aggregations 
Show enough rows to understand the structure 
 
For sketches: 
Include titles, axis labels, and legends 
Show how data will be mapped to visual elements 
Indicate interactive elements if applicable 
Make it clear enough that someone could implement it 
 
Sketches should be YOUR work  - not copied from other projects or stock image collections!
The data table doesn’t need to be the full dataset - show 10-20 rows that illustrate what the data looks like after your processing. The sketch should be detailed enough that someone else could implement it. Include annotations explaining what’s shown. If the visualization is interactive, indicate what happens on hover, click, etc. Hand-drawn sketches are fine as long as they’re clear and complete. What we’re checking: Can you get the data you need? Have you thought through the visualization design?
 
 
Milestone 3: First Draft (Due Nov 17) 
Submit a Jupyter notebook  or Observable notebook  with:
Brief introduction to your project 
For each question:
State the question 
Show the D3 visualization 
Describe what the visualization shows 
Answer the question based on the visualization 
  
 
At this stage: 
All visualizations should be implemented in D3 
Focus on getting the basics working 
Styling/polish can come later 
Interactivity should be functional (if included) 
It’s still okay to refine questions if needed 
 
This is your first implementation milestone. We expect working D3 code for all your visualizations. They don’t need to be perfect - labels might be messy, colors might be defaults, interactions might be basic - but they should work and show your data correctly. This is when you discover implementation challenges: “This chart type is harder than I thought” or “The data is more complex than I realized.” We’ll give you feedback on what to improve for the next draft.
 
 
Milestone 4: Second Draft (Due Dec 1) 
Transform your notebook into a complete article  with:
Title  - Clear and informativeIntroduction  - Problem, background, motivation, overview of findingsData Description  - Sources, collection methods, attributes usedQuestions and Findings  - For each question:
Clear question statement 
Polished D3 visualization 
Analysis and interpretation 
Insights and implications 
 Conclusion  - Summary of findings, recommendations, limitations 
Focus on narrative flow  - someone unfamiliar with your project should be able to read and understand it.
This is where you transition from “a bunch of charts” to “a coherent story.” The introduction sets up the problem and tells readers what to expect. Each section builds on the previous one. The conclusion ties it all together and explains what it means. This is also when you polish your visualizations: fix labels, improve colors, refine interactions, add annotations. Think about it like writing a blog post or article for a technical audience. What would they need to know? What order makes sense?
 
 
Milestone 5: Final Submission (Due Dec 8) 
Your final submission includes:
Final notebook  - Polished version of second draft with all feedback addressedPresentation  (10 minutes) - Present to class on Dec 5 or Dec 12Code repository  - GitHub repo with all code and documentationTeam contribution statement  - Who did what 
Presentations will include: 
Problem and motivation (2 min) 
Key visualizations and findings (6 min) 
Conclusions and recommendations (2 min) 
Q&A with class 
 
The final submission is your polished, publication-ready work. We’ll grade based on: technical execution (D3 implementation quality), visualization design (appropriate choices, clear encoding), analysis depth (insights, not just description), presentation quality (narrative flow, writing, polish), and teamwork (balanced contributions). The presentation is your chance to showcase your work - practice it! The code repository should be clean and documented - others should be able to run your code. The contribution statement ensures everyone gets appropriate credit.
 
 
Evaluation Criteria 
Technical Implementation (35%) 
D3 code quality and correctness 
Appropriate use of D3 features 
Interactivity implementation 
Code organization and documentation 
 
Visualization Design (30%) 
Appropriate chart types 
Effective visual encodings 
Clear labels and legends 
Color and layout choices 
Accessibility considerations 
 
 
Analysis & Insights (20%) 
Question quality and coherence 
Depth of analysis 
Insight generation 
Interpretation accuracy 
 
Communication (10%) 
Narrative flow 
Writing clarity 
Presentation quality 
Professional polish 
 
Teamwork (5%) 
Equal contribution 
Coordination evidence 
 
  
Technical implementation and visualization design together are 65% of your grade - execution matters! We’re checking: Does your D3 code work correctly? Did you choose appropriate visualizations? Are they well-designed? Analysis and insights are 20% - you need to actually interpret your visualizations and derive meaningful findings. Communication is 10% - can someone else understand your work? Teamwork is 5% but we may adjust individual grades if contributions are very unequal. Document who did what in your final submission.
 
 
Example Projects from Previous Years 
Strong projects typically: 
Focus on specific aspects of a dataset rather than trying to show everything 
Have 5-8 well-designed visualizations that build on each other 
Include at least 2-3 interactive visualizations 
Use a mix of chart types appropriate to the questions 
Derive actionable insights or recommendations 
Are well-written and professionally presented 
 
I will share example projects on Discord  from previous offerings of this course. Look at them for inspiration, but make your project your own!
Past successful projects include: subway performance analysis identifying specific lines and times with delays, 311 complaint patterns revealing seasonal and geographic trends, taxi trip analysis showing how ride-sharing impacted traditional taxis, bike share growth and usage patterns across different neighborhoods, property development trends and their relationship to rezoning. What made these successful: focused problem, rich data, coherent questions, appropriate visualizations, interesting insights. Study the examples on Discord but don’t copy - we know what those projects look like!
 
 
Tips for Success 
Start early  - Form teams and choose topics this weekExplore data before proposing  - Make sure it has what you needKeep questions focused  - Better to do 6 questions well than 10 poorlyIterate on designs  - Your first sketch won’t be your bestTest on others  - Show your visualizations to friends, get feedbackUse version control  - Git is your friend, commit frequentlyDivide work clearly  - But review each other’s codeDocument as you go  - Future you will thank present youAsk for help  - Office hours, Discord, lab sessions 
The most common failure mode: starting too late. The most common success factor: iterative development with frequent feedback. Form your teams NOW - use #project-teams on Discord. Start browsing datasets THIS WEEK. The proposal deadline (Oct 20) comes faster than you think. Use office hours strategically - come with specific questions. Don’t wait until something is broken to ask for help. The milestone structure is designed to keep you on track - use it!
 
 
Getting Help 
Office Hours: 
Instructor: Fridays 1:30-2:30 PM (after class, Room 215) 
TA office hours: Posted on Discord 
 
Resources: 
Discord #group-projects channel 
Weekly check-ins during lab sessions 
Milestone feedback at each stage 
 
Bring specific questions: 
“How should I handle this data issue?” ✓ 
“Which visualization is better for this?” ✓ 
“How do I make my project better?” ✗ (too vague) 
 
We want you to succeed! Use office hours, use Discord, use lab time. But come with specific questions - show us what you’ve tried, explain what’s not working, have your data and code ready. “I don’t know where to start” → go browse datasets for 30 minutes, come back with 3 ideas. “My visualization isn’t working” → share your code, explain what you expect vs what you see. “How do I make it better?” → be specific - better how? more interactive? clearer labels? different colors?
 
 
Finding Teammates 
Use Discord #project-teams channel to: 
Post your interests and availability 
Find others interested in similar topics 
Coordinate team formation 
 
When forming teams, consider: 
Complementary skills (design + coding, analysis + presentation) 
Similar commitment levels and standards 
Compatible schedules for meetings 
Communication styles 
 
Teams must be finalized by Oct 17  (one week from today).
Finding good teammates is crucial. You need people who will contribute equally, communicate clearly, and meet deadlines. Look for complementary skills - if you’re great at D3 but weak on design, find someone who’s good at visual design. If you’re analytical but not a strong writer, team up with someone who writes well. Schedule a quick video call before committing to a team - make sure you can work together. Discuss expectations: how often will you meet? what happens if someone misses a deadline? Better to have these conversations now than in November when things get stressful.
 
 
Project Ideas to Get Started 
If you’re not sure where to start, consider: 
NYC 311 Requests  - What do New Yorkers complain about? Patterns over time/space?Sports Analytics  - NBA player performance, soccer match outcomes, fantasy sportsMovie/TV Data  - Box office trends, IMDb ratings, streaming popularityClimate Data  - Temperature trends, extreme weather events, CO2 levelsCOVID-19 Data  - Case trends, vaccination rates, policy impactsMusic Analysis  - Spotify streaming data, genre evolution, artist networksStock Market  - Price movements, trading volumes, sector performanceReddit/Twitter  - Topic trends, sentiment analysis, viral content 
Browse: NYC Open Data, Kaggle, Data.gov, or use APIs (Spotify, TMDB, etc.)
These are just starting points - you can take any of these in many different directions. NYC data: focus on specific types of complaints, response times, neighborhood patterns. Sports: compare players, track team performance over seasons, analyze game strategies. Movies: explore genre trends, director/actor networks, rating distributions. Climate: look at global patterns, seasonal variations, extreme events. COVID: examine waves, vaccination rollout, regional differences. Pick something you’re genuinely curious about - you’ll be working on this for 8 weeks!
 
 
Timeline Recap 
 
 
8 
Oct 20 
Proposal 
Problem, data, questions (2-3 pages) 
 
10 
Nov 3 
Data & Sketches 
Data tables and visualization sketches 
 
12 
Nov 17 
First Draft 
All visualizations implemented in D3 
 
14 
Dec 1 
Second Draft 
Complete article with polished visuals 
 
15 
Dec 8 
Final Submission 
Final notebook, presentation, code 
 
 
Presentations:  Dec 5 and Dec 12 (teams will be assigned)
Form teams by:  Oct 17 (next week!)
Mark these dates in your calendar now! The milestones are spaced to give you steady progress without overwhelming you. Two weeks between most milestones - that’s enough time to do good work if you start promptly, but not enough to procrastinate. The proposal is due Week 8 (Oct 20) - that’s two weeks from today. Teams must be formed by next Friday (Oct 17) so you have time to discuss ideas before the proposal is due. Final presentations are in the last two weeks - we’ll split the class roughly in half. Use Thanksgiving break wisely to polish your second draft!
 
 
Questions? 
Discord:  #group-projects
Email:  csilva@nyu.edu
Office Hours:  Fridays 1:30-2:30 PM (Room 215)
Next Steps:  1. Browse datasets (NYC Open Data, Kaggle, etc.) 2. Find teammates (#project-teams on Discord) 3. Start drafting proposal ideas
 
This is the time for questions! Ask about anything: team formation, choosing topics, finding data, scope of work, evaluation criteria, milestone requirements. After this lecture, you have three immediate tasks: (1) Start browsing datasets - NYC Open Data, Kaggle, or other sources - spend at least an hour this weekend, (2) Find teammates using Discord #project-teams - post your interests and respond to others, aim to have a team by Monday, (3) Start thinking about your proposal - what problem interests you? What data could you use? What questions would you ask? The proposal is due in two weeks - that sounds like a lot of time but it goes fast. Start early, get feedback, iterate!