Welcome back to the Visualization for Machine Learning Lab!

## Miscellaneous * Proposal grades and feedback are on Brightspace * Check PDFs for comments * Longer comments may need to be viewed in a PDF reader (like Acrobat) rather than the browser * Presentations 5/2 * Written report due sometime during finals week (will get back to you with a date on Monday) * Homework 4 (OPTIONAL - extra credit) on Brightspace * Amount of extra credit TBD * Due April 19th ## NLP + Vis Applications * These slides are adapted from a half-day tutorial at the 2023 EMNLP Conference by Shafiq Joty (Salesforce AI Research and Nanyang Technological University), Enamul Hoque (York University), and Jesse Vig (Salesforce AI Research) * You can find their full tutorial slides [here](https://nlp4vis.github.io/) ## What Can NLP + Vis Be Used For? * Visual text analytics * Natural language interfaces for visualizations * Text generations for visualizations * Automatic visual story generation * Visualization retrieval and recommendation * Etc. ## What Can NLP + Vis Be Used For? * We will first divide these applications into two categories: * Natural language as **input** * Natural language as **output** ## NL as Input: ChartQA ![](chartQA.png) ::: footer Ref: @masry-etal-2022-chartqa ::: ## ChartQA Dataset * Real-world charts crawled from various online sources * 9.6k human-authored and 23.1K Machine-generated questions ![](chartQA_dataset.png) ::: footer Ref: @masry-etal-2022-chartqa ::: ## ChartQA Approach ![](chartQA_approach.png) ::: footer Ref: @masry-etal-2022-chartqa ::: ## ChartQA Evaluation ![](chartQA_eval.png) * VisionTaPas achieves SOTA performance. * Lower accuracies in authors' dataset compared to previous datasets (mainly due to the human-written visual and logical reasoning questions) ## OpenAI’s Study of GPT-4 on ChartQA Benchmark ![](chartQA_GPT-4.png) ::: footer Ref: https://openai.com/research/gpt-4 ::: ## NL as Input: Multimodal Inputs for Visualizations * Ambiguity Widgets: Eviza (Setlur et al., 2016) * Allows users to rectify queries ![](mm_vis.png){width=50%} ## NL as Input: Multimodal Inputs for Visualizations * Query completion through text and interactive vis: Sneak Pique (Setlur et al., 2020) ::: footer Ref: @10.1145/3379337.3415813 ::: ## NL as Input: Multimodal Inputs for Visualizations ![](mm_vis2.png) ::: footer Ref: @10.1145/3379337.3415813 ::: ## NL as Output: Chart-to-Text ![](CTT-2.png) ::: footer Ref: @kantharaj-etal-2022-chart ::: ## NL as Output: Chart-to-Text ![](CTT-1.png) ::: footer Ref: @kantharaj-etal-2022-chart ::: ## Chart-to-Text Example Models ![](CTT.png) * Full fine-tuning BART/T5 on authors' datasets * Setup 1: Linearizes the table as the input * Setup 2: Send OCR text from the chart image as the input * Prefix to T5: “translate Chart to Text: ” ::: footer Ref: @kantharaj-etal-2022-chart ::: ## Chart-to-Text Sample Output ![](CTT-output.png) ::: footer Ref: @kantharaj-etal-2022-chart ::: ## VisText * 12.4K Charts with generated + crowd-sourced captions * Scene graphs with a hierarchical representation of a chart's visual elements ![](VisText.png) ::: footer Ref: @tang-etal-2023-vistext ::: ## VisText Sample Output ![](VisText2.png) * Correctly identifies upword trends, but repeats this claim twice ::: footer Ref: @tang-etal-2023-vistext ::: ## NL as Output: Open-Ended Question Answering with Charts ![](openCQA.png) ::: footer Ref: @kantharaj-etal-2022-opencqa ::: ## Combining Language and Visualizations as Output * Roles of natural language * Generating explanatory answer * Explaining the answer ![](role_NL.png) ## Combining Language and Visualizations as Output * An example of combining text and vis as a multimodal output ![](role_NL2.png) ::: footer Ref: @DBLP:journals/corr/abs-2010-09975 ::: ## Combining Language and Visualizations as Output * DataShot (Yun et al., 2019) ![](role_NL3.png) ::: footer Ref: @Wang2020DataShotAG ::: ## Conversational QA With Visualization * Evizeon (Hoque et al., TVCG 2017) ![](conversationalQA.png) ## Open Challenges & Ongoing Research * Design of natural language interfaces * Must consider richness and ambiguities of natural language * Complex reasoning required to predict the answer * Computer vision challenges for automatic understanding of image charts * Inherently interdisciplinary (HCI, ML, NLP, InfoVis, Computer Vision) ## Open Challenges & Ongoing Research * Dataset creation * Need for large-scale, real-world benchmark datasets * Most existing datasets lack realism * For many problem setups, there is no benchmark ## Open Challenges & Ongoing Research * Challenges with natural language generation * Hallucinations * Factual errors * Perceptual and reasoning aspects * Computer Vision Challenges ## Open Challenges & Ongoing Research ![](LLM_error_1.png) ## Open Challenges & Ongoing Research ![](LLM_error_2.png) ## Open Challenges & Ongoing Research * Improving logical and visual reasoning ![](LLM_reasoning.png) ::: footer Ref: @masry-etal-2022-chartqa ::: ## Open Challenges & Ongoing Research * Computer vision challenges (e.g. chart data extraction) ![](chart_data_extraction.png) ::: footer Ref: @masry-etal-2022-chartqa ::: ## Open Challenges & Ongoing Research * How can we effectively combine text and visualization in data stories? ![](data_stories.png) ## Open Challenges & Ongoing Research * NLP for Visualization accessiblity ![](NLP_accessiblity_1.png){width=70%} ::: footer Ref: @10.1145/3491102.3517431 ::: ## Open Challenges & Ongoing Research * NLP for Visualization accessiblity ![](NLP_accessibility.png){width=90%} ::: footer Ref: @Alam_et_al ::: ## Project Proposal Feedback * For the remainder of the lab, please look over the feedback on your project proposals with your group members * Any questions? Concerns? Any progress you want feedback on? Feel free to ask now ## References

Claudio Silva