Posts

Showing posts with the label Stanford CoreNLP

2020-08-10: StoryGraph, reading the news for three years, a look at the past and the future

Image
Fig. 1: Three connected components from three different StoryGraphs, representing three different news stories. The first connected component represents the news story about North Korea considering firing missiles at Guam , the second, the royal wedding of Prince Harry and Meghan , and the third, AG William Barr's release of his summary of the Mueller Report . A lot has happened in three years. We've seen threats of war , hurricanes Harvey / Irma / Maria , upsets in  elections ,  a royal wedding , an impeachment , a pandemic , etc. For all these stories and many more, for three years, every 10-minutes, StoryGraph has been reading the news, generating news similarity graphs, and quantifying the level of attention news stories receive. August 8, 2020 marked the third year since StoryGraph went live. In this blogpost, I will take a retrospective look at the studies StoryGraph has enabled and a promise multiple services and studies for the future. StoryGraph's Past: studies a...

2020-03-24: StoryGraph at Computation + Journalism Symposium 2020 Non-Trip Report

Image
Click to expand: Overview of StoryGraph illustrating the process of generating a news similarity graph is four primary steps. The four steps are explained in the StoryGraph Tech Report We never did give StoryGraph  a proper introduction. Over three years, I have tweeted, created a Twitter account ( @storygraphbot ) for StoryGraph, and published two blogposts that utilized the StoryGraph service to determine the top news stories of 2018 and 2019 . But I never really introduced and motivated the need for StoryGraph. I hoped that the  Computation + Journalism Symposium  would provide the opportunity for giving StoryGraph a proper introduction, but the COVID-19 pandemic interrupted it. To commemorate #ElectionDay , Same domain story linking results for October 2018 (for @BreitbartNews @FoxNews @CNN @HuffPost ) shows stories news media focused on 1 month before the elections. pic.twitter.com/Xw196Y66Mi — StoryGraph (@storygraphbot) November 6, 2018 The...

2019-09-09: Introducing sumgram, a tool for generating the most frequent conjoined ngrams

Image
Comparison of top 20 (first column) bigrams, top 20 (second column) six-grams, and top 20 (third column) sumgrams (conjoined ngrams) generated by sumgram for a collection of documents about the 2014 Ebola Virus Outbreak . Proper nouns of more than two words (e.g., "centers for disease control and prevention") are split when generating bigrams, sumgram strives to remedy this. Generating six-grams surfaces non-salient six-grams. Click image to expand. A Web archive collection consists of groups of webpages that share a common topic e.g., “Ebola virus” or “Hurricane Harvey.” One of the most common tasks involved in understanding the "aboutness" of a collection is generating the top k (e.g., k = 20) ngrams. For example, given a collection about Ebola Virus , we could generate the top 20 bigrams as presented in Fig. 1. This simple operation of calculating the most frequent bigrams unveils useful bigrams that help us understand the focus of the collection, and m...

2018-03-04: Installing Stanford CoreNLP in a Docker Container

Image
Fig. 1: Example of Text Labeled with the CoreNLP Part-of-Speech , Named-Entity Recognizer and Dependency Annotators . Click to expand image. The  Stanford CoreNLP  suite provides a wide range of important natural language processing applications such as Part-of-Speech (POS) Tagging and Named-Entity Recognition (NER) Tagging. CoreNLP is written in Java and there is support for other languages . I tested a couple of the latest Python wrappers that provide access to CoreNLP but was unable to get them working due to different environment-related complications. Fortunately, with the help of Sawood Alam , our very able Docker  campus ambassador at Old Dominion University, I was able to create a Dockerfile  that installs and runs the CoreNLP server ( version 3.8.0 ) in a container. This eliminated the headaches of installing the server and also provided a simple method of accessing CoreNLP services through HTTP requests. How to run the CoreNLP server on localhost...