Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

hrbrmstr/newsflash

Repository files navigation

*** BREAKING CHANGES ***

newsflash

Tools to Work with the Internet Archive and GDELT Television Explorer

Description

Ref:

TV Explorer:

"In collaboration with the Internet Archive’s Television News Archive, GDELT’s Television Explorer allows you to keyword search the closed captioning streams of the Archive’s 6 years of American television news and explore macro-level trends in how America’s television news is shaping the conversation around key societal issues. Unlike the Archive’s primary Television News interface, which returns results at the level of an hour or half-hour"show," the interface here reaches inside of those six years of programming and breaks the more than one million shows into individual sentences and counts how many of those sentences contain your keyword of interest. Instead of reporting that CNN had 24 hour-long shows yesterday that mentioned Donald Trump, the interface here will count how many sentences uttered on CNN yesterday mentioned his name - a vastly more accurate metric for assessing media attention."

Third Eye:

The TV News Archive’s Third Eye project captures the chyrons–or narrative text–that appear on the lower third of TV news screens and turns them into downloadable data and a Twitter feed for research, journalism, online tools, and other projects. At project launch (September 2017) we are collecting chyrons from BBC News, CNN, Fox News, and MSNBC–more than four million collected over just two weeks."

An advantage of using this over the TV Explorer interactive selector & downloader or Third Eye API is that you get tidy tibbles with this package, ready to use in R.

NOTE: While I don’t claim that this alpha-package is anywhere near perfect, the IA/GDELT TV API hiccups every so often so when there are critical errors run the same query in their web interface before submitting an issue. I kept getting errors when searching all affiliate markets for the "mexican president" query that also generate errors on the web site when JSON is selected as output (it’s fine on the web site if the choice is interactive browser visualizations). Submit those errors to them, not here.

What’s Inside The Tin

The following functions are implemented:

  • list_chyrons: Retrieve Third Eye chyron index
  • list_networks: Helper function to identify station/network keyword and corpus date range for said market
  • newsflash: Tools to Work with the Internet Archive and GDELT Television Explorer
  • query_tv: Issue a query to the TV Explorer
  • read_chyrons: Retrieve TV News Archive chyrons from the Internet Archive’s Third Eye project
  • gd_top_trending: Top Trending (GDELT)
  • `iatv_top_trending: Top Trending Topics (Internet Archive TV Archive)
  • word_cloud: Retrieve top words that appear most frequently in clips matching your search

Installation

devtools::install_github("hrbrmstr/newsflash")
options(width=120)

Usage

library(newsflash)
library(ggalt)
library(hrbrthemes)
library(tidyverse)
# current verison
packageVersion("newsflash")
## [1] '0.6.0'

"Third Eye" Chyrons are simpler so we’ll start with them first:

list_chyrons()
## # A tibble: 457 x 3
## ts type size
## <date> <chr> <dbl>
## 1 2018年04月16日 cleaned 297177.
## 2 2018年04月16日 raw 10436998.
## 3 2018年04月15日 cleaned 347063.
## 4 2018年04月15日 raw 9884284.
## 5 2018年04月14日 cleaned 470448.
## 6 2018年04月14日 raw 13709682.
## 7 2018年04月13日 cleaned 410976.
## 8 2018年04月13日 raw 12058117.
## 9 2018年04月12日 cleaned 384796.
## 10 2018年04月12日 raw 11750908.
## # ... with 447 more rows
ch <- read_chyrons("2018年04月13日")
mutate(
 ch, 
 hour = lubridate::hour(ts),
 text = tolower(text),
 mention = grepl("comey", text)
) %>% 
 filter(mention) %>% 
 count(hour, channel) %>% 
 ggplot(aes(hour, n)) +
 geom_segment(aes(xend=hour, yend=0), color = "lightslategray", size=1) +
 scale_x_continuous(name="Hour (GMT)", breaks=seq(0, 23, 6),
 labels=sprintf("%02d:00", seq(0, 23, 6))) +
 scale_y_continuous(name="# Chyrons", limits=c(0,20)) +
 facet_wrap(~channel, scales="free") +
 labs(title="Chyrons mentioning 'Comey' per hour per channel",
 caption="Source: Internet Archive Third Eye project & <github.com/hrbrmstr/newsflash>") +
 theme_ipsum_rc(grid="Y")

Now for the TV Explorer:

See what networks & associated corpus date ranges are available:

list_networks(widget=FALSE)
## # A tibble: 159 x 6
## StationID Description Market Network StartDate EndDate 
## <chr> <chr> <chr> <chr> <date> <date> 
## 1 ALJAZ Al Jazeera International ALJAZ 2017年09月11日 2017年09月11日
## 2 ALJAZAM Al Jazeera America NationalDiscontinued ALJAZAM 2013年08月20日 2013年08月20日
## 3 BBCNEWS BBC News International BBCNEWS 2017年01月01日 2017年01月01日
## 4 BETW BET - San Francisco (BETW) San Francisco BET 2016年12月13日 2016年12月13日
## 5 BLOOMBERG Bloomberg National BLOOMBERG 2013年12月05日 2013年12月05日
## 6 CNBC CNBC National CNBC 2009年07月02日 2009年07月02日
## 7 CNN CNN National CNN 2009年07月02日 2009年07月02日
## 8 COM Comedy Central NationalSpecialty COM 2011年05月10日 2011年05月10日
## 9 CSPAN CSPAN National CSPAN 2009年06月04日 2009年06月04日
## 10 CSPAN2 CSPAN2 National CSPAN 2009年06月04日 2009年06月04日
## # ... with 149 more rows

Basic search:

comey <- query_tv('comey', start_date = "2018年04月01日")
comey
## # A tibble: 144 x 3
## network date value
## <chr> <date> <dbl>
## 1 CSPAN3 2018年04月01日 0.0273
## 2 CSPAN3 2018年04月02日 0. 
## 3 CSPAN3 2018年04月03日 0. 
## 4 CSPAN3 2018年04月04日 0.0241
## 5 CSPAN3 2018年04月05日 0. 
## 6 CSPAN3 2018年04月06日 0. 
## 7 CSPAN3 2018年04月07日 0. 
## 8 CSPAN3 2018年04月08日 0. 
## 9 CSPAN3 2018年04月09日 0. 
## 10 CSPAN3 2018年04月10日 0. 
## # ... with 134 more rows
query_tv('comey', start_date = "2018年04月01日") %>% 
 arrange(date) %>% 
 ggplot(aes(date, value, group=network)) +
 ggalt::geom_xspline(aes(color=network)) +
 ggthemes::scale_color_tableau(name=NULL) +
 labs(x=NULL, y="Volume Metric", title="'Comey' Trends Across National Networks") +
 facet_wrap(~network) +
 theme_ipsum_rc(grid="XY") +
 theme(legend.position="none")

query_tv("comey Network:CNN", mode = "TimelineVol", start_date = "2018年01月01日") %>% 
 arrange(date) %>% 
 ggplot(aes(date, value, group=network)) +
 ggalt::geom_xspline(color="lightslategray") +
 ggthemes::scale_color_tableau(name=NULL) +
 labs(x=NULL, y="Volume Metric", title="'Comey' Trend on CNN") +
 theme_ipsum_rc(grid="XY")

Relative Network Attention To Syria since January 1, 2018

query_tv('syria Market:"National"', mode = "StationChart", start_date = "2018年01月01日") %>% 
 arrange(desc(count)) %>% 
 knitr::kable("markdown")
station count
FOX News 1.0148
CNN 0.8804
MSNBC 0.7668
CSPAN 0.6192
FOX Business 0.5121
CSPAN2 0.3346
Bloomberg 0.3208
CSPAN3 0.2392
CNBC 0.2171

Video Clips

clips <- query_tv('comey Market:"National"', mode = "ClipGallery", start_date = "2018年01月01日")
clips
## # A tibble: 32 x 8
## preview_url ia_show_id date station show show_date preview_thumb snippet 
## <chr> <chr> <date> <chr> <chr> <date> <chr> <chr> 
## 1 https://archive.... FOXNEWSW_201... 2018年04月13日 FOX Ne... Shepa... 2018年04月13日 https://archive.org/do... comey -- i mention it ...
## 2 https://archive.... MSNBCW_20180... 2018年03月20日 MSNBC MTP D... 2018年03月20日 https://archive.org/do... donald trump ousted co...
## 3 https://archive.... CNNW_2018041... 2018年04月16日 CNN CNN S... 2018年04月16日 https://archive.org/do... comey versus comey or ...
## 4 https://archive.... MSNBCW_20180... 2018年04月12日 MSNBC The R... 2018年04月12日 https://archive.org/do... and the president of c...
## 5 https://archive.... FOXNEWSW_201... 2018年04月13日 FOX Ne... The I... 2018年04月13日 https://archive.org/do... comey announced when h...
## 6 https://archive.... FBC_20180413... 2018年04月13日 FOX Bu... After... 2018年04月13日 https://archive.org/do... untethered to the trut...
## 7 https://archive.... FBC_20180415... 2018年04月15日 FOX Bu... The J... 2018年04月15日 https://archive.org/do... that we haven't alread...
## 8 https://archive.... CNNW_2018031... 2018年03月18日 CNN New D... 2018年03月18日 https://archive.org/do... media. after comey lea...
## 9 https://archive.... MSNBCW_20180... 2018年02月20日 MSNBC The B... 2018年02月20日 https://archive.org/do... trump caused this inve...
## 10 https://archive.... CNBC_2018041... 2018年04月13日 CNBC Power... 2018年04月13日 https://archive.org/do... he is ego different an...
## # ... with 22 more rows

2018年04月13日 | FOX News | Shepard Smith Reporting

comey – i mention it because comey is in the news. treats comey like a white knight and points out that director comey would have a vested interest in distancing himself from andrew mccabe because the inspector general was also looking at comey and his

"Word Cloud" (top associated words to the query)

wc <- query_tv('hannity Market:"National"', mode = "WordCloud", start_date = "2018年04月13日")
ggplot(wc, aes(x=1, y=1)) +
 ggrepel::geom_label_repel(aes(label=label, size=count), segment.colour="#00000000", segment.size=0) +
 scale_size_continuous(trans="sqrt") +
 labs(x=NULL, y=NULL) +
 theme_ipsum_rc(grid="") +
 theme(axis.text=element_blank()) +
 theme(legend.position="none") 

Last 15 Minutes Top Trending

gd_top_trending()
## $overall_trending_topics
## [1] "commonwealth" "shirley" "caribbean" "florida" 
## [5] "jim comey" "boston" "sandra" "nell" 
## [9] "george stephanopoulos" "vincent kompany" "pallab ghosh" "brighthouse financial"
## [13] "islamic state" "wetherspoon" "europe" "sorrell" 
## [17] "north carolina" "nasa" "starbucks" "pakistan" 
## [21] "whitbread" "cliff richard" "asia" "hilary clinton" 
## [25] "ghouta" "kevin johnson" "west" "philadelphia" 
## [29] "renee" "zimbabwe" "city" "bill chaplin" 
## [33] "james" "grassley" "quetta" "myrbetriq" 
## [37] "barbara" "john heilemann" "carrie underwood" "joe" 
## [41] "houston" "balochistan" "ibm" "medicare" 
## [45] "barclays" "fidelity" "jason aldean" "rhonda" 
## [49] "michael flynn" "belfast" "kohler" 
## 
## $station_trending_topics
## # A tibble: 112 x 2
## station topic 
## <chr> <chr> 
## 1 CNN brilinta 
## 2 CNN jim comey 
## 3 CNN christine 
## 4 CNN michael flynn 
## 5 CNN tremfya 
## 6 CNN tal 
## 7 CNN nick paton walsh
## 8 CNN geico 
## 9 CNN vladimir putin 
## 10 CNN lynch 
## # ... with 102 more rows
## 
## $station_top_topics
## # A tibble: 112 x 2
## station topic 
## <chr> <chr> 
## 1 CNN fbi 
## 2 CNN russia 
## 3 CNN donald trump 
## 4 CNN james comey 
## 5 CNN mueller 
## 6 CNN syria 
## 7 CNN united states
## 8 CNN michael cohen
## 9 CNN clinton 
## 10 CNN cnn 
## # ... with 102 more rows
## 
## $overall_trending_phrases
## [1] "morally unfit" "unfit to be president" "good morning" 
## [4] "medically unfit" "president of the united" "islamic state group" 
## [7] "night sky" "bank of america" "xfinity delivers gig" 
## [10] "give this guy gig-" "delivers gig speed" "give this guy" 
## [13] "gig speed" "speed to more homes" "xfinity delivers gig speed" 
## [16] "guy gig-" "treats women" "xfinity delivers" 
## [19] "donald trump" "gig-speed internet" "kennedy space centre" 
## [22] "people watching" "threatens new sanctions" "donald trump unfit" 
## [25] "exclusive interview" "evidence of obstruction" "sees moral equivalence" 
## [28] "100 years" "air strikes" "fit to be president" 
## [31] "new york" "maintaining a level" "shield annuity" 
## [34] "growth opportunities" "lies constantly" "time to make" 
## [37] "level of protection" "support for president assad" "removing donald trump" 
## [40] "support for president" "buy the stuff" "2700 journalists" 
## [43] "pallab ghosh" "brighthouse financial- established" "mission to scan" 
## [46] "stars resonate" "voting booth" "star makes" 
## [49] "james comey comments" "embody respect" "adhere to the values"

Top Overall Trending from the Internet Archive TV Archive (2017 and earlier)

iatv_top_trending("2017年12月01日 18:00", "2017年12月02日 06:00")
×ばつ 2]> <data.frame [7 ×ばつ 2]> <chr [51]> ## 2 2017-12-01 18:15:00 <chr [51]> <data.frame [7 ×ばつ 2]> <data.frame [7 ×ばつ 2]> <chr [51]> ## 3 2017-12-01 18:30:00 <chr [51]> <data.frame [7 ×ばつ 2]> <data.frame [7 ×ばつ 2]> <chr [51]> ## 4 2017-12-01 18:45:00 <chr [51]> <data.frame [7 ×ばつ 2]> <data.frame [7 ×ばつ 2]> <chr [51]> ## 5 2017-12-01 19:00:00 <chr [51]> <data.frame [7 ×ばつ 2]> <data.frame [7 ×ばつ 2]> <chr [51]> ## 6 2017-12-01 19:15:00 <chr [51]> <data.frame [7 ×ばつ 2]> <data.frame [7 ×ばつ 2]> <chr [51]> ## 7 2017-12-01 19:30:00 <chr [51]> <data.frame [7 ×ばつ 2]> <data.frame [7 ×ばつ 2]> <chr [51]> ## 8 2017-12-01 19:45:00 <chr [51]> <data.frame [7 ×ばつ 2]> <data.frame [7 ×ばつ 2]> <chr [51]> ## 9 2017-12-01 20:00:00 <chr [51]> <data.frame [7 ×ばつ 2]> <data.frame [7 ×ばつ 2]> <chr [51]> ## 10 2017-12-01 20:15:00 <chr [51]> <data.frame [7 ×ばつ 2]> <data.frame [7 ×ばつ 2]> <chr [51]> ## # ... with 39 more rows">
## # A tibble: 49 x 5
## ts overall_trending_topics station_trending_topics station_top_topics overall_trending_phrases
## <dttm> <list> <list> <list> <list> 
## 1 2017年12月01日 18:00:00 <chr [51]> <data.frame [7 ×ばつ 2]> <data.frame [7 ×ばつ 2]> <chr [51]> 
## 2 2017年12月01日 18:15:00 <chr [51]> <data.frame [7 ×ばつ 2]> <data.frame [7 ×ばつ 2]> <chr [51]> 
## 3 2017年12月01日 18:30:00 <chr [51]> <data.frame [7 ×ばつ 2]> <data.frame [7 ×ばつ 2]> <chr [51]> 
## 4 2017年12月01日 18:45:00 <chr [51]> <data.frame [7 ×ばつ 2]> <data.frame [7 ×ばつ 2]> <chr [51]> 
## 5 2017年12月01日 19:00:00 <chr [51]> <data.frame [7 ×ばつ 2]> <data.frame [7 ×ばつ 2]> <chr [51]> 
## 6 2017年12月01日 19:15:00 <chr [51]> <data.frame [7 ×ばつ 2]> <data.frame [7 ×ばつ 2]> <chr [51]> 
## 7 2017年12月01日 19:30:00 <chr [51]> <data.frame [7 ×ばつ 2]> <data.frame [7 ×ばつ 2]> <chr [51]> 
## 8 2017年12月01日 19:45:00 <chr [51]> <data.frame [7 ×ばつ 2]> <data.frame [7 ×ばつ 2]> <chr [51]> 
## 9 2017年12月01日 20:00:00 <chr [51]> <data.frame [7 ×ばつ 2]> <data.frame [7 ×ばつ 2]> <chr [51]> 
## 10 2017年12月01日 20:15:00 <chr [51]> <data.frame [7 ×ばつ 2]> <data.frame [7 ×ばつ 2]> <chr [51]> 
## # ... with 39 more rows

About

Tools to Work with the Internet Archive and GDELT Television Explorer in R

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

Languages

AltStyle によって変換されたページ (->オリジナル) /