<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=1615255955451271&amp;ev=PageView&amp;noscript=1">

The Social Media Intelligence Blog

Insights on social media intelligence, marketing, and consumer insight

Using Big Data + Text Analytics to Create a Better News Feed

By Jordan Hanson  •  August 18, 2015

Image Source: Ryoji Ikeda

Using Big Data + Text Analytics to Create a Better News Feed

Posted by: Jordan Hanson on August 18, 2015

With millions of places online to get your news and the rise of the 24/7 content generation machine, keeping up-to-date about what topics are actually trending at any given moment is extremely difficult.

Sources like CNN.com and Foxnews.com are biased, fully self-interested and only report on what they deem newsworthy and likely to keep you on their sites. Reddit, on the other hand, is a community driven site that was previously useful for news discovery but has become more of a haven for cat memes and shower thoughts.

Fed up with sifting through so much information to find the key issues that are trending online, the developers at Infegy decided to apply the company’s text analytics platform to their massive database of online discussion to create a better way to discover trending news.

We call it News Ninja and it’s first-of-its-kind automated news generation enables you to know the topics that are trending at any given moment based on what people are actually talking about rather than just what news sites and popular blogs are reporting.


How News Ninja Works

News Ninja collects and analyzes big data on an impressive scale, churning through and analyzing more than 10 million posts from across the web every 5 minutes.

Beyond this simple explanation, how does this news site work? How does it generate the constantly-updated data you can see for free, whenever you’d like? I’d be happy to explain.


Step 1... Big Data.

First, News Ninja needs to understand what constitutes the idea of great ‘news.’

At Infegy, we maintain a massive database of online content it can scan to build a list of sources that produce news content.

The application identifies news in three ways: those sources who identify themselves as news sources or have been identified by Infegy as news organizations, those sources constantly proving influential in spreading topics of digital conversation—something we’d expect a reliable news source to do with some frequency, and those sources that produce well-written content.

In order to form these opinions, News Ninja uses Infegy Linguistics to analyze data, understand its tracks of influence, as well as assigning it a writing-level score we use to understand how well-written content should look.

Step 2... Text Analytics

Next, the robots deploy Infegy Linguistics in a second pass over the database, using posts only from this newly-established list of identified news sources.

Reading the text of each post document, News Ninja is able to understand the context of subjects discussed.

This second pass of our text analytics engine generates a list of documents you can see used to construct and support each News Ninja ‘trending topics’ article. Based on the topics extracted, each potential news topic is scored to understand the likelihood of any given potential trends.

This is done using a summation of post data, using the volume of posts about the topic within a given time frame based on the age of each article, from the time it was actually posted. These values establish a trend score for each news topic.

After all the scores are finished, the list of ranked post topics is generated for news production.

Step 3... the News

Finally, News Ninja robots deploy Infegy Linguistics again, but this time using the built-in understanding of natural language and post context to generate specific, informative headlines for each post you see displayed on the website.

There are rare cases when News Ninja finds a topic headlines that represent the story so accurately that it doesn’t change the headline at all. In most cases, however, the headlines you see are automatically written by our news-generation robots.

This process is always happening, all the time, to generate of the next set of trending news stories. If a topic makes the trending topic list again, there’s a very real chance that the headline can change on the next pass.

Most importantly, we wanted the way News Ninja operates to reflect that information is constantly being added to the online conversation as situations unfold. Frequently, you may see headlines that become more descriptive as stories evolve, or have subjects become clearer over time.

newspaper headlinesImage by: m01229 flikr

A Bold Step Forward in Trending News Topics

We think we’ve delivered a clever way to change how people discover trending news topics.

Many sites today focus on information coming from people paid to cover a given topic. We wanted to take the opposite approach, recognizing that news is just information posted somewhere online, and it has the potential to emerge from anywhere.

The internet has the power to equalize who is given a voice that contributes to informative conversations. Harnessing this power to generate news automatically, based on trending topics across volumes of digital dialogue will always remain our primary goal.

We want to bring trending news discovery even further into the future by putting a brand new application of text analytics in the palm of your hand. Based on user feedback, we are in the process of developing easy-to-use mobile apps and increasing the functionality you see embedded in the web versions of News Ninja many people use every day.


infegy linguistics

Comments

Receive Updates From Infegy