Machine Learning to Identify Google News Content

On December 10, 2019, Eric Silva, Google Product Manager informed publishers that a new Publisher Center had been launched.

Publishers no longer need to apply for inclusion in the News Index. Publishers that adhere to our content policies are eligible for consideration in our news surfaces. If you previously applied and were accepted, you don’t need to make any changes

Publisher Center FAQ

Since December 10, 2019, manual appurtenant to Google News Corpus doesn’t work. Google refused to handmade the process in favor of the new algorithm.

Machine Learning to Identify News Content

How does Google News identify news content for sites launched after December 2019?

Google uses machine learning to develop a “News classifier” based on AI.

The reimagined Google News uses a new set of AI techniques to take a constant flow of information as it hits the web, analyze it in real time and organize it into storylines.
Trystan Upstill, Distinguished Engineer, Google News

How does it work? It is top-secret.

How Does Google News Classify?

Probably… A model that analyses individual posts on Google News and other news surfaces and decides whether what a post contains is news or not. To form a machine learning model, human reviewers create examples of what the classifier is trying to identify (positive examples) and what it is not trying to identify (negative examples), so that the model learns to distinguish patterns between the two.

AI assesses whether a given piece of content is news by answering specific questions. These questions and guidelines might be developed in collaboration with journalism industry experts and include criteria such as:

  1. Is this piece of content reporting on timely events, current information, or ongoing investigations, or is it editorial or opinion reporting?
  2. Is this piece of content directly attributed to a cited author, journalist, or creator?
  3. Does this piece of content cite sources for facts that are asserted?
  4. Does this piece of content have editorial transparency (this includes information like clear dates and bylines, as well as information about authors, the news source, company or network behind it, and contact information.)?
  5. Does this piece of content have dates and/or timestamps?
  • If the AI answers “yes” to all the questions asked, then the piece of content under review is identified as news.
  • If the AI model does not get an affirmative answer to at least one question, your content is not news.

Articles are classified based on the quality, originality, and timeliness of their content and your previous activity.

You should read it as well: 
Elevating original reporting in search is a step back into the future.

How Google Determining URL Freshness

Why Does Google News Not List Your Articles?

A reminder of the old axiom. Google News only wants news, not any content that exists in nature.

If you publish mostly marketing or general information content, the AI model will most likely not classify your site as a knowledge domain called “News”.

So don’t be surprised if your articles don’t appear on Google News and other news surfaces.

Machine Learning and News Content

Original reporting and analysis should be prioritized

Original reporting is critical to informing people all around the world. Breaking a news story or doing an in-depth investigative analysis, discovering new facts and data, delivering crucial updates during times of crisis, or presenting eyewitness reports, interviews, and media filmed on-the-ground are all examples of original news.

Google gives priority to news content that it considers being original on a hot topic. Google accomplishes this by analyzing groups of articles on a certain subject and identifying the ones that are most frequently cited as the original source or as the first to report on the subject.

News articles that do not contain new original reporting or analysis will receive less ranking in Google News surfaces. The more extensive original reporting the better-ranked in Google News surfaces.

It’s interesting: Why You’ve Seen a Sudden Drop in Website Traffic

Google News wants original content, but not necessarily yours

This is not a secret. Since the launch of the new format, Google has been giving preference to news from reliable publishers.

The Google News algorithm trusts content from authoritative sites by default.

If you’d like to increase the probability of your content showing up in news results, you should improve your website so that it measures up to the Google News and Webmaster guidelines.

Finally, if you are interested in Google News, please read more details in “Answers to some common questions about appearing in Google News” written by Danny Sullivan.

While the Publisher Center can help you manage content that’s deemed eligible, eligibility itself is determined through the automated process. Given this, being approved to use the Publisher Center, submitting content through it or having a News source page does not mean content will appear in:
Search results at Google News.
Google News features like “For you” or Headlines.
News surfaces in Search such as Top stories or the News tab.

Rest assured, machine learning systems can find patterns, and identify news. Automatically detecting news articles that contain journalists’ own opinions, is our reality.

By John Morris

Researcher and blog writer. #ML, #AI

3 thoughts on “Google News Classifier: Machine Learning to Identify News Content”
    1. Ok that’s definitely interesting! This is the first time I see anyone saying that they have managed to appear in Google News.
      Do you happen to know any other examples?

Leave a Reply

Your email address will not be published. Required fields are marked *