This article presents an in-depth comparative analysis of two studies by V. Varenia (2023) and J. Morris (2023) on metrics that affect the assessment of the quality of helpful and news-worthy content. Methodologies, datasets, indicators, correlations, and conclusions are critically analysed.
As search engines rely more and more on machine learning, identifying scientifically based signals of high quality, useful content is vital for SEO. This analysis compares two relevant data-driven studies in this emerging field.
Methodology Investigating Content Quality Factors
V. Varenia analysed 42 low-ranking English-language sites using correlation analysis to identify quality factors after The September 2023 Google Helpful Content Update, evaluating metrics such as Majestic Trust Flow, spam probability, indexed pages, affiliate links, and the use of artificial intelligence content. J. Morris analysed more than 50 low-ranking sites from Google Publisher Center based on trust metrics, authority, language, transparency, and quality scores.
Both evaluated indexed pages with Majestic TF and Majestic CF. However, Varenia included additional metrics and conducted more extensive quantitative correlation testing. J. Morris relied more on qualitative analysis.
Content Quality Factors
Importantly, both studies concluded that a lack of transparency about sources, authors, owners, etc. can negatively affect the perception of quality.
V. Varenia found a positive correlation between a limited number of indexed pages and lack of transparency. Morris similarly highlighted the risks associated with insufficient transparency.
In terms of indexed pages, V. Varenia found a peculiar relationship between the number of indexed pages and the level of Transparency a website possessed. Remarkably, his findings revealed that websites with a meagre number of less than 660 pages were enveloped in a shroud of anonymity, devoid of any trace of their owners or creators. Coincidentally, during his own explorations, Morris also stumbled upon a connection between the limited visibility of offending sites and their scant index count of fewer than 1500 pages. These intriguing correlations suggest that number of indexed URLs may be a risk factor.
In particular, Varenia found a strong negative correlation (-0.61) between Majestic Trust Flow and spam probability. This is elegantly supplemented by Morris’s observations. “Sites with a spammy profile may have a lower trust flow (<6), a higher citation flow (>10), and a higher number of low-quality backlinks and domains linking to them,” said J. Morris.
Through meticulous analysis, Varenia unearthed a positive correlation, standing steadfast at 0.24, between low site traffic and the classification of sites as “new site“ (according to Google “About the topic” SERP feature). This finding pointed to a lurking menace: the higher susceptibility of new or less popular sites to be estimated as untrustworthy or involved in spam.

Both identified the risks of thin, low-quality content pages.
V. Varenia found a correlation between many affiliate links and biased content, spam probability and biased content, as well as between the use of AI content and the likelihood of spam. Overall, the study suggests that the use of affiliate links and AI in content creation can lead to biased content and spam probability. It highlights the importance of creating high-quality, unbiased content that provides value to users.
Morris did not evaluate content with artificial intelligence.
V. Varenia also emphasized the negative impact of excessive advertising and poor UX.
Discussion
The limitations of these studies are related to the small sample size and the use of correlation analysis, which only reveals the existence of relationships between parameters, but not causal relationships. Further research on larger datasets using machine learning methods will allow us to build more accurate predictive models.
The results contain examples with low correlation. Nevertheless, low-correlation indicators can have a cumulative or synergistic effect on SEO, which is not immediately noticeable when considering each factor separately. In our case, it’s better to find at least a weak correlation than no correlation at all.
List of references
Varenia V. (2023). Analysis of the factors of “useful content” in the Google algorithm.
Morris J. (2023). A study of the factors that change the status of “Live” to “Inactive” in the Google Publisher Center.