On The Impact Of Publicly Available News And Information Transfer To Financial Markets
We quantify the propagation and absorption of large-scale publicly available news articles from the World Wide Web to financial markets. To extract publicly available information, we use the news archives from the Common Crawl, a non-profit organization that crawls a large part of the web. We develop a processing pipeline to identify news articles associated with the constituent companies in the S&P 500 index, an equity market index that measures the stock performance of US companies. Using machine learning techniques, we extract sentiment scores from the Common Crawl News data and employ tools from information theory to quantify the information transfer from public news articles to the US stock market. Furthermore, we analyse and quantify the economic significance of the news-based information with a simple sentiment-based portfolio trading strategy. Our findings provide support for that information in publicly available news on the World Wide Web has a statistically and economically significant impact on events in financial markets.
Studies of the impact of speculation and information arrival on the price dynamics of financial securities have a long history, going back to the early work of Bachelier  in 1900 and Mandelbrot  in 1963 (see, Jarrow & Protter  for a historical account of these and related developments). In 1970, Fama  formulated the efficient market hypothesis in financial economics, stating that security prices reflect all publicly available information. Shortly after in 1973, Clark  proposed the mixture of distribution hypothesis which asserts that the dynamics of price returns are governed by the information flow available to traders. Subsequently, novel models were introduced such as the sequential information arrival model , news jump dynamics , both in the 1980s, and truncated Levy processes from econophysics  in the 1990s, to name a few examples. With the rise of the World Wide Web and social media came an ever-increasing abundance of available data, allowing for more detailed studies of the impact of news on financial markets at different time-scales [9–17]. Big data, coupled together with advancements in machine learning (ML)  and complex systems research [19–22], enabled more efficient analysis of financial data , such as web data [24–39], social media [40–44], web search queries [45–47], online blogs [48,49] and other alternative data sources .
In this article, we use news articles from the Common Crawl News, a subset of the Common Crawl’s petabytes of publicly available World Wide Web archives, to measure the impact of the arrival of new information about the constituent stocks in the S&P 500 index at the time of publishing. To the best of our knowledge, our study is the first one to use the Common Crawl News data in this way. We develop a cloud-based processing pipeline that identifies news articles in the dataset that are related to the companies in the S&P 500. As the Common Crawl public data archives are getting bigger, they are opening doors for many real-world ‘data-hungry’ applications such as transformers models such as GPT  and BERT , a recent class of deep learning language models. We believe that public sources of news data are important not only in natural language processing (NLP) and financial markets, but also in complex systems and computational social sciences that are aiming to characterize (mis)information propagation and dynamics in techno-socio-economic systems. The abundance of high-frequency data in financial markets enables complex systems researchers to have microscopic observables, allowing for the testing and verification of hypotheses and theories previously not possible.
Using ML methods from NLP [53,54] we analyse and extract sentiment for each news article in the Common Crawl News repository, assigning a score in the range from zero to one that represent most negative (zero) through most positive (one) sentiment. To quantify the information propagation from the publicly available news articles on the World Wide Web to companies in the S&P 500 index, we use two different approaches. First, we employ tools from information theory of complex systems [20,22,55] to measure the impact of information transfer of the news sentiment scores on the returns of the constituent companies in the S&P 500 index at an intraday level. Second, we implement and simulate the daily portfolio returns resulting from a simple trading strategy based on extracted news sentiment scores for each company. We use the returns from this strategy as an econometric instrument and compare it with several benchmark strategies that do not incorporate news sentiments. Our findings provide support for that information in publicly available news on the World Wide Web has a statistically and economically significant impact on events in financial markets.
Read the Full Article: