Security exchange commission forms K-10 filings – Positive and negative word occurrence dataset 1995–2008

Piotr Staszkiewicz; Richard Staszkiewicz

Data in Brief (Jun 2022)

Security exchange commission forms K-10 filings – Positive and negative word occurrence dataset 1995–2008

Piotr Staszkiewicz,
Richard Staszkiewicz

Affiliations

Piotr Staszkiewicz: Collegium of Business Administration, Institute of Corporate Finance and Investment, SGH Warsaw School of Economics, Poland; Corresponding author.
Richard Staszkiewicz: The Faculty of Electronics and Information Technology, Warsaw University of Technology, Poland

Journal volume & issue: Vol. 42
p. 108110

Abstract

Read online

Corporate disclosure became more descriptive rather than quantitative over time. Thus, textual analysis gained popularity in finance and business, however, it requires massive computing power. The paper presents the panel set of the raw frequencies of positive and negative words across 90,463 Forms 10-K filed at Security Exchange Commission (SEC) in EDGAR (the Electronic Data Gathering, Analysis, and Retrieval system) over the period 1995–2008. The dataset consists of 456 variables. The texts of the forms were retrieved from the SEC servers and processed using text mining techniques. The data relevant for archive analysis on the sentiment of the financial statements and financial reporting on SEC registrants. Potential reuse for creation of the tone or sentiments indexes. Long-time data series allows for dynamic analysis. The data set allows reducing the computer power requirements for further research.

Published in Data in Brief

ISSN: 2352-3409 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Science (General)
Website: http://www.journals.elsevier.com/data-in-brief/

About the journal

Abstract

Keywords