Journal of Big Data (Nov 2022)

Transforming the generative pretrained transformer into augmented business text writer

  • Faisal Khalil,
  • Gordon Pipa

DOI
https://doi.org/10.1186/s40537-022-00663-7
Journal volume & issue
Vol. 9, no. 1
pp. 1 – 21

Abstract

Read online

Abstract This study uses transformers architecture of Artificial neural networks to generate artificial business text for a given topic or theme. The implication of the study is to augment the business report writing, and general business writings process with help of generative pretrained transformers (generative pretrained transformer (GPT)) networks. Main focus of study is to provide practical use case for GPTs models with help of big data. Our study model has 355 million model parameters and trained for three months on GPU enable devices using 2.3 billion text tokens(is available as open-source data now). Text tokens are collected with help of rigorous preprocessing, which includes; shortlisting of Subreddits of Fortune 500 companies and industries, listed on US-based social news aggregation online portal called “Reddit”. After shortlisting, millions of submission of users during the five years, are parsed to collect the URLs out of it. 1.8 million working URLs are scrutinized. Business text is parsed, cleaned, and converted into word embeddings out of uniform resoruce locator (URLs). The result shows that both models; conditional interactive and random sampling, generate text paragraphs that are grammatically accurate and stick to the given topic.

Keywords