Argument
https://doi.org/10.1177/2329488418819139
International Journal of Business Communication 2022, Vol. 59(1) 126 –147
© The Author(s) 2018 Article reuse guidelines:
sagepub.com/journals-permissions DOI: 10.1177/2329488418819139
journals.sagepub.com/home/job
Article
Artificial Intelligence in Business Communication: A Snapshot
Jefrey Naidoo1 and Ronald E. Dulek1
Abstract Despite artificial intelligence’s far-reaching influence in the financial reporting and other business domains, there is a surprising dearth of accessible descriptions about the assumptions underlying the software’s development along with an absence of empirical evidence assessing the viability and usefulness of this communication tool. With these observations in mind, the purposes of this study are to explain how automated text summarization applications work from an overarching, semitechnical, modestly theoretical perspective and, using ROUGE-1 (Recall-Oriented Understudy for Gisting Evaluation–1) evaluation metrics, assess how effective the summarization software is when summarizing complex business reports. The results of this study show that the extraction-based summarization system produced moderately satisfactory results in terms of extracting relevant instances of the text from the business reports. Much work still needs to be accomplished in the area of precision and recall in extraction-based systems before the software can match a human’s ability to capture the gist of a body of text.
Keywords ROUGE-1, automatic text summarization, artificial intelligence, company annual reports
The rapid advances made in machine learning over the past few decades have paved the way for a prolific rise in a new generation of sophisticated artificial intelligence (AI) systems that can perform tasks autonomously. AI is arguably the most important tech- nology innovation of our era (Brynjolfsson, Rock, & Syverson, 2017); its transforma- tive impact has been felt in almost every societal domain. Intelligence communities are leveraging AI across their portfolios to strengthen national security, reduce biological
1University of Alabama, Tuscaloosa, AL, USA
Corresponding Author: Jefrey Naidoo, University of Alabama, Stadium Drive, Tuscaloosa, AL 35487-0001, USA. Email: [email protected]
819139 JOBXXX10.1177/2329488418819139International Journal of Business CommunicationNaidoo and Dulek research-article2018
Naidoo and Dulek 127
warfare, and mitigate cyber threats (Allen & Chan, 2017); legal firms are employing AI to enhance legal informatics, predict litigation, and measure workflows in real time (Sobowale, 2016); health care entities are utilizing AI to perform clinical diagnostics on medical images at levels equal to those of experienced clinicians (HealthIT, 2017); the airline industry is engaging AI to reduce “human-steered” flight time to only 7 minutes of the total flight time (Narula, 2018); and, finally, social media platforms are deploying AI to generate a more personalized and interactive user experience.
AI’s pervasive impact has extended into the business environment as well. By pro- viding tools that automate redundant tasks, identify patterns within data, and uncover valuable insights, AI has helped corporations automate routine processes and improve overall process performance. These improvements have taken the form of enhanced compliance, security and risk management; increased gains in productivity and market share; and improved employee retention (Jha, 2018). A recent global survey of 1,600 business decision makers found that 76% of the respondents believed that AI is funda- mental to future business success, while 64% believed that their organization’s future growth is dependent on AI adoption. The survey also found that companies expect AI to contribute an average revenue increase of 39% by 2020 (Infosys, 2018).
Its value proposition seemingly endless, AI has entered the domain of business communication in a number of ways, with perhaps the most pronounced being auto- matic text summarization of corporate disclosures (Cardinaels, Hollander, & White, 2017). Large financial institutions, such as Citicorp and Bank of America; regulators, such as the Securities and Exchange Commission; and investors are key beneficiaries of this type of summarization (Barth, 2015). The first two entities see similar effi- ciency benefits from summarization software because disclosures have, over the years, become fairly protracted and include a substantial amount of redundancy (Dyer, Lang, & Stice-Lawrence, 2017). The third group, investors, including hedge fund investors, employs AI engines to analyze macroeconomic data, assess market fundamentals, and analyze corporate financial disclosures, each with the intention of making more accu- rate market predictions and executing more successful stock trades (Metz, 2016).
Yet despite AI’s far-reaching influence in the financial reporting and other business domains, there is a surprising dearth of accessible descriptions about the assumptions underlying the software’s development along with an absence of empirical evidence assessing the viability and usefulness of this communication tool. The lack of the for- mer means that we need a kind of pretheory about summarization software; the lack of the latter means that we have yet to determine how effective automatic text summari- zation software is as a business communication tool.
With the above observations in mind, the purposes of this study are threefold:
1. To explain how automated text summarization applications work from an overarching, semitechnical, modestly theoretical perspective
2. To study how effective the summarization software is when summarizing com- plex business reports
3. To explore variances between outputs produced by human authors and artifi- cial intelligence for the selected data genre
128 International Journal of Business Communication 59(1)
To measure the effectiveness of summarization software, we first created manual (human-authored) summaries of the Letter to Shareholders in 10 Fortune 500 company annual reports that were published in 2018. Next, we used an automated extraction-based text summarization application, Resoomer, to produce machine-generated summaries of the same documents. We then used ROUGE-1 (Recall-Oriented Understudy for Gisting Evaluation–1), a highly regarded and widely employed set of metrics for evaluating auto- matic summarization, to conduct our assessment of efficacy and determine variances between the outputs produced within the respective summary categories.
This study makes several contributions to the body of literature in business com- munication and to the business field at large. First, as the software continues to become more and more effective, the manner in which business summaries are written is going to change dramatically. It therefore seems wise for the field’s researchers and practi- tioners to familiarize themselves with how this family of application software works as well as to determine where we are in terms of the software’s efficacy.
Second, this is the first evaluative study of automatic text summarization conducted on this specific instrument of strategic business communication. We considered a broad range of data corpora to serve as potential datasets for this evaluation. We concluded that Letters to Shareholders worked well for this study because they provide an important business communication bridge between the voluntary and mandatory information dis- closures of public companies (Williams, 2008). Additionally, the subject matter of these letters reaches across a variety of business enterprises and disciplines. Hence, we decided that these letters provide an objective way to evaluate the effectiveness of summarization software as a business communication tool, with the letters functioning as independent variables on which to test the summarization software’s effectiveness. We are not evalu- ating the design, effectiveness, or even the strategic approach of the Letters themselves.
Finally, the study calls attention to evolutionary developments and practices in the busi- ness communication space. By condensing large business documents into short, informa- tive summaries, automatic text summarization is expediting information communication in business environments, thereby affecting what the organization knows. Additionally, it likely affects organizational decision making as well other downstream processes such as information searches and report generation (Paulus, Xiong, & Socher, 2017).
In the following sections of this article, we provide an extensive exposition of how the software works from an overarching, semitechnical perspective. We then describe the selection, extraction, and processing of the dataset and conclude with an analysis and discussion of our results and findings.
Overview of Automated Text Summarization and Evaluation
While appearing simple to do on the surface, the act of summarizing text is actually a highly complex task that involves summarization of source codes based on software reflection and lexical source model extraction (Murphy & Notkin, 1996). Proof of its complexity is found in the fact that developers have been working for decades to make this software viable and to advance its efficacy.
Naidoo and Dulek 129
Automated text summarization systems endeavor to produce a concise summary of the source or reference text while retaining its fundamental essence and overall mean- ing. The system’s goal is to generate a summary of the source or reference text that is equivalent to a summary generated by a human (Brownlee, 2017). A three-phase pro- cess generally characterizes these systems:
1. An analysis of the source text 2. The determination of its salient points 3. A synthesis of an appropriate output (Alonso, Castellon, Fuentes, Climent, &
Horacio Rodriquez, 2003)
Previous studies (e.g., Smith, Patmos, & Pitts, 2018) have found that much work still needs to be accomplished in the area of precision and recall in extraction-based systems before the software can match a human’s ability to capture the gist of a body of text.
Seminal work in automatic text summarization began in the 1950s, with the first sentence extraction algorithm being developed in 1958 (Steinberger & Jezek, 2009). The algorithm used term frequencies to measure the relevance of the sentence. Understandably, the methods developed during that era were fairly rudimentary (Hovy & Lin, 1998). Since then, a large number of techniques and approaches have been developed. Interestingly, the large volumes of information created on the web have triggered much of this development (Nenkova & McKeown, 2011; Shams, Hashem, Hossain, Akter, & Gope, 2010). Bhargava, Sharma, and Sharma (2016) posit that text summarization tools have now become a necessity to navigate the information on the web because they help eliminate dispensable or superfluous content. Torres-Moreno (2014) asserts that automatic text summarization reduces reading time, expedites research by making the selection process of documents easier, employs algorithms that are less biased than human summarizers, improves the effectiveness of indexing, and enables commercial abstraction services to increase the number of texts they are able to process. All in all, high praise for the software.
Extraction-Based Text Summarization
Automatic text summarization systems utilize different summarization techniques to condense source text. The vast majority of today’s summarization algorithms employ what is referred to as an extraction-based approach (Saggion & Poibeau, 2012). The flexibility and greater general applicability of the extraction-based approach make it the preferred approach for most business summaries (Liu & Liu, 2009).
Extraction-based techniques involve the analysis of text features at the sentence level, discourse level, or corpus level to locate salient text units that are extracted, with mini- mal or no modification, to formulate a summary of the text (Liu & Liu, 2009). Stated more simply, in extraction-based text summarization, relevant phrases and sentences are selected from the source document and rearranged into a new summary sequence (Paulus et al., 2017). The summary, then, is essentially a subset of the sentences in the original source or reference text (Allahyari et al., 2017).
130 International Journal of Business Communication 59(1)
Salient text units are identified by evaluating their linguistic and statistical rele- vance or by matching phrasal patterns (Hahn & Mani, 2000). Statistical relevance is based on the frequency of certain elements in the text, such as words or terms, while linguistic relevance is determined from a simplified argumentative structure of the text (Neto, Freitas, & Kaestner, 2002). These parameters serve as inputs to a combi- nation function with modifiable weights to derive a total score for each text unit. Text units with a concentration of high-score words are often likely contenders for extrac- tion (Liu & Liu, 2009). Extraction-based summarization, then, is essentially con- cerned with evaluating the salience or the indicative power of each sentence in a given document (Shams et al., 2010). Figure 1 maps out the process flow for extrac- tion-based systems.
Evaluation of Text Summarization Using ROUGE-1 Metrics
Intrinsic evaluations of text summarization outputs conventionally involved manual human assessments of the quality and utility of a given summary. Rubrics based on coherence, conciseness, grammaticality, readability, and content provided the guid- ance for these human assessments (Mani, 2001). Given the potential for bias, and the time-consuming nature of the process, this practice gradually evolved into automatic comparisons of the summaries with human-authored gold standards thus minimizing the need for human involvement (Nenkova, 2006).
Today, various summarization evaluation systems and methods employing sophisti- cated algorithms may be used to compare human-authored summaries with machine- generated summaries. One such method, ROUGE, the most widely used metric for automatic evaluation (Allahyari, 2017), was found to produce evaluation rankings that correlate reasonably with human rankings (Lin, 2004). It leverages numerous measures to automatically determine the quality of a computer-generated summary. The mea- sures include, but are not limited to, a count of variables such as word sequences and
Source Text
Term Frequency Counts
Pattern Matching Ops.
Presence of Specific Terms
Sentence Location
Statistical Metrics
Weight Selection
Extraction
Analysis Synthesis
Lexical Metrics
Figure 1. Process flow for extraction-based systems.
Naidoo and Dulek 131
word pairs between the computer-generated summary and the reference summary cre- ated by humans (Lytras, Aljohani, Damiani, & Chui, 2018).
In this study, we used ROUGE-1 evaluation metrics that measure the overlap of words (unigrams) between the machine-generated and reference summaries and pro- vide three measures of quality.
Recall. Also known as sensitivity, this is the measure of the fraction of relevant instances that have been retrieved over the total amount of relevant instances. Stated more simply, it is the computation of the number of overlapping words between the machine-generated summary and the reference summary (i.e., number of overlapping words/total number of words in reference summary).
Precision. Also called positive predictive value, this is the measure of the fraction of relevant instances among the retrieved instances. In other words, it is the computation of how much of the machine-generated summary is actually relevant or essential (i.e., number of overlapping words/total words in machine-generated summary).
F1-Score. This score is a weighted average of the precision and recall. A score of 1 suggests perfect precision and recall, while a score of 0 indicates the opposite. This measure is regularly employed in the field of information retrieval to provide a quan- tifiable assessment of performance (Beitzel, 2006).
Each of the above standards is arguably more precise than subjective human calcu- lations of coherence and conciseness.
Method
Data Corpus
Our data corpus was composed of Letters to the Shareholders of a subset of corpora- tions listed on the Fortune 100 list for 2017. Letters to the Shareholders are voluntary inclusions in the annual report, usually appearing as an introduction. Considered an important piece of information (Vozzo, 2016), these documents provide useful insight into the quality of leadership at the corporation and management’s commitment to creating meaningful long-term value for shareholders (Heyman, 2010).
Wielding much influence in investment transactions, Letters to Shareholders are integral to an investor’s due diligence process. They are read with much interest by professional investors, analysts, and other stakeholders (Heyman, 2010). Additionally, these letters often supplement the overall effort to frame the annual report’s informa- tion through narrative and graphical strategies (Laskin, 2018; Penrose, 2008). The ability to effectively and accurately summarize the most salient content from these letters may, therefore, offer significant value to its readership.
To ensure that we obtained a meaningful understanding of the effectiveness of auto- mated text summarization applications, we elected to focus our investigation on a small, purposive sample of 10 Fortune 100 corporations. To this end, we selected the top 10 corporations listed in the Fortune 100 list for 2017. Two of the top 10 corporations,
132 International Journal of Business Communication 59(1)
Apple and United Health, however, did not include a Letter to the Shareholders in their respective annual reports. We sought to replace them with letters from the corporations listed in 11th and 12th place, respectively. However, the annual report for the corpora- tion listed in 11th place, AmerisourceBergen, was unavailable at the time of the study. Ultimately, Letters to the Shareholders from Amazon, listed in 12th place, and General Electric, listed in 13th place, were included in the dataset in lieu of Letters to the Shareholders from Apple and United Health.
Corpus Extraction and Preparation
We located the Letters to the Shareholders on the respective corporate websites and reformatted the PDF files into text files. We conducted a manual inspection of each Letter to the Shareholders and removed all redundant graphics and images. In addition to harmonizing the data corpus, this exercise ensured that the datatype was exclusively text based.
Procedure
There are generally two ways to assess the quality of automatic text summarization output. The first method, referred to as extrinsic evaluation, assesses the usefulness of the output summary in a task-based setting. Here, the summary is used to support the completion of a specific task. Its usefulness is determined by measuring established metrics for task completion efficiency (Hirschberg, McKeown, Passonneau, Elson, & Nenkova, 2005). The second method, referred to as intrinsic evaluation, is conducted by “by soliciting human judgments on the goodness and utility of a given summary, or by a comparison of the summary with a human-authored gold standard” (Nenkova & McKeown, 2011, p. 199).
For this study, we employed the intrinsic evaluation method. We first compared the machine-generated text summary with a human-authored summary as prescribed in the literature to assess the “goodness” of the machine-generated summary. To this end, human-generated summaries and machine-generated summaries were produced for each Letter to the Shareholders at predetermined levels of reduction. Each machine- generated summary was then assessed by the summarization evaluation system, ROUGE-1, using the human-authored summary as the source or reference text. This method is consistent with standard practice in automated text summarization evalua- tion as noted by Nenkova and McKeown (2011).
Subsequently, in an effort to explore potential variances between the outputs pro- duced by human authors and artificial intelligence for the selected data genre, we conducted a second-phase evaluation in which we employed ROUGE-1 metrics to evaluate the “goodness” of
•• the human-authored summary for each company using the respective Letter to the Shareholders for that company as the reference summary and
•• the machine-generated summary for each company using the respective Letter to the Shareholders for that company as the reference summary
Naidoo and Dulek 133
In doing so, we aimed to assess the extent of the variance, if any, in the recall, preci- sion, and F-measures between the two summary classes (i.e., human-authored and machine-generated). We hypothesized that comparable scores between the two summary classes in each of those corresponding measures would likely indicate a similarity in the quality and utility of the summaries, while widely disparate scores would suggest the alternative. Ultimately, either outcome would provide a broader commentary on the effec- tiveness of the summarization software when summarizing complex business reports.
In summary, then, we employed ROUGE-1 metrics to evaluate
•• the machine-generated summary against the human-authored summary, •• the human-generated summary against the original Letter to the Shareholders, and •• the machine-generated summary against the original Letter to the Shareholders.
Following is a more detailed description of each of these processes.
Formulation of Datasets. To reiterate, two distinct categories of summaries were pro- duced for each Letter to the Shareholders (i.e., human-authored summaries and machine-generated summaries). To facilitate a more robust investigation, two sum- maries were produced within each category, each differentiated by the total word count. The first summary was capped at 10% of the word count of the original Letter to the Shareholders; the second summary was capped at 20%. Thus, the dataset for each company comprised the following data:
•• A human-authored summary capped at 10% of the word count of the original Letter to the Shareholders
•• A human-authored summary capped at 20% of the word count of the original Letter to the Shareholders
•• A machine-generated summary capped at 10% of the word count of the original Letter to the Shareholders
•• A machine-generated summary capped at 20% of the word count of the original Letter to the Shareholders
These summaries served as the input data for the study. A more detailed description of the process to create the data follows.
Human-authored summaries. A writer trained and experienced in writing business summaries generated summaries at both summarization levels (i.e., 10% and 20%) of all the documents in the data corpus. The word count was validated using MSWord’s word count feature. To mitigate bias, this writer was not involved in processing the Letter to the Shareholders through the automated text summarization application.
Machine-generated summaries. Simultaneously, the text files were processed indi- vidually through the online automated text summarization application. Resoomer was selected because of its current popularity as a text summarization tool and its demonstrated superiority over other online text summarization applications in terms
134 International Journal of Business Communication 59(1)
of functionality, ease of use, and accuracy (Hobler, 2017; Nyzam, Gatto, & Bossard, 2017). An advanced feature of this application is the ability to set the summarization to a desired level of word count reduction. Accordingly, the summarization level was set first to 10% and then to 20% of the word count of the original Letter to the Shareholders. The resulting machine-generated summaries were saved as MSWord documents.
Evaluation of Summaries. The human-authored and machine-generated summary for each corporation was then processed in the ROUGE-1 notepad interface, and the evaluation run was executed. The resulting scores were captured in an Excel spreadsheet and evaluated.
Figure 2 provides a visual representation of the process flow.
Data Corpus Letters to the
Shareholders (LTS) in Published Format
Step 1: Corpus Preparation
Removal of Images and Infographics
Harmonized Data Corpus
LTS in Text Format
Step 2: Formulation of Datasets
Data category 1: Human- authored Summaries
Produced by Researcher Word-counts of 10% and 20% of
original LTS, respectively
Data category 2: Machine- generated Summaries
Produced by Text Summarization application (Resoomer)
Word-counts of 10% and 20% of original LTS, respectively
Step 3: Evaluation of Datasets using
ROUGE-1 metrics
i) Evaluation of Machine-generated
summary with Human- authored summary as
reference
ii) Evaluation of Human- authored summary with
original LTS as reference
iv) Evaluation of Machine-generated
summary with original LTS as reference
Step 4: Evaluation of variances in human and machine-generated output using boxplots
Figure 2. Method flowchart.
Naidoo and Dulek 135
Results
Example of Outputs
Following are examples of the output summaries for one Letter to the Shareholders from the data corpus.
Corporation: ExxonMobil Original Word Count: 510
Human-Authored Summaries. Reduction: 10% of original word count (51 words)
Winning involves capturing value, maintaining a technological edge, and operating safely and responsibly.
ExxonMobil’s financial future looks promising. It invests in growth projects and integrates in ways competitors cannot. Innovation occurs through technical exploration and the development of environ- mentally friendly products with higher financial returns. ExxonMobil is an industry leader.
Reduction: 20% of original word count (102 words)
Winning involves capturing value, maintaining a technological edge, and operating safely and responsibly. ExxonMobil is an industry leader. Its financial future looks promising. We invest in high-value growth projects and integrate in ways competitors cannot. We are adding new low-cost supplies of LNG. We are ramping up unconventional production. We use proprietary technology to produce higher value products. Innovation occurs through technical exploration and the development of environmentally friendly prod- ucts with higher financial returns. Our technology investments build a foundation for the future—creat- ing long-term value for society. We lead in the discovery of scalable technologies.
ExxonMobil is an industry leader.
Machine-Generated Summaries. Reduction: 10% of original word count (46 words)
Winning in today’s energy business takes a cost to the whole commodity cycle. In our Downstream, we’re using our proprietary technology to produce higher value products. Innovative products pioneered in our Chemical business are enabling a growing global middle class to enjoy a higher quality of life.
Reduction: 20% of original word count (98 words)
Winning in today’s energy business takes a cost to the whole commodity cycle. A company is able to capture value across the supply chain. In our Downstream, we’re using our proprietary technology to produce higher value products. And in our Chemical business, we are investing in capacity and manu- facturing to meet the needs of growing economies around the world. ExxonMobil is investing for high- value growth. Innovative products pioneered in our Chemical business are enabling a growing global middle class to enjoy a higher quality of life. Our innovation is delivering value to our customers, our communities, and you, our shareholders.
ROUGE-1 Scores for Precision, Recall, and F-Measures
In Tables 1 to 6, we report ROUGE-1 scores when specific summaries are evaluated against a reference summary. As mentioned earlier, the reference summary is deemed
136 International Journal of Business Communication 59(1)
to be the ideal or standard document against which the ROUGE-1 algorithm evaluates other summaries for precision and recall.
Evaluation of Machine-Generated Summaries Against Human-Authored Summaries. To maintain consistency with the standard protocol defined in the literature for conducting evaluations of automated text summarization outputs, we designated the human- authored summaries as the reference summaries. We then evaluated the machine-gen- erated summary for each company against the reference summary for that company.
Table 1 shows ROUGE-1 scores (average recall, average precision, and average F1-score) for input documents summarized to 10% of the word count of the original Letter to the Shareholders. For illustrative purposes, the average recall score of 0.21 for Walmart in Table 1 implies that 21% of the words (unigrams) in the machine-gen- erated summary are also present in the human-authored summary for this company. The corresponding precision score of 0.20 implies that only 20% of the overlapping words in the machine-generated summary were actually relevant. The F-measure of 0.21, the weighted average of the recall and precision, essentially quantifies the per- formance efficiency of the automatic text summarization tool.
Table 2 shows ROUGE-1 scores for input documents summarized to 20% of the word count of the original Letter to the Shareholders.
Evaluation of Human-Authored Summaries Against the Original Letters to the Shareholders. In this instance, we designated the original Letters to the Shareholders as the standard/ ideal/ reference summaries. We evaluated the human-authored summary for each company against the reference summary (Letters to the Shareholders) for that company. Our goal in doing so was to assess the integrity of the human-authored summaries. Table 3 shows ROUGE-1 scores for human-authored summaries compiled at 10% of the word count of the original Letter to the Shareholders. In this case, the average recall score of 0.05 for
Table 1. Evaluation of Machine-Generated Summaries Using Human-Authored Summaries as Reference (10% Summarization Level).a
Corporation Average recall Average precision Average F1-score
Walmart 0.21 0.20 0.21 Exxon Mobil 0.13 0.11 0.12 Berkshire Hathaway 0.25 0.27 0.26 McKesson 0.22 0.23 0.23 CVS Health 0.27 0.22 0.24 Amazon.com 0.21 0.22 0.22 AT&T 0.23 0.26 0.25 General Motors 0.26 0.27 0.26 Ford 0.15 0.20 0.17 GE 0.27 0.19 0.22 Mean 0.22 0.22 0.22
aRounded to two decimal places.
Naidoo and Dulek 137
Walmart in Table 3 implies that there is a 5% overlap in words (unigrams) between the human-authored summary and the original Letter to the Shareholders for this company. The corresponding precision score of 0.49 implies that almost 50% of the overlapping words in the human-authored summary were actually relevant. The F-measure of 0.09 quantifies the performance efficiency of the automatic text summarization tool.
Table 4 shows ROUGE-1 scores for human-authored summaries condensed to 20% of the word count of the original Letter to the Shareholders.
Comparison of Machine-Generated Summaries With Original Letters to the Sharehold- ers. Here, we once again designated the original Letters to the Shareholders as the
Table 3. Evaluation of Human-Authored Summaries Using Original Letter to the Shareholders as Reference (10% Summarization Level).a
Corporation Average recall Average precision Average F1-score
Walmart 0.05 0.49 0.09 Exxon Mobil 0.03 0.31 0.06 Berkshire Hathaway 0.05 0.48 0.10 McKesson 0.06 0.49 0.10 CVS Health 0.05 0.48 0.09 Amazon.com 0.05 0.47 0.09 AT&T 0.05 0.48 0.10 General Motors 0.05 0.46 0.09 Ford 0.07 0.48 0.12 General Electric 0.05 0.48 0.10 Mean 0.05 0.46 0.09
aRounded to two decimal places.
Table 2. Evaluation of Machine-Generated Summaries Using Human-Authored Summaries as Reference (20% Summarization Level).a
Corporation Average recall Average precision Average F1-score
Walmart 0.26 0.26 0.26 Exxon Mobil 0.20 0.18 0.19 Berkshire Hathaway 0.26 0.30 0.28 McKesson 0.25 0.27 0.26 CVS Health 0.27 0.29 0.28 Amazon.com 0.29 0.26 0.27 AT&T 0.27 0.30 0.28 General Motors 0.28 0.29 0.28 Ford 0.22 0.31 0.25 GE 0.31 0.21 0.25 Mean 0.26 0.27 0.26
aRounded to two decimal places.
138 International Journal of Business Communication 59(1)
reference summaries. We evaluated the machine-generated summary for each com- pany against the reference summary (Letters to the Shareholders) for that company. Our goal in doing so was to assess the integrity of the machine-generated summa- ries. Tables 5 and 6 show ROUGE-1 scores for machine-generated summaries extracted to 10% and 20% of the word count of the original Letter to the Sharehold- ers, respectively.
Comparison of Human-Authored and Machine-Generated Summaries. We used compara- tive boxplots of ROUGE-1 F1-scores (see Tables 3-6) to determine if there were any observable differences between the human-authored summaries and machine-generated
Table 4. Evaluation of Human-Authored Summaries Using Original Letter to the Shareholders as Reference (20% Summarization Level).a
Corporation Average recall Average precision Average F1-score
Walmart 0.10 0.49 0.17 Exxon Mobil 0.07 0.38 0.12 Berkshire Hathaway 0.10 0.47 0.17 McKesson 0.11 0.48 0.17 CVS Health 0.11 0.48 0.18 Amazon.com 0.10 0.47 0.16 AT&T 0.11 0.48 0.17 General Motors 0.10 0.46 0.16 Ford 0.14 0.49 0.21 General Electric 0.10 0.47 0.17 Mean 0.10 0.47 0.17
aRounded to two decimal places.
Table 5. Evaluation of Machine-Generated Summaries Using Original Letter to the Shareholders as Reference (10% Summarization Level).a
Corporation Average recall Average precision Average F score
Walmart 0.05 0.50 0.10 Exxon Mobil 0.06 0.50 0.10 Berkshire Hathaway 0.05 0.50 0.09 McKesson 0.05 0.50 0.10 CVS Health 0.06 0.50 0.11 Amazon.com 0.05 0.50 0.09 AT&T 0.05 0.50 0.09 General Motors 0.05 0.50 0.09 Ford 0.05 0.50 0.10 General Electric 0.05 0.50 0.10 Mean 0.05 0.50 0.10
aRounded to two decimal places.
Naidoo and Dulek 139
Table 6. Evaluation of Machine-Generated Summaries Using Original Letter to the Shareholders as Reference (20% Summarization Level).a
Corporation Average recall Average precision Average F score
Walmart 0.10 0.50 0.17 Exxon Mobil 0.11 0.50 0.18 Berkshire Hathaway 0.09 0.50 0.15 McKesson 0.10 0.46 0.16 CVS Health 0.11 0.50 0.19 Amazon.com 0.11 0.50 0.18 AT&T 0.10 0.50 0.17 General Motors 0.11 0.50 0.18 Ford 0.10 0.50 0.16 General Electric 0.10 0.50 0.17 Mean 0.10 0.50 0.17
aRounded to two decimal places.
summaries. Instead of analyzing precision and recall individually, we focused our analy- sis on the F1 scores because they represent the weighted average of the two measures. Our results are shown in Figures 3 and 4.
Field Expert Evaluation of Machine-Generated Summaries. Finally, a reviewer of the arti- cle wisely suggested that we seek input from financial experts with regard to the effec- tiveness of the machine-generated summaries. We solicited open-ended feedback from a convenience sample of eight financial experts, each of whom held positions within nationally or internationally recognized financial firms. Seven of the eight invited par- ticipants provided feedback on the reports.
We drew on discourse analysis principles to evaluate the resulting feedback. Specifically, we employed the discourse-based interpretive content analysis method. This method proposes a holistic approach not restricted by coding rules, with the flex- ibility to take context more fully into account (Ahuvia, 2001). Although the responses were not uniform, themes emerging from the analysis were fairly homogeneous.
As a whole, the group commented that they would use the summaries to make a rapid determination as to whether to spend additional time and energy reviewing the Letter to Shareholders and the Annual Report. The reviewers noted that the summaries provided hints of insights into initiatives the companies are pursuing, challenges faced by the company, and overall perspectives with regard to the organization’s culture and values. As such, the summaries served a useful sorting function as to which, if any, reports the financial experts might examine in more depth. Each reviewer was adamant, however, that these documents provided financial information that is at best dated.
From a structural perspective, the financial experts evaluated the summaries for overall coherence. They viewed 80% of the sample set as cogent and coherent; the other 20% was viewed as disjointed and difficult to interpret.
140 International Journal of Business Communication 59(1)
Discussion
The intention of this study was first to examine summarization software from an over- arching, semitechnical, almost pretheoretical perspective. After that, we sought to evalu- ate the effectiveness of summarization software and look for important variances in the data. These latter two areas enabled us to begin to answer two key research questions:
Research Question 1: How effective is the summarization software when sum- marizing complex business reports? Research Question 2: Are there any important variances between outputs pro- duced by human authors and artificial intelligence for the selected data genre?
Summarization Software Effectiveness
Because ROUGE-1 is a recall-based measure based on content overlap, it endeavors to determine if the general concepts covered in an automatic summary and a refer- ence summary align (Allahyari et al., 2017). For summaries comprising 10% of total
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Human-authored Summaries Machine-generated Summaries
Figure 3. Boxplot of F1 scores for summaries capped at 10% of word count of Letters to the Shareholders.
Naidoo and Dulek 141
0
0.05
0.1
0.15
0.2
0.25
Human-authored Summaries Machine-generated Summaries
Figure 4. Boxplot of F1 scores for summaries capped at 20% of word count of Letters to the Shareholders.
word count, ROUGE-1 metrics determined that approximately 21.9% of co-occur- ring words within a given window in the human-authored reference summaries were also present in the machine-generated summary (see Table 1). For summaries com- prising 20% of total word count, ROUGE-1 metrics determined that approximately 26.1% of unigrams in the human-authored reference summaries were also present in the machine-generated summary (see Table 2). ROUGE-1 metrics also determined that machine-generated summaries have an approximately one-fifth overlap with the human-authored reference summaries comprising 10% of total word count and one- fourth overlap with the human-authored reference summaries comprising 20% of total word count. Combined with the overall F-measures, these results suggest that the automated text summarization tool is moderately sensitive in terms of extracting relevant instances of the text. Thus, while significant progress has been made in the field of natural language processing and computational linguistics in the past six decades, producing sophisticated advances in text summarization (Das & Martins, 2007; Liu & Liu, 2009), much still needs to be accomplished in the area of precision and recall when summarizing complex business reports.
142 International Journal of Business Communication 59(1)
Variances in Human-Authored and Machine-Generated Outputs
The boxplots in Figures 3 and 4 above highlighted perceptible differences between the two summary categories. In Figure 3, the boxplots show that the distribution for the human-authored summaries were slight left-skewed, in contrast to the distribu- tion for the machine-generated summaries that were right-skewed. In Figure 4, however, the boxplots show a more symmetric distribution for human-authored summaries and a left-skewed distribution for the machine-generated summaries. Next, while the medians were relatively equal in both instances, machine-generated summaries exhibited tighter spreads than human-authored summaries, indicative of greater variability in the latter. The observations from the boxplots, to some extent, corroborate Steinberger and Jezek’s (2009) contention that a big gap exists between the summaries produced by automatic text summarizations systems and summari- zations generated by humans.
It is interesting to note that both summaries of Berkshire Hathaway’s Letter to the Shareholders (i.e., at 10% and 20% word count level) earned the highest F-scores of the 10 corporations evaluated when evaluated against their corresponding human- authored summaries (see Tables 1 and 2). The lowest F-scores, on the other hand, were earned by the summaries of ExxonMobil’s Letter to the Shareholders. These results prompted a qualitative inspection of the Letters to the Shareholders of Berkshire Hathaway and ExxonMobil to determine if there were any distinguishing features, apart from the fact that they operated in different industry sectors.
Following are excerpts from the Letters to the Shareholders delivered by the Chairman and CEO of each of these corporations.
Warren Buffet, Berkshire Hathaway Why the purchasing frenzy? In part, it’s because the CEO job self-selects for “can-do” types. If Wall Street analysts or board members urge that brand of CEO to consider possible acquisitions, it’s a bit like telling your ripening teenager to be sure to have a normal sex life. (2018, p. 4)
The bet illuminated another important investment lesson: Though markets are generally rational, they occasionally do crazy things. Seizing the opportunities then offered does not require great intelligence, a degree in economics or a familiarity with Wall Street jargon such as alpha and beta. What investors then need instead is the ability to both disregard mob fears, or enthusiasm, and to focus on a few simple fundamentals. A willingness to look unimaginative for a sustained period—or even to look foolish—is also essential. (2018, p. 12)
Darren Woods, ExxonMobil ExxonMobil is in a prime position to generate strong returns and remain the industry leader, leveraging our strengths and outperforming our competition in growing shareholder value.
Naidoo and Dulek 143
We’re investing in advantaged projects to grow our world-class portfolio. Through exploration and strategic acquisitions, we’ve captured our highest-quality inventory since the Exxon and Mobil merger, including high-impact projects in Guyana and Brazil. Integration enables us to capture efficiencies, apply technologies, and create value that our competitors can’t. (2018, p. 3)
A qualitative analysis of these excerpts reveals a distinctive stylistic posturing in the narrative of each letter. The Chairmen and CEO of ExxonMobil employs a rigid, formal writing style, which follows a conventional mechanical formula that tradition- ally characterizes official letters from the C-suite. His letter is peppered with appro- priate business conventions and familiar industry jargon such as “leveraging our strengths,” “strategic acquisitions,” “high-impact projects,” and “integration enables us to capture efficiencies.”
Warren Buffett, on the other hand, renowned for the folksy, personal manner in which he writes the company’s annual letter, employs a less rigid, less formal style. On the surface, Buffet’s style seems devoid of any artifice. He infuses his letter with unique words, creative phrases that are not traditionally used to communicate informa- tion formally in the business domain. Hence, his use of analogies such as “It’s a bit like telling your ripening teenager to be sure to have a normal sex life” and statements such as “They occasionally do crazy things” or “A willingness to look unimaginative for a sustained period—or even to look foolish—is also essential.”
The scores for the more traditionally postured Letter to the Shareholders appear to suggest that the summarization tool had greater success with recall and precision when the text strayed away from linguistic patterns that are common and specific to the busi- ness world. The likely conjecture from this is that extraction-based automatic summa- rization systems function less optimally when domain-specific ontologies are employed.
Finally, evaluations of the machine-generated summaries by a pool of financial experts posit a favorable outlook for automated text summarization tools. The respon- dents overwhelmingly agreed that the machine-generated summaries provided a sliver of insight into the company’s operational performance and strategic initiatives. These insights were sufficient to trigger a go/no go decision in terms of further exploration of the original document.
Conclusion and Future Studies
The results of this study show that the extraction-based summarization system pro- duced moderately satisfactory results in terms of extracting relevant instances of the text from the business reports. Much work still needs to be accomplished in the area of precision and recall in extraction-based systems before the software can match a human’s ability to capture the gist of a body of text.
But beyond practical applications, automatic text summarization highlights a broader discourse. Automatic text summarization raises important issues connected to AI and cognitive science. Therefore, further study into how advanced text summariza- tion capability affects cognitive capacity and intelligence may augment our ability as
144 International Journal of Business Communication 59(1)
communication professionals to both disseminate and consume information more effi- ciently. Additional text corpora covering different data genres should be empirically evaluated to obtain more robust findings.
From a business communication perspective, we best agree that this form of com- munication technology is not going away. The effectiveness of the text summarization software may only be between 22% and 26%, but it is not going to get lower. Instead, the field should remain alert to future developments of this software and look for ways by which to incorporate it into future studies as well as class teachings.
Finally, and perhaps most important, our findings hint at a forthcoming synergy between what AI does and what business leaders proclaim to desire. At its heart, AI depends on consistency, pattern recognition, and logical development, even when deal- ing with summarization software. Christensen (2015)) and many other business experts, on the other hand, argue vociferously in favor of creativity and new ideas for business models. When presented with the creativity of a Warren Buffett—or, more directly, when presented with a letter written differently from other patterns—the AI summari- zation software proved to be very effective. In fact, when compared against a human gold standard, AI proved demonstrably better at extracting Berkshire-Hathaway’s cre- ative syntax than it did ExxonMobil’s business jargon-laded language. This synergy bodes well for AI’s role in business communication and business in general.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
References
Ahuvia, A. (2001). Traditional, interpretive, and reception based content analyses: Improving the ability of content analysis to address issues of pragmatic and theoretical concern. Social Indicators Research, 54, 139-172.
Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E. D., Gutierrez, J. B., & Kochut, K. (2017). Text summarization techniques: A brief survey. arXiv preprint arXiv:1707.02268. Retrieved from https://arxiv.org/pdf/1707.02268.pdf
Allen, G., & Chan, T. (2017). Artificial intelligence and national security. Cambridge, MA: Belfer Center for Science and International Affairs, Harvard Kennedy School.
Alonso, L., Castellon, I., Fuentes, M., Climent, S., & Horacio Rodriquez, L. P. (2003). Approaches to text summarization: Questions and answered. Inteligencia Artificial, 20, 34-52.
Barth, M. E. (2015). Financial accounting research, practice, and financial accountability. ABACUS: A Journal of Accounting, Finance and Business Studies, 51, 499-510.
Beitzel, S. M. (2006). On understanding and classifying web queries (Unpublished doctoral dissertation). Illinois Institute of Technology, Chicago.
Naidoo and Dulek 145
Bhargava, R., Sharma, Y., & Sharma, G. (2016). Atssi: Abstractive text summarization using sentiment infusion. Procedia Computer Science, 89, 404-411.
Brownlee, J. (2017, November 29). A gentle introduction to text summarization. Deep Learning for Natural Language Processing. Retrieved from https://machinelearningmastery.com /gentle-introduction-text-summarization/
Brynjolfsson, E., Rock, D., & Syverson, C. (2017). Artificial intelligence and the modern pro- ductivity paradox: A clash of expectations and statistics (Working paper No. w24001). Cambridge, MA: National Bureau of Economic Research.
Cardinaels, E., Hollander, S., & White, B. (2017, July). Automatic summarization of corpo- rate disclosures. Retrieved from https://www.nhh.no/globalassets/departments/accounting -auditing-and-law/seminar-papers/chw-manuscript-july-14-2017.pdf
Christensen, C. (2015). The Innovator’s Dilemma: When New Technologies Cause Great Firms to Fail. Cambridge, MA: Harvard Business Review Press.
Das, D., & Martins, A. F. (2007). A survey on automatic text summarization. Literature Survey for the Language and Statistics II course at CMU, 4, 192-195. Pittsburgh, PA: Language Technologies Institute.
Dyer, T., Lang, M., & Stice-Lawrence, L. (2017). The evolution of 10-K textual disclosure: Evidence from Latent Dirichlet Allocation. Journal of Accounting and Economics, 64, 221-245.
Hahn, U., & Mani, I. (2000). The challenges of automatic summarization. Computer, 33(11), 29-36.
HealthIT. (2017). Artificial intelligence for health and health care. Retrieved from https://www .healthit.gov/sites/default/files/jsr-17-task-002_aiforhealthandhealthcare12122017.pdf
Heyman, E. (2010). What you can learn from shareholder letters. Chicago, IL: American Association of Individual Investors. Retrieved from http://www.aaii.com/journal/article /what-you-can-learn-from-shareholder-letters.touch
Hirschberg, J. B., McKeown, K., Passonneau, R., Elson, D. K., & Nenkova, A. (2005). Do sum- maries help? A task-based evaluation of multi-document summarization. Retrieved from https://academiccommons.columbia.edu/doi/10.7916/D87370BC
Hobler, D. (2017). A functional text summarizer that adapts to the times. Retrieved from http: //techsophist.net/a-functional-text-summarizer-that-adapts-to-the-times/
Hovy, E., & Lin, C. Y. (1998, October). Automated text summarization and the SUMMARIST system. In Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998, USA (pp. 197-214). Stroudsburg, PA: Association for Computational Linguistics.
Infosys. (2018). Amplifying human potential: Towards purposeful artificial intelligence. Retrieved from https://www.infosys.com/aimaturity/
Jha, S. (2018). The impact of AI on business leadership and the modern workforce. Retrieved from https://www.techemergence.com/the-impact-of-ai-on-business-leadership-and-the-modern -workforce/
Laskin, A. V. (2018). The narrative strategies of winners and losers: Analyzing annual reports of publicly traded corporations. International Journal of Business Communication, 55, 338-356.
Lin, C. Y. (2004, June). Looking for a few good metrics: Automatic summarization eval- uation-how many samples are enough? In Proceedings of NTCIR Workshop 4, Tokyo, Japan (pp. 1-10). Retrieved from https://pdfs.semanticscholar.org/0996/e937a14f6fa- f34a3ce39fa537189e12b1ef7.pdf
Liu, F., & Liu, Y. (2009, August). From extractive to abstractive meeting summaries: Can it be done by sentence compression? In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers (pp. 261-264). Singapore: Association for Computational Linguistics.
146 International Journal of Business Communication 59(1)
Lytras, M. D., Aljohani, N., Damiani, E., & Chui, K. T. (2018). Innovations, developments, and applications of semantic web and information systems. Hershey, PA: IGI Global.
Mani, I. (2001). Automatic summarization (Vol. 3). Amsterdam, Netherlands: John Benjamins. Metz, C. (2016, January 25). The rise of the artificially intelligent hedge fund. Wired. Retrieved
from https://www.wired.com/2016/01/the-rise-of-the-artificially-intelligent-hedge-fund/ Murphy, G. C., & Notkin, D. (1996). Lightweight lexical source model extraction. ACM
Transactions on Software Engineering and Methodology, 5, 262-292. Narula, G. (2018). Everyday examples of artificial intelligence and machine learning. Retrieved
from https://www.techemergence.com/everyday-examples-of-ai/ Nenkova, A. (2006, September). Summarization evaluation for text and speech: Issues and
approaches. Paper presented at the Ninth International Conference on Spoken Language Processing, Pittsburgh, PA. Retrieved from http://www.cis.upenn.edu/~nenkova/papers /sumEval.pdf
Nenkova, A., & McKeown, K. (2011). Automatic summarization. Foundations and Trends® in Information Retrieval, 5, 103-233.
Neto, J. L., Freitas, A. A., & Kaestner, C. A. (2002, November). Automatic text summarization using a machine learning approach. In Brazilian Symposium on Artificial Intelligence (pp. 205-215). Berlin, Germany: Springer.
Nyzam, V., Gatto, N., & Brossard, A. (2017). Automatically summarize online: demonstration of a multi-document summary web service. In proceedings of the 24th Conference on the Automatic Processing of Natural Languages (TALN) (p. 30).
Paulus, R., Xiong, C., & Socher, R. (2017). A deep reinforced model for abstractive summari- zation. arXiv preprint arXiv:1705.04304. Retrieved from https://arxiv.org/abs/1705.04304
Penrose, J. M. (2008). Annual report graphic use: A review of the literature. Journal of Business Communication, 45, 158-180.
Saggion, H., & Poibeau, T. (2012). Automatic text summarization: Past, present and future. In T. Poibeau, H. Saggion, J. Piskorski, & R. Yangarber (Eds.), Multi-source, multilingual information extraction and summarization (pp. 3-21). Berlin, Germany: Springer-Verlag.
Shams, R., Hashem, M. M. A., Hossain, A., Akter, S. R., & Gope, M. (2010, May). Corpus- based web document summarization using statistical and linguistic approach. In Computer and Communication Engineering (ICCCE), 2010 International Conference on (pp. 1-6). Piscataway, NJ: IEEE. Retrieved from https://ieeexplore.ieee.org/search/searchresult.jsp? newsearch=true&queryText=Corpus-based%20web%20document%20summarization%20 using%20statistical%20and%20linguistic%20approach
Smith, S. A., Patmos, A., & Pitts, M. J. (2018). Communication and teleworking: A study of communication channel satisfaction, personality, and job satisfaction for teleworking employees. International Journal of Business Communication, 55, 44-68.
Sobowale, J. (2016, April). How artificial intelligence is transforming the legal profession. ABA Journal [online]. Retrieved from www.abajournal.com/magazine/article/how_artificial _intelligence_is_transforming_the_legal_profession/
Steinberger, J., & Jezek, K. (2009). Evaluation measures for text summarization. Computing and Informatics, 28, 1001-1026.
J. M. Torres-Moreno (Ed.). (2014). Automatic text summarization. Hoboken, NJ: Wiley. Vozzo, P. (2016, March). How to write your annual letter to shareholders. Baltimore, MD:
Westwicke Partners. Retrieved from https://westwickepartners.com/2016/03/how-to-write -your-annual-letter-to-shareholders/
Williams, C. (2008). Toward a taxonomy of corporate reporting strategies. Journal of Business Communication, 45, 232-264.
Naidoo and Dulek 147
Author Biographies
Jefrey Naidoo is an assistant Professor of Management and the Derrell Thomas Faculty Fellow at the University of Alabama. His research focuses on how visual analytics and artificial intel- ligence may be leveraged to help organizations navigate big data and meet business intelligence objectives.
Ronald E. Dulek is the John R. Miller Professor of Management at the University of Alabama. He is a longtime supporter of the Association for Business Communication and a past recipient ofthe Kitty O. Locker Award.
Copyright of International Journal of Business Communication is the property of Association for Business Communication and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.