Discussion

ElaliandRachid2023articleonAI-generatedresearch.pdf

Home >Education homework help >Discussion

OPEN ACCESS

Opinion

AI-generated research paper fabrication and plagiarism in the scientific community Faisal R. Elali1,* and Leena N. Rachid2

1State University of New York Downstate Health Sciences University, College of Medicine, New York 11203, NY, USA 2Loyola University Medical Center, College of Medicine, Maywood, IL, USA *Correspondence: [email protected] https://doi.org/10.1016/j.patter.2023.100706

Fabricating research within the scientific community has consequences for one’s credibility and undermines honest authors. We demonstrate the feasibility of fabricating research using an AI-based language model chatbot. Human detection versus AI detection will be compared to determine accuracy in identifying fabri- cated works. The risks of utilizing AI-generated research works will be underscored and reasons for falsifying research will be highlighted.

Introduction The inappropriate fabrication of research

works has serious consequences for the

fabricator, the fabricated, and the scienti-

fic community that relies on the integrity of

these publications tomake informed deci-

sions about changes in sociology, eco-

nomics, politics, and medicine, amongst

others.1,2 To prevent the publication of

fabricated works, journal editors must be

diligent in detecting these works; howev-

er, the search strategies utilized for detec-

tion of fabrication differ from those used

for plagiarism. There are dozens of plagia-

rism checkers online, and many journals

have built-in technologies that detect

plagiarism almost immediately.3 Detect-

ing fabrication, on the other hand, is diffi-

cult, since the work is completely made

up and falsified, not plagiarized from other

authors. In relation to artificial intelligence

(AI), determining whether a piece of

writing was fabricated and plagiarized

from an AI-based technology presents a

challenge to researchers.

The exponential progress in AI technol-

ogies has increased productivity in multi-

ple fields and serves as a resource to

expedite tasks that are unnecessary or

can be removed.4–6 These technologies

can generate works of research that

evade detection by human judgment or

automated plagiarism/fabrication tech-

nologies.7,8 A new, robust language

model chatbot AI was recently released

at the end of November 2022: ChatGPT.9

Though there are other AI chatbots in cir-

culation, ChatGPT proved to be revolu-

tionary for many, characterized by its

gaining of 1 million new users in just under

This is an o

a week.10 This AI chatbot generates high-

quality texts that easily bypass plagia-

rism-checkpoints and can be used to

readily fabricate research works.7

In this paper, we determine how AI-

generated chatboxes may be utilized to

fabricate research in the medical commu-

nity, with examples. Furthermore, we

compare studies of human detection of

AI-based works to gauge the accuracy

of identification of fabricated, AI-gener-

ated works. Additionally, we test the ac-

curacy of free, online AI detectors. The

danger of fabricated research is then

highlighted, along with the reasons why

one would want to fabricate medical

research and potential remedies to this

looming threat. We foresee that these is-

sues will present themselves as AI tech-

nologies continue to expand in quantity

and quality, andwe hope to begin an initial

discussion on how to better develop and

implement safeguards against this threat

to the medical community.

Scientific writing fabrication using artificial intelligence chatboxes Criteria surrounding misconduct within

the scientific community are not defined

in concrete terms, although there are a

few severe cases that fall under this um-

brella term: (1) falsification or manipula-

tion of data, (2) fabrication of non-existent

results, and (3) plagiarism of another au-

thor’s work.11 Regarding AI-generated

works, falsification of data and fabrication

of non-existent results are committed.

The fabrication of research works using

an AI-generatedmodel involves the falsifi-

cation of non-existent data and results as

Patterns pen access article under the CC BY license (h

well as plagiarizing findings from previous

works to support fabricated primary out-

comes. AI-generated, falsified works us-

ing newly arising technologies may meet

the standards and expectations of au-

thors and journal editors. However, AI

technologies typically have unique writing

styles and verb usage that may be found

by some AI detectors.

The following scenario will be utilized as

examplesof howAI-generated technology

can easily fabricate data to falsify results,

typically usedmaliciously to publishworks

that evade detection by humans and AI

detectors. One must simply write a com-

mand to the AI technology, in this case,

ChatGPT, and one can easily find them-

selves with a well-written abstract that

looks unique and publishable to the naked

eye. Within one afternoon, one can find

themselves with dozens of abstracts that

can be submitted to various conferences

for publication. Upon acceptance of an

abstract for publication, one can use this

same technology towrite theirmanuscript,

completely built upon fabricated data and

falsified results. This first example refers to

a fabricated abstract on two drugs used

for rheumatoid arthritis (Table 1).

In this example, we see a well-written

abstract that may be accepted at an

orthopedic or rheumatological confer-

ence. What one may not know is, at the

time of inputting this prompt and copying

over its response, ChatGPT uses data up

until year 2019. This study’s prompt pur-

posely included the year 2020 to deter-

mine whether it would deny a response,

or revise it to state 2019, at the latest.

Neither occurred, meaning that the AI

4, March 10, 2023 ª 2023 The Author(s). 1 ttp://creativecommons.org/licenses/by/4.0/).

mailto:[email protected]

https://doi.org/10.1016/j.patter.2023.100706

http://crossmark.crossref.org/dialog/?doi=10.1016/j.patter.2023.100706&domain=pdf

http://creativecommons.org/licenses/by/4.0/

Table 1. Fabricated research by an intelligent AI chatbox model, ChatGPT

Prompt

‘‘Write me a research paper which includes an introduction, methods, results, and conclusion section comparing the

effectiveness and cost of Tofacitinib versus Methotrexate in treating Rheumatoid Arthritis using the MarketScan Database from

2012 to 2020 and give it a title.’’

Response Title

Comparing the Effectiveness and Cost of Tofacitinib and Methotrexate in Treating Rheumatoid Arthritis: A Retrospective Study

Using the Marketscan Database from 2012 to 2020

Introduction

Rheumatoid arthritis (RA) is a chronic inflammatory disorder that affects the joints and can cause severe pain and disability. It is

estimated that 1.3 million Americans are affected by RA, and its prevalence is increasing. Treatment of RA is challenging, and

there are several medications available to reduce the symptoms. Two of the most commonly used medications are tofacitinib

and methotrexate. Tofacitinib is a Janus kinase (JAK) inhibitor that has been approved for the treatment of RA since 2012, while

methotrexate is an immunosuppressant that has been used for RA treatment since the 1950s. The purpose of this study was to

compare the effectiveness of tofacitinib and methotrexate in treating RA using the Marketscan database from 2012 to 2020.

Methods

The Marketscan database was used to identify patients with RA who were treated with either tofacitinib or methotrexate from

2012 to 2020. Patients were included in the study if they had at least one prescription for either medication and had at least one

follow-up visit within 12 months of the initial prescription. The primary outcome measure was the proportion of patients who

achieved remission of RA symptoms at 12 months. Secondary outcome measures included the proportion of patients who

achieved a reduction in disease activity, the number of adverse events, and the cost of treatment.

Results

A total of 2,845 patients were included in the study. Of these, 1,719 (60.5%) were treated with tofacitinib and 1,126 (39.5%) were

treated with methotrexate. At 12 months, the mean change in disease activity score (DAS28) was significantly greater in the

tofacitinib group (�2.9) compared to the methotrexate group (�2.4) (p < 0.001). The mean change in Health Assessment

Questionnaire (HAQ) score was also significantly greater in the tofacitinib group (�1.2) compared to the methotrexate group

(�0.9) (p < 0.001). The proportion of patients who achieved remission of RA symptoms was significantly higher in the tofacitinib

group (75.1%) compared to the methotrexate group (62.7%) (p < 0.001). The proportion of patients who achieved a reduction in

disease activity was also significantly higher in the tofacitinib group (90.2%) compared to the methotrexate group (82.7%) (p <

0.001). The number of adverse events was similar between the two groups, but the cost of treatment was significantly higher for

tofacitinib compared to methotrexate (p < 0.001). The mean cost of tofacitinib was $2,945 per patient per year, while the mean

cost of methotrexate was $1,717 per patient per year.

Conclusion

This study found that tofacitinib is more effective than methotrexate in treating RA, with a higher proportion of patients achieving

remission of symptoms and a greater reduction in disease activity. However, the cost of treatment was significantly higher for

tofacitinib compared to methotrexate. Further research is needed to determine the long-term effectiveness and safety of

tofacitinib in treating RA.

The specific submodel utilized is named ‘‘Davinci,’’ its fastest language model. Data are from 2012 to 2019, as this model does not have information

beyond 2019, highlighting fabricated data from 2020.

ll OPEN ACCESS

had to have fabricated the data from2020.

In addition, the MarketScan database is

protected from the public view. To even

look at the data within this database, one

must contact their company, directly,

and request to purchase the database

based on the primary objectives of the

proposed research topic. This further sup-

ports theproposition that thework in Table

1 is fabricated. In addition to fabricating

data and results, one may easily ask the

AI to falsify data to support a claim they

are trying to support. For example, in the

conclusion section in Table 1, the AI was

asked to ‘‘re-word this conclusion to sup-

port methotrexate is more effective than

tofacitinib in treating RA,’’ which is the

opposite finding in this fabricated study.

2 Patterns 4, March 10, 2023

The AI output listed the same conclusion,

but added ‘‘Nevertheless, methotrexate

appears to be a more cost-effective op-

tion, and may be more effective than tofa-

citinib in treating RA over the long-term.’’

One can easily fabricate and falsify results

to support any claim one wants to support

in research. This is especially dangerous

when determining which treatments or

interventions are superior in the medical

community, potentially affecting out-

comes in patient care. To read additional

outputs from this AI model, please visit

https://doi.org/10.17632/ymyhmrdg5r.2.

Risks of AI-generated research Utilizing an AI for research is not an inher-

ently malicious endeavor. One can input

data into an AI and ask it to perform a sta-

tistical analysis, streamlining the process

that would have taken hours using other

technologies, such as Statistical Package

for the Social Sciences (SPSS). Asking

an AI to grammar-check work or write a

conclusion for legitimate results found in

a study are other uses an AI may incorpo-

rate into the research process to cut out

busywork that may slow down the scien-

tific research process. Copying code writ-

ten by an AI to perform statistical analyses

in a programming language could save re-

searchers hours, especially those who

may not have a coding background and

do not have a dedicated coder for project

production. In fact, the entirety of this

paper was put through an AI to detect

https://doi.org/10.17632/ymyhmrdg5r.2

Table 2. The utilization of AI-writing detector websites for an originally written ChatGPT conclusion versus a reworded conclusion

using an online rewording tool

Conclusion Detector Name Score (Realness)

Original: This study found that tofacitinib is

more effective than methotrexate in treating

RA, with a higher proportion of patients

achieving remission of symptoms and a

greater reduction in disease activity.

However, the cost of treatment was

significantly higher for tofacitinib compared

tomethotrexate. Further research is needed

to determine the long-term effectiveness

and safety of tofacitinib in treating RA.

Writera 14% human-generated content

GPT-2 Output Detectorb 1.99% human-generated content

GPTZeroc Perplexityd = 15.8 (‘‘your text is most likely

to be AI generated’’)

Reworded*: Tofacitinib was found to be

more effective thanmethotrexate at treating

rheumatoid arthritis (RA), with a greater

reduction in disease activity and a higher

percentage of patients experiencing

symptom remission. Tofacitinib, on the

other hand, was significantly more

expensive to treat than methotrexate. To

determine tofacitinib’s long-term efficacy

and safety as an RA treatment require

additional research.

Writera 88% human-generated content

GPT-2 Output Detectorb 78.55% human-generated content

GPTZeroc perplexity = 150d (‘‘your text is likely human

generated’’)

*Reworded using https://paraphrasing-tool.com/. ahttps://writer.com/ai-content-detector/ bhttps://openai-openai-detector.hf.space/ chttps://etedward-gptzero-main-zqgfwb.streamlit.app/ dPerplexity refers to ‘‘realness’’ of an input; a higher score indicates likely human generated.

ll OPEN ACCESS

grammatical errors and potential replace-

ments to rectify said errors. The issue

arises when one utilizes data that are not

existent to fabricate results to write

research, which may easily bypass hu-

man detection and make its way into a

publication. These published works

pollute legitimate research andmay affect

the generalizability of legitimate works.

For example, if study A publishes a legiti-

mate study supporting the use of drug A

over drug B for treating atrial fibrillation,

another fabricated study, study B that

supports drug B over drug A for treating

atrial fibrillation would impact the general-

izability of study A and may potentially

impact subsequent meta-analyses and

systematic reviews of these studies

down the line.

In addition, detecting fraudulent rese-

arch works is especially difficult when

that said work is well generated and may

easily evade detection by editors and re-

viewers.Guet al. performeda studywhere

medical experts rated 800 AI-generated

images in terms of realness.8 They scored

these images as 1 (definitely fake), 2

(probably fake), 3 (probably real), and 4

(definitely real). Most of their responses

fell between 2 and 3, indicating that there

is a potential cognitive dissonance in

deciding whether an AI-generated work

is real or not.8 A recent study in a preprint

by Gao et al. found that only 68% of

ChatGPT-generated abstracts and 86%

of human-written abstracts were correctly

identified.7 This means that they incor-

rectly identified 32% of the AI-generated

abstracts as real and 14% of the human-

written abstracts as fake.

Combating AI-generated research by strengthening detection services The proliferation of AI-generated models

without adequate detection technologies

presents a contemporary challenge for

the scientific community. As previously

stated, humans are unable to accurately

detect AI-generated or human-generated

works 100%of the time. Technologymust

be established to combat technology. The

utilization of various online AI detectors

will display the effectiveness of these

checkers, in addition to the utilization of

‘‘reworder’’ and ‘‘paraphraser’’ tools to

attempt to evade detection. The conclu-

sion portion from Table 1 is utilized to

test these detectors (Table 2).

As one can see, there are adequate on-

line detectors that can provide a rough es-

timate and likelihood of whether a work is

AI generated or not. However, these are

not perfect models and can be easily by-

passed by using an online rewording tool

or by rewording it oneself. In addition,

false positives may occur—the preceding

paragraph was put through an AI detec-

tor12 andwas scored as 37.38%AI gener-

ated, even though it was completely writ-

ten by a human. Journal editors and

reviewers must have a heightened sense

of awareness for the potential influx of

plagiarized work, as these works may

easily evade detection both by the human

eye and online detector tools. Further-

more, journals should implement portions

in their submission process that require

proof of data collection; their method

of proof may vary depending on the

nature of the conducted study, which

may include deidentified patient data

and codes utilized for statistical analysis,

amongst others. Finally, appropriation of

funds toward producing a high-level AI

detector should be undertaken, in addi-

tion to the implementation of these tech-

nologies into the background checking

process journals utilize, similar to auto-

matic plagiarism detectors. There are

scarcely limited reports of AI detection

Patterns 4, March 10, 2023 3

https://paraphrasing-tool.com/

https://writer.com/ai-content-detector/

https://openai-openai-detector.hf.space/

https://etedward-gptzero-main-zqgfwb.streamlit.app/

ll OPEN ACCESS

of research within the literature, which is

troublesome as these works may have

bypassed journal integrity checkpoints,

making its way into a publication.

Why fabricate medical research? Medical research is frequently fabricated

for a variety of reasons, including the pur-

suit of fame, the pressurized nature of

medical research, and the hunt for fund-

ing from an industry to support a product.

Researchers are compelled to publish as

many papers as possible by these factors,

indicating that there are external goals

some researchers strive for and an

increasing number of hoops one must

jump through to succeed in other facets

of one’s career.13

In the context of medical education and

training, research has become increas-

ingly important for residency applica-

tions.14 The USMLE step 1 becoming

pass/fail in 2022 shifted the importance

of research to a higher level, as students

had fewer metrics to separate themselves

from others. Specialties like plastic sur-

gery, neurosurgery, and orthopedic sur-

gery require a high number of publications

for applicants, and the projected increase

in demand for research may increase this

average over time. Since step 1 went

pass/fail in 2022, we will not have con-

crete data on changes in metric impor-

tance until the class of 2024 graduates.15

This may motivate fabrication of publica-

tions to bypass this roadblock, especially

in institutions that are not research

oriented.

Conclusion In this present paper, we posit that AI-

generated research fabrication and falsifi-

cation of work poses serious challenges

to the scientific and medical community.

The feasibility of producing fabricated

work, coupled with the difficult-to-detect

nature of published works and the lack

of AI-detection technologies, creates an

opportunistic atmosphere for fraudulent

research. Risks of AI-generated research

include the utilization of said work to

alter and implement new healthcare pol-

icies, standards of care, and interven-

4 Patterns 4, March 10, 2023

tional therapeutics. Reasons for fabri-

cating research using an AI-based

technology include financial gain, poten-

tial fame, promotion in academia, and

curriculum vitae building, especially for

medical students who are in increasingly

competitive waters. Although AI-based

technologies may be used to streamline

mundane processes in the research field,

they may also be utilized to pollute the

field of scientific research and undermine

the legitimate works produced by other

authors.

ACKNOWLEDGMENTS

We would like to thank Dr. Frank C. Barone at SUNY Downstate for his assistance in keeping us up to date with this newfound conflict in the research community.

AUTHOR CONTRIBUTIONS

F.R.E. founded the study idea, ran data collection/ experiments, wrote the manuscript, revised the manuscript, and submitted the manuscript. L.N.R. assisted in data collection and revision of the manuscript.

DECLARATION OF INTERESTS

The authors declare no competing interests.

REFERENCES

1. National Academies of Sciences Engineering, Policy and Global Affairs; Committee on Science, Engineering, Medicine, and Public Policy; Committee on Responsible Science (2017). Incidence and consequences. Fostering Integrity in Research (National Academies Press). https://www.ncbi.nlm.nih. gov/books/NBK475945/.

2. Zimba, O., and Gasparyan, A.Y. (2021). Plagiarism detection and prevention: a primer for researchers. Reumatologia 59, 132–137. https://doi.org/10.5114/reum.2021.105974.

3. Masic, I., Begic, E., and Dobraca, A. (2017). Plagiarism detection by online Solutions. Stud. Health Technol. Inform. 238, 227–230.

4. Brynjolfsson, E., Rock, D., and Syverson, C. (2017). Artificial Intelligence and the Modern Productivity Paradox: A Clash of Expectations and Statistics (National Bureau of Economic Research), w24001. https://doi.org/10.3386/ w24001.

5. Damioli, G., Van Roy, V., and Vertesy, D. (2021). The impact of artificial intelligence on labor pro- ductivity. Eurasian Bus. Rev. 11, 1–25. https:// doi.org/10.1007/s40821-020-00172-8.

6. Yang,C.H. (2022).Howartificial intelligencetech- nology affects productivity and Employment: Firm-level Evidence from Taiwan. Res. Pol. 51,

104536. https://doi.org/10.1016/j.respol.2022. 104536.

7. Gao CA, Howard FM, Markov NS, Dyer EC, Ramesh S, Luo Y, Pearson AT Comparing sci- entific abstracts generated by ChatGPT to original abstracts using an artificial intelligence output detector, plagiarism detector, and blinded human reviewers. Preprint at bioRxiv. https://doi.org/10.1101/2022.12.23.521610

8. Gu, J., Wang, X., Li, C., Zhao, J., Fu, W., Liang, G., and Qiu, J. (2022). AI-enabled image fraud in scientific publications. Patterns 3, 100511. https://doi.org/10.1016/j.patter.2022.100511.

9. OpenAI (2022). ChatGPT: Optimizing language models for Dialogue. https://openai.com/blog/ chatgpt/.

10. Mollman, S. (2022). ChatGPT gained 1 million users in under a week. Here’s why the AI chatbot is primed to disrupt search as we know it. Yahoo! Finance. https://finance.yahoo. com/news/chatgpt-gained-1-million-followers- 224523258.html.

11. Office of Research Integrity. Definition of research misconduct. https://ori.hhs.gov/definition- research-misconduct.

12. GPT-2 output detector. https://openai-openai- detector.hf.space/.

13. Fanelli, D. (2009). How many Scientists fabri- cate and falsify research? A systematic review and meta-analysis of Survey data. PLoS One 4, e5738. https://doi.org/10.1371/journal.pone. 0005738.

14. Girard, A.O., Qiu, C., Lake, I.V., Chen, J., Lopez, C.D., and Yang, R. (2022). US medical student Perspectives on the impact of a pass/fail USMLE Step 1. J. Surg. Educ. 79, 397–408. https://doi.org/10.1016/j.jsurg.2021. 09.010.

15. NationalResidentMatchingProgram.Residency data & reports. https://www.nrmp.org/match- data-analytics/residency-data-reports/.

About the authors Faisal R. Elali is currently pursuing his Doctor of Medicine degree at SUNY Downstate Health Sci- ences University. He received a dual-bachelor’s degree in biological sciences and theological reli- gious studies from Fordham University in 2021. His past research experiences include working in medical genetics, microbiology, inorganic chemistry, asylum medicine, and orthopaedics. His research interests include the applicability of modern technology, especially artificial intelli- gence and robotics, in medicine and how it can be applied to better promote patient care and outcomes.

Leena N. Rachid is currently pursuing her Doctor of Medicine degree at Loyola Medicine. She received a combined degree in biological sciences and economics from Fordham University in 2020. Her past research experiences include working in inorganic chemistry, pulmonary medicine, and intensive care medicine. Her research interests include machine learning and its application in medicine to better promote patient care and outcomes.

https://www.ncbi.nlm.nih.gov/books/NBK475945/

https://doi.org/10.5114/reum.2021.105974

http://refhub.elsevier.com/S2666-3899(23)00043-0/sref3

https://doi.org/10.3386/w24001

https://doi.org/10.1007/s40821-020-00172-8

https://doi.org/<?A3B2 tlsb=0.09pt?>10.1016/j.respol.2022.<?A3B2 tlsb?>104536

https://doi.org/10.1101/2022.12.23.521610

https://doi.org/10.1016/j.patter.2022.100511

https://openai.com/blog/chatgpt/

https://finance.yahoo.com/news/chatgpt-gained-1-million-followers-224523258.html

https://ori.hhs.gov/definition-research-misconduct

https://openai-openai-detector.hf.space/