Discussion
OPEN ACCESS
ll
Opinion
AI-generated research paper fabrication and plagiarism in the scientific community Faisal R. Elali1,* and Leena N. Rachid2
1State University of New York Downstate Health Sciences University, College of Medicine, New York 11203, NY, USA 2Loyola University Medical Center, College of Medicine, Maywood, IL, USA *Correspondence: [email protected] https://doi.org/10.1016/j.patter.2023.100706
Fabricating research within the scientific community has consequences for one’s credibility and undermines honest authors. We demonstrate the feasibility of fabricating research using an AI-based language model chatbot. Human detection versus AI detection will be compared to determine accuracy in identifying fabri- cated works. The risks of utilizing AI-generated research works will be underscored and reasons for falsifying research will be highlighted.
Introduction The inappropriate fabrication of research
works has serious consequences for the
fabricator, the fabricated, and the scienti-
fic community that relies on the integrity of
these publications tomake informed deci-
sions about changes in sociology, eco-
nomics, politics, and medicine, amongst
others.1,2 To prevent the publication of
fabricated works, journal editors must be
diligent in detecting these works; howev-
er, the search strategies utilized for detec-
tion of fabrication differ from those used
for plagiarism. There are dozens of plagia-
rism checkers online, and many journals
have built-in technologies that detect
plagiarism almost immediately.3 Detect-
ing fabrication, on the other hand, is diffi-
cult, since the work is completely made
up and falsified, not plagiarized from other
authors. In relation to artificial intelligence
(AI), determining whether a piece of
writing was fabricated and plagiarized
from an AI-based technology presents a
challenge to researchers.
The exponential progress in AI technol-
ogies has increased productivity in multi-
ple fields and serves as a resource to
expedite tasks that are unnecessary or
can be removed.4–6 These technologies
can generate works of research that
evade detection by human judgment or
automated plagiarism/fabrication tech-
nologies.7,8 A new, robust language
model chatbot AI was recently released
at the end of November 2022: ChatGPT.9
Though there are other AI chatbots in cir-
culation, ChatGPT proved to be revolu-
tionary for many, characterized by its
gaining of 1 million new users in just under
This is an o
a week.10 This AI chatbot generates high-
quality texts that easily bypass plagia-
rism-checkpoints and can be used to
readily fabricate research works.7
In this paper, we determine how AI-
generated chatboxes may be utilized to
fabricate research in the medical commu-
nity, with examples. Furthermore, we
compare studies of human detection of
AI-based works to gauge the accuracy
of identification of fabricated, AI-gener-
ated works. Additionally, we test the ac-
curacy of free, online AI detectors. The
danger of fabricated research is then
highlighted, along with the reasons why
one would want to fabricate medical
research and potential remedies to this
looming threat. We foresee that these is-
sues will present themselves as AI tech-
nologies continue to expand in quantity
and quality, andwe hope to begin an initial
discussion on how to better develop and
implement safeguards against this threat
to the medical community.
Scientific writing fabrication using artificial intelligence chatboxes Criteria surrounding misconduct within
the scientific community are not defined
in concrete terms, although there are a
few severe cases that fall under this um-
brella term: (1) falsification or manipula-
tion of data, (2) fabrication of non-existent
results, and (3) plagiarism of another au-
thor’s work.11 Regarding AI-generated
works, falsification of data and fabrication
of non-existent results are committed.
The fabrication of research works using
an AI-generatedmodel involves the falsifi-
cation of non-existent data and results as
Patterns pen access article under the CC BY license (h
well as plagiarizing findings from previous
works to support fabricated primary out-
comes. AI-generated, falsified works us-
ing newly arising technologies may meet
the standards and expectations of au-
thors and journal editors. However, AI
technologies typically have unique writing
styles and verb usage that may be found
by some AI detectors.
The following scenario will be utilized as
examplesof howAI-generated technology
can easily fabricate data to falsify results,
typically usedmaliciously to publishworks
that evade detection by humans and AI
detectors. One must simply write a com-
mand to the AI technology, in this case,
ChatGPT, and one can easily find them-
selves with a well-written abstract that
looks unique and publishable to the naked
eye. Within one afternoon, one can find
themselves with dozens of abstracts that
can be submitted to various conferences
for publication. Upon acceptance of an
abstract for publication, one can use this
same technology towrite theirmanuscript,
completely built upon fabricated data and
falsified results. This first example refers to
a fabricated abstract on two drugs used
for rheumatoid arthritis (Table 1).
In this example, we see a well-written
abstract that may be accepted at an
orthopedic or rheumatological confer-
ence. What one may not know is, at the
time of inputting this prompt and copying
over its response, ChatGPT uses data up
until year 2019. This study’s prompt pur-
posely included the year 2020 to deter-
mine whether it would deny a response,
or revise it to state 2019, at the latest.
Neither occurred, meaning that the AI
4, March 10, 2023 ª 2023 The Author(s). 1 ttp://creativecommons.org/licenses/by/4.0/).
Table 1. Fabricated research by an intelligent AI chatbox model, ChatGPT
Prompt
‘‘Write me a research paper which includes an introduction, methods, results, and conclusion section comparing the
effectiveness and cost of Tofacitinib versus Methotrexate in treating Rheumatoid Arthritis using the MarketScan Database from
2012 to 2020 and give it a title.’’
Response Title
Comparing the Effectiveness and Cost of Tofacitinib and Methotrexate in Treating Rheumatoid Arthritis: A Retrospective Study
Using the Marketscan Database from 2012 to 2020
Introduction
Rheumatoid arthritis (RA) is a chronic inflammatory disorder that affects the joints and can cause severe pain and disability. It is
estimated that 1.3 million Americans are affected by RA, and its prevalence is increasing. Treatment of RA is challenging, and
there are several medications available to reduce the symptoms. Two of the most commonly used medications are tofacitinib
and methotrexate. Tofacitinib is a Janus kinase (JAK) inhibitor that has been approved for the treatment of RA since 2012, while
methotrexate is an immunosuppressant that has been used for RA treatment since the 1950s. The purpose of this study was to
compare the effectiveness of tofacitinib and methotrexate in treating RA using the Marketscan database from 2012 to 2020.
Methods
The Marketscan database was used to identify patients with RA who were treated with either tofacitinib or methotrexate from
2012 to 2020. Patients were included in the study if they had at least one prescription for either medication and had at least one
follow-up visit within 12 months of the initial prescription. The primary outcome measure was the proportion of patients who
achieved remission of RA symptoms at 12 months. Secondary outcome measures included the proportion of patients who
achieved a reduction in disease activity, the number of adverse events, and the cost of treatment.
Results
A total of 2,845 patients were included in the study. Of these, 1,719 (60.5%) were treated with tofacitinib and 1,126 (39.5%) were
treated with methotrexate. At 12 months, the mean change in disease activity score (DAS28) was significantly greater in the
tofacitinib group (�2.9) compared to the methotrexate group (�2.4) (p < 0.001). The mean change in Health Assessment
Questionnaire (HAQ) score was also significantly greater in the tofacitinib group (�1.2) compared to the methotrexate group
(�0.9) (p < 0.001). The proportion of patients who achieved remission of RA symptoms was significantly higher in the tofacitinib
group (75.1%) compared to the methotrexate group (62.7%) (p < 0.001). The proportion of patients who achieved a reduction in
disease activity was also significantly higher in the tofacitinib group (90.2%) compared to the methotrexate group (82.7%) (p <
0.001). The number of adverse events was similar between the two groups, but the cost of treatment was significantly higher for
tofacitinib compared to methotrexate (p < 0.001). The mean cost of tofacitinib was $2,945 per patient per year, while the mean
cost of methotrexate was $1,717 per patient per year.
Conclusion
This study found that tofacitinib is more effective than methotrexate in treating RA, with a higher proportion of patients achieving
remission of symptoms and a greater reduction in disease activity. However, the cost of treatment was significantly higher for
tofacitinib compared to methotrexate. Further research is needed to determine the long-term effectiveness and safety of
tofacitinib in treating RA.
The specific submodel utilized is named ‘‘Davinci,’’ its fastest language model. Data are from 2012 to 2019, as this model does not have information
beyond 2019, highlighting fabricated data from 2020.
ll OPEN ACCESS
had to have fabricated the data from2020.
In addition, the MarketScan database is
protected from the public view. To even
look at the data within this database, one
must contact their company, directly,
and request to purchase the database
based on the primary objectives of the
proposed research topic. This further sup-
ports theproposition that thework in Table
1 is fabricated. In addition to fabricating
data and results, one may easily ask the
AI to falsify data to support a claim they
are trying to support. For example, in the
conclusion section in Table 1, the AI was
asked to ‘‘re-word this conclusion to sup-
port methotrexate is more effective than
tofacitinib in treating RA,’’ which is the
opposite finding in this fabricated study.
2 Patterns 4, March 10, 2023
The AI output listed the same conclusion,
but added ‘‘Nevertheless, methotrexate
appears to be a more cost-effective op-
tion, and may be more effective than tofa-
citinib in treating RA over the long-term.’’
One can easily fabricate and falsify results
to support any claim one wants to support
in research. This is especially dangerous
when determining which treatments or
interventions are superior in the medical
community, potentially affecting out-
comes in patient care. To read additional
outputs from this AI model, please visit
https://doi.org/10.17632/ymyhmrdg5r.2.
Risks of AI-generated research Utilizing an AI for research is not an inher-
ently malicious endeavor. One can input
data into an AI and ask it to perform a sta-
tistical analysis, streamlining the process
that would have taken hours using other
technologies, such as Statistical Package
for the Social Sciences (SPSS). Asking
an AI to grammar-check work or write a
conclusion for legitimate results found in
a study are other uses an AI may incorpo-
rate into the research process to cut out
busywork that may slow down the scien-
tific research process. Copying code writ-
ten by an AI to perform statistical analyses
in a programming language could save re-
searchers hours, especially those who
may not have a coding background and
do not have a dedicated coder for project
production. In fact, the entirety of this
paper was put through an AI to detect
Table 2. The utilization of AI-writing detector websites for an originally written ChatGPT conclusion versus a reworded conclusion
using an online rewording tool
Conclusion Detector Name Score (Realness)
Original: This study found that tofacitinib is
more effective than methotrexate in treating
RA, with a higher proportion of patients
achieving remission of symptoms and a
greater reduction in disease activity.
However, the cost of treatment was
significantly higher for tofacitinib compared
tomethotrexate. Further research is needed
to determine the long-term effectiveness
and safety of tofacitinib in treating RA.
Writera 14% human-generated content
GPT-2 Output Detectorb 1.99% human-generated content
GPTZeroc Perplexityd = 15.8 (‘‘your text is most likely
to be AI generated’’)
Reworded*: Tofacitinib was found to be
more effective thanmethotrexate at treating
rheumatoid arthritis (RA), with a greater
reduction in disease activity and a higher
percentage of patients experiencing
symptom remission. Tofacitinib, on the
other hand, was significantly more
expensive to treat than methotrexate. To
determine tofacitinib’s long-term efficacy
and safety as an RA treatment require
additional research.
Writera 88% human-generated content
GPT-2 Output Detectorb 78.55% human-generated content
GPTZeroc perplexity = 150d (‘‘your text is likely human
generated’’)
*Reworded using https://paraphrasing-tool.com/. ahttps://writer.com/ai-content-detector/ bhttps://openai-openai-detector.hf.space/ chttps://etedward-gptzero-main-zqgfwb.streamlit.app/ dPerplexity refers to ‘‘realness’’ of an input; a higher score indicates likely human generated.
ll OPEN ACCESS
grammatical errors and potential replace-
ments to rectify said errors. The issue
arises when one utilizes data that are not
existent to fabricate results to write
research, which may easily bypass hu-
man detection and make its way into a
publication. These published works
pollute legitimate research andmay affect
the generalizability of legitimate works.
For example, if study A publishes a legiti-
mate study supporting the use of drug A
over drug B for treating atrial fibrillation,
another fabricated study, study B that
supports drug B over drug A for treating
atrial fibrillation would impact the general-
izability of study A and may potentially
impact subsequent meta-analyses and
systematic reviews of these studies
down the line.
In addition, detecting fraudulent rese-
arch works is especially difficult when
that said work is well generated and may
easily evade detection by editors and re-
viewers.Guet al. performeda studywhere
medical experts rated 800 AI-generated
images in terms of realness.8 They scored
these images as 1 (definitely fake), 2
(probably fake), 3 (probably real), and 4
(definitely real). Most of their responses
fell between 2 and 3, indicating that there
is a potential cognitive dissonance in
deciding whether an AI-generated work
is real or not.8 A recent study in a preprint
by Gao et al. found that only 68% of
ChatGPT-generated abstracts and 86%
of human-written abstracts were correctly
identified.7 This means that they incor-
rectly identified 32% of the AI-generated
abstracts as real and 14% of the human-
written abstracts as fake.
Combating AI-generated research by strengthening detection services The proliferation of AI-generated models
without adequate detection technologies
presents a contemporary challenge for
the scientific community. As previously
stated, humans are unable to accurately
detect AI-generated or human-generated
works 100%of the time. Technologymust
be established to combat technology. The
utilization of various online AI detectors
will display the effectiveness of these
checkers, in addition to the utilization of
‘‘reworder’’ and ‘‘paraphraser’’ tools to
attempt to evade detection. The conclu-
sion portion from Table 1 is utilized to
test these detectors (Table 2).
As one can see, there are adequate on-
line detectors that can provide a rough es-
timate and likelihood of whether a work is
AI generated or not. However, these are
not perfect models and can be easily by-
passed by using an online rewording tool
or by rewording it oneself. In addition,
false positives may occur—the preceding
paragraph was put through an AI detec-
tor12 andwas scored as 37.38%AI gener-
ated, even though it was completely writ-
ten by a human. Journal editors and
reviewers must have a heightened sense
of awareness for the potential influx of
plagiarized work, as these works may
easily evade detection both by the human
eye and online detector tools. Further-
more, journals should implement portions
in their submission process that require
proof of data collection; their method
of proof may vary depending on the
nature of the conducted study, which
may include deidentified patient data
and codes utilized for statistical analysis,
amongst others. Finally, appropriation of
funds toward producing a high-level AI
detector should be undertaken, in addi-
tion to the implementation of these tech-
nologies into the background checking
process journals utilize, similar to auto-
matic plagiarism detectors. There are
scarcely limited reports of AI detection
Patterns 4, March 10, 2023 3
ll OPEN ACCESS
of research within the literature, which is
troublesome as these works may have
bypassed journal integrity checkpoints,
making its way into a publication.
Why fabricate medical research? Medical research is frequently fabricated
for a variety of reasons, including the pur-
suit of fame, the pressurized nature of
medical research, and the hunt for fund-
ing from an industry to support a product.
Researchers are compelled to publish as
many papers as possible by these factors,
indicating that there are external goals
some researchers strive for and an
increasing number of hoops one must
jump through to succeed in other facets
of one’s career.13
In the context of medical education and
training, research has become increas-
ingly important for residency applica-
tions.14 The USMLE step 1 becoming
pass/fail in 2022 shifted the importance
of research to a higher level, as students
had fewer metrics to separate themselves
from others. Specialties like plastic sur-
gery, neurosurgery, and orthopedic sur-
gery require a high number of publications
for applicants, and the projected increase
in demand for research may increase this
average over time. Since step 1 went
pass/fail in 2022, we will not have con-
crete data on changes in metric impor-
tance until the class of 2024 graduates.15
This may motivate fabrication of publica-
tions to bypass this roadblock, especially
in institutions that are not research
oriented.
Conclusion In this present paper, we posit that AI-
generated research fabrication and falsifi-
cation of work poses serious challenges
to the scientific and medical community.
The feasibility of producing fabricated
work, coupled with the difficult-to-detect
nature of published works and the lack
of AI-detection technologies, creates an
opportunistic atmosphere for fraudulent
research. Risks of AI-generated research
include the utilization of said work to
alter and implement new healthcare pol-
icies, standards of care, and interven-
4 Patterns 4, March 10, 2023
tional therapeutics. Reasons for fabri-
cating research using an AI-based
technology include financial gain, poten-
tial fame, promotion in academia, and
curriculum vitae building, especially for
medical students who are in increasingly
competitive waters. Although AI-based
technologies may be used to streamline
mundane processes in the research field,
they may also be utilized to pollute the
field of scientific research and undermine
the legitimate works produced by other
authors.
ACKNOWLEDGMENTS
We would like to thank Dr. Frank C. Barone at SUNY Downstate for his assistance in keeping us up to date with this newfound conflict in the research community.
AUTHOR CONTRIBUTIONS
F.R.E. founded the study idea, ran data collection/ experiments, wrote the manuscript, revised the manuscript, and submitted the manuscript. L.N.R. assisted in data collection and revision of the manuscript.
DECLARATION OF INTERESTS
The authors declare no competing interests.
REFERENCES
1. National Academies of Sciences Engineering, Policy and Global Affairs; Committee on Science, Engineering, Medicine, and Public Policy; Committee on Responsible Science (2017). Incidence and consequences. Fostering Integrity in Research (National Academies Press). https://www.ncbi.nlm.nih. gov/books/NBK475945/.
2. Zimba, O., and Gasparyan, A.Y. (2021). Plagiarism detection and prevention: a primer for researchers. Reumatologia 59, 132–137. https://doi.org/10.5114/reum.2021.105974.
3. Masic, I., Begic, E., and Dobraca, A. (2017). Plagiarism detection by online Solutions. Stud. Health Technol. Inform. 238, 227–230.
4. Brynjolfsson, E., Rock, D., and Syverson, C. (2017). Artificial Intelligence and the Modern Productivity Paradox: A Clash of Expectations and Statistics (National Bureau of Economic Research), w24001. https://doi.org/10.3386/ w24001.
5. Damioli, G., Van Roy, V., and Vertesy, D. (2021). The impact of artificial intelligence on labor pro- ductivity. Eurasian Bus. Rev. 11, 1–25. https:// doi.org/10.1007/s40821-020-00172-8.
6. Yang,C.H. (2022).Howartificial intelligencetech- nology affects productivity and Employment: Firm-level Evidence from Taiwan. Res. Pol. 51,
104536. https://doi.org/10.1016/j.respol.2022. 104536.
7. Gao CA, Howard FM, Markov NS, Dyer EC, Ramesh S, Luo Y, Pearson AT Comparing sci- entific abstracts generated by ChatGPT to original abstracts using an artificial intelligence output detector, plagiarism detector, and blinded human reviewers. Preprint at bioRxiv. https://doi.org/10.1101/2022.12.23.521610
8. Gu, J., Wang, X., Li, C., Zhao, J., Fu, W., Liang, G., and Qiu, J. (2022). AI-enabled image fraud in scientific publications. Patterns 3, 100511. https://doi.org/10.1016/j.patter.2022.100511.
9. OpenAI (2022). ChatGPT: Optimizing language models for Dialogue. https://openai.com/blog/ chatgpt/.
10. Mollman, S. (2022). ChatGPT gained 1 million users in under a week. Here’s why the AI chatbot is primed to disrupt search as we know it. Yahoo! Finance. https://finance.yahoo. com/news/chatgpt-gained-1-million-followers- 224523258.html.
11. Office of Research Integrity. Definition of research misconduct. https://ori.hhs.gov/definition- research-misconduct.
12. GPT-2 output detector. https://openai-openai- detector.hf.space/.
13. Fanelli, D. (2009). How many Scientists fabri- cate and falsify research? A systematic review and meta-analysis of Survey data. PLoS One 4, e5738. https://doi.org/10.1371/journal.pone. 0005738.
14. Girard, A.O., Qiu, C., Lake, I.V., Chen, J., Lopez, C.D., and Yang, R. (2022). US medical student Perspectives on the impact of a pass/fail USMLE Step 1. J. Surg. Educ. 79, 397–408. https://doi.org/10.1016/j.jsurg.2021. 09.010.
15. NationalResidentMatchingProgram.Residency data & reports. https://www.nrmp.org/match- data-analytics/residency-data-reports/.
About the authors Faisal R. Elali is currently pursuing his Doctor of Medicine degree at SUNY Downstate Health Sci- ences University. He received a dual-bachelor’s degree in biological sciences and theological reli- gious studies from Fordham University in 2021. His past research experiences include working in medical genetics, microbiology, inorganic chemistry, asylum medicine, and orthopaedics. His research interests include the applicability of modern technology, especially artificial intelli- gence and robotics, in medicine and how it can be applied to better promote patient care and outcomes.
Leena N. Rachid is currently pursuing her Doctor of Medicine degree at Loyola Medicine. She received a combined degree in biological sciences and economics from Fordham University in 2020. Her past research experiences include working in inorganic chemistry, pulmonary medicine, and intensive care medicine. Her research interests include machine learning and its application in medicine to better promote patient care and outcomes.
- AI-generated research paper fabrication and plagiarism in the scientific community
- Introduction
- Scientific writing fabrication using artificial intelligence chatboxes
- Risks of AI-generated research
- Combating AI-generated research by strengthening detection services
- Why fabricate medical research?
- Conclusion
- Acknowledgments
- Author contributions
- Declaration of interests
- References