Experimental evaluation exercise ( 3 models x 3 drugs x 2 context conditions)

profilespringbird
2026-04-25cap1..docx

Sheet 1.

This is the instruction on what is required to do for this evaluation exercise

A. Compare 3 small LLMs in LM Sudio:

1. Use 3 models – SmolLM M3 GGUF, Gemma 2B, Granite 3.1 2B Instruct

2. Compare 3 drugs - Acetaminophen, Alogliptin + Metformin HCL,

Amlodipine + Atorvastatin

3. Two conditions: 1. No medical context, 2. With drug factsheet context

For drug fact sheet: download from FDA web page

4. Ask 3 standardized questions for each drug for each condition

Q1. What is the maximum total daily dose of [drug, e.g. Acetaminophen] drug for adults in U.S.?

Q2. Describe any major drug-drug interactions, including specific interacting drugs and the resulting clinical risks for [Acetaminophen].

Q3 Identify the most serious adverse effects listed for [Acetaminophen] and describe early warning signs patients should watch for.

5. For Condition 1 (No Context) • 3 models x 3 questions = 9 response per drug, 3 drugs -> 27 responses.

6. For Condition 2 (With drug sheet Context) (Paste the drug sheet information) • 3 models x 3 questions = 9 response per drug, 3 drugs -> 27 responses.

7. Total data set size: 54 responses

8. Set up a table record the model responses: one chat = one model + one drug + one condition. (Suggest using Excel spreadsheet table)

9. Set up an analysis page and evaluate each response (use score 1 – 5) for

1. Accuracy (factual correctness vs source drug sheet)

2. Clarity (readability, easy to understand)

3. Completeness (coverage of key information)

4. Consistency (across models and conditions)

5. Hallucinations (Any false information present)

After the above analysis, answer the questions set out in the attached upload file.

B. Create a NABC Innovation Framework for a future AI drug -information product. (Need- Identify the user need and problem being solved, A- Approach: describe your method, models used and context workflow, B- Benefit- explain the advantages, improves and value created, C- competition: compare with alternative approaches or baselines.

C. Build a SWOT -based innovation plan for a future AI drug -information product

Reflect on safety, accuracy and responsible AI use in healthcare