State of the Science: Artificial Intelligence in Medical Education
By G.K. Schatzman
Since ChatGPT made its public debut in 2022, the
exploding generative artificial intelligence (GenAI)
industry has been driving headlines, markets, hopes
and fears across sectors. On the floor of Congress, early
testimonies about the potential dangers of a technology
whose growth outpaces its guardrails have been largely
supplanted by calls — even by the selfsame industry
leaders — to secure America’s dominance in the global
tech race. Organizations are grappling with how to safely
leverage new capabilities, and as usual, higher education
is thoroughly in the mix.
Current and future physicians at Drexel University College
of Medicine are managing the challenge through policy,
research initiatives, inspiring student projects and even a
proposal for an AI literacy curriculum. From matriculation
to graduation and beyond, large language models (LLMs)
are quietly shaping medical education in every corner.
Applications &
Admissions: The
Chatbot Tradeoff
Vanessa Pirrone, PhD, assistant dean of admissions; associate professor,
Department of
Microbiology
& Immunology
Beginning in fall of 2025,
incoming medical students have
encountered AI in their journey to
Drexel — and not just in getting
their applications together. In fact,
Assistant Dean of Admissions
Vanessa Pirrone, PhD, says
students who use AI in creating
their applications may unwittingly
put themselves at a disadvantage.
The Office of Admissions doesn’t
use an AI detector; application
materials come through the
centralized AMCAS platform,
and the unreliability of current detection technology is well
documented. Still, Pirrone says, students with computer-perfect
applications risk “losing their authentic voice.”
For Pirrone, when polished text can be produced to spec
in a click, imperfections become personal, even precious,
like the tell-tale craft marks of a handmade good.
“They’re not showing us who they are,” Pirrone says of
applicants who overuse AI. “Oftentimes, we see this really
polished but flat application, and for us, it just kind of
blends in. We get 16,000 applications every year. You
have to find ways to make yourself stand out, and the way
that you stand out is by being yourself.”
Human touch is important for Pirrone, who makes a point
of getting hands-on with applications and interviews to
understand the unique talents and experiences of each
incoming class.
“Every year when the students come in, I am so incredibly
proud of all of them, because I see where they started,”
she says. “Then I get to watch them through all the years
and see the amazing things that they do. They’re serving
the community. They’re living our mission every day and
pushing the envelope.”
This fall, MD program admissions began using a new
AI tool, AMP AI, integrated into its admissions management
platform through ZAP Solutions. The tool offers mission-integrated
insights like its competency analyzer, which ZAP
claims “condenses and summarizes free text fields” into a
customizable domain score. The tool provides admissions
staff with extra metrics on how a candidate aligns with
the school’s values. For now, only staff from the Office of
Admissions will be training on the technology so that they
can coordinate the shift.
Pirrone says the goal isn’t to spend less time on each
application, but rather to “make us more consistent and
ensure that we’re not losing the reason to say yes. When
you go through applications, it’s not about finding the
reason to say no. It’s about finding that reason to say,
‘Yes, this is a person that would enrich our university.’”
Human review and judgement still have priority, Pirrone
says: “In the end, this is a tool, it’s not a person.” The office
also remains committed to holistic review, understanding
candidates in the context of their stories.
Learning Patient
Care With Llama

Emily Spengler, MD, assistant director, Foundations of Patient Care I; assistant professor, Department of Pediatrics

Emily Feng, MD ’27

Michael Jayasuriya, MD ’27
Once they’ve arrived at Drexel,
medical students spend their first
two years focusing primarily on
their didactic coursework, building
the knowledge base necessary for
their clinical work. Emily Spengler,
MD, practices general outpatient
pediatrics at St. Christopher’s
Hospital for Children in North
Philadelphia and is the assistant
director of one of these early
courses, Foundations of Patient
Care I, where students learn the
fundamentals of taking a patient’s history and physical in a relationship-centered,
patient-focused manner.
Last year, two of her Foundations students,
Emily Feng, MD ’27, and Michael Jayasuriya,
MD ’27, approached her with a problem:
Practice sessions with standardized patients
(hired actors) felt too infrequent.
“The clinical skills portion of our curriculum
is less emphasized in the first two years,”
Jayasuriya explains. “We’ll get to talk to
one standardized patient every month.”
With Jayasuriya’s background in software
engineering and Feng’s in computational
biology, the two were able to devise a
solution. “Emily and I had an idea: Could
we make an AI chatbot for us that would
serve as a standardized patient? Then, you
could just log in whenever you want, chat
with the chatbot with your voice and it would
speak back to you.”
Spengler’s interest in clinical skills,
feedback systems and health literacy made
her a match as a mentor for the project. While
observed structural clinical examinations, or
OSCEs, are a longstanding way of providing feedback to medical
students on their patient interactions, Spengler agreed with her
students that there is room for more support. “I think one of the
problems with health literacy education is that, from medical school
on through residency and as doctors, we’re not given much feedback
on how clear we are when we talk to patients,” she says. “Patients
don’t really tell you when they don’t understand. And once you’re no
longer a med student, nobody’s really observing you and telling you,
‘Hey, I don’t think the patient understood that word.’”
“What I thought would be cool is if we
incorporated some of the objective,
real-time skills that AI is capable
of into giving real-time
feedback on these skills
to students,” Spengler
says. And the tool needn’t be limited to students; physicians could opt
in, too. “My hope is that by getting this feedback, clinicians are better
able to improve their ability to communicate with patients.”
With Spengler’s support and guidance on metric and feedback
domains, Feng and Jayasuriya set about designing the Patient
Interaction Analysis Tool, or PIAT Learn. For the user, whether student
or clinician, it’s simple.
“You talk to the chatbot, it chats back to you,” Feng explains. In this
case, “talk” is literal: It both receives and outputs audio for real-time
practice, as well as a transcript for later review. “We’ve built in a
feature to calculate certain metrics and give feedback immediately
after the encounter.”
PIAT Learn provides a recap of the spoken grade level, how often
you checked for understanding, and the number of questions asked,
pauses offered and conversational “turns” taken — quantifiable ways
of thinking about interaction dynamics. Perhaps more ambitiously, it
also aims to provide feedback on when you used jargon and even
when you showed empathy.
Speaking level was easy, Feng says; those kinds of metrics have
already been validated in health literacy literature. The back-end
work for coding empathy remains an ongoing challenge. But the
progress they made on the jargon identification may provide a
promising path.
“Jargon to me might be different from jargon to another person,”
Jayasuriya says, and the original algorithms that leveraged word-use
frequency packages didn’t always align with reality. “It would pick
out words that are infrequently used in the English language but that I think are understandable, and it would miss words that are supposedly
frequently used, but that might actually be unfamiliar to non-experts.”
Now, however, they’re approaching the problem through prompt
fine-tuning: giving the bot, a version of Meta’s open-source Llama
model, a role or persona and having it calibrate its jargon judgments
accordingly. Then it flags instances of jargon and suggests substitutions
for next time.
Feng and Jayasuriya are continuing to improve PIAT Learn, but
they already had a chance to test it with some of Spengler’s residents
at St. Christopher’s. At this stage, the experiment was for quality
improvement rather than a proper research study, but it has already
yielded helpful feedback on the user interface and instruction.
Perhaps more importantly, it raised big-picture questions about
metrics, surveillance and GenAI technology in the workplace.
A quarter of the residents involved in the user test expressed
concerns about the new metrics, Jayasuriya says. Having AI listen
in on their patient conversations could make them feel scrutinized.
“The biggest concern that they noted was that they thought this was
a tool to test them, when our goal was mainly just to provide an
educational tool for them,” he says.
Like all of us grappling with increased AI integration, Spengler,
Feng and Jayasuriya are working to discover the guardrails that
support advances in medical practice while maintaining or even
furthering its essential humanity.
“I don’t want an AI system to be grading me and docking my
pay,” Jayasuriya says. “But I am still motivated by the idea that I
want more ways to improve my clinical skills right now.” The two
designers have committed to making the software “copy-left,” an
open-source model that stipulates future branches of their work
from other programmers also continue to be open-source.
Spengler imagines using a tool like PIAT Learn to create opportunities
for side-by-side reflection between trainees and mentors, like
watching film with a coach after a game. The potential draw to “perform to a metric rather than keep in mind the humanity
of that doctor-patient interaction” is a concern for her, too.
“I want my students to be thinking about, ‘Does my patient
understand me? Am I communicating clearly?’ But the very
first thing I always want them to be thinking about is
preserving that doctor-patient relationship and keeping
that connection with the patient.”
Creating metrics for the previously unquantifiable also
presents an opportunity for soft skills — in Spengler’s
opinion, the biggest blind spot of many beginning
physicians — to finally get the attention they need.
“I think if this tool is shown to be valid — and we’re not
there at all — this can just be one of the many other ways
that we’re assessing medical students, just the way that
multiple choice tests are,” Spengler says. “When we have
valid evaluations of these softer skills, I think students will
take them a little more seriously. And how do we know
we’re effectively teaching something if we’re not assessing it?
The more validity we can create in these assessments, the better.”
Sifting Surveys
With ChatGPT
Carolyn Giordano, PhD, associate dean of assessment and evaluation; professor, Department of Family, Community & Preventive Medicine
Multiple-choice questions are popular on
tests and surveys for a reason: They take
guesswork out of grading and provide
instant, clear-cut datasets. Free-response
questions allow for a wider variety of
expression, but ensuring reliable analysis
across hundreds or thousands of responses
is a process unto itself. While there are
established research methodologies for
categorizing and tagging content, their
costliness in both resources and time limit
their application. Researchers like Carolyn
Giordano, PhD, associate dean of assessment
and evaluation, hope that might be changing.
In addition to overseeing the exams that
students take throughout their time in medical school, and all the
course evaluations, surveys and peer evaluations, Giordano works
with students interested in researching medical education, which
focuses on everything from how medical schools are educating
students to how medical systems are educating the public. Over the
years, she says, her own interests have made her a sort of “go-to”
person for all kinds of social science research.
“I’m kind of a fiddler,” Giordano says. “I naturally wonder about
how you can analyze things more efficiently, and thought maybe we
can look at AI.”
Over the course of medical school, Drexel students provide a
huge amount of feedback to the school itself, in both fixed- and
free-response formats.
“We have over 300 students. We have 60 evaluations a year, or
more. We get a ton of survey information from student feedback and
course feedback, and a human reads that. We read every single
word that students tell us,” Giordano says. “Well, we started using
Microsoft Copilot to read responses, to ask questions about where
different themes showed up.” Using AI, the survey reviewers can
draw insights about specific faculty or classes, or what students liked
about a certain textbook or set of learning materials.
The tool itself isn’t trained in the methodology and nomenclature
Giordano and her team use, she says, and is no replacement for
human insight on surveys.
“We always read the results. But
between the end of the semester and the
first day of classes, there’s not a lot of time to make
changes. AI really helps with speed.”
Now, a survey specialist reads the responses, leveraging AI to
decrease turnaround time. In turn, Giordano receives the reports
sooner, leaving more time for a secondary read before disseminating
the results, which then leaves more time to implement changes.
Simran Shamith, MD ’26
But what about surveys outside the
purview of student experience, in medical
research? Simran Shamith, MD ’26, has
worked with Giordano to leverage AI to
create a survey validation tool that works in
tandem with focus groups. Two years ago,
they used the then-current free version of
ChatGPT to validate a survey, examining it
to see what made sense, what didn’t, and
how people were processing the questions.
And while they found the tool couldn’t
replace focus groups, only complement
them, its speed was on a different order of magnitude.
“We could do it in about six seconds versus one hour of hosting
the focus group, hearing different feedback and synthesizing the
findings,” Giordano explains.
Shamith thinks the tool can accelerate the survey creation process,
allowing for rapid iteration before incurring the time and expense of
a focus group for final review. Once again, the large language
model excelled at fine-tuning survey language. “It wasn’t just giving me another word for bias,” Shamith explained in one example. “It
was giving me ways to describe to students what I’m trying to get out
of them when I’m saying ‘bias.’”
Policies, Dilemmas and Taboos
Discerning what generative AI can do and what it can’t, how it should
be used and how it shouldn’t, is essential as Drexel contemplates
adoption strategies. In fact, it’s a core element of the AI Fluency
Framework1 released last year by Anthropic, an industry-leading
public benefit corporation noted for its commitment to more-responsible
AI development. With a technology this disruptive and rapidly
developing, policy struggles to keep up.
As of writing, Drexel’s Academic Integrity Policy page2 mentions
use of generative AI in the final bullet points of its sections on
cheating and plagiarism, instructing students to follow instructors’
guidance on acceptable use. This follows a November 2023 policy3
from the Office of the Provost, which was up for review in fall 2025,
that grants instructors “broad discretion to define the suitable use
of Artificial Intelligence Tools in the classroom,” along with “the
responsibility … to include in the course syllabus a clearly written
description of the permitted use of AI tools.” In turn, the policy
outlines students’ responsibilities to adhere to instructors’ policies, cite
AI usage appropriately, and bear ultimate responsibility for the work
they submit. AI detection tools are discouraged due to their documented
unreliability. A separate Information Security page4 directs
faculty and staff to use only approved GenAI tools in order to ensure
the privacy of sensitive data, a list of which can be found on the
Provost’s AI at Drexel home page5 alongside a new Digital Commons
space design for faculty and staff to share insights and practices.
For many instructors, though perhaps fewer today than two years
ago, the “ch” in ChatGPT stands for “cheating.” Plagiarism is a
leading concern. As Giordano argues, though, it isn’t one the College
is unequipped to handle.
“The toothpaste is out of the tube, and we’re not putting that back.
The tool exists. But rules against plagiarism exist too, so we can talk
about that in an open way. You were never able to plagiarize
material and pretend it was your own.”
Misinformation is another concern raised by students and faculty
alike. Using generative AI is quicker — much quicker — than consulting
traditional reference materials, even if we take “traditional” to mean
PubMed instead of a library book.
“Students are humans, and humans do love shortcuts, don’t we?”
Giordano says. “One of my worries is that students use AI in place
of other reliable materials. I think that they have to use this as one
paintbrush in the whole artistic arsenal.”
Unlike databases that require a degree of baseline knowledge to
search and synthesize results, large language models are approachable
from ignorance. A question asked in plain language will provide a
response that will make sense to the user whether or not it is, in fact,
accurate. What educators call “productive struggle” — incrementally
building durable knowledge foundations by grappling with challenges
just beyond your current understanding — can be diminished simply
by virtue of the technology’s ease.
For Giordano, a training opportunity can be found in the imperfections
of large language models. “When you put a prompt into ChatGPT, it
often gives you too much information, not enough information or
inaccurate information,” she says. “I think that students are learning
that this is also what humans do. When they go to take a patient’s
history, the patient is going to talk too much, talk too little, forget
information or just make things up. In that way, I think it’s actually
training them pretty well to be a future physician.”
Shamith also identified misinformation, for both patients and
physicians, as a concern. Still, as a student, she benefits from AI as
a study tool. “Other students and I have started asking ChatGPT, ‘I’m
going into this case. What are some of the things that I might see?
What are the steps that we’re going to do? What are things that I
could get quizzed on?’ And honestly, that has been life changing.”
As AI becomes increasingly integrated into our favorite software,
we’re all likely using it more often than we realize, from text-editing
to suggested email replies. If GenAI is indeed becoming ubiquitous,
what are the risks of not understanding it — and who is leading the
charge in AI literacy?
Currently at the College, students are pushing the envelope with
ideas for new AI initiatives and integrations. Faculty instructors and
mentors have guided those initiatives, helping locate them within the
larger system of medical education.
“Dr. Giordano and Dr. Spangler are amazing in the sense that
they’re able and willing to adapt. The fact that they’re willing to learn
from their students is amazing. I don’t think teachers have to know
everything,” Shamith says. “It’s an evolving landscape for everyone.”
Even so, it’s paramount that students be prepared for the current
state of science, and self-discovery may not always be sufficient.
“I’m not an educator yet, but I think it’s our responsibility, or
teachers’ responsibilities, to teach students how to correctly use AI,
just like our teachers taught us how to correctly use sources and how
to research and use PubMed,” Shamith says. “I think it’s just the next
version of all this. Before us, our parents were taught how to use
books. Then we were taught how to use the internet. Now, we’re
going to be taught how to use AI.”
A Charge to Be Led
Spencer Moavenzadeh, MD/PhD student
Artificial intelligence curriculum is a current
passion of Spencer Moavenzadeh, a
second-year MD/PhD student with a
background in computer science and
biomedical engineering. Despite the buzz
around generative AI models, and especially
large language models like ChatGPT, he
emphasizes that these aren’t the only models
being used in research.
For example, while Moavenzadeh was
working as an ultrasound engineer, he
had a project where they optimized three
modalities of ultrasound into a single image
using a neural network — a machine
learning model that works sort of like our
brains do, and that predates our current large language models.
As he explains it, “They’re all mechanisms by which the algorithm
effectively learns to or works its way iteratively to a solution that is
optimized.” In AI research at large, machine learning [ML] is closely
partnered, and AI/ML applications for research, including medical
research, abound, from random forest decision algorithms to merging
images for comprehensive analysis.
“As the physician reading this image, I think you would want to
know that it is effectively an artificial creation. Yes, it is grounded in
three images that are somewhat real, but they are merged together
in a deep neural net,” which affects what parts of the data are
directly interpretable, Moavenzadeh says. “If there’s an artifact in there, it would help, in my opinion, to know what the basis
of the model being used to design it was, to see whether
or not you can trust that. I think that is going to be a
challenge that a lot of physicians will face in the future,
both while interpreting papers and the new state-of-the-art thing that comes out, or while doing their own
research.”
That’s why Moavenzadeh is actively working to
develop an AI literacy curriculum for the College
of Medicine that covers not only the more recent
rise of generative AI but also the widespread
use of machine learning in all kinds of medical
research, a project he also considers a way
of fostering his own learning. His proposal is
two-pronged: First, create a condensed
Foundations of Machine Learning elective
course focused on the principles of machine
learning and the models currently in use,
and second, provide concrete examples
of current implementations in medicine.
The latter, he imagines, might be a good
opportunity for a speaker series.
The medical school curriculum is already
busy, but it makes sense to look for time in
intersession or elsewhere — because in
Moavenzadeh’s view, AI, from large
language models like our favorite
chatbots to neural networks for deep
processing, really is everywhere. “I would
say every researcher is probably using it.
Almost in every field, and probably every
single lab to at least some degree.”
There’s no shortage of big decisions on
the horizon, and when it comes to AI itself,
none of us control the pace of innovation.
Drexel has pioneered programs in
physician-patient communication,
medical humanities and professional
formation — an attitude toward innovation
that, along with those who take the
initiative, can serve the school well in the
AI realm. Fortunately, there are excellent
people on the task. As Assistant Dean
Pirrone says with a smile full of pride, “Don’t
we just have the best group?”
Resources:
- anthropic.skilljar.com/ai-fluency-framework-foundations
- drexel.edu/studentlife/community-standards/
code-of-conduct/academic-integrity-policy
- bit.ly/47Sivyi
- drexel.edu/it/security/policies-regulations/ai-guidance
- drexel.edu/provost/ai