Improving Provider Experience
Redesigning provider search and profiles, with a weighted-scoring research framework deciding which changes shipped, and which to kill.

Project Overview
Outcomes
- Turned provider-profile sentiment from net −2 to +15 across 6 interviews with a header redesign
- Shipped two provider-search changes; killed one before a sprint
- Built reusable research infrastructure that now powers other studies
The Problem
Patients couldn't reliably find or evaluate the right provider. The team had fixes in mind but no way to tell which would actually help, so the calls defaulted to opinion.
The Solution
To decide what was worth building, I built a weighted-scoring research framework, Sentiment × Severity, then ran the provider search and profile redesign through three tested hypotheses. Two shipped. The third I killed before a sprint went into it.
My Role
UX Designer
Team
- 1 UX Lead
- 2 Developers
- 1 Product Manager
I Personally Owned
- Scoring methodology
- Session coding process
- Synthesis framework
- Insight prioritization model
- Study analysis
My Process
The Starting Point
GoalMake research produce answers leadership would act on
When I joined the web team at the University of Rochester Medical Center, design decisions ran on opinion. Research happened; it just never became proof a roadmap would move for. Everyone agreed it mattered. Turning a hypothesis into enough evidence to change direction was the part nobody had cracked. Three problems stood in the way.
Decisions by Opinion
Stakeholder decisions were driven primarily by:
- Executive preference
- Anecdotal feedback
- Isolated usability observations
- Analytics without behavioral context
Limited Pushback Ability
With little hard evidence behind any of it, every leadership request carried the same weight, and which features got built came down to taste.
No Organizational Memory
Each project started from zero, so teams rediscovered the same usability problems over and over.
Applying the Framework to Provider Search
GoalValidate or reject product assumptions with structured research
268
user observations logged
11
participants
3
distinct audiences
Provider search was the framework's first real test, and the reason it exists. The team had no shortage of proposed fixes. It just couldn't prove any of them, and I needed each assumption validated or rejected against real evidence. Old research handed over themes and quotes, easy to argue with and impossible to rank. The framework turns every session into a number instead. WEIGHTED IMPACT SCORE = Sentiment × Severity Sentiment is direction, scored −3 to +3: did this help the user or hurt them? I read it from the whole interview, the tone, the expression, the body language, the small hesitations, not just the words. Severity, scored 1 to 3, is how much it mattered. Multiply the two, and every observation carries a single weighted score that rolls up to the hypothesis it tested.
- 01
Observation Logging
Every user behavior, reaction, and comment is logged as a discrete observation.
- 02
Taxonomy Coding
Observations are tagged against a 27-code taxonomy to cut interpretation drift between studies.
- 03
Severity Scoring
Each observation gets a sentiment score (−3 to +3) and a severity score (1 to 3), multiplied into a weighted impact score.
- 04
Hypothesis Analysis
Scores roll up to the hypothesis level to compare baseline usability, proposed changes, and observed behavior.
The score, in practice
“To me, non-surgical stands out right here. It says herniated discs, spinal stenosis, degenerative disc disease.”
“The provider's specialties did not immediately stand out for assessing the profile at a high level.”
“I'm sort of missing the baseline definition of what orthopedics and physical performance is.”
Three Hypotheses, Three Verdicts
GoalShip what the evidence supports, kill what it doesn't
With the framework running, I ran three design hypotheses against a documented baseline, six moderated interviews each. Each one returned a numeric Efficacy Delta that went straight to the roadmap. Two said build. One said kill, before a sprint was spent.
Specialties Label
Shipped+17Δ
sentiment swing across 6 interviews (−2 → +15)
Specialties Label
ShippedHypothesis: a clear "Specialties" label on the provider profile header helps patients see what a provider actually treats. It worked. Across 6 interviews, net sentiment climbed from −2, where people skimmed past the key information unsure what the provider did, to +15, where they engaged with the provider's experience right away. A 17-point gain, and it shipped.
Filter Shift
Shipped+10.3%
filter engagement, across 6 interviews plus analytics
Filter Shift
ShippedHypothesis: renaming the filter to patients' own words, and opening it by default, drives more use. The dashboard liked it, with filter engagement up 10.3% and downstream profile clicks up 5.9%. In the interviews, though, sentiment barely moved, nudging to +2. Triangulating both signals in Looker Studio gave one honest read, and it caught a false positive the analytics alone would have called a win. We shipped it anyway, eyes open, because the gap between the numbers and the feeling was itself the finding.
Connected Profiles
Killed−21Δ
sentiment across 6 interviews
Connected Profiles
KilledHypothesis: surfacing a provider's published research on the profile card builds trust. Our web team lead championed it. Then it scored −21 across 6 interviews. Patients ignored the widget, and where they did notice it, the clinical language bred confusion and chipped at the very trust it was supposed to earn. We killed it before engineering spent a sprint.
Killing a Feature the Team Wanted
GoalUse the evidence to overturn a popular idea before it cost a sprint
1 sprint
of front-end development saved by not building it
−21Δ
the score that made the call hard to argue with
This is the one worth slowing down on, because killing it took more than a number. Connected Profiles had a champion in our web team lead, and on paper the logic was sound: research credentials signal trust. The score said otherwise. At −21 across six sessions, it left patients worse off than no card at all. So I brought the rollup to a design review and walked the room through the scored sessions, observation by observation. The lead read the same evidence I did and agreed to cut it, and a sprint of front-end work went to something patients needed instead. The framework had done the hard part already. By turning a hunch into a number the room could line up behind, it let me overrule a popular idea without the conversation ever turning personal.
Operational Impact
GoalMake prior research reusable instead of disposable
The framework's real payoff arrived after the provider study. Every session lives scored in one relational base, which means old research stops being a finished report and turns back into data I can query. I went looking through interviews from earlier projects, the same recordings seen through a sharper lens, and pulled defensible new findings out of work everyone had already filed away.
Before
Student interviews had flagged navigation confusion, but it was logged as "anecdotal" and set aside.
Re-scored
Re-scored through the framework, the same comments came back as a high-severity pattern, not a one-off complaint.
Before
Clinical patient interviews from a hospital redesign sat as scattered observations with no clear through-line.
Re-scored
Scored, a clear pattern emerged: patients expected hospital-specific navigation, not system-wide. That reframed a planned cosmetic refresh into an information-architecture overhaul, and the relaunched site shipped with the structural fixes built in.
Key Learnings
Whether the product is a platform for construction businesses or a healthcare system at institutional scale, the problem repeats: decisions get made on opinion until someone builds the infrastructure to make them on evidence.
The right direction
Somewhere on every product team sits a feature one person swears by and users don't need. A scored framework catches it before engineering spends a sprint.
Decisions that stick
Give leadership a number instead of a theme or a quote, and research starts steering direction. Buy-in gets easier to win and harder to walk back.
Data that compounds
With a relational research architecture, every study makes the next one smarter. You build on institutional memory instead of starting each project from zero.
My Impact
Build the infrastructure, and evidence starts replacing opinion at any scale. The search and profiles this one guided now draw more than 650,000 profile views a quarter in New York State alone, holding a 65.8% engagement rate.
+17Δ
net sentiment swing on the redesigned provider profile (−2 → +15, H1 shipped)
2 shipped, 1 killed
hypotheses resolved on evidence, not opinion
268
observations scored across 11 participants and 3 audiences
658K
provider-profile views a quarter on the experience I redesigned (New York State users)
Next project