Skip to content
← All work
Provider SearchInteraction DesignWeighted Scoring

Improving Provider Experience

Redesigning provider search and profiles, with a weighted-scoring research framework deciding which changes shipped, and which to kill.

Improving Provider Experience: cover

Project Overview

Outcomes

  • Turned provider-profile sentiment from net −2 to +15 across 6 interviews with a header redesign
  • Shipped two provider-search changes; killed one before a sprint
  • Built reusable research infrastructure that now powers other studies

The Problem

Patients couldn't reliably find or evaluate the right provider. The team had fixes in mind but no way to tell which would actually help, so the calls defaulted to opinion.

The Solution

To decide what was worth building, I built a weighted-scoring research framework, Sentiment × Severity, then ran the provider search and profile redesign through three tested hypotheses. Two shipped. The third I killed before a sprint went into it.

My Role

UX Designer

Team

  • 1 UX Lead
  • 2 Developers
  • 1 Product Manager

I Personally Owned

  • Scoring methodology
  • Session coding process
  • Synthesis framework
  • Insight prioritization model
  • Study analysis

My Process

01

The Starting Point

GoalMake research produce answers leadership would act on

When I joined the web team at the University of Rochester Medical Center, design decisions ran on opinion. Research happened; it just never became proof a roadmap would move for. Everyone agreed it mattered. Turning a hypothesis into enough evidence to change direction was the part nobody had cracked. Three problems stood in the way.

01

Decisions by Opinion

Stakeholder decisions were driven primarily by:

  • Executive preference
  • Anecdotal feedback
  • Isolated usability observations
  • Analytics without behavioral context
02

Limited Pushback Ability

With little hard evidence behind any of it, every leadership request carried the same weight, and which features got built came down to taste.

03

No Organizational Memory

Each project started from zero, so teams rediscovered the same usability problems over and over.

02

Applying the Framework to Provider Search

GoalValidate or reject product assumptions with structured research

268

user observations logged

11

participants

3

distinct audiences

Provider search was the framework's first real test, and the reason it exists. The team had no shortage of proposed fixes. It just couldn't prove any of them, and I needed each assumption validated or rejected against real evidence. Old research handed over themes and quotes, easy to argue with and impossible to rank. The framework turns every session into a number instead. WEIGHTED IMPACT SCORE = Sentiment × Severity Sentiment is direction, scored −3 to +3: did this help the user or hurt them? I read it from the whole interview, the tone, the expression, the body language, the small hesitations, not just the words. Severity, scored 1 to 3, is how much it mattered. Multiply the two, and every observation carries a single weighted score that rolls up to the hypothesis it tested.

  1. 01

    Observation Logging

    Every user behavior, reaction, and comment is logged as a discrete observation.

  2. 02

    Taxonomy Coding

    Observations are tagged against a 27-code taxonomy to cut interpretation drift between studies.

  3. 03

    Severity Scoring

    Each observation gets a sentiment score (−3 to +3) and a severity score (1 to 3), multiplied into a weighted impact score.

  4. 04

    Hypothesis Analysis

    Scores roll up to the hypothesis level to compare baseline usability, proposed changes, and observed behavior.

The Airtable base: each observation logged as a quote, tagged against the code book, then scored for sentiment and severity.

The score, in practice

Sentiment +1 × Severity 2 = +2
To me, non-surgical stands out right here. It says herniated discs, spinal stenosis, degenerative disc disease.
Said leaning in, scanning the profile with a nod. Positive direction, moderate weight: a clear point in the hypothesis's favor.
Sentiment 0 = 0
The provider's specialties did not immediately stand out for assessing the profile at a high level.
Flat tone, no strong reaction either way. It goes on the record, but it doesn't move the hypothesis.
Sentiment −1 × Severity 1 = −1
I'm sort of missing the baseline definition of what orthopedics and physical performance is.
Hesitation and a furrowed brow gave this away as a genuine miss. Negative sentiment pulls the hypothesis's score down instead of hiding inside a transcript.
03

Three Hypotheses, Three Verdicts

GoalShip what the evidence supports, kill what it doesn't

With the framework running, I ran three design hypotheses against a documented baseline, six moderated interviews each. Each one returned a numeric Efficacy Delta that went straight to the roadmap. Two said build. One said kill, before a sprint was spent.

H1

Specialties Label

Shipped

+17Δ

sentiment swing across 6 interviews (−2 → +15)

Hypothesis: a clear "Specialties" label on the provider profile header helps patients see what a provider actually treats. It worked. Across 6 interviews, net sentiment climbed from −2, where people skimmed past the key information unsure what the provider did, to +15, where they engaged with the provider's experience right away. A 17-point gain, and it shipped.

H2

Filter Shift

Shipped

+10.3%

filter engagement, across 6 interviews plus analytics

Hypothesis: renaming the filter to patients' own words, and opening it by default, drives more use. The dashboard liked it, with filter engagement up 10.3% and downstream profile clicks up 5.9%. In the interviews, though, sentiment barely moved, nudging to +2. Triangulating both signals in Looker Studio gave one honest read, and it caught a false positive the analytics alone would have called a win. We shipped it anyway, eyes open, because the gap between the numbers and the feeling was itself the finding.

H3

Connected Profiles

Killed

−21Δ

sentiment across 6 interviews

Hypothesis: surfacing a provider's published research on the profile card builds trust. Our web team lead championed it. Then it scored −21 across 6 interviews. Patients ignored the widget, and where they did notice it, the clinical language bred confusion and chipped at the very trust it was supposed to earn. We killed it before engineering spent a sprint.

04

Killing a Feature the Team Wanted

GoalUse the evidence to overturn a popular idea before it cost a sprint

1 sprint

of front-end development saved by not building it

−21Δ

the score that made the call hard to argue with

This is the one worth slowing down on, because killing it took more than a number. Connected Profiles had a champion in our web team lead, and on paper the logic was sound: research credentials signal trust. The score said otherwise. At −21 across six sessions, it left patients worse off than no card at all. So I brought the rollup to a design review and walked the room through the scored sessions, observation by observation. The lead read the same evidence I did and agreed to cut it, and a sprint of front-end work went to something patients needed instead. The framework had done the hard part already. By turning a hunch into a number the room could line up behind, it let me overrule a popular idea without the conversation ever turning personal.

05

Operational Impact

GoalMake prior research reusable instead of disposable

The framework's real payoff arrived after the provider study. Every session lives scored in one relational base, which means old research stops being a finished report and turns back into data I can query. I went looking through interviews from earlier projects, the same recordings seen through a sharper lens, and pulled defensible new findings out of work everyone had already filed away.

Education
6 of 8participants hit the same navigation problem

Before

Student interviews had flagged navigation confusion, but it was logged as "anecdotal" and set aside.

Re-scored

Re-scored through the framework, the same comments came back as a high-severity pattern, not a one-off complaint.

Hospitals
5,500+survey recipients, plus 5 moderated interviews

Before

Clinical patient interviews from a hospital redesign sat as scattered observations with no clear through-line.

Re-scored

Scored, a clear pattern emerged: patients expected hospital-specific navigation, not system-wide. That reframed a planned cosmetic refresh into an information-architecture overhaul, and the relaunched site shipped with the structural fixes built in.

06

Key Learnings

Whether the product is a platform for construction businesses or a healthcare system at institutional scale, the problem repeats: decisions get made on opinion until someone builds the infrastructure to make them on evidence.

The right direction

Somewhere on every product team sits a feature one person swears by and users don't need. A scored framework catches it before engineering spends a sprint.

Decisions that stick

Give leadership a number instead of a theme or a quote, and research starts steering direction. Buy-in gets easier to win and harder to walk back.

Data that compounds

With a relational research architecture, every study makes the next one smarter. You build on institutional memory instead of starting each project from zero.

My Impact

Build the infrastructure, and evidence starts replacing opinion at any scale. The search and profiles this one guided now draw more than 650,000 profile views a quarter in New York State alone, holding a 65.8% engagement rate.

+17Δ

net sentiment swing on the redesigned provider profile (−2 → +15, H1 shipped)

2 shipped, 1 killed

hypotheses resolved on evidence, not opinion

268

observations scored across 11 participants and 3 audiences

658K

provider-profile views a quarter on the experience I redesigned (New York State users)