Provider SearchInteraction DesignWeighted Scoring

Improving the Provider Search experience with a strong research framework

How a score cut a feature the team wanted before it cost a sprint.

6 min read

Improving the Provider Search experience with a strong research framework: cover

Outcomes

Turned provider-profile sentiment from net −2 to +15 across 6 interviews with a header redesign
Shipped two provider-search changes; killed one before a sprint
Built reusable research infrastructure that now powers other studies

The Problem

Patients couldn't reliably find or evaluate the right provider. The team had fixes in mind but no way to tell which would actually help, so the calls defaulted to opinion.

The Solution

To decide what was worth building, I built a weighted-scoring research framework, Sentiment × Severity, then ran the provider search and profile redesign through three tested hypotheses. Two shipped. The third I killed before a sprint went into it.

My Role

UX Designer

Team

1 UX Lead
2 Developers
1 Product Manager

I Personally Owned

Scoring methodology
Session coding process
Synthesis framework
Insight prioritization model
Study analysis

My Process

Design decisions ran on opinion

Goal: Make research produce answers leadership would act on

When I joined the web team at the University of Rochester Medicine, design decisions ran on opinion. Research happened; it just never became proof a roadmap would move for. Everyone agreed it mattered. Turning a hypothesis into enough evidence to change direction was the part nobody had cracked. Three problems stood in the way.

Decisions by Opinion

Stakeholder decisions were driven primarily by:

Executive preference
Anecdotal feedback
Isolated usability observations
Analytics without behavioral context

Limited Pushback Ability

With little hard evidence behind any of it, every leadership request carried the same weight, and which features got built came down to taste.

No Organizational Memory

Each project started from zero, so teams rediscovered the same usability problems over and over.

A scoring framework: Sentiment × Severity

Goal: Validate or reject product assumptions with structured research

268

user observations logged

participants

distinct audiences

Provider search was the framework's first real test, and the reason it exists. The team had no shortage of proposed fixes. It just couldn't prove any of them, and I needed each assumption validated or rejected against real evidence. Old research handed over themes and quotes, easy to argue with and impossible to rank. The framework turns every session into a number instead. WEIGHTED IMPACT SCORE = Sentiment × Severity Sentiment is direction, scored −3 to +3: did this help the user or hurt them? I read it from the whole interview, the tone, the expression, the body language, the small hesitations, not just the words. Severity, scored 1 to 3, is how much it mattered. Multiply the two, and every observation carries a single weighted score that rolls up to the hypothesis it tested.

01
Observation Logging
Every user behavior, reaction, and comment is logged as a discrete observation.
02
Taxonomy Coding
Observations are tagged against a 27-code taxonomy to cut interpretation drift between studies.
03
Severity Scoring
Each observation gets a sentiment score (−3 to +3) and a severity score (1 to 3), multiplied into a weighted impact score.
04
Hypothesis Analysis
Scores roll up to the hypothesis level to compare baseline usability, proposed changes, and observed behavior.

A scoring framework: Sentiment × Severity: The Airtable base: each observation logged as a quote, tagged against the code book, then scored for sentiment and severity. — The Airtable base: each observation logged as a quote, tagged against the code book, then scored for sentiment and severity.

The score, in practice

Sentiment +1 × Severity 2 = +2

“To me, non-surgical stands out right here. It says herniated discs, spinal stenosis, degenerative disc disease.”

Said leaning in, scanning the profile with a nod. Positive direction, moderate weight: a clear point in the hypothesis's favor.

Sentiment 0 = 0

“The provider's specialties did not immediately stand out for assessing the profile at a high level.”

Flat tone, no strong reaction either way. It goes on the record, but it doesn't move the hypothesis.

Sentiment −1 × Severity 1 = −1

“I'm sort of missing the baseline definition of what orthopedics and physical performance is.”

Hesitation and a furrowed brow gave this away as a genuine miss. Negative sentiment pulls the hypothesis's score down instead of hiding inside a transcript.

Three fixes tested: two shipped, one didn't

Goal: Ship what the evidence supports, kill what it doesn't

With the framework running, I ran three design hypotheses against a documented baseline, six moderated interviews each. Each one returned a numeric Efficacy Delta that went straight to the roadmap. Two said build. One said kill, before a sprint was spent.

Specialties Label

Shipped

+17Δ

sentiment swing across 6 interviews (−2 → +15)

Specialties Label

Shipped

Hypothesis: a clear "Specialties" label on the provider profile header helps patients see what a provider actually treats. It worked. Across 6 interviews, net sentiment climbed from −2, where people skimmed past the key information unsure what the provider did, to +15, where they engaged with the provider's experience right away. A 17-point gain, and it shipped.

Filter Shift

Shipped

+18.6%

filter engagement clicks (Looker Studio)

Filter Shift

Shipped

Hypothesis: renaming the filter to patients' own words, and opening it by default, drives more use. The dashboard liked it, with filter engagement clicks up 18.6%. In the interviews, though, sentiment barely moved, nudging to +2. Triangulating both signals in Looker Studio gave one honest read, and it caught a false positive the analytics alone would have called a win. We shipped it anyway, eyes open, because the gap between the numbers and the feeling was itself the finding.

Connected Profiles

Killed

−21Δ

sentiment across 6 interviews

Connected Profiles

Killed

Hypothesis: surfacing a provider's published research on the profile card builds trust. Our web team lead championed it. Then it scored −21 across 6 interviews. Patients ignored the widget, and where they did notice it, the clinical language bred confusion and chipped at the very trust it was supposed to earn. We killed it before engineering spent a sprint.

Killing a feature the team wanted, on a −21 score

Goal: Use the evidence to overturn a popular idea before it cost a sprint

1 sprint

of front-end development saved by not building it

−21Δ

the score that made the call hard to argue with

This is the one worth slowing down on, because killing it took more than a number. Connected Profiles had a champion in our web team lead, and on paper the logic was sound: research credentials signal trust. The score said otherwise. At −21 across six sessions, it left patients worse off than no card at all. So I brought the rollup to a design review and walked the room through the scored sessions, observation by observation. The lead read the same evidence I did and agreed to cut it, and a sprint of front-end work went to something patients needed instead. The framework had done the hard part already. By turning a hunch into a number the room could line up behind, it let me overrule a popular idea without the conversation ever turning personal.

Old research became data I could query again

Goal: Make prior research reusable instead of disposable

The framework's real payoff arrived after the provider study. Every session lives scored in one relational base, which means old research stops being a finished report and turns back into data I can query. I went looking through interviews from earlier projects, the same recordings seen through a sharper lens, and pulled defensible new findings out of work everyone had already filed away.

Education

6 of 8participants hit the same navigation problem

Before

Student interviews had flagged navigation confusion, but it was logged as "anecdotal" and set aside.

→↓

Re-scored

Re-scored through the framework, the same comments came back as a high-severity pattern, not a one-off complaint.

Hospitals

5,500+survey recipients, plus 5 moderated interviews

Before

Clinical patient interviews from a hospital redesign sat as scattered observations with no clear through-line.

→↓

Re-scored

Scored, a clear pattern emerged: patients expected hospital-specific navigation, not system-wide. That reframed a planned cosmetic refresh into an information-architecture overhaul, and the relaunched site shipped with the structural fixes built in.

Evidence beats opinion, at any scale

Whether the product is a platform for construction businesses or a healthcare system at institutional scale, the problem repeats: decisions get made on opinion until someone builds the infrastructure to make them on evidence.

The right direction

Somewhere on every product team sits a feature one person swears by and users don't need. A scored framework catches it before engineering spends a sprint.

Decisions that stick

Give leadership a number instead of a theme or a quote, and research starts steering direction. Buy-in gets easier to win and harder to walk back.

Data that compounds

With a relational research architecture, every study makes the next one smarter. You build on institutional memory instead of starting each project from zero.

My Impact

Build the infrastructure, and evidence starts replacing opinion at any scale. The search and profiles this one guided now draw more than 650,000 profile views a quarter in New York State alone, holding a 65.8% engagement rate.

+17Δ

net sentiment swing on the redesigned provider profile (−2 → +15, H1 shipped)

2 shipped, 1 killed

hypotheses resolved on evidence, not opinion

268

observations scored across 6 participants and 3 audiences

658K

provider-profile views a quarter on the experience I redesigned (New York State users)

What I'd do differently

Write the tiebreaker before the test

H2 exposed the gap: analytics climbed 18.6% while interview sentiment barely moved. We reconciled the two signals after the fact, and it worked, but the rule for which one wins belongs in the study design, decided before anyone sees a number.

Let stakeholders watch the scoring

The kill decision held because the web lead read the scored sessions himself. That review came at the end. Inviting him into a session mid-study would have turned the final number into something he watched accumulate instead of a verdict delivered in a meeting.

Three ugly truths

01
Six participants. The framework makes a small sample rigorous to read; it does not make the sample bigger. Every delta in this study rests on how carefully six people were understood.
02
H2 shipped while the signals disagreed. The dashboard showed 18.6% more filter engagement and the interviews nudged to +2. Calling the gap a finding was honest. Shipping anyway was still a bet.
03
Sentiment scoring runs through one researcher's judgment. The 27-code taxonomy keeps studies comparable; it cannot make my read of a pause or a furrowed brow objective.

Liked this project?

I'm happy to walk through the decisions behind it, or talk about what I could bring to your team. Email is the fastest way to reach me.

ahmedrazinux@gmail.com LinkedIn

Next project

Turning the design system from ignored to fully adopted

→

Improving the Provider Search experience with a strong research framework

My Process

Design decisions ran on opinion

Decisions by Opinion

Limited Pushback Ability

No Organizational Memory

A scoring framework: Sentiment × Severity

Observation Logging

Taxonomy Coding

Severity Scoring

Hypothesis Analysis

Three fixes tested: two shipped, one didn't

Specialties Label

Specialties Label

Filter Shift

Filter Shift

Connected Profiles

Connected Profiles

Killing a feature the team wanted, on a −21 score

Old research became data I could query again

Evidence beats opinion, at any scale

The right direction

Decisions that stick

Data that compounds

My Impact

Write the tiebreaker before the test

Let stakeholders watch the scoring

Liked this project?

Turning the design system from ignored to fully adopted