When the Survey Results Come Back: Are We Actually Doing the Work?
Getting by is not the same as getting better.
Every spring, ACGME resident and fellow survey results land in program directors’ inboxes and in GME offices across the country. And every spring, a version of the same conversation happens in program meetings, in CCC discussions, in annual program evaluation cycles about what to do with scores that came back below where anyone wanted them to be.
I have sat in enough of those conversations to recognize the pattern. And the pattern is worth naming.
What the Threshold Actually Means
ACGME expects programs to use survey results in their Annual Program Evaluation process and respond to areas of concern. Many institutions set an internal threshold, commonly around 80% compliance, as the marker that triggers a required plan of action. The threshold exits for a reason. A score that falls below is not a rounding error or a statistical anomaly. It is a signal from residents and fellows, anonymously and collectively, that something in the learning environment is not meeting the standard it should.
At the institutional level, the GME office receives both program-level and institutional-level results. Which means the GME office often have a view that no single program has: the pattern across programs, the items that show up consistently below threshold, the gaps between what the institution believes about its learning environment and what the people training inside it are actually experiencing.
That view carries responsibility. And it requires honesty about what is being done with it.
The Pattern Nobody Talks About
Here is what I have watched happen, more often than I would like.
Survey results come back. A program has several items below threshold, some significantly below. The initial reaction is a combination of defensiveness and deflection. A few residents must be disgruntled. The cohort this year was particularly difficult. Things are already being addressed. The scores will probably be better next year.
Then the annual program evaluation arrives. The below-threshold items get noted. A brief narrative gets written with measured, careful language that acknowledges that concern without fully sitting with it. A plan of action gets documented that is just specific enough to satisfy the requirement and just vague enough to avoid real accountability. And the cycle continues.
What almost never happens is the harder thing: a genuine, uncomfortable examination of what the data is actually saying. Who is in the room when that conversation happens, and whether the people whose experiences generated the data have any voice in what comes next. Whether the plan of action is designed to improve the score or to improve the thing the score is measuring. Whether anyone will look back in twelve months and honestly assess whether anything changed.
Getting by is not the same as getting better. And in graduate medical education, the difference matters because the learning environment these scores are measuring is the environment shaping the next generation of physicians.
Why Genuine Response Is Hard
It would be easy to frame this as a problem of motivation or integrity. It is not. Most program directors and GME leaders genuinely want their programs to improve. The barriers to genuine response are real and worth understanding.
Low scores in certain categories, including faculty teaching, workload and psychological safety, require conversations that are uncomfortable and politically complicated. Telling a faculty member that residents don’t feel safe raising concerns in their presence is not a conversation most program directors are training to have, and not one that most institutions have created clear processes to support.
Institutional factors that drive low scores, including adequate resources, systemic workload issues, and cultural dynamics that have existed for years, are often outside the program’s control to address unilaterally. A program director who identifies a systemic problem and it brings to institutional leadership needs that leadership to take it seriously. When they don’t the program is left holding a score they cannot improve without authority they don’t have.
And the survey itself, while valuable, has limitations that programs sometimes hide behind. Small programs sizes create statistical volatility. A single outlier response can move a score dramatically. These are real methodological considerations, and they are also sometimes a convenient way to avoid looking too closely at what the data might be telling you.
The difficulty is real. It does not excuse doing just enough to get by.
What a Genuine Response Actually Looks Like
The programs and institutions that use survey data well share a few characteristics. They are worth naming not as a checklist but as a standard to measure against.
They resist the instinct to explain the score before they understand it. The first question is not “why did we score low here,” which almost always lead to external attribution. The first question is “what might residents be experiencing that generated this score,” which leads somewhere more useful.
They involve residents in the response. Not performatively, not a brief mention in a town hall that scores were reviewed and things are being worked on. Genuinely. What did you mean when you said this? What would improvement actually look like from where you sit? What would make you answer the question differently next year? Residents who are involved in the response to their own feedback become invested in the outcome in a way that no program-designed initiative can replicate.
They distinguish between items they can address at the program level and items that require institutional intervention, and they escalate the latter clearly and specifically. A program director who brings a below threshold score on workload to the GME office or DIO with a specific request is doing something different from one who notes the concern in the APE and moves on. The first creates institutional accountability. The second creates documentation.
They build plan of actions that are specific, measurable, and tied to a timeline that will be honestly reviewed. Not “we will continue to monitor,” which means nothing. Not “we have implemented changes to address this concern,” which describes an action without committing to an outcome. But: here is what we are doing, here is what we expect to see change, here is when we will look at whether it did.
And they come back to the data. Not to validate that the plan worked but to honestly assess whether the learning environment changed. Those are different questions. One invites confirmation. The other invites truth
What the GME Office Owes This Conversation
At the institutional level, the GME office has a responsibility that goes beyond collecting and distributing results.
It is in a position to see what no individual program can see: which concerns are isolated and which are systemic. Which below-threshold items appear across multiple programs and therefore reflect something about the institution rather than something about a single program’s leadership. When the pattern of results over multiple years suggests that nothing is actually changing despite documented plans.
That view creates an obligation. The GME office that receives institutional results, notes the below-threshold items, and waits for programs to respond on their own is not doing its job. The institutional perspective exists to connect dots that programs cannot connect from where they sit, and to bring those connections to the people with the authority to act on them.
This means presenting results to senior leadership not just as a compliance update but as a genuine picture of institutional learning environment quality. It means tracking plan of action follow-through across programs and asking, explicitly, what changed. It means being willing to say to a program director, to a DIO, to a CMO, that the data suggests something that a plan of action is not going to fix.
That is uncomfortable work. It is also the work.
Closing Reflection
The ACME residency and fellow survey exists because what residents and fellows experience in training matters. Not just for accreditation. Not just for scores. But because the learning environment they are in right now is shaping the physicians they are becoming, and the physicians they become will carry those formative experiences into every patient encounter, every team interaction, every institutional decision for the rest of their careers.
A score that below threshold is not a bureaucratic inconvenience. It is a message from the people closest to the learning environment that something is not right.
They deserve a response that takes that message seriously.
Not just seriously enough to document. Seriously enough to act. Seriously enough to come back and honestly ask whether anything changed. Seriously enough to sit with the discomfort about an environment that everyone, including program leadership, institutional leadership, and GME offices, has some responsibility for shaping.
Getting by is not the same as getting better.
The survey results are in. The question is what we are actually going to do with them.


