The EM Physician's AI Stack: What Belongs in Your Workflow Today
Shuhan He, MD
Artificial intelligence is already in the emergency department: ambient documentation pilots, sepsis alerts, staffing forecasts, patient-message triage, radiology worklists, translation-adjacent workflows, coding suggestions, literature search, and the personal tools physicians use to draft, summarize, and organize their work.
Emergency physicians now need to decide which AI uses are appropriate, which uses require local governance, and which uses should stay out of the clinical workflow until they are validated for emergency care.
The answer starts with a boring distinction that matters: "AI use" covers very different risk categories. A tool that helps summarize a policy memo carries a different risk than a model that influences disposition. A tool that rewrites discharge instructions carries a different risk than one that recommends a workup. A dashboard that predicts arrival volume belongs in a different governance category than an adaptive device or clinical decision-support model.
The EM physician's AI stack should be built in layers, with stricter rules as the tool gets closer to patient care. That is the same risk-based instinct behind the NIST AI Risk Management Framework: the governance burden should rise with the consequence of being wrong.
Layer 1: Personal productivity
This is the lowest-risk place to start, and it is where many physicians will get the most immediate value.
Reasonable uses include:
- drafting a meeting agenda;
- summarizing a non-PHI policy document;
- turning a committee discussion into action items;
- creating an outline for a lecture;
- comparing two public guidelines;
- making a patient-education handout easier to read, before clinician review;
- generating a first-pass literature-search strategy;
- translating a vague operational problem into a clearer project charter.
The rule is simple: no protected health information, no confidential hospital data, no patient-identifiable screenshots, and no assumption that the output is correct. If a team wants to use real clinical material, it should first understand what counts as de-identified under HHS HIPAA guidance. Treat the tool like an overconfident intern with a fast keyboard. Useful, but supervised.
For residents and fellows, this layer is especially valuable for learning how to structure a question. A good prompt often forces the user to define the audience, constraint, decision, and desired format. That habit transfers back to clinical reasoning.
Layer 2: Team operations
The next layer is operational work that affects the team but does not directly generate clinical decisions.
Examples include:
- drafting a downtime drill checklist;
- summarizing crowding metrics for a QI meeting;
- extracting themes from deidentified safety reports;
- building a first-pass staffing memo;
- generating a handoff template for a new observation-unit workflow;
- comparing vendor claims against a local requirements list;
- drafting a policy for use of unapproved AI tools.
This layer requires more care because the output can shape how the department behaves. The main risks are false confidence, missing context, and automation of a bad process. If a model summarizes incident reports, someone still has to check whether the summary hides rare but serious events. If a model drafts a staffing memo, someone still has to know whether the assumptions match the real department. If a forecast or staffing model is being compared to historical operations, the team should decide in advance what general statistical comparison will count as improvement.
Operational AI should always produce an artifact that a human can inspect. If nobody can explain how the output will be used, the team is not ready to use it.
Layer 3: Documentation support
Ambient scribes and note-generation tools are the first high-visibility AI use case many emergency physicians will encounter. They may save time. They may also create new forms of error.
An ED note is a medicolegal, billing, communication, and clinical-reasoning document. A documentation tool that turns conversation into prose has to be judged on more than whether the note sounds fluent.
Before an ED adopts documentation AI, a pilot should measure at least:
- hallucinated history or exam findings;
- omitted negatives that matter clinically;
- wrong laterality, timing, dose, or medication;
- copied-forward or invented assessment language;
- time saved after physician editing, not before;
- whether the output changes coding or billing patterns;
- patient comfort and consent workflow;
- how corrections are made and audited.
The physician remains responsible for the note. That sounds obvious, but it has to be operationalized. If the tool creates the first draft, the physician needs a review workflow that is fast enough for emergency medicine and strict enough to catch clinically meaningful errors.
The same principle applies to structured data. If a documentation tool is extracting a finding into the chart, the team should know whether that output becomes a discrete HL7 FHIR Observation, a billing artifact, a note fragment, or a dashboard-only field. Those are not interchangeable.
Layer 4: Clinical decision support
Once a tool influences triage, diagnosis, testing, treatment, disposition, or follow-up, it has crossed into clinical decision support. At that point, teams should ask how it performs in their population, under their workflow, with their failure modes.
Emergency physicians should ask for the same basic evidence they would expect from any clinical test or decision rule:
- What is the intended use?
- What patient population was it trained and validated on?
- What is the reference standard?
- What are the sensitivity, specificity, positive predictive value, and negative predictive value?
- How does performance change by age, race, sex, language, insurance status, arrival mode, acuity, and site?
- What happens when the model is wrong?
- How often is the output ignored, overridden, or unavailable?
- How is drift monitored after go-live?
Those operating characteristics should be reported explicitly. A local validation can start with a simple diagnostic performance table, but the same discipline applies whether the tool is a rule, a regression model, or a large language model wrapper.
ONC's HTI-1 rule is useful here because it pushes algorithm transparency for predictive decision-support tools in certified health IT. The point for emergency physicians is practical: if a tool is influencing care, clinicians need enough baseline information to judge fairness, appropriateness, validity, effectiveness, and safety.
This is also where alert fatigue matters. A model can be statistically impressive and operationally harmful if it fires too often, interrupts at the wrong time, or pushes clinicians toward low-value action. In the ED, timing is part of performance.
If the tool is supposed to launch inside the EHR rather than sit in a separate browser tab, the architecture matters too. A real SMART on FHIR workflow should preserve context, authorization, and auditability rather than forcing clinicians to copy data between systems.
Layer 5: Autonomous or regulated tools
Some AI tools may meet the definition of medical device software, especially if they analyze patient data to drive diagnosis or treatment recommendations. FDA regulation changes what evidence, monitoring, and change-control expectations should exist before the tool is used.
Emergency physicians do not need to become regulatory lawyers, but they should know when to ask the question: is this tool functioning as medical device software, and if so, what authorization, lifecycle plan, and monitoring apply?
The higher the autonomy and the closer the tool is to clinical decision-making, the less acceptable it is to rely on vendor reassurance, demo performance, or generic validation outside emergency care.
Red lines for EM physicians
There are a few rules worth stating plainly.
Do not paste patient-identifiable information into an unapproved AI tool.
Do not use a public model as a medical consultant for a real patient.
Do not let an AI output decide disposition.
Do not use a model's explanation as proof that the model is correct.
Do not deploy a tool that nobody owns after go-live.
Do not accept "validated on millions of patients" as an answer unless the validation population, reference standard, and emergency-care performance are clear. If the model is being used to compare two measurements or workflows, require a real method comparison, not a screenshot of correlation.
Do not allow a tool to make inequity harder to see. If subgroup performance is unknown, that is not a small footnote.
A practical checklist: can I use this in my ED?
Before using or recommending an AI tool, ask:
- What task is this tool actually doing?
- Does it touch protected health information?
- Is it patient-facing?
- Does it influence a clinical decision?
- Is it approved by my institution for this use?
- Is the output easy for a clinician to inspect and correct?
- What is the worst plausible error?
- Who monitors performance after implementation?
- What evidence exists in emergency medicine specifically?
- What will we stop doing if the tool works?
The last question is often the most revealing. If a tool does not remove work, reduce risk, improve communication, or make a decision more reliable, it may simply be adding another screen to a department that already has too many.
How to start this week
For individual physicians, start with low-risk productivity tasks that do not involve PHI: lecture outlines, committee memos, policy summaries, literature-search plans, and patient-facing language that you personally review.
For section leaders and medical directors, start by writing an AI use policy that separates personal productivity tools from clinical tools. Make the boundary clear. Then inventory what is already happening in the department. Many EDs will discover that AI use has already begun informally.
For informatics teams, pick one candidate workflow and define the evaluation before the pilot starts. If the use case is documentation, measure documentation errors and physician editing burden. If it is decision support, measure performance, override behavior, and alert burden. If it is operations, measure whether the prediction changes an operational decision. If the pilot is meant to become publishable, set the sample size and outcome definitions before the first chart is reviewed.
The bottom line
Emergency physicians should use AI the way we use every other high-risk tool in the ED: with a clear indication, awareness of failure modes, and a plan to check whether it actually helped. For interventions, that means reporting effect in terms people can use, not only p-values; absolute risk reduction, relative risk reduction, and number needed to treat still matter.
The safest starting point is a disciplined stack: personal productivity first, team operations next, documentation support with review, clinical decision support only with validation and monitoring, and autonomous or regulated tools only with the governance they require.
The future of AI in emergency medicine should be physician-led, patient-centered, and operationally honest. That will not happen by waiting for perfect tools. It will happen when emergency physicians learn to ask better questions before the tools become invisible parts of the workflow.
EngagED prompt
What is one AI use case that has actually saved you time in the ED, and what is one use case you would not allow near patient care yet?
References
- American College of Emergency Physicians. Leading EM Organizations Issue Consensus Statement on Artificial Intelligence in EM. March 18, 2026. https://www.acep.org/news/acep-newsroom-articles/3-18-26-leading-em-organizations-issue-consensus-statement-on-artificial-intelligence-in-em
- American Medical Association. Augmented intelligence in medicine. Updated March 13, 2026. https://www.ama-assn.org/practice-management/digital-health/augmented-intelligence-medicine
- Office of the National Coordinator for Health Information Technology. HTI-1 Final Rule. https://healthit.gov/regulations/hti-rules/hti-1-final-rule/
- S. Food and Drug Administration. Artificial Intelligence in Software as a Medical Device. https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-software-medical-device
- National Institute of Standards and Technology. AI Risk Management Framework. https://www.nist.gov/itl/ai-risk-management-framework
- HHS Office for Civil Rights. Guidance Regarding Methods for De-identification of Protected Health Information. https://www.hhs.gov/hipaa/for-professionals/special-topics/de-identification/index.html
- HL7 International. FHIR Release 4: Observation. https://hl7.org/fhir/r4/observation.html