HIE from First Principles

Health Information Exchange [HIE, the organized sharing of health data across otherwise separate systems] sounds, at first glance, like a plumbing problem. One system has data, another system wants it, and some dutiful bit of middleware carries the parcel across. This is how many projects are sold, and it is also how many of them acquire the haunted look of a tram depot at midnight. The real problem is not movement. Data moves very easily. CSV files move. HL7 [Health Level Seven, a family of healthcare messaging standards] messages move. FHIR [Fast Healthcare Interoperability Resources, a modern web-oriented standard for representing and exchanging health data] bundles move. What does not move so obediently is meaning, authority, identity, timing, and responsibility. An HIE exists because healthcare is full of facts that are locally generated, globally needed, and semantically unstable the moment they leave the room.

OpenHIE [Open Health Information Exchange, an open architectural framework for building national and regional HIEs] matters because it begins in a more serious place. It does not assume that interoperability is a single standard, a single database, or a single miracle API. It assumes instead that health systems are fragmented by design, that public health and clinical care ask different questions of the same events, that facilities and workers and patients all acquire multiple identities depending on who is counting, and that the exchange layer must survive this without pretending the mess is not there. That is a much better beginning.

The first principle of HIE is simple and brutal: healthcare data is born inside workflows, not inside information models. A clinic visit produces one kind of record because a nurse needs to triage, a physician needs to decide, a biller needs to justify, a program manager needs to report, and a ministry needs to count. Those are not the same act. They merely happen near one another and are later flattened into a thing called “the data,” as though it had descended from heaven pre-labeled and hygienic.

An HIE therefore is not chiefly a pipe. It is a negotiated civic arrangement for deciding which facts can be trusted, under what identity, with what provenance, at what latency, and for which downstream use. Once you see that, the usual confusions fall away. A Master Patient Index [MPI, a service that links records belonging to the same person across systems] is not just a convenience; it is an answer to the political and mathematical problem of personhood across fractured systems. A terminology service is not just a code table with delusions of grandeur; it is a machine for constraining meaning so that aggregation does not dissolve into poetry. A shared health record is not the patient chart for the whole country; it is a deliberately selected and normalized subset of clinically useful information, stored for exchange rather than for the full burden of care delivery.

OpenHIE’s great virtue is that it treats these as distinct architectural concerns. It gives you registries for the major nouns, an interoperability layer for the verbs, and shared services for the semantics. That separation sounds almost obvious until one has seen what happens without it. Then it seems less obvious and more like civilization.

Start with the bare minimum. A health system contains points of service: hospitals, primary care clinics, laboratory systems, pharmacy systems, community health worker tools, logistics systems, insurance platforms, public health platforms, and, lurking in the alley, spreadsheets. Each of these systems generates data for its own workflow. That local purpose matters. An Electronic Health Record [EHR, the digital clinical record used in care delivery] is optimized for care operations. A Health Management Information System [HMIS, a platform used for aggregate program and reporting data] is optimized for summary reporting. A Logistics Management Information System [LMIS, a system for managing supply chain and inventory data] is optimized for stock movement. Their records may refer to the same patient, facility, or event, but they are not the same representation of reality.

OpenHIE places an Interoperability Layer [IOL, the orchestrating exchange layer that receives, validates, routes, secures, and transforms communications] between these point-of-service systems and the shared services of the exchange. This is an important distinction. The IOL is not the truth. It is the traffic officer, customs desk, translator, and sometimes bouncer. It enforces transport and orchestration concerns: who may send, what may be accepted, where it should go next, which transformations are needed, which validations must pass, and how failures are handled.

Behind that sits a set of canonical services that answer stable questions about unstable reality.

The Client Registry [CR, the service responsible for uniquely identifying care recipients across systems] addresses patient identity. In an ideal world, every person has one identifier and uses it consistently. In the world we inhabit, a mother may be registered under different names in antenatal care, immunization, and inpatient admission; a child may have no formal identifier; a refugee may have shifting documentation; spelling may wander; birthdays may be approximations. The CR does not eliminate this. It manages it. It provides matching, linking, and identity resolution under explicit rules.

The Facility Registry [FR, the service that maintains the authoritative list of health facilities and their attributes] addresses place. This sounds dull until one recalls how much healthcare data depends on where a service occurred, where the specimen was processed, where the inventory sits, where the patient was referred, and which administrative hierarchy claims the activity. A facility list that drifts is not merely untidy. It warps reporting, financing, referral logic, catchment analysis, and workforce planning.

The Health Worker Registry [HWR, the service that maintains identities and attributes for providers and other health workers] addresses agency. Clinical orders, referrals, signatures, permissions, supervision chains, and accountability all depend on who did what, under which role, in which facility, during which period. When this becomes muddy, governance becomes theater.

The Terminology Service [TS, the service that manages code systems, value sets, mappings, and semantic definitions] addresses meaning. Here lies one of the most neglected truths in interoperability. A blood pressure can be transported perfectly and still be semantically wrong for its downstream use. Was it sitting or standing. Was it a device observation or a manually entered retrospective note. Is the diagnosis code local, national, ICD [International Classification of Diseases, a coded classification for diseases], SNOMED CT [Systematized Nomenclature of Medicine Clinical Terms, a detailed clinical terminology], or something improvised during a long afternoon. Transport standards can carry all of these. Only semantic governance decides what they mean together.

The Shared Health Record [SHR, a normalized repository used to share selected patient-level clinical data across systems] is where many people imagine the whole HIE resides. It does not. The SHR is a carefully chosen exchange surface. It stores enough normalized clinical information to support continuity of care and certain cross-system workflows. It is not necessarily the medico-legal source chart, not necessarily the analytics warehouse, and not necessarily the operational system of record for every clinical fact. Confusing it with those other things is one of the faster ways to produce grief.

This is where a non-obvious architectural point becomes useful. OpenHIE separates reference identity services from transactional clinical exchange because stability lives in different places. Patient demographics are revised. Facility attributes change. Health worker roles change. Clinical events occur and are then interpreted, corrected, or superseded. Trying to handle all of this in one monolithic “national health database” creates brittle coupling between governance domains that change at different rates and under different ownership. OpenHIE’s decomposition is not aesthetic. It is a recognition that healthcare facts have different half-lives.

Now consider standards. HL7 v2 messages are often event-oriented and operational. They are excellent at saying that something happened in a local workflow and needs to be known elsewhere. CDA [Clinical Document Architecture, a document-based standard for structured clinical documents] packages narrative and structure together, often preserving legal and human-readable context better than more granular models, but at the cost of awkward extraction and uneven computability. FHIR offers finer-grained resources, web-native patterns, and better modularity, but it also tempts architects into believing that resource availability equals semantic agreement. It does not. A Patient resource, an Observation resource, and a Condition resource can be exchanged with exquisite technical grace while still failing to answer the receiving system’s real question.

This is the distinction between transport and meaning. Transport asks whether the parcel arrived. Semantics asks whether the recipient can tell whether it is a letter, a lab result, a promise, or a legal threat. HIE failures are regularly blamed on “data quality” when the actual problem is representational mismatch. The source system may be internally consistent. The receiver may be perfectly well built. The failure occurs because one workflow’s representation of reality is being asked to behave like another workflow’s representation of reality. That is not dirt in the data. That is a category error.

The first failure point is identity overconfidence. Teams assume that matching people is basically a solved problem plus a little fuzzy logic. It is not. Matching is an ongoing governance function with clinical consequences. False positives merge two lives. False negatives split one life into several administrative ghosts. In high-volume, multilingual, mobility-heavy environments, this problem can become the silent tax on the whole architecture.

The second failure point is canonical model vanity. Architects often create a grand unified model, pronounce it canonical, and assume normalization will rescue all downstream use cases. But normalization always throws something away. It may discard narrative nuance, timing details, local workflow context, uncertainty states, or program-specific fields that mattered in the source setting. Later, when analytics or surveillance or referral management needs that lost context, the complaint arrives as “bad data quality.” In truth, the data may have been pristine before the architecture politely sanded off its edges.

The third failure point is putting governance in the middleware and calling it architecture. Interface engines and orchestration tools can route, transform, retry, enrich, and log, but they cannot by themselves settle questions of authority. Who owns the facility list. Who approves provider role definitions. Which diagnosis coding system is official for which workflow. Which record is the source of truth for the patient’s address. Which latency is acceptable for immunization history versus stock availability. These are governance questions that software can enforce only after someone has the courage to answer them.

The fourth failure point is temporal ambiguity. Healthcare data is riddled with time, and not just one kind of time. There is event time, entry time, correction time, transmission time, publication time, and query time. A referral created on Monday, transmitted on Tuesday, corrected on Wednesday, and consumed on Thursday may all look like “the referral” unless the architecture preserves provenance and temporal semantics. Without that, dashboards lie, alerts misfire, and longitudinal analysis becomes a gentlemanly fiction.

The fifth failure point is assuming that shared records and analytics repositories are interchangeable. They are cousins, not twins. An SHR supports cross-system clinical retrieval and workflow continuity. A warehouse supports denormalized, query-heavy, often delayed analytical use. The moment one tries to make the SHR serve as the national reporting mart, or the warehouse serve real-time care continuity, one begins to hear strange grinding noises in both performance and semantics.

The sixth failure point is terminology drift. Local systems age. Programs change. code sets fork. Data entry practices mutate. One site records hypertension under a local label mapped loosely to a national concept; another captures blood pressure observations without a diagnosis; a third updates its value sets quietly after a donor-funded project closes. To the dashboard, this becomes variation. To the clinician, it becomes omission. To the architect, it should be recognized as unmanaged semantic entropy.

Then there is the organizational failure that hides inside the technical one. Fragmented ownership is encoded in the data model whether anyone admits it or not. If the immunization team owns one registry, the maternal health program owns another, laboratories answer elsewhere, and financing sits in a separate bureaucratic republic, the architecture will inherit those borders. You may integrate across them, but you will not abolish them with a standards document. What persists in org charts reappears in payloads.

Why does this problem persist, even when the standards are better, the tooling is better, and everyone has by now attended quite enough workshops to know the slogans by heart. Because HIE is not merely a technical deficit. It is the place where institutional incentives become visible in machine-readable form.

Healthcare systems generate data as a side effect of care, reporting, reimbursement, regulation, supervision, procurement, and political visibility. Those motives are not aligned. A clinician wants just enough structure to support safe care without spending the afternoon feeding a machine. A ministry wants comparable data across districts. A donor program wants indicators. A payer wants adjudication logic. A public health unit wants surveillance timeliness. Each of these pulls the architecture toward a different representational style. When these styles collide, the result is often mislabeled as poor data quality. But the real issue is representational pluralism under conflicting incentives.

This is why OpenHIE is useful from first principles. It does not promise to dissolve these tensions. It offers a way to compartmentalize them. Identity can be governed as identity. Facility authority can be governed as facility authority. Worker roles can be governed separately. Terminology can be managed as a semantic asset rather than as an afterthought buried in application tables. Clinical exchange can be routed and validated without pretending that routing itself has settled meaning.

There is also a historical reason. Many health systems, especially at national scale, did not begin as coherent digital organisms. They accreted. A tuberculosis register appeared here, a maternal program there, a district reporting platform somewhere else, then a laboratory system, then insurance, then a mobile app for community health workers, then an emergency donor requirement, then a pandemic workaround, then a spreadsheet that no one loves but everyone fears. This is not a moral failure. It is how institutions under constraint survive. The resulting architecture is path dependent. Shadow workflows become permanent. Human workaround becomes policy. The data model starts to resemble an archaeological dig.

One more deeper truth deserves stating plainly. Early-binding transformation, where source data is heavily mapped into a target model as it arrives, can make downstream systems seem clean and orderly. It can also lock in today’s assumptions so tightly that tomorrow’s questions become impossible to answer. Late-binding approaches preserve more original context and defer some interpretation to downstream use, but they impose more discipline on provenance, storage, and query logic. There is no universally pure answer here. But there is a common mistake: binding early because governance is weak, then later discovering that the architecture has become a museum of old assumptions.

Begin by refusing the fantasy of a single source of truth for all healthcare data. Truth in healthcare is distributed. What you can achieve is source-of-truth by domain and use. Let the CR govern person identity resolution. Let the FR govern facility authority. Let the HWR govern worker identity and role context. Let the TS govern official semantics, mappings, and value sets. Let clinical systems remain clinically authoritative for the parts of the record that are workflow-coupled and operationally volatile. Let the SHR hold what must be shared, not everything that exists.

Design the IOL as a policy-enforcing exchange boundary, not as a magic blender. It should authenticate, authorize, validate, route, transform where necessary, and capture failures with auditable detail. But it should not become the secret home of every business rule in the country. When business logic migrates into opaque middleware scripts, the architecture begins to rot in places no one can govern.

Treat provenance as a first-class design object. Every exchanged fact should carry enough metadata to answer uncomfortable questions later: where it came from, when it was recorded, who asserted it, whether it was transformed, under which terminology version, and whether it is original, corrected, or derived. Without provenance, longitudinal care becomes guesswork and analytics becomes rhetoric with charts.

Separate exchange models from analytical models. This sounds almost embarrassingly obvious, yet systems repeatedly ignore it. An exchange-oriented SHR should optimize for clinically relevant retrieval and interoperable sharing. An analytical platform should optimize for longitudinal queries, denormalized access patterns, cohort logic, and reproducibility. Trying to make one serve as the other usually produces a structure that is too stiff for care and too thin for analysis.

Invest in terminology governance earlier than feels comfortable. Not because it is glamorous, which it is not, but because unmanaged semantics are like damp in an old building: invisible at first, then suddenly everywhere. Establish official code systems, mapping authorities, version policies, deprecation rules, and validation pathways. Make semantic changes explicit, reviewable, and testable.

Adopt standards pragmatically. HL7 v2 remains useful where operational event messaging is entrenched and reliable. CDA can still be appropriate where document fidelity and narrative integrity matter. FHIR is powerful where granular retrieval, profile-driven exchange, and modern integration patterns are needed. The sensible question is not which standard is best in the abstract. It is which representational form best preserves meaning for the workflow boundary being crossed.

Use OpenHIE not as a product checklist but as an architectural discipline. The point is not to deploy every component because the diagram has them. The point is to understand which shared concerns must be separated so they can be governed cleanly. Some countries or regions will begin with a facility registry and an interoperability layer. Others will need a client registry first. Others may establish a terminology service only after painful failure has taught them manners. Sequence matters. Purity matters less.

And keep one final practical thought in view. Many “data quality” crises are really architecture admitting, in a cracked voice, that the system never agreed on what the data was supposed to mean once it crossed a boundary. When a blood pressure, a diagnosis, a facility code, or a provider identifier fails downstream, do not begin by scolding the clerks. Begin by asking which workflow produced the fact, which semantics governed it there, what transformation touched it, and what new use you are now trying to force upon it. That question is less theatrical and much more useful. It is also, in the long run, where real interoperability begins.