How VA Healthcare Data Systems Work: From MUMPS to SQL

By
IMG 20260425 WA0005

The Quiet Behemoth in the Basement

Somewhere, in a server room that smells of ozone and institutional anxiety, a database written in a language older than the internet itself is still deciding whether a veteran in Omaha gets their medication refilled today.

This is not a joke. This is the Veterans Health Information Systems and Technology Architecture—VistA, to its friends, though it has few—and it has been the digital backbone of the largest integrated healthcare system in the United States for over four decades. Four decades. That is longer than most of the people reading this have been alive, and certainly longer than most modern programming languages have existed in any meaningful form. VistA runs on MUMPS, which stands for Massachusetts General Hospital Utility Multi-Programming System, a name that sounds like it was generated by a committee of exhausted clinicians in 1966, which, in fact, it was.

MUMPS is not a database. Not exactly. It is a programming language with a built-in hierarchical database, or perhaps a database that happens to be programmable, or perhaps a conceptual error that accidentally became the most reliable healthcare data system in American history. The distinction matters less than you might think. What matters is that VistA, built atop MUMPS and its companion FileMan (a data management utility that sounds like a superhero from a very boring comic), has stored the medical records of millions of veterans across hundreds of hospitals and thousands of clinics, and it has done so with a stubborn, creaking, almost heroic reliability that modern cloud-native microservices can only dream of.

But dreams are contagious. And the dream of the modern VA is to move this data—this vast, sprawling, deeply idiosyncratic accumulation of clinical encounters, prescriptions, lab results, radiology reports, and the occasional note scribbled in a language that only one retired physician in Biloxi can still decipher—into the clean, well-lighted world of SQL databases and data warehouses.

This is not a simple migration. It is not a matter of exporting a CSV and importing it into PostgreSQL. It is a translation between two fundamentally different ways of thinking about information, two epistemologies of data, two civilizations that happen to share a border but do not share a language.

Who Built This, and Why Are They Still Here?

The story begins, as so many American technological stories do, with a war and a bureaucracy and a group of people who were told something was impossible.

In the late 1960s, the Veterans Administration—now the Department of Veterans Affairs—was a sprawling network of hospitals that shared information through paper, telephones, and the slow osmosis of institutional memory. Computers existed, but they were expensive, finicky, and mostly used for billing. The idea that a physician in San Francisco might instantaneously access the medical history of a veteran who had been treated in Boston was science fiction, and not the optimistic kind.

Then came MUMPS. Developed by Neil Pappalardo and his colleagues at Massachusetts General Hospital, MUMPS was designed for a specific problem: how to store and retrieve medical data quickly on the limited hardware of the era. It was not elegant. It was not theoretically beautiful. It was fast, it was compact, and it could run on machines with less memory than your smart toaster. MUMPS used a hierarchical data model, which means data is stored in a tree-like structure of nodes and subnodes, rather than the flat tables of relational databases. Think of it as a filing cabinet where each drawer contains smaller filing cabinets, which contain even smaller filing cabinets, and somewhere at the bottom of this infinite regression is the actual piece of paper you need.

FileMan, developed by George Timson and others at the VA in the 1970s, was the layer that made MUMPS usable by mere mortals. It provided a schema definition system, a query language of sorts, and the ability to define relationships between data elements without requiring a computer science degree. FileMan is, in many ways, the true genius of the VistA system: it allowed clinicians, not programmers, to define the structure of medical data. A nurse in Wichita could create a new data field for tracking a specific clinical protocol, and that field would propagate through the system with a flexibility that modern enterprise software, with its change advisory boards and six-month release cycles, finds almost offensive.

The people who built this system were, by and large, VA employees. They were not Silicon Valley wunderkinds. They were not chasing stock options. They were civil servants, clinicians, and the occasional eccentric programmer who found in MUMPS a kind of perverse aesthetic satisfaction. Many of them are still here, in the sense that their code is still running, their design decisions still constraining what the system can and cannot do, their comments in the source code still providing the only documentation for why something works the way it does.

This matters because institutional knowledge is not merely a resource; it is an ecosystem. When you migrate from MUMPS to SQL, you are not just moving data. You are moving—or losing—decades of embedded expertise, of tacit understanding, of the kind of knowledge that lives in the fingers of people who have been debugging the same system since the Carter administration.

What Is This Thing, Exactly?

VistA is not a single program. It is an architecture, a philosophy, a collection of over 100 integrated software packages that handle everything from patient registration to pharmacy management to laboratory reporting to scheduling to billing to the electronic health record itself. It is, in the terminology of software engineering, a monolith. Not a microservice in sight. Not a container. Not a Kubernetes cluster. Just a vast, interlocking set of routines written in MUMPS, stored in a global database (the term “global” in MUMPS refers to a persistent, disk-resident variable, not a worldwide system, though the confusion is poetic), and accessed through a terminal emulator that makes the user interface look like something designed for a 1980s airport check-in counter.

The core of VistA is the electronic health record, which in VA parlance is called the Computerized Patient Record System, or CPRS. CPRS is the interface that clinicians see, and it is, by modern standards, almost aggressively ugly. Green text on black background. Function keys. No mouse required, or particularly useful. And yet, and yet—clinicians who use it often love it, or at least respect it with the grudging affection one reserves for a tool that never fails when it matters. CPRS is fast. It is predictable. It does not hide information behind animated loading screens or conversational AI chatbots. It presents the data, all of it, immediately, in a format that a physician can read and act upon in seconds.

Underneath CPRS is the data layer, which is where MUMPS and FileMan live. FileMan defines the schema, but the schema is not rigid in the way that a SQL schema is rigid. In a relational database, you define your tables, your columns, your data types, your foreign keys, and then you live within that structure. If you need to add a new field, you alter the table, and if you have a lot of data, that alter operation can take hours or days and require careful planning and downtime windows. In FileMan, you can add a new field to a file (the MUMPS equivalent of a table) in seconds, and the system will happily store whatever you put in it, with minimal type checking and maximal flexibility.

This flexibility is both the genius and the horror of the system. It allows rapid adaptation to clinical needs. It also means that the same field might contain a date, a string, a number, or a reference to another file, depending on who entered the data and when and what they were thinking at the time. Data quality in VistA is a known issue, not because the system is poorly designed, but because it was designed for a world in which data entry was performed by humans under stress, and in which the system prioritized capturing information over enforcing strict validation.

When Did the World Change?

The question of when is not simple. VistA did not emerge fully formed; it evolved, accreted, grew like a coral reef, layer upon layer of functionality added in response to specific clinical needs, specific regulatory requirements, specific crises.

The 1970s were the foundational era. The Decentralized Hospital Computer Program, or DHCP, established the architecture and the principle that each VA medical center would have its own local VistA instance, with data shared through a network that was, by the standards of the time, remarkably advanced. The 1980s saw the spread of VistA across the VA system, the development of CPRS, and the gradual realization that this system was actually working, was actually improving care, was actually saving lives by preventing medication errors and ensuring continuity of care for veterans who moved between facilities.

The 1990s brought the internet, and with it the first serious questions about whether MUMPS could survive in a networked world. The answer, surprisingly, was yes. MUMPS databases could be accessed over TCP/IP. Web interfaces could be built. The system adapted, not because it was designed for the internet, but because its fundamental simplicity made it surprisingly portable. A MUMPS global is just a string of characters on disk, accessed through a simple API. There is no complex query optimizer, no ORM layer, no middleware. Just data, and code that reads and writes it.

The 2000s brought the Bush administration’s push for electronic health records, the Obama administration’s billions in health IT funding, and the gradual realization that the rest of American healthcare was building systems that were, in many ways, worse than VistA. The commercial EHR vendors—Epic, Cerner, Allscripts—built systems that were prettier, more marketable, and often more frustrating to use. The VA’s own attempts to replace VistA with commercial systems, notably the failed iEHR (integrated Electronic Health Record) initiative with the Department of Defense, became cautionary tales of what happens when you try to replace a system that works with a system that is supposed to work better but does not.

The 2010s and 2020s have been the era of data migration, data warehousing, and the slow, grinding attempt to extract value from the vast repository of VistA data. The VA established the Corporate Data Warehouse, or CDW, which is not a replacement for VistA but a companion: a SQL-based system that extracts data from VistA, transforms it into a relational format, and makes it available for reporting, analytics, and research. This is where the MUMPS-to-SQL translation happens, and it is where the real complexity lies.

Where Does This Translation Happen?

The translation happens in the shadows. In the ETL pipelines—Extract, Transform, Load—that run in the hours between midnight and dawn, when the clinicians are sleeping and the servers can be burdened with the heavy work of reading every record in a MUMPS global and figuring out what it means in SQL terms.

The Corporate Data Warehouse is the primary destination. CDW is a massive SQL Server installation, though the VA has been experimenting with other platforms, that attempts to impose relational structure on the hierarchical chaos of VistA. It does this through a combination of automated mapping, manual curation, and the occasional prayer.

The mapping process is where the epistemological collision occurs. In MUMPS, a patient’s medication history might be stored in a global that looks something like this: ^PS(55,12345,5,1,0)="ASPIRIN^80^20230115^...". This is a string. It contains multiple data elements separated by caret characters. The first element is the drug name, the second is the dosage, the third is the date, and so on. To translate this into SQL, you need to parse this string, split it on the carets, map each element to a column in a table, handle the cases where the string is malformed or missing elements or contains unexpected data, and then insert the result into a table with a predefined schema.

This sounds straightforward. It is not. Because VistA was developed over decades by thousands of programmers, the format of these strings is not always consistent. The same global might contain data in different formats depending on when it was entered. There are “pointer” fields that reference other files, but the references are not always valid. There are “word-processing” fields that contain free text, sometimes entire clinical notes, stored as a series of subnodes that must be concatenated and cleaned. There are fields that were added by local modifications at a specific VA facility and exist nowhere else in the system.

The ETL process must handle all of this. It must be robust enough to process millions of records without failing, but flexible enough to accommodate the endless variety of edge cases. It must be auditable, because healthcare data is regulated by HIPAA and a host of other laws that require you to know exactly what happened to every piece of patient information. And it must be reversible, or at least traceable, because when a researcher finds an anomaly in the CDW data, someone needs to be able to trace it back to the original MUMPS record and figure out whether the anomaly is in the data or in the translation.

This happens in data centers, in cloud environments, in the offices of contractors who specialize in the arcane art of VistA extraction. It happens in the minds of the few remaining programmers who can read MUMPS code and understand what it is doing, who can look at a global and know, instinctively, whether the data it contains is trustworthy or suspicious.

Why Does Any of This Matter?

It matters because veterans matter. The VA is not a commercial enterprise. It is a promise, made by a nation to the people who fought its wars, that their healthcare will be provided, free of charge, for the rest of their lives. This promise is kept, imperfectly but persistently, by a system that treats millions of patients, many of them with complex, chronic conditions that require coordination across multiple specialties, multiple facilities, multiple decades.

The data in VistA is not abstract. It is the record of a veteran’s service, their injuries, their treatments, their medications, their progress, their setbacks. It is the memory of the system, and like human memory, it is messy, incomplete, sometimes contradictory, but ultimately indispensable. When you translate this data into SQL, you are not performing a technical exercise. You are translating lives. You are creating the possibility that a researcher, querying the Corporate Data Warehouse, will discover a pattern in medication responses that leads to a new treatment for PTSD. You are creating the possibility that a quality improvement officer will identify a facility that is underperforming and intervene before more veterans are harmed. You are creating the possibility that a veteran, moving from one VA hospital to another, will have their complete medical history available instantly, rather than waiting days for records to be faxed.

But the translation also matters because it is a case study in the larger problem of technological legacy. Every industry, every institution, every society is built on layers of technology that were designed for a different era, with different assumptions, different constraints, different dreams. The bridges we drive across, the power grids we depend on, the legal codes we live under—all of these are legacy systems, and all of them require translation into modern terms if they are to remain useful. The VA’s struggle to move from MUMPS to SQL is not unique. It is universal. It is the human condition, expressed in code.

How Does the Translation Actually Work?

Let us descend into the mechanics. Let us open the hood and look at the engine, which is not an engine at all but a series of nested routines that would make a computer science professor weep.

The first step is extraction. The ETL process connects to the VistA system, either through a direct database connection or through an intermediate format like a VA FileMan export, and reads the globals. This is harder than it sounds because MUMPS databases are not designed for bulk extraction. They are designed for transactional access: one record at a time, read quickly, updated quickly. Reading an entire global, which might contain billions of nodes, is like trying to drink a lake through a straw. It requires patience, optimization, and the willingness to accept that the process will take hours.

The extraction process must also handle concurrency. VistA is a live system. Clinicians are entering data while the extraction is running. The ETL process must either run against a snapshot (which requires stopping the database, which is unacceptable in a 24/7 healthcare environment) or handle the fact that the data is changing while it is being read. This is typically done through timestamp-based extraction, reading only records that have changed since the last extraction, or through transaction log reading, which is possible in some MUMPS implementations but not in others.

The second step is transformation. This is where the real work happens. The extracted data, in its raw hierarchical form, must be mapped to a relational schema. This mapping is not one-to-one. A single MUMPS global might map to multiple SQL tables. A single SQL table might draw data from multiple globals. Relationships must be inferred from pointer fields, which are often just numeric identifiers that reference other files. Data types must be converted: MUMPS stores everything as strings, so dates must be parsed, numbers must be validated, booleans must be inferred from the presence or absence of specific characters.

The transformation process must also handle the semantic richness of clinical data. A medication order in VistA is not just a drug name and a dosage. It is a complex clinical event with multiple statuses (pending, active, discontinued, expired), multiple providers (ordering, verifying, administering), multiple dates (ordered, started, stopped, renewed), and multiple associated documents (the order itself, the administration record, the pharmacy fill record, the billing record). Translating this into SQL requires a deep understanding of clinical workflows, of how medications actually move through a healthcare system, of the difference between an order and a prescription and a dispense and an administration.

The third step is loading. The transformed data is inserted into the SQL database, which is typically optimized for analytical querying rather than transactional processing. This means denormalized tables, star schemas, precomputed aggregates, and all the other techniques of data warehousing. The load process must be efficient—inserting millions of rows into a SQL database is not trivial—and it must be reliable. If the load fails halfway through, the warehouse is in an inconsistent state, and someone must figure out what went wrong and how to fix it without corrupting the data.

The final step, which is often overlooked but critically important, is validation. The data in the SQL warehouse must be checked against the source data in VistA. This is done through reconciliation queries, through sampling and manual review, through automated checks that compare counts and sums and distributions. The goal is not perfection—perfection is impossible when translating between fundamentally different data models—but sufficiency. The warehouse must be good enough for its intended purposes: reporting, research, quality improvement, operational analytics.

Which Technologies Make This Possible?

The technologies are a patchwork, a palimpsest, a layer cake of historical accident and deliberate choice.

At the bottom is MUMPS itself, which persists in several implementations. The most common in the VA is GT.M (now known as YottaDB), an open-source MUMPS database that runs on Linux and provides the transactional backbone for VistA. GT.M is fast, reliable, and almost entirely devoid of the features that modern developers expect: no JSON support, no REST API, no ORM, just globals and routines and the relentless efficiency of a system that was optimized for hardware with less power than a modern smartphone.

On top of GT.M is FileMan, which provides the schema layer. FileMan is not a database management system in the modern sense, but it is the closest thing VistA has to one. It allows the definition of files, fields, data types, and relationships, and it provides a query language (the FileMan print template system) that can extract data in various formats. FileMan is also the interface through which most VistA customization happens: new clinical applications, new data fields, new workflows are typically implemented as FileMan files and MUMPS routines.

The extraction layer typically uses a combination of MUMPS routines (written in the MUMPS language itself) and external tools. The VA has developed various extraction utilities over the years, some of them official products, some of them local innovations that spread through the system by word of mouth and shared code. These utilities read the globals and output data in formats that can be consumed by the ETL process: flat files, HL7 messages, XML documents, or increasingly, direct database connections.

The ETL layer is where the modern world begins to appear. The VA uses commercial ETL tools like Informatica, Microsoft SQL Server Integration Services (SSIS), and Talend, as well as custom-built pipelines in Python, Java, and other languages. These tools provide the scheduling, monitoring, error handling, and transformation logic that turn raw VistA extracts into structured SQL data. They run on servers, in virtual machines, increasingly in cloud environments like Amazon Web Services or Microsoft Azure, though the VA’s migration to the cloud has been cautious and politically fraught.

The SQL layer is typically Microsoft SQL Server, which the VA has used for the Corporate Data Warehouse since its inception. SQL Server provides the relational engine, the analytical services, the reporting tools, and the integration with the rest of the Microsoft ecosystem that the VA has standardized on. There are experiments with other platforms—PostgreSQL, Oracle, cloud-native warehouses like Snowflake or BigQuery—but SQL Server remains the dominant technology.

On top of the SQL warehouse are the analytics and reporting tools. The VA uses Business Objects, Tableau, Power BI, and a host of custom applications to query the CDW and present results to clinicians, administrators, and researchers. These tools speak SQL, which is why the translation from MUMPS to SQL is necessary: they cannot query MUMPS directly, and even if they could, the hierarchical structure of MUMPS data is poorly suited to the tabular presentations that these tools expect.

And finally, there is the emerging layer of machine learning and artificial intelligence, which promises to extract insights from the CDW that would be impossible to discover through traditional querying. Predictive models for readmission risk, natural language processing of clinical notes, image analysis of radiology studies—these technologies depend on the CDW as their data source, and they depend on the quality of the translation from MUMPS to SQL for their accuracy.

The Misconceptions, the Trade-offs, the Unresolved Questions

There is a misconception that this migration is simply a matter of technology, of replacing an old database with a new one. It is not. It is a matter of ontology, of how we choose to represent reality in digital form. MUMPS and SQL are not just different storage formats. They are different ways of thinking about what data is and how it relates to the world.

MUMPS is organic. It grows. It adapts. It tolerates ambiguity and inconsistency because the clinical world is ambiguous and inconsistent. A patient might have two different diagnoses from two different providers, and MUMPS will store both without judgment. SQL is normative. It enforces. It requires decisions: which diagnosis is primary? Which is secondary? How do we resolve conflicts? These are not technical questions. They are clinical questions, ethical questions, questions about the nature of medical truth.

The trade-off is between flexibility and consistency. MUMPS is flexible to a fault. SQL is consistent to a fault. The CDW attempts a middle ground, but the middle ground is unstable. Every transformation rule is a decision about what to preserve and what to discard, what to normalize and what to leave messy, what to trust and what to question. These decisions are made by committees, by contractors, by individual programmers working late at night, and they are embedded in the ETL code where they are difficult to review and nearly impossible to change once the warehouse is in production.

There are unresolved questions about the future of VistA itself. The VA has been trying to replace it for decades, with commercial systems, with modernized versions, with cloud-native architectures. None of these attempts have succeeded completely. VistA persists, not because it is loved, but because it works, because the cost of replacement is measured not just in dollars but in the risk to patient care during the transition. The Oracle Cerner Millennium system, which the VA is currently attempting to deploy, has been plagued by delays, cost overruns, and clinical concerns. Meanwhile, VistA continues to process millions of encounters per year, its MUMPS globals accumulating data like sediment, like memory, like the layers of a city built upon itself.

There is also the question of the workforce. The programmers who understand MUMPS are retiring. The pipeline of new MUMPS developers is nonexistent. The VA relies increasingly on contractors, on legacy support companies, on the diminishing pool of people who can read the code and understand the data. This is not sustainable. At some point, the knowledge will be lost, and the system will become unmaintainable, a black box that works until it doesn’t, and then fails catastrophically.

Stepping Back

We began with a server room and a smell of ozone, and we end with a question that is larger than any database, any programming language, any healthcare system.

The question is this: how do we preserve the past without being imprisoned by it? How do we honor the systems that were built with ingenuity and necessity, that served their purposes with a reliability that modernity often fails to match, while still moving forward into a future that demands different capabilities, different interfaces, different ways of thinking?

The VA’s data systems are a case study in this tension. VistA is a monument to a particular moment in the history of computing, when hardware was scarce and programmer time was abundant, when systems were built to be understood and modified by the people who used them, when the boundary between developer and user was permeable and the idea of “clinical informatics” was just beginning to take shape. The translation to SQL is a monument to a different moment, when data is abundant and analyst time is scarce, when systems are built to be queried by machines and algorithms, when the boundary between human and machine intelligence is permeable and the idea of “artificial intelligence in healthcare” is just beginning to take shape.

Both moments are valid. Both are necessary. The translation between them is not merely technical; it is historical, philosophical, almost archaeological. We are excavating the digital past, carefully brushing away the layers of sediment, trying to understand what the builders meant, what the data meant, what the system meant, so that we can carry that meaning forward into a future that will inevitably misunderstand it, transform it, and eventually bury it under new layers of its own.

The veteran whose medication history is stored in a MUMPS global in Omaha does not care about any of this. They care about whether their prescription is refilled on time, whether their doctor knows their history, whether the system that serves them is reliable and responsive and humane. This is the ultimate measure of any healthcare technology: not the elegance of its architecture, not the modernity of its stack, but the quality of care it enables, the trust it maintains, the promise it keeps.

And so the translation continues, in the hours between midnight and dawn, in the ETL pipelines that parse carets and pointers and free text, in the SQL queries that aggregate and analyze and report, in the machine learning models that learn patterns from the accumulated data of millions of lives. It is imperfect. It is necessary. It is the work of maintaining a promise, one record at a time, one transformation at a time, one query at a time, across the vast and stubborn and strangely beautiful landscape of American veterans’ healthcare.

The server room still smells of ozone. The MUMPS globals still spin on their disks. The SQL warehouse still grows, table by table, row by row. And somewhere, a veteran gets their medication, and goes home, and lives another day, unaware that their health, their history, their very digital existence, has been translated from one language to another, from one world to another, by the quiet, persistent, almost invisible labor of people and systems that will never be thanked, but that keep the promise anyway.

P.S. References: VA Office of Information and Technology Technical Documentation; VistA Monograph (Department of Veterans Affairs); “MUMPS: The Complete Standard” (MDC); YottaDB/GT.M Documentation; VA Corporate Data Warehouse Technical Specifications; “The History of VistA” (Hardhats.org); Oracle Cerner Millennium VA Implementation Reports (2020-2025).

Topics Discussed

  • Healthcare IT
  • Veterans Affairs
  • VistA
  • MUMPS
  • Data Migration
  • SQL
  • Medical Informatics
  • SuvroGhosh

© 2026 Suvro Ghosh