AI Health Insurance and Cruelty

There may be an apprehensive premonition in patients with insurance that Artificial Intelligence, or AI, could be used quietly, indirectly, or without meaningful consent to raise premiums, narrow coverage, intensify scrutiny, or cancel policies in ways that protect insurers from large payouts. That suspicion deserves a more hard-boiled treatment than polite regulatory prose usually gives it.

The first version of this story is legal. The second is commercial. The third, which is the one that usually matters in production, is organizational. Laws and policy documents describe what institutions say they are doing. Incentives, operational targets, vendor products, and human appetite for money describe what institutions are often tempted to do. Insurance, like many profitable bureaucracies, has long mastered the art of appearing administratively neutral while behaving economically strategic. AI does not invent that instinct. It industrializes it.

So the right starting point is not trust. It is structure. An insurer is not a moral philosopher gazing tenderly at risk. It is a revenue-preservation system under pressure to avoid loss, manage uncertainty, reduce payouts, detect fraud, retain profitable members, and shed administrative burden where it can. Even in regulated markets, those pressures do not vanish. They become more indirect, more procedural, and more skillfully disguised. A firm may be forbidden from using health status in one explicit way and yet remain highly motivated to infer future expense through adjacent signals, then operationalize that inference through workflows that do not carry the vulgar label of discrimination.

That is where AI becomes useful to the pecuniary imagination. Not because it is conscious, and not because it is secretly evil, but because it is very good at finding patterns that can be monetized while allowing human beings to keep their hands looking clean. A model does not need to say deny this sick person. It only needs to rank members by expected future cost, predict likely claim intensity, estimate appeal persistence, infer social fragility, score “engagement,” or identify cases that will be expensive to carry. Another workflow can then make the practical decisions. More prior authorization here. More documentation burden there. A more aggressive fraud review for this cluster. A less generous outreach strategy for that one. A selective escalation queue. A renewal complication. A supplementary product offer priced with curious enthusiasm. A policyholder does not experience these as model outputs. A policyholder experiences them as friction, delay, exhaustion, or silent exclusion.

That is why a purely legal reading of the problem is often too neat. Law governs declared categories. Commerce works through substitutes, proxies, and workflows. The formal rule may say a health plan cannot directly price a compliant product using health status in a certain market. Fine. The system can still infer likely future cost from pharmacy behavior, address instability, provider patterns, claim histories, missed appointments, payment interruptions, digital behavior, call-center interactions, or dozens of other signals that live one euphemism away from sickness, poverty, disability, language difficulty, or low bargaining power. Nothing in corporate life is more common than obeying the letter while mugging the spirit in a side alley.

This is not always grand conspiracy. Often it is greed diluted through committees until no individual feels monstrous. Product wants better loss ratios. Finance wants better margin. Operations wants shorter queues. Fraud wants higher sensitivity. Care management wants to target members “most likely to benefit,” which can become a lovely phrase for preferential resource allocation. Data science wants a model that predicts cost well. Compliance wants paperwork showing there was review. A vendor arrives promising optimization, payment integrity, leakage reduction, adverse selection control, utilization management efficiency, or fraud savings. No one has to say let us quietly make life harder for expensive people. The architecture can say it for them.

The result is a system in which greed becomes modular. That is the truly modern touch. No cigar-chomping villain need sit under a green lamp deciding who gets crushed. The work is divided across teams, models, queues, contracts, dashboards, thresholds, and performance indicators. One group enriches the data. Another builds a score. Another decides how the score enters workflow. Another tunes alert sensitivity. Another writes the review protocol. Another handles appeals under impossible throughput targets. Another audits only whether a box was checked, not whether the whole apparatus is just a respectable machine for discouraging costly lives. Fragmentation is not a bug here. It is the camouflage.

This is where transport and meaning part company, and they part company with a loud crack. A system may ingest data beautifully from Fast Healthcare Interoperability Resources, or FHIR, which is Health Level Seven’s modern standard for exchanging healthcare data as granular digital resources, from claims feeds, pharmacy transactions, enrollment files, laboratory interfaces, or call-center systems. The pipelines can be fast, validated, complete, and technically elegant. That proves almost nothing about the fairness of the inferences built atop them. Transport moves records. Meaning arises later, when those records are turned into features, and features are turned into judgments. A record that a prescription was not filled may be interpreted as nonadherence, financial hardship, access failure, improvement, confusion, pharmacy stockout, or transportation trouble. The machine will usually choose the interpretation that best fits the business objective it was trained to optimize.

That is why representation failures are so often mislabeled as data quality failures. The data are not always wrong in the narrow clerical sense. The trouble is that the institution forces them to mean more than they can honestly bear. Administrative records are artifacts of reimbursement, coverage design, local coding culture, human fatigue, documentation habit, and organizational incentives. They are not transparent readouts of illness, virtue, or intent. A diagnosis code may indicate confirmed disease, suspected disease, ruled-out disease, defensive coding, reimbursement strategy, or simple templated inertia. A missed follow-up can signify recovery, depression, poverty, caregiving overload, lost employment, poor transport, bad network design, or a phone number that died with the electricity. When AI learns from these records, it does not simply learn risk. It learns the sediment of the system that produced them. And because social inequality itself leaves regular tracks in data, a model can become exquisitely accurate at rediscovering disadvantage while describing the process as objective triage.

That last point is where greed and fallibility become partners rather than rivals. One does not need a perfectly cynical institution. One needs an institution willing to benefit from its own misunderstanding. Human beings are very good at this. If a model produces savings, if denials appear procedurally defensible, if appeals rates remain manageable, if regulators do not peer too closely, and if the people most harmed have the least power to contest the decision path, then a great many organizations will accept the result with the relieved conscience of the well-paid. They will call it innovation, modernization, evidence-based targeting, or administrative stewardship. Language, in these settings, often serves the same purpose that snow serves in detective novels. It covers footprints.

The developing-world form of the problem may be cruder and therefore, in a way, more honest. Where consumer protection is weaker, where consent is theater, where data brokers, telecom exhaust, financial traces, and health-adjacent signals mingle more freely, insurers or associated intermediaries may not even need elaborate doctrinal camouflage. They can construct shadow risk profiles from cross-domain data and feed those profiles into underwriting, renewal, pricing, or exclusionary workflow design. In such settings, the technical stack can become a sort of informal caste mechanism written in scoring logic. People are sorted by inferred worthiness, predicted expense, and expected contestability. The machine need not know the language of morality. It only needs to know who is expensive and who is easy to push aside.

Developed countries tend to perform the same maneuver in a suit and tie. They surround the process with policy statements, review committees, fairness language, and carefully delimited declarations about what the model does not do. That can matter. Constraints are real. But institutions also become skillful at moving consequential decisions upstream or downstream of the officially regulated point. If the premium itself is constrained, then resource allocation, utilization review, network design, retention strategy, supplemental product design, documentation burden, and appeals friction begin to carry more of the economic load. The spirit of exclusion migrates. It does not retire.

This is why the soothing phrase human in the loop deserves suspicion. A human reviewer who sees only a ranked queue, trusts the score by default, lacks visibility into feature provenance, works under throughput pressure, and is rewarded for consistency rather than wisdom is not a sovereign decision-maker. That human is an instrument panel with a pulse. In some organizations, the human mainly exists to preserve contestability in court and deniability in public. The signature is real. So is the structural coercion beneath it.

There is another uncomfortable truth here. Insurance companies are not the only relevant actors. Pharmacy benefit managers, utilization management vendors, fraud detection vendors, analytics firms, data brokers, revenue integrity platforms, employer wellness ecosystems, and assorted optimization merchants can all insert models into the decision chain. By the time the consumer is denied, delayed, repriced, flagged, or exhausted, no single actor may fully understand the causal path. That opacity is not incidental. It is often commercially convenient. Shared responsibility is the bureaucratic cousin of no responsibility.

For Healthcare Information Technology, or Healthcare IT, architects, this means the object of scrutiny cannot be only the final denial letter or the official pricing rule. The entire adverse-decision pipeline has to be mapped. Which data sources are ingested. Which transformations occur. Which inferred variables are created. Which model outputs become features for later models. Which thresholds drive workflow. Which queues are manual and which are automatic. Which vendor black boxes sit inside the chain. Which explanations can actually be given to a member. Provenance here is not academic decoration. It is the difference between understanding a system and being fooled by its paperwork.

A non-obvious architectural insight is that some of the most harmful decisions may not appear in systems labeled as insurance adjudication at all. They may arise in member engagement tools, fraud scoring layers, customer service routing, payment integrity edits, retention analytics, or utilization-management prioritization. Because these are often classified as operational support rather than core benefit determination, they can escape the strongest scrutiny while still materially shaping who gets timely care, who gets dragged through administrative gravel, and who eventually gives up. In other words, the architecture of discouragement often lives in the margins, not the headline transaction.

That is also why fairness testing limited to the final model is inadequate. Bias can accumulate compositionally. A somewhat skewed cost-prediction model feeds a somewhat skewed review-prioritization model, which feeds a somewhat skewed human workflow, which feeds a somewhat skewed appeals pathway. Each layer can look tolerable in isolation. Together they can produce a ruthless asymmetry. This is how modern systems often injure people: not through one scandalous switch, but through many boring switches all tilted a few degrees in the same direction.

The deeper truth is that healthcare data systems have always encoded moral judgments while pretending merely to record facts. Billing categories, coverage determinations, medical necessity rules, diagnosis hierarchies, quality measures, fraud heuristics, utilization norms, and risk adjustment formulas all translate institutional values into structured data. AI enters a world already thick with judgment. It does not arrive as a neutral mathematician. It arrives like a clerk who can read the ledgers faster than anyone else and imitate the prejudices in them with frightening efficiency.

So what architectural direction remains possible, assuming one wishes not to build a silkier apparatus for exclusion.

First, governance has to attach to consequence, not to formal job title or vendor category. A model that directly changes price and a model that indirectly channels costly members into harsher workflows are morally and operationally related systems. Treating one as high-risk and the other as innocuous is a category error of the most expensive kind.

Second, proxy analysis has to be treated as an engineering discipline rather than a legal afterthought. If variables such as geography, pharmacy pattern, digital engagement, call-center sentiment, missed appointment history, provider selection, payment irregularity, or device characteristics are functioning as stand-ins for disability, poverty, race, language barriers, or other protected vulnerabilities, then the organization is not avoiding discrimination simply because it did not type the forbidden word into the feature list.

Third, explanations must be built for the affected person, not for the audit binder. If the system cannot tell a member in intelligible terms why a case was flagged, delayed, or escalated, what data drove that pathway, and how to contest both factual error and inferred meaning, then the system is too opaque for consequential use.

Fourth, monitoring should focus on burden distribution, not just classification accuracy. Who receives more documentation demands. Who waits longer. Who is more frequently routed to manual review. Who abandons claims or appeals. Who experiences repeated friction before any formal denial occurs. These are often better indicators of exclusionary behavior than the clean, well-ironed metrics presented in governance slide decks.

Fifth, some uses should simply be prohibited. Not everything that can be inferred should be operationalized. In a domain where illness, disability, poverty, and administrative power collide, there are predictions whose commercial usefulness is precisely what makes them dangerous. The technical culture of optimization has a chronic weakness for mistaking what is computable for what is legitimate. Insurance gives that weakness very sharp teeth.

So the more nuanced version is darker, not lighter. It is not that insurers are uniquely villainous and twirl their mustaches in secret rooms. It is that ordinary greed, institutional self-interest, fragmented responsibility, legal minimalism, and human willingness to profit from abstraction can produce systems that behave cruelly without requiring any individual to feel especially cruel at all. AI fits this world perfectly. It offers speed, pattern recognition, deniability, and scale. It can convert hunch, prejudice, and financial appetite into ranked outputs that look clinical on a dashboard and punishing in a life.

The real question is therefore not whether a statute somewhere describes a noble boundary. The real question is whether the operational stack allows expensive, vulnerable, or inconvenient people to be identified early and then quietly managed as liabilities. In many systems, that is not a speculative dystopia. It is simply what optimization looks like after it has learned manners.