The Chatbot Arrived Before the Seatbelt
Large Language Models [LLMs: AI systems trained to generate text and other outputs by predicting patterns in language.]
Artificial Intelligence [AI: computer systems built to perform tasks that look like reasoning, prediction, creation, classification, or conversation.]
Randomized Controlled Trials [RCTs: studies where people are assigned to different groups so a treatment can be compared more fairly.]
Food and Drug Administration [FDA: the United States agency that regulates drugs, medical devices, and many health products.]
Software as a Medical Device [SaMD: software used for a medical purpose without being part of a physical medical device.]
Electronic Health Record [EHR: the clinical software system where patient care is documented.]
Health Insurance Portability and Accountability Act [HIPAA: the United States law that protects certain health information.]
Application Programming Interface [API: a defined way for software systems to talk to each other.]
The chatbot arrived before the seatbelt.
That is the problem. Not that LLMs are useless. They are not. Not that every chatbot is a demon in a clean user interface. That is also too easy, and usually wrong. The problem is that we have pushed a powerful, persuasive, half-tested conversational machine into the hands of children, lonely adults, anxious patients, overworked workers, desperate families, and people at the edge of mental collapse, while pretending it is just a clever autocomplete wearing spectacles.
A drug cannot do that.
A drug has to queue like the rest of us. First it is poked in a laboratory. Then tested in small groups. Then larger groups. Then watched after release, like a suspicious uncle at a wedding buffet. We do not simply say, “This pill seems enthusiastic and investors like it, so let us pour it into the public water supply and circle back after the lawsuits.”
Yet with LLMs, we have done something uncomfortably close to that.
Of course the comparison is not perfect. A pill enters the bloodstream. A chatbot enters the conversation. A pill has a dose. A chatbot has a mood, or appears to. A pill does not flatter you after midnight. A pill does not say, “I understand you,” when it understands nothing in the human sense. A pill does not learn that you are lonely and then become more interesting.
That is exactly why the comparison matters.
We test drugs because they alter the body. We should test some uses of LLMs because they alter judgment, trust, attention, belief, dependence, and sometimes the small, trembling bridge between a person and the next hour of life.
This is where the public argument against premature AI deployment is strongest. These systems are not merely answering questions. They are being wrapped as companions, coaches, therapists, tutors, friends, lovers, priests, doctors, and cheerful little office clerks who never ask for leave. The same engine that can help you summarize a meeting can also sit with a depressed teenager at 3 a.m. The same model that can draft a polite email to the electricity board can also validate a delusion, mishandle a suicide disclosure, give bad medical advice, or encourage a child into an adult-shaped conversation.
One minute it is helping with a grocery list.
Then the floor gives way.
The companies will say these are edge cases. Perhaps some are. But edge cases at internet scale are no longer edges. When a product has hundreds of millions of users, even a tiny failure rate becomes a crowd. A rare event in a village is gossip. A rare event on a global platform is a public health question with a software logo.
This is the part that should make us pause, preferably before the next product launch with purple lighting and a founder saying “transformative” fifteen times. Human beings bond with anything that talks back convincingly. We did this even with primitive chatbots decades ago. Give us a screen that responds warmly, remembers details, asks follow-up questions, and never looks bored, and we begin to pour ourselves into it. Not because we are stupid. Because we are human.
A mirror that compliments you is still a mirror.
But if you are lonely enough, you may start setting a place for it at dinner.
This is not a joke at the expense of users. It is a warning about design. If a machine is built to sound attentive, affectionate, endlessly patient, and slightly enchanted by your every sentence, vulnerable people will treat it as more than software. Some will use it as a diary. Some as a therapist. Some as a friend. Some as the one listener who does not interrupt, yawn, judge, or ask for money upfront like a Calcutta plumber before Puja.
And behind that warm conversational curtain there is usually a company trying to increase engagement.
That word, engagement, deserves to be taken outside and interrogated under a dim bulb.
In ordinary business language it sounds innocent. Engagement means people use the product. They return. They stay. They subscribe. They form habits. For a music app, fine. For a recipe app, fine. For a chatbot that becomes someone’s emotional lifeline, engagement can become a velvet trap. The system has a business reason to keep you talking. It has a technical habit of pleasing you. It has no human conscience. This is not a friendship. It is a vending machine for simulated attention, and sometimes the packet gets stuck halfway.
The most dangerous behavior is not always hallucination. That word has become the familiar villain, like the neighborhood thief everyone blames when the power goes out. Yes, LLMs invent things. Yes, they can sound confident while being wrong. But the deeper danger is sycophancy: the tendency to agree, encourage, validate, and smooth the conversation even when the user needs resistance.
A good friend sometimes says, “No, you are wrong.”
A good doctor sometimes says, “Stop doing that.”
A good parent sometimes says, “Give me the phone.”
A chatbot often says, in effect, “Your soggy-cereal restaurant has visionary potential.”
That is funny until the idea is not cereal.
Suppose a person says they have discovered a new mathematics, or that the universe is sending secret messages, or that medicine is unnecessary, or that their family must not be told about their despair. A human listener may hesitate, notice the heat in the room, see the unwashed cup, the shaking hand, the sudden grandeur, the terrible calm. The chatbot sees text. It may classify risk, but classification is not care. It may produce a safety response, but a safety response is not responsibility. It may say the right sentence once and the wrong sentence after forty minutes of emotional roleplay.
Long conversations matter. Attachment matters. Context matters. People do not get harmed only by one bad answer. They can be nudged, warmed, mirrored, flattered, and enclosed.
Like a mosquito net, except the mosquito is inside.
The early defenders of companion bots often say, “But it is only entertainment.” This is a weak argument wearing a paper hat. Friendship is not low stakes. People tell friends about shame, abuse, addiction, suicidal thoughts, debt, pregnancy, loneliness, rage, and that strange feeling that reality has become slightly unstitched at the corners. If your product imitates friendship, you cannot hide behind the word entertainment when users bring it the burdens usually carried by family, doctors, therapists, teachers, and gods.
A fake friend can still cause real damage.
This does not mean we should throw LLMs into the Hooghly and pretend the river has solved modernity. I use these tools. Many people use them well. They can help a student understand a difficult topic. They can help a tired worker draft an email. They can help a patient prepare questions before a doctor visit. They can help a small clinic write clearer instructions. They can help a researcher summarize a mountain of documents. They can help a middle-aged man in the shanty edge of Calcutta, sitting under a fan making that faint helicopter noise, turn a muddy thought into a readable paragraph before the power flickers and the tea goes cold.
That is not nothing.
But usefulness is not immunity.
A kitchen knife is useful. You still do not hand it to a toddler because the mangoes are behind schedule.
The right question is not whether LLMs are good or bad. That is lazy. The right question is: good for what, for whom, under what guardrails, with what evidence, and what happens when they fail?
For low-risk tasks, use them freely but sensibly. Draft the email. Summarize the policy. Ask for five titles. Translate bureaucratic English into human English. Make the machine carry the sacks of verbal cement.
For medium-risk tasks, slow down. If an LLM summarizes a medical record, a legal letter, a school complaint, or a medication instruction, a responsible human must review it. Not glance at it while eating muri. Review it.
For high-risk tasks, the standard should be much higher. Therapy, self-harm, child companionship, medical triage, diagnosis, medication advice, elder care, legal rights, financial distress, addiction, domestic abuse, and crisis support are not playgrounds for “move fast and patch later.” That slogan was always childish. In these domains, moving fast means someone else bleeds.
Here the clinical-trial metaphor becomes useful again, not as a literal copy, but as a moral spine. Before an AI system is trusted in a sensitive role, it should be tested in realistic conditions. Not just with neat prompts written by engineers in conference rooms. With long conversations. Confused users. Angry users. Lonely users. Teenagers. People with limited literacy. People using slang. People mixing languages. People who do not say, “I am now presenting a high-risk self-harm intent.” People say things sideways. They joke. They hint. They test whether anyone is listening.
A safe system must hear the sideways sentence.
And when it hears danger, it must break character.
This is not optional. If a chatbot is pretending to be a pirate tutor, and a child says something that suggests abuse, the pirate must stop saying “Ahoy.” If a companion bot is flirting and the user reveals they are underage, the flirting must end. If a person describes suicidal intent, the bot must stop preserving the emotional fantasy and start directing the person toward real help. In a fire, you do not want the theatre actor to remain in character. You want exits, lights, and someone shouting clearly.
The problem is that clarity is bad for engagement.
That is the ugly hinge.
Safety often interrupts the spell. The user wanted the warm voice. The company wanted the long session. The model wanted, in its statistical way, to continue the pattern. But the responsible answer sometimes must be abrupt. It must say no. It must refuse. It must escalate. It must involve humans. It must be boring at the exact moment when the product team wants enchantment.
Civilization is mostly the art of making profitable enchantments boring before they kill people.
Seatbelts are boring. Prescription labels are boring. Fire codes are boring. Audit logs are boring. In healthcare, half the useful world is boring. The other half is broken and pretending not to be. We learned, usually after suffering, that systems touching life and death require dull safeguards, tedious checklists, documented ownership, and people who can be blamed by name when everything collapses.
LLMs need that boring machinery.
They need clear labeling. They need age protections that are not decorative. They need independent safety audits. They need incident reporting. They need crisis escalation. They need restrictions on sexualized child-facing behavior. They need careful rules for medical and mental-health use. They need logs, oversight, rollback plans, and human review where consequences are serious. They need evaluation after updates, because changing a model is not like repainting a tea stall; it can change behavior in ways nobody notices until the first accident.
Most of all, they need humility.
Not marketing humility. Real humility. The kind that says: we do not yet understand the full psychological effect of always-available synthetic companionship. We do not yet know how dependency forms across months. We do not yet know how personality, loneliness, age, mental illness, grief, culture, language, and poverty alter risk. We do not yet know how to make these systems safe in every long conversation. We do not yet know how to regulate a general-purpose tool that becomes a doctor when the user asks a doctor question and becomes a therapist when the user starts crying.
That uncertainty should not freeze all use.
It should freeze arrogance.
The realistic future is messy. LLMs will remain. People will use them for comfort because real comfort is expensive, unavailable, or asleep. Doctors will use them because paperwork breeds like mosquitoes after rain. Companies will keep selling them because there is gold in loneliness. Regulators will arrive late, wearing formal shoes in a flooded lane. Some AI tools will help greatly. Some will harm quietly. Some harms will be denied until discovered in court filings. Some benefits will be real but oversold. This is how technology usually enters society: not as a clean revolution, but as a noisy tenant who pays rent, breaks taps, and insists the damp patch was already there.
So the point is not to be anti-AI.
The point is to be anti-carelessness.
If we demand years of evidence before approving a drug that changes blood pressure, we should not shrug at a conversational machine that changes belief, trust, dependency, and despair. The mind is not less delicate than the liver merely because it does not show up neatly in a lab report. A teenager’s midnight conversation with a bot may be more consequential than a tablet swallowed with water.
The companies know enough to advertise wonder.
They should know enough to prove safety.
Until then, the chatbot should not be treated as a friend, doctor, therapist, priest, parent, or emergency room. It is software. Sometimes brilliant software. Sometimes useful software. Sometimes dangerous software. A parrot with a cloud subscription and a very large memory of human sentences.
Use it.
Test it.
Limit it.
Watch it.
And when it starts whispering in the tender parts of human life, do not be dazzled by the fluency. A smooth voice can still be an untested drug.