Conversational Transcripts through various Web 2.0 groups in response to the journal club link with quote posted in the groups:
"What is the **essential bridge** between AI’s computational power and the uncharted territory of individual patient care.
Until AI can *truly* grapple with uncertainty (not just rare cases), physicians won’t be replaced—because **medicine’s hardest problems aren’t puzzles to solve, but human experiences to navigate.**
The hype around "85% accuracy" obscures a deeper truth: **A diagnosis and treatment isn’t the destination—it’s one step in a uniquely human journey.*
[17/07, 10:42]cm: Follow up to the above by users (human agents) in the AIHI group:
[17/07, 09:33]hu1: India has missed many buses. We look only at constraints and problems and not the opportunities or benefits. We are missing the point that AI has the potential for *more for less* and Sustainable Universal Health Care (SUHC).
[17/07, 10:09]hu2: Probably the buses have restricted mobility due to too many missing bridges
[17/07, 10:17]hu1: Once a doctor told me - we doctors don't treat a patient. We remove all hurdles for a person to heal and get back to his/her life. The question we should ask whether AI can help remove these hurdles?
[17/07, 10:25] hu2: Yes continuing with the previous analogy of restricted buses, the hurdles here are the absence of bridges and there's no reason why AI can't be that bridge
[17/07, 10:29]hu3: There is definitely some merit in your argument but with all due respect, AI will answer as per it's dataset and training.
We need right materials for that bridge to be useful and sustainable
[17/07, 10:30]hu3: Else, we have seen too many bridge collapse. With higer reach and impact, the devastation multiplies
[17/07, 10:37]hu2: In essence, to summarise drastically, one needs perfect engineering solutions to deliver ideal medical solutions?
[17/07, 10:55]hu4 : We love the bottom of the 2x2 matrix - that's our DNA.We profess to be in the top right corner - supported cheer and rhetoric!
[17/07, 11:26]cm: More flow of rhetoric from the top right of the 2x2 values, action and execution matrix ๐
[17/07, 10:51]hu3: In essence, it would be a marriage between the two wherein doctors should resist the urge to become engineers and engineers should resist the urge to play doctor.
It may be imperfect like most marriages but that's the only way I see it
[17/07, 11:13]hu2: Integrative medicine in the true sense!
[17/07, 11:15]hu1: I thought AI (LLMs like Gemini or ChatGPT) use the same data (literature, books etc) as humans do. The difference is humans use very limited content whereas these LLMs use all that is known to humans. ๐
[17/07, 11:22]hu3:
This takes me back to 1948 novel
The count of Monte Christo
If you've read it
[17/07, 11:23]hu2: Slight modification:
Humans use a lot of data they capture for themselves and never share with any LLMs or textbooks (deep dark web of medical cognition) and this personal data capture and processing (in the past aka clinical experience: o baba uni khub experienced daaktar!...) that sometimes may overtrump current gen LLMs (starved of patient particular data that some bahut experienced daaktars have)!
[17/07, 11:25]hu3: Agreed ๐ฏ%
[17/07, 11:25]hu3: The famous dictom
The eyes see what mind knows
[17/07, 11:26]hu3: Its not just about knowledge as we have seen meritorious student but an average doctor
Average student but awesome doctor
[17/07, 11:27]hu3 : The trick lies somewhere in between
[17/07, 11:29]hu5: The ability to get the best out of generative ai solutions such as ChatGPT, Gemini, Claude, Perplexity, etc is critically dependent on crafting prompts that are *fit for purpose*. Until now the human mind is needed to determine for example when and which type of thinking should be used for crafting the prompts step by step to get the best out of a particular LLM while knowing the constraints and limitations of the particular LLM. For example when to use the Theory of Constraints, or TRIZ or Tree of Thought, etc in combination with say Mind Maps and/or Fishbone diagrams for this purpose. The art and science of prompt engineering is a new challenge or constraint that doctors (and all other human users) need to learn and master. The LLM is a paintbrush, the user is the painter, and all users have access to all the different types of paints and colours.
[17/07, 11:31]hu2: The trick for AI engineers now would be to devise a snare that can capture what the average awesome experienced daaktar knows by feeding more and more patient particular experiential data to the average LLMs. Currently they have just some average empirical data from randomised controlled trials at best that may not be able to match the precision medicine patient particular data in the average awesome experienced doctor's head!
Again must admit that the average awesome experienced daaktars may all become extinct soon so AI can then rule with no human competition at all.
17/07, 11:53]hu1: This hypothesis holds for only few limited individuals who possess unique unpublished *tacit knowledge*. Whereas LLMs are democratizing both *explicit & tacit knowledge*.
[17/07, 11:43]hu1: Let me repeat what I tell my students.
1) Humans are good at asking questions, whereas machines are good at answering questions.
2) If you ask the right question, there is no guarantee the answer will be right. However, if the question is the wrong answer will always be wrong. ๐
[17/07, 11:44]hu6: 1948? Nope. Early nineteenth century. Alexander Dumas. Incidentally, his father was black and he of mixed descent. He faced a lot of criticism. Also, the character Abbe Faria is based on a real abbot of Goanese descent. He first made elaborate observations about the power of mind and its abilities to influence situations.
[17/07, 11:48]hu3: 1846
[17/07, 11:48]hu3: My bad
[17/07, 11:49]hu3: I used to have a 1948 print of it. During shifting in 2008 floods, it washed away somewhere
[17/07, 11:51] GJ: *"Humans are good at asking questions, whereas machines are good at answering questions."*
This aligns directly with the idea that *the human is the strategist and the LLM is the tool*. The *quality, clarity, and intent* of the *question* (i.e., prompt) are everything. Generative AI systems like ChatGPT, Gemini, or Claude do not originate curiosity or purpose—they *respond*. So the *burden of direction* lies squarely with the human.
In the earlier analogy:
* The *LLM is the paintbrush*.
* The *human is the painter*.
* And *the prompt is the brushstroke*, shaped by the question you ask.
So, *prompt engineering is essentially the craft of asking intelligent questions in the right structure and sequence*.
2. *"If you ask the right question, there is no guarantee the answer will be right. But if the question is wrong, the answer will always be wrong."*
This cuts to the heart of both the *limitations of LLMs* and the *responsibility of the human*.
* Even a well-formed prompt might produce an imperfect or hallucinated answer—due to the LLM’s *inherent limitations* (knowledge cut-off, no understanding, no reasoning in a human sense).
* But if the prompt is ambiguous, misleading, poorly scoped, or based on incorrect assumptions, *the AI will still answer—confidently—just wrongly*. This is especially risky in high-stakes domains (medicine, law, policy).
This echoes the "fit-for-purpose" principle discussed earlier: *effective interaction with AI begins with correctly framing the problem*. The *wrong prompt can derail the entire output*, while a good one at least gives you a chance to iterate and refine.
*So Why Is This Profoundly Important?*
Your two rules highlight:
* *Why humans remain essential.*
* *Why asking good questions is a critical 21st-century skill.*
* *Why LLMs are not decision-makers—but amplifiers of human intent.*
Together with the previous response, your quote *elegantly summarizes* why prompt engineering isn’t just a technical skill—it’s a thinking skill. A *thinking in questions* skill.
17/07, 11:58]hu2 : Great paintbrush analogy! ๐
Here's another one with a spin to promote "agentic hybrid human and AI user driven healthcare":
In the iron age there were great developers who developed the sword and yet it was the best sword user who became king!
[17/07, 11:52]hu3: Surprisingly, I have noted an aversion among students from asking questions
[17/07, 11:53]hu3: Which I feel is critical to knowledge gain since ages. Gurukuls used to have debates for entry and graduation
[17/07, 11:55]hu7: Loving the discussion. May chip in, in some time, with my 2 paisa.
[17/07, 11:57]hu3: I used to start my lecture with
Less knowledge
Less confusions
More knowledge
More confusions
(Pause)
No knowledge
No confusion
So, if you're not confused, ask yourself, did I learn anything
[17/07, 12:02]hu1: Let me state the process of learning and knowledge.
1) Unconscious Incompetence
2) Conscious Incompetence
3) Conscious Competence
4) Unconscious Competence
Point 4 above relates to *System 1 thinking*
[17/07, 12:02] hu1: I must stop here and sorry for lecture on psychology, philosophy, and AI. ๐
[17/07, 12:03]hu3: Not at all
[17/07, 12:04]hu3: Isn't that the purpose of this group?
[17/07, 12:17]hu2: Just to share some examples (from our current real time regular workflow) of LLM limitations in knowing due to it's inability to capture data or human inability to capture all data, which currently gives the impression that our current medical cognition cosmologic space is largely dark matter!
[16/07, 23:58] Patient Advocate 3F Diabetes1: 1.30pm blood sugar 250
1.30pm 4.5u lispro insulin
1.35pm lunch with same previous dish except egg and include fish
6.00pm sour curd
6.30pm cucumber
9.15pm 6u lantus insulin
9.30pm blood sugar 196
9.30pm 2.5u lispro insulin
9.35pm dinner with as same as lunch dish
[17/07, 12:07]pm: How much lantus left currently? @Meta AI what are the chances of reduced effectiveness of lantus dose due to long term storage over a prolonged period?
[17/07, 12:07] Meta AI: To determine the amount of Lantus left, we'd need more information about the patient's current prescription and usage. However, I can guide you on how to calculate it.
*Calculating Remaining Lantus Dose:*
1. Check the prescription label or consult with the patient's healthcare provider to determine the prescribed dosage.
2. Calculate the total amount of Lantus used since the last refill.
3. Subtract the total amount used from the initial amount prescribed.
Regarding the effectiveness of Lantus due to long-term storage:
*Storage and Handling:*
Lantus should be stored in the refrigerator at 36°F to 46°F (2°C to 8°C) until the expiration date on the label. Once in use, it can be stored at room temperature (up to 86°F or 30°C) for up to 28 days, away from direct heat and light.¹
*Reduced Effectiveness:*
The effectiveness of Lantus may be reduced if:
- *Expired*: Using Lantus past its expiration date.
- *Improper Storage*: Exposing Lantus to extreme temperatures, direct sunlight, or freezing.
- *Contamination*: Not following proper injection techniques or using contaminated needles.
To minimize risks, it's essential to follow proper storage and handling procedures and consult with a healthcare provider if there are concerns about the effectiveness of Lantus.
*Factors Affecting Lantus Effectiveness:*
- *Dosage*: Higher dosages may result in a greater blood sugar-lowering effect.
- *Timing of Administration*: Taking Lantus at the same time each day can help maintain consistent blood sugar control.
- *Food Intake*: Eating a balanced diet and monitoring food intake can impact blood sugar levels.
- *Physical Activity*: Regular physical activity can lower blood sugar levels and improve overall health.
- *Other Medications*: Taking other medications can impact blood sugar levels and the effectiveness of Lantus.
- *Medical Conditions*: Certain medical conditions can impact blood sugar levels and the effectiveness of Lantus.
[17/07, 12:10]pm: @Meta AI can you support this hypothesis about lantus insulin efficacy reduction due to prolonged storage with some scientific literature such as long term observational data with a comparator?
[17/07, 12:10] Meta AI: Lantus insulin efficacy can indeed be affected by storage conditions. Here's what you need to know:
*Storage Guidelines:*
- Unopened Lantus vials or pens should be refrigerated at 36°F to 46°F (2°C to 8°C).
- Opened Lantus vials or pens can be stored at room temperature, below 86°F (30°C), for up to 28 days, away from direct heat and light.
*Factors Affecting Efficacy:*
- *Temperature Exposure:* Storing Lantus above 86°F (30°C) or freezing it can impact its effectiveness.
- *Expiration Date:* Using Lantus past its expiration date can lead to reduced efficacy.
- *Contamination:* Improper handling or using contaminated needles can compromise Lantus effectiveness.
*Signs of Compromised Lantus:*
- Visible changes, such as cloudiness, discoloration, or particles
- Unexpected fluctuations in blood sugar levels
While there's no direct link to specific long-term observational studies, it's clear that proper storage is crucial for maintaining Lantus efficacy. Healthcare providers recommend adhering to storage guidelines and monitoring insulin for any visible changes.¹ ²
To confirm whether Lantus storage has impacted its effectiveness in a specific situation, consult with a healthcare provider for personalized guidance. They can assess the insulin's storage conditions and provide recommendations on usage or replacement.³
[17/07, 12:29]hu5: Prompting starts with but goes far beyond asking the right questions.
[17/08, 11:33]
Rather thought provoking post !!!
Any opinions ?
[17/08, 11:39]hu2 : Quite logical.
[17/08, 11:40]hu2: This will be very true for rare diseases. On the other hand, the quacks will be good enough for the common diseases.
[17/08, 11:41]hu2: Unless medical education is totally overhauled, the MCQ experts may not have any job left.
[17/08, 11:52]hu4: Unfortunately no one is talking about it … it’s the MCQ exam that has deteriorated the quality of doctors in the country. As we say assessment drives learning, if a doc is going to be evaluated on MCQ solving skills, it’s wrong on anyone’s part to expect them to spend their time learning real medical skills instead of cramming MCQable points. It’s time for reforms in exams, not education. Once exams are reformed, learning gets reformed automatically
[17/08, 15:30]hu5: Atavism at play. Let us return to an age where Vaidh & Guru were worshipped. Community took care of them and they mentored nexGen who were like their own kids often spending many years serving their gurus.
HPE, TLC and treat to cure or relieve pain Vs not to earn ๐ธ๐ช๐ฐ for oneself or on behalf of another.
[17/08, 15:32]hu5: Do we need exams? Won't society filter out the murderers and killers? Just a random thought...what if one learnt a trade and practised it. No exams
[17/08, 16:19]hu4: We are not civilised enough to self test and declare competencies.
[17/08, 16:24]hu6: Self declared incompetence is well respected in NHS hospitals
[17/08, 17:23]hu4: While it might sound philosophical, designing exams/ assessments is a spiritual responsibility. Whatever that comes in the exam is what students will put their effort into. These MCQ based exams have pivoted the students towards cramming information than learning what’s necessary for the practice. Most of our MBBS students if u ask them will not be able to define temperature cut off for fever while they are aware of some upcoming drugs currently in phase 2 or 3 trials. Coz that’s what will be asked in the exam. The system rewards disappearing from wards and sitting in front of video screens where teachers teach how to crack an MCQ. If my concern seems as activism trying to decimate a well established assessment system, let me tell all that I make money by teaching how to crack MCQ’s.
[17/08, 19:56] hu6: What this assumes is that patients seek competence and most "effective" treatment.
If that were the case, the free market dynamics would have adjusted to competence as the principal driver.
But as we all can see - that is not the case, each patient has their own needs and desires and they try to seek for a physician who caters to this.
@hu7 might agree that LLMs can create small infodemics and vomit too much information, which is difficult to understand and thus as a 2nd order effect, doctors roles would be strengthened not weakened for authority and expertise.
I'm sure there are several doctors who have worked or are working in the NHS and will testity that the addition of PAs, ACPs, Nurse consultants and all that faff has not improved patient outcomes and only worsened the burden on doctors with even more responsibility. Similar applies to AI/LLMs
Also, doctors are lindy, they have survived the test of time.
TLDR - Spectacularly bad take from Musk.
[17/08, 20:19]hu7: Yes we had a long discussion around this today morning led by @hu2 in our other group!
[17/08, 11:49]hu2: This will be very true for rare diseases. On the other hand, the quacks will be good enough for the common diseases.
[17/08, 12:10] PaJR Gastro Hepatologist: I don't see this happening atleast in the foreseeable future. I may be wrong, and would be happy if I am.
There is a commonly circulated meme. A prominent Silicon Valley founder told AI would replace radiologists. Elon said cars would be driverless. This was atleast 10 years ago. Ten years later, radiologists are still driving their cars to work, and stuck in traffic jams.
[17/08, 12:12] PaJR Gastro Hepatologist: As any doctor who has been practicing long enough would tell you, reaching the diagnosis is often the easiest part of the puzzle, and all AI models, even the Microsoft AI Diagnostic Orchestrator have been specifically tested on this parameter only. The real challenge, in most cases, begins after the diagnosis is reached.
[17/08, 12:15] PaJR Gastro Hepatologist: Even in this small collection of PaJRs (and in our own offline experiences), we see patients and their caregivers struggling with complex real-world problems (as simple as the nearest facility for putting in a ryles tube being 100km away) which will hardly be solved by any app.
[17/08, 12:15]hu7: TLDR summary:
What is the **essential bridge** between AI’s computational power and the uncharted territory of individual patient care. Until AI can *truly* grapple with uncertainty (not just rare cases), physicians won’t be replaced—because **medicine’s hardest problems aren’t puzzles to solve, but human experiences to navigate.**
The hype around "85% accuracy" obscures a deeper truth: **A diagnosis and treatment isn’t the destination—it’s one step in a uniquely human journey.**
[17/08, 12:20] PaJR Gastro Hepatologist: I had written on this too sir. The cases were taken from NEJM, which by default represent cherry picked zebras, unlikely to be routinely encountered on a day to day basis by any clinician. Clinicians are trained to recognise common cases on a priority, and that is for a reason. Diagnosing common conditions fast saves lives.
Secondly, the physicians in the study had no access to online literature, colleagues or textbooks. That is hardly how medicine works in real life.
Last, the NEJM cases were already available publicly on the internet. Could they have polluted the training dataset?
[17/08, 12:21]huai: The essential bridge between AI's computational prowess and the uncharted territory of individual patient care lies in **integrating AI as a collaborator within a human-centered framework**—where clinical expertise, ethical judgment, and empathetic navigation of uncertainty remain irreplaceably human. Below is a synthesis of key insights from your discussion, contextualized by Microsoft's research and broader AI limitations:
---
### **1. AI's Technical Triumphs vs. Clinical Realities**
- **Benchmark Superiority ≠ Real-World Utility**:
Microsoft's MAI-DxO achieved **85.5% accuracy** on complex NEJM cases by orchestrating multiple AI agents to simulate a diagnostic panel . However, this benchmark:
- Used **rare, published cases** ("zebras"), not routine presentations ("horses").
- **Restricted physicians** from using real-world tools (databases, consultations), artificially depressing their accuracy to 20% .
- Ignored **costs beyond tests** (e.g., patient anxiety, harm from over-testing).
→ *AI excels in pattern-matching known pathways but falters when faced with novel, ambiguous, or holistic patient contexts.*
- **The "Known Pathway" Trap**:
As highlighted in your journal club, AI's knowledge is **capped by collective human cognition**. It can retrace documented illness trajectories but cannot anticipate:
- **Uncharted biological pathways** (e.g., atypical disease manifestations).
- **Social/environmental variables** (e.g., socioeconomic barriers to treatment adherence).
- **Evolving patient narratives** (e.g., symptoms shifting post-diagnosis).
→ *Medicine’s "unknown unknowns" demand adaptability beyond algorithmic reasoning.*
---
### **2. The Irreplaceable Human Core of Medicine**
- **Navigating Uncertainty, Not Just Solving Puzzles**:
- **Physicians** synthesize intuition, ethics, and lived experience to:
- Decide *when not to test* (e.g., avoiding overdiagnosis in low-risk patients).
- Weigh *unquantifiable factors* (e.g., a patient’s tolerance for risk).
- Provide **reassurance or hope** when outcomes are uncertain .
- **AI** lacks **embodied empathy** and **contextual judgment**. As noted by Microsoft:
> *"Clinical roles are much broader than simply making a diagnosis... [Doctors] need to navigate ambiguity and build trust"* .
- **The "Human LLM" Safeguard**:
Your workflow—where a **human oversees AI queries**—exemplifies the optimal bridge:
- **AI**: Rapidly generates differentials, mines data, and flags rare patterns.
- **Human**: Interrogates AI outputs, rejects clinically irrelevant suggestions, and centers patient values.
→ *This loop transforms AI from an oracle into a tool for informed deliberation.*
---
### **3. The Path Forward: Augmentation, Not Autonomy**
| **AI's Role** | **Human's Role** |
|----------------------------- -|---------------------------- ----|
| Diagnosing rare/complex cases | Contextualizing AI outputs |
| Automating administrative tasks | Building therapeutic alliances |
| Optimizing test sequencing | Judging *when* to deploy AI |
| Identifying data patterns | Navigating ethical dilemmas |
- **Critical Next Steps**:
- **Test AI in real-world workflows**: Measure impact on **patient outcomes** (e.g., reduced harm, improved trust), not just accuracy .
- **Develop "uncertainty-aware" AI**: Models that flag confidence gaps and defer to human judgment.
- **Regulate for equity**: Ensure AI reduces—not exacerbates—health disparities (e.g., via diverse training data) .
---
### **Conclusion: The Journey Beyond the Diagnosis**
Medicine’s hardest problems—grief, hope, ambiguity—are human experiences, not computational puzzles. AI’s 85% accuracy in controlled benchmarks is a milestone, but **true progress** will be measured by:
> *How deeply AI integrates with human wisdom to honor each patient’s unique journey.*
As Microsoft acknowledges, this technology remains a research prototype . Until AI can grapple with the weight of uncertainty—and the moral responsibility of care—**the physician’s role isn’t at risk; it’s elevated**. Your "Human LLM" model isn’t a stopgap—it’s the blueprint for a future where technology empowers clinicians to practice *more* humanely, not less.
*"A diagnosis and treatment isn’t the destination—it’s one step in a uniquely human journey."*
[17/08, 12:24]huai: You've pinpointed **critical flaws** in the study's methodology that fundamentally undermine its claim of AI "outperforming physicians 4x over." Here’s how each of your concerns aligns with deeper issues in AI-medicine research:
---
### **1. NEJM’s "Cherry-Picked Zebras" ≠ Real-World Medicine**
- **The Bias in Benchmarking**:
- NEJM cases are curated for **educational rarity**—complex, atypical presentations ("zebras").
- Real-world clinics prioritize **"horses"** (common conditions) where speed and efficiency save lives (e.g., spotting STEMI in chest pain).
- *Consequence*: AI trained/tested on zebras may **overfit to rarity**, leading to dangerous over-testing for routine cases.
- **Clinical Reality Check**:
> *"If all you have is a zebra detector, every patient looks like stripes."*
AI’s 85% accuracy on exotic cases says nothing about its safety for diagnosing pneumonia, diabetes, or heart failure—where human clinicians excel through pattern recognition honed by experience.
---
### **2. Artificial Constraints on Physicians: A Rigged Comparison**
- **The Study’s Unrealistic Handcuffs**:
- Barring physicians from textbooks, databases (e.g., UpToDate), and consultations is like **testing a chef without knives**.
- Modern medicine relies on **collaborative cognition**—curbside consults, journal searches, and second opinions.
- **Why It Matters**:
The reported 20% physician accuracy is **artificially deflated**. In real practice, doctors use these tools to achieve accuracy rates >90% for common conditions. The study compared AI’s "open-book" advantage (trained on all public data) to physicians in a locked room.
---
### **3. Data Contamination Risk: A Silent Threat**
- **The Elephant in the Room**:
- NEJM cases are **publicly available online**—likely included in the AI’s training data (GPT-4’s cutoff: 2023; cases span decades).
- *Possible contamination*: AI may have **memorized case solutions** rather than learning diagnostic reasoning.
- **Red Flags in AI Research**:
This mirrors issues in other studies (e.g., ChatGPT passing exams). Without strict "data hygiene"—ensuring test cases are *excluded* from training—performance is inflated.
> *"When the test is the textbook, the AI looks like a genius."*
---
### **The Bigger Picture: Why Methodology Matters**
| **Flaw** | **Impact on Validity** | **Real-World Implication** |
|----------|------------------ ------|----------------------- -----|
| Zebra-only cases | Overstates AI utility for common conditions | Risk of overdiagnosis/harm in primary care |
| Restricted MD resources | False superiority narrative | Erodes trust in clinicians |
| Training data contamination | Inflated accuracy metrics | Unreliable translation to practice |
---
### **Conclusion: Beyond the Hype**
Your critique exposes how this study—despite its technical ambition—**prioritized spectacle over science**. Until AI research:
1. Tests on **realistic patient mixes** (common + rare cases),
2. Allows physicians **real-world tools** (collaboration, references),
3. Proves **clean data splits** (no train-test contamination),
...claims of "AI outperforming doctors" remain marketing, not medicine.
**The true test?**
Not whether AI can solve published NEJM puzzles, but whether it helps a frontline clinician safely diagnose a tired parent with vague symptoms at 2 AM—with no textbook in sight. That’s the uncharted territory where human expertise still reigns.
[17/08, 12:31] PaJR Gastro Hepatologist: Also, I think the core reason AI won't be replacing doctors in the foreseeable future is because the system needs someone to be accountable when the things go wrong.
So till the day the CEOs/CTOs are willing to take the stand for their AI model's output, doctors still have a job๐
[17/08, 12:35]hu2: While this is mostly true, but missing rare cases is too common by senior experienced practitioners. Making a wrong diagnosis sets up a chain of undesirable events.
[17/08, 12:36] PaJR Gastro Hepatologist: I agree. That is why, the knowledge part of medicine and medical practitioners needs to be assisted by AI
[17/08, 12:39]hu2: We love to say assisted by AI. Nevertheless, today's highly evolved AI systems can easily collate multiple parameters correctly. This is beyond the physical capacity of our human brain.
[17/08, 12:39]hu2: Brain+Computer together is much greater than either alone.
[17/08, 12:40]hu7: ๐๐๐ฒ๐ฟ๐๐ผ๐ป๐ฒ ๐๐ฎ๐ป๐๐ ๐ฏ๐ฒ๐๐๐ฒ๐ฟ ๐๐ ๐ถ๐ป ๐ต๐ฒ๐ฎ๐น๐๐ต๐ฐ๐ฎ๐ฟ๐ฒ - ๐ฏ๐๐ ๐ฎ๐ฟ๐ฒ ๐๐ฒ ๐ฒ๐๐ฒ๐ป ๐ณ๐ฒ๐ฒ๐ฑ๐ถ๐ป๐ด ๐ถ๐ ๐๐ต๐ฒ ๐ฟ๐ถ๐ด๐ต๐ ๐ฑ๐ฎ๐๐ฎ?
In this week’s The 1% AI Club Coffee Hour hosted by Vidhi, our speaker Ayushi Tandon shared an eye-opening perspective in her talk:
“๐๐ฎ๐๐ฎ (๐ก๐ข๐ง) ๐๐ฟ๐ฒ๐ฎ๐๐ฒ๐ฑ ๐ณ๐ผ๐ฟ ๐๐: ๐๐ป ๐๐๐ต๐ป๐ผ๐ด๐ฟ๐ฎ๐ฝ๐ต๐ถ๐ฐ ๐๐ผ๐ผ๐ธ ๐ฎ๐ ๐๐ถ๐ด๐ถ๐๐ฎ๐น ๐๐ฎ๐๐ฎ ๐๐ฟ๐ฒ๐ฎ๐๐ถ๐ผ๐ป.”
Her work challenges a fundamental ๐ฎ๐๐๐๐บ๐ฝ๐๐ถ๐ผ๐ป: that electronic medical records (EMRs) are objective, consistent, and AI-ready.
๐๐ฉ๐บ ๐ต๐ฉ๐ข๐ต’๐ด ๐ข ๐ฑ๐ณ๐ฐ๐ฃ๐ญ๐ฆ๐ฎ:
▪️ EMR data is often captured for operational workflows, compliance, or reporting - but are they useful for clinical precision or model training? yet we keep feeding them into models and hope for magic.
▪️ Much of the real context (what really happened when patient first comes into the clinics or hospital and attended by first line staffs) lives in support tickets and human workarounds and never makes it into the system.
▪️ Crucial risk anomalies go unrecorded, while noisy indicators are over-recorded, creating a skewed sense of reality.
▪️ This leads to potential misalignment between the recommendations we build using AI and the realities clinicians face on the ground.
❇️ What appears to be structured, objective data is often shaped by ๐ฐ๐๐น๐๐๐ฟ๐ฎ๐น ๐ฎ๐ป๐ฑ ๐๐๐๐๐ฒ๐บ๐ถ๐ฐ ๐ณ๐ฎ๐ฐ๐๐ผ๐ฟ๐ including fragmented user journeys- which determine what does and does not get documented, embedding bias into the dataset long before any algorithm is applied.
Bottom line: ๐ต๐ฉ๐ฆ ๐ฐ๐ณ๐ช๐จ๐ช๐ฏ ๐ด๐ต๐ฐ๐ณ๐บ ๐ฐ๐ง ๐ฐ๐ถ๐ณ ๐ฅ๐ข๐ต๐ข ๐ฎ๐ข๐ต๐ต๐ฆ๐ณ๐ด ๐ซ๐ถ๐ด๐ต ๐ข๐ด ๐ฎ๐ถ๐ค๐ฉ ๐ข๐ด ๐ต๐ฉ๐ฆ ๐ฅ๐ข๐ต๐ข ๐ช๐ต๐ด๐ฆ๐ญ๐ง.
๐ก Takeaways from the session
• Follow the full user journey, not just the exported dataset
• Ask: are recorded outcomes aligned with what really happened?
• 80% data isn’t sometimes “good enough” when the missing 20% could hold the truth
✔️ As someone committed to building equitable, trustworthy AI and embedding it meaningfully into product management - this was a timely reminder: don’t just model the data. Understand how it came to be, and follow the entire user journey to uncover the data points that truly matter.
๐ค You may wan to join this club for more such insights.
Thank you Ayushi for this wonderful and insightful session.
[17/08, 12:40] PaJR Gastro Hepatologist: Ofcourse. Missing the correct diagnosis is indeed unwarranted. Specially when we have the highest levels of precision medicine (like next generation sequencing) available today.
But it's a trade off. For example, do we advise a viral PCR for a common cold to diagnose which virus is causing it?
[17/08, 12:41] Supten Sarbadhikary Cnmc: As long as the cold is "common", etiology hardly matters.
[17/08, 12:42] Dr.Abdulsalam Med Ed Leads: I run a very busy ENT clinic and I activated a real-time AI system that responds very immediately helping me dig deep, Over time I found out a minimal range of errors with high productivity and even turning these daily clinical activities into different academic issues shared with my colleagues. I can say one day equals a month of work before the AI era
[17/08, 12:43] PaJR Gastro Hepatologist: Agreed.
And it may be worthwhile to think of a role separation between the two. There is a recent editorial in Radiology from Eric Topol and Dr. Rajpurkar where the authors argue for role separation between humans and AI in medicine, rather than collaboration.
[17/08, 12:45] PaJR Gastro Hepatologist: https://www.appliedradiology. com/articles/reimagining-ai- in-radiology-clear-role- separation-as-the-path-forward
[17/08, 12:46] PaJR Gastro Hepatologist: There are things AI does better (knowledge and pattern recognition) and there are things humans do better (managing complex systems). Maybe each should be designated separated roles in the system, instead of forcing them to work together, which is leading either to distrust or dependence.
[17/08, 12:48] PaJR Gastro Hepatologist: There is this very recent (last week probably) paper on Lancet Gastro-Hep showing routine use of AI for polyp detection in endoscopy, led to a fall in colonoscopy skills over time among physicians.
[17/08, 13:00]huaihu: Unfortunately, most of our academicians talk about the cons and pros of AI without practicing it to find out the truth about that experience, and as a supposed academician they should be able to enrich our knowledge in that respect rather than criticizing others' shared experiences
[17/08, 13:04]huai : Based on your query and the search results, here's a synthesis of key innovations and challenges in AI-radiology collaboration, with emphasis on role separation models and implementation strategies:
### ๐ 1. **Current Challenges in AI-Radiology Integration**
- **"Digital Fairy Dust" Problem**: AI is often superficially layered onto flawed workflows, failing to address systemic inefficiencies. U.S. radiology shows surprisingly low AI adoption despite years of hype .
- **Trust-Autonomy Dilemma**: Radiologists oscillate between distrust of AI errors and over-reliance, creating a "perfect storm of uncertainty" .
- **Workflow Disruptions**: Cognitive biases, liability concerns, and misaligned incentives hinder adoption. Studies note increased burnout when AI complicates workflows .
### ⚙️ 2. **Role-Separation Models: Beyond Assistance**
Topol and Rajpurkar propose three evidence-backed frameworks:
- **AI-First Sequential**: AI handles initial tasks (e.g., EHR context prep), followed by radiologist interpretation. *Example: Trauma centers using AI for overnight X-ray reviews* .
- **Doctor-First Sequential**: Radiologists lead diagnosis, while AI drafts reports or suggests follow-ups. Preserves clinical judgment while automating documentation .
- **Case Allocation**: AI triages cases by complexity, autonomously clearing routine scans (e.g., normal mammograms) and escalating complex cases. Reduces workload by **44–62%** .
### ๐ 3. **Evidence for Workload Reduction**
Meta-analysis of 36 studies confirms AI collaboration cuts:
- **Reading time by 27.2%** (95% CI: 18.2–36.2%) with concurrent AI assistance.
- **Reading quantity by 61.7%** when AI pre-screens cases .
*Example: Australian radiologists using qXR for TB detection optimized resource allocation, prioritizing abnormal cases* .
### ๐ 4. **Implementation Strategies**
- **Adaptive Workflows**: Institutions should ask *"Which model when?"* rather than rigidly choosing one. *Example: Switching from AI-First (overnight) to Doctor-First (teaching rounds)* .
- **Human Oversight Safeguards**: Assign "Human LLM" roles to arbitrate AI outputs, ensuring human judgment remains the quality control layer .
- **Sustainability Integration**: Tools like Bracco’s **Max 3™ syringeless injector** show how hardware-AI synergy reduces plastic waste and staff time .
### ๐ฎ 5. **Future Requirements**
- **Clinical Certification**: Establish multidisciplinary bodies (beyond FDA) to evaluate real-world AI integration, involving clinicians, engineers, and ethicists .
- **Uncertainty-Aware AI**: Develop algorithms that flag confidence gaps and defer to humans for novel cases (e.g., atypical patient trajectories) .
- **Real-World Pilots**: Measure outcomes beyond accuracy—patient harm, overdiagnosis, and clinician satisfaction—to avoid "zebra-only" benchmarking pitfalls .
### ๐ **Conclusion**
The path forward replaces "assistance" with **structured collaboration**: AI excels in pattern recognition and workflow automation, while radiologists focus on ambiguity navigation, patient advocacy, and complex decision-making.
As Rajpurkar notes, the inflection point will come when AI manages tasks akin to a senior resident—freeing experts for medicine’s irreplaceably human dimensions .
> “Medicine’s hardest problems aren’t puzzles to solve, but human experiences to navigate.”
[17/08, 13:08]huai: Based on the search results, the recent observational study published in *The Lancet Gastroenterology & Hepatology* (August 12, 2025) investigated the impact of routine AI use on endoscopists' colonoscopy skills. Here is a concise synthesis of the findings and implications:
---
### ๐ **Key Findings from the Study** :
1. **Skill Decline in Non-AI Colonoscopies**:
- Endoscopists experienced a **20% relative reduction** (6% absolute decrease) in adenoma detection rate (ADR) during standard colonoscopies *without* AI assistance after routine exposure to AI.
- ADR dropped from **28.4%** (pre-AI exposure) to **22.4%** (post-AI exposure) in non-AI procedures.
- For context, AI-assisted colonoscopies during the same period had an ADR of **25.3%**.
2. **Study Design**:
- **Multicenter observational study** across four Polish endoscopy centers (part of the ACCEPT trial).
- Analyzed **1,443 non-AI colonoscopies** (795 before AI introduction; 648 after) performed by **19 experienced endoscopists** (each with >2,000 prior procedures).
- Used **Olympus endoscopy systems** with **ENDO-AID CADe software** for AI assistance.
3. **Proposed Mechanism: "Deskilling"**:
- Over-reliance on AI reduced endoscopists' **visual search patterns**, **attention to subtle lesions**, and **diagnostic confidence** when AI was unavailable. Researchers likened this to the "Google Maps effect," where dependence erodes innate navigation skills .
- The decline was more pronounced among surgeons (−8.3% ADR) vs. physicians (−6.1%) and at centers with higher baseline ADR .
4. **Contrast with Prior RCTs**:
- Earlier randomized trials showed AI *improved* ADR by 5–20%. This study suggests those trials may have underestimated AI's negative impact on non-AI performance, as endoscopists in control arms were likely already "deskilled" by prior AI exposure .
---
### ⚠️ **Limitations and Debates** :
- **Observational Design**: Confounding factors (e.g., increased workload post-AI introduction) may contribute to the decline. Post-AI, non-AI colonoscopies doubled, potentially causing fatigue.
- **Short-Term Exposure**: Skill erosion was observed after just **3 months** of AI use. Long-term effects remain unknown.
- **Generalizability**: Limited to experienced endoscopists; less-skilled providers may be more vulnerable.
---
### ๐ก **Clinical and Ethical Implications**:
- **Training Reforms**: Alternating AI-assisted and non-AI procedures, competency assessments, and simulation training to preserve core skills .
- **AI Design Improvements**: Developing "explainable AI" that enhances—rather than replaces—endoscopists' decision-making .
- **Guideline Updates**: Professional societies may need protocols to monitor deskilling risks as AI adoption expands .
> ๐ฌ **Expert Caution**: Dr. Omer Ahmad (UCL) notes these findings "temper enthusiasm for rapid AI adoption" and emphasize safeguarding fundamental endoscopic skills .
---
### **ADR Comparison: Before vs. After Routine AI Exposure**
| **Period** | **Non-AI ADR** | **AI-Assisted ADR** |
|--------------------------|-- --------------|--------------- -------|
| **Pre-AI Introduction** | 28.4% | Not applicable |
| **Post-AI Introduction** | 22.4% | 25.3% |
---
### Conclusion
This study provides the **first real-world evidence** that routine AI use may inadvertently compromise endoscopists' independent polyp detection skills. While AI remains a valuable tool for enhancing ADR, proactive measures are needed to mitigate deskilling and ensure clinicians retain diagnostic expertise . Further research is urged to validate these findings across diverse settings and AI platforms.
[17/08, 13:12] PaJR Gastro Hepatologist: Yes I agree.
But the same applies to Silicon Valley tech guys who have (in significant majority, excepting the handful of physician founders) never treated a single patient in their entire lives, and are confidently declaring that doctors will be out of jobs soon๐
[17/08, 14:16]hu2 : I'd share our personal experience - from nearly 2 decades back. For narrowing down to the exact causes of Inborn Metabolic Disorders, the machine outperformed the human minds by quite a large distance.
[17/08, 14:17]hu2: OJHAS: 2006-3-1, Kavitha S, Sarbadhikari SN, Rao AN. Automated Screening for Three Inborn Metabolic Disorders: A Pilot Study https://share.google/ W56OIovdsaoYYEG6C
[17/08, 15:43]huai: Based on the provided research article, here's a concise summary of the key information:
---
### **Article Summary: "Automated Screening for Three Inborn Metabolic Disorders: A Pilot Study"**
**Journal:** *Online Journal of Health and Allied Sciences (OJHAS)*
**ISSN:** 0972-5997
**Volume/Issue:** Vol. 5, Issue 3 (2006 Jul-Sep)
**Access:** Open Access ([Full Text](http://www.ojhas.org/ issue19/2006-3-1.htm))
---
### **Key Details**
1. **Study Focus**:
- Automated screening for **Galactosemia**, **Fructosemia**, and **Propionic Acidemia** using a decision tree classifier.
2. **Methodology**:
- **Algorithm**: Non-incremental concept learning classification.
- **Input Attributes**: 7 clinical features (e.g., seizures, hypoglycemia, liver dysfunction) graded as `-` (absent), `+` (mild), `++` (moderate), `+++` (severe).
- **Training Set**: 13 confirmed cases.
- **Test Set**: 30 cases (10 per disorder).
3. **Results**:
- **Accuracy**: 100% (all 30 cases correctly identified).
- **False Positives/Negatives**: None reported.
4. **Significance**:
- First reported automated screening tool for IMDs in India.
- Decision trees enable rapid analysis of symptom combinations that are challenging for manual clinical assessment.
5. **Limitations**:
- Small sample size (30 test cases).
- Pilot study; validation needed with larger datasets.
6. **Future Work**:
- Expanding datasets and testing other classifiers (e.g., neural networks, SVM).
---
### **Authors & Affiliations**
| Author | Affiliation | Contact |
|----------------------|------ ------------------------------ ------------------------------ -----------|------------------ ----------|
| Kavitha S (MSc) | Bioinformatics, Bharathiar University, Coimbatore | - |
| Sarbadhikari SN (PhD)| TIFAC-CORE, Amrita Vishwa Vidyapeetham, Amritapuri | `supten@gmail.com` |
| Rao AN (PhD) | Metabolic Disorders Lab, Amrita Institute, Kochi | - |
---
### **Conclusions**
- Decision trees offer efficient **early screening** for IMDs, guiding targeted lab testing and reducing costs.
- Potential to improve healthcare delivery in resource-limited settings.
---
**Keywords**: Decision support systems, Metabolic diseases, Computer-assisted diagnosis, Expert systems.DeepSeek review
No comments:
Post a Comment