关键词:AI医疗,医疗大模型,chatdoctor,大模型finetuning,基于疾病知识库的数据生成
ChatDoctor
在LLaMA模型的基础上进行fine tuning训练,样本数据采用100k在线真实医患对话,同时加入自主知识检索能力,比如从wikipedia或者疾病数据库,通过微调训练,模型在理解患者需求和提供建议上显著提升,自主知识检索能力能够实时访问权威信息,提示模型回答的准确性。这对错误低容忍度的医疗领域来说是很重要的。
论文题目:ChatDoctor: A Medical Chat Model Fine-tuned on LLaMA Model using Medical Domain Knowledge
论文地址:https://arxiv.org/pdf/2303.14070.pdf
项目地址:https://github.com/Kent0n-Li/ChatDoctor
样本数据
- HealthCareMagic-100k:网站HealthCareMagic.com上的患者真实对话100k
- icliniq-10k:网站icliniq.com上的患者真实对话10k
- GenMedGPT-5k:使用chatGPT生成的医患对话5k
- alpaca-52k:同时使用了standford-alpaca模型中的样本数据52k
样本生成方式
- 使用700类疾病的知识库信息,包括「疾病名称、症状、检验、药物」,借助ChatGPT生成指令训练数据
疾病知识库数据:
idx,disease,Symptom,reason,TestsAndProcedures,commonMedications
0,Panic disorder,"['Anxiety and nervousness', 'Depression', 'Shortness of breath', 'Depressive or psychotic symptoms', 'Sharp chest pain', 'Dizziness', 'Insomnia', 'Abnormal involuntary movements', 'Chest tightness', 'Palpitations', 'Irregular heartbeat', 'Breathing fast']","Panic disorder is an anxiety disorder characterized by recurring severe panic attacks. It may also include significant behavioral changes lasting at least a month and of ongoing worry about the implications or concern about having other attacks. The latter are called anticipatory attacks (DSM-IVR). Panic disorder is not the same as agoraphobia (fear of public places), although many afflicted with panic disorder also suffer from agoraphobia. Panic attacks cannot be predicted, therefore an individual may become stressed, anxious or worried wondering when the next panic attack will occur. Panic disorder may be differentiated as a medical condition, or chemical imbalance. The DSM-IV-TR describes panic disorder and anxiety differently. Whereas anxiety is preceded by chronic stressors which build to reactions of moderate intensity that can last for days, weeks or months, panic attacks are acute events triggered by a sudden, out-of-the-blue cause: duration is short and symptoms are more intense. Panic attacks can occur in children, as well as adults. Panic in young people may be particularly distressing because children tend to have less insight about what is happening, and parents are also likely to experience distress when attacks occur. ","['Psychotherapy', 'Mental health counseling', 'Electrocardiogram', 'Depression screen (Depression screening)', 'Toxicology screen', 'Psychological and psychiatric evaluation and therapy']","['Lorazepam', 'Alprazolam (Xanax)', 'Clonazepam', 'Paroxetine (Paxil)', 'Venlafaxine (Effexor)', 'Mirtazapine', 'Buspirone (Buspar)', 'Fluvoxamine (Luvox)', 'Imipramine', 'Desvenlafaxine (Pristiq)', 'Clomipramine', 'Acamprosate (Campral)']"
1
借助知识库和chatGPT生成数据如下:
{"instruction": "If you are a doctor, please answer the medical questions based on the patient's description.","input": "Doctor, I have been experiencing sudden and frequent panic attacks. I don't know what to do.","output": "Well, based on what you're telling me, it sounds like you may be suffering from panic disorder. The best course of action is to start with psychotherapy and mental health counseling. Additionally, we should conduct an electrocardiogram to make sure that there are no physical issues causing your panic attacks. We will also need to perform a depression screen and a toxicology screen to rule out any other underlying causes. Finally, I would recommend a comprehensive psychological and psychiatric evaluation and therapy to help manage your symptoms."
},
- 生成样本数据的prompt论文中没有给出,但从最终的数据推断,及在ChatGPT上尝试,可能的prompt为:知识词条 + 格式化指令结果,例如prompt格式:
{context}
---------
基于上面的知识信息,构造医生与患者的对话,输出结果格式为:
指令:xxx, 输入:xxx, 输出:xxx
提交chatGPT的prompt:
{疾病: 过敏, 症状: "皮疹,过敏反应,皮肤瘙痒,外围水肿,皮肤异常", 医学检测:"完整物理皮肤检查完成,雾化器治疗", 药物: "苯海拉明(苯那地尔),泼尼松"}
---------
基于上面的知识信息,构造医生与患者的对话,输出结果格式为:
指令:xxx, 输入:xxx, 输出:xxx
chatGPT生成结果:
指令:描述症状
输入:我出现了皮疹、过敏反应、皮肤瘙痒、外围水肿和皮肤异常。
输出:这些症状可能表明你患有过敏。你做了什么尝试缓解症状吗?指令:建议治疗方案
输入:检查后医生告诉我确诊了过敏,要进行哪些治疗?
输出:针对过敏症状,我们可以采用苯海拉明(苯那地尔)和泼尼松等药物治疗,并采用雾化器治疗辅助缓解症状。
同步更新到:AI加油站
---------END--------