Analysis of the Quality and Readability of Thyroid Cancer-related Information in Large Language Models Based on TikTok Index postprint

Author: XUE Mengyuan ¹ PENG Yinghua ¹ NING Yanting ² MA Heng ¹ ZHAO Bohui ³ HUANG Yingtong ¹
Institute:

1. Head and Neck Surgery，National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital，Chinese Academy of Medical Sciences and Peking Union Medical College，Shenzhen，518116，China

2. Department of Nursing，National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital，Chinese Academy of Medical Sciences and Peking Union Medical College，Shenzhen，518116，China

3. Department of Thoracic Surgery，National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital，Chinese Academy of Medical Sciences and Peking Union Medical College，Beijing，100021，China
Correspondent： 宁艳婷
Submit Time:2025-07-14 16:57:32

Abstract: Background　Large language models（LLMs） are gaining public familiarity and are increasingly adopted in healthcare contexts. Thyroid cancer represents a common malignancy in China，where patients express substantial unmet needs for evidence-based disease information. Nevertheless，no studies have assessed the quality and readability of LLM-generated responses regarding thyroid cancer in the Chinese context. Objective　To evaluate and compare the quality and readability of responses generated by domestic large language models（LLMs） to thyroid cancer-related queries. Methods　The Douyin Index was used to identify a set of 25 questions pertaining to thyroid cancer. Response texts were generated using DeepSeek （DeepSeek-R1-0120），Qwen（qwen-max-2025-01-25），and GLM（GLM-4Plus）. Cosine similarity is a metric used to evaluate the similarity between texts generated at different time points，thereby assessing the stability of the model. To assess the quality of the information，the modified version of the Health Information Quality Assessment Tool（mDISCERN） was employed. Additionally，the Chinese Readability Formula was utilized to evaluate the readability of the texts. To explore the differences in the quality and stability of response text information between models，the following methodologies are applied，cluster heatmaps，principal component analysis（PCA），Friedman tests，and signed rank tests. Additionally，Pearson correlation analysis is used to examine the relationship between information quality and readability. Results　The text similarity evaluation results show that the proportion of moderately similar texts on Deepseek is 12%，the proportion of highly similar texts is 88%，and the proportion of highly similar texts in the two responses of Qwen and GLM is 100%. A comparative analysis of information quality and readability across the three models showed statistically significant differences（P<0.001）. Specifically，DeepSeek demonstrated superior performance in terms of information quality，as indicated by a significant chi-squared test result（Z=35.396，P<0.001）. However，its readability was comparatively lower（R=7.525±1.006）. Qwen and GLM exhibited comparable information quality，with GLM outperforming in question clusters 2 and 3，while Qwen excelled in responding to question cluster 1. The overall correlation between information quality and readability was found to be negative（r=-0.370，P=0.010）. Conclusion　LLMs in China have significant potential to provide essential health education to patients with thyroid cancer. However，concerns have been raised regarding inaccuracies in the generated content and the occurrence of AI hallucinations. When patients actually apply LLMs to obtain health information，they should consider comprehensively in combination with the response texts from different platforms and the doctor's suggestions. In terms of the model，it is necessary to balance the professionalism and popularity of the information and establish a medical content security review mechanism to ensure the accuracy and professionalism of the information.

大型语言模型甲状腺癌信息质量可读性分析医疗人工智能

Subject: Medicine, Pharmacy >> Clinical Medicine
Contribution： Under Review
Cite as: ChinaXiv:202507.00153 (or this version ChinaXiv:202507.00153V1)
DOI:10.12114/j.issn.1007-9572.2025.0142
CSTR:32003.36.ChinaXiv.202507.00153
Recommended references： 薛梦元,彭映华,宁艳婷,马恒,赵博慧,黄映彤.基于抖音指数的甲状腺癌问题集在大型语言模型中的信息质量及可读性分析.null.[DOI:10.12114/j.issn.1007-9572.2025.0142] (Click&Copy)

Version History

[V1]

2025-07-14 16:57:32

ChinaXiv:202507.00153V1

Download

Related Paper

1. 全科医生开具医疗运动处方促进体医融合	2025-07-21
2. 不同类型社区居民的健康教育需求研究	2025-07-21
3. 标准弥散模拟器在肺弥散功能检查仪器质量评估中的应用研究	2025-07-21
4. 复发或转移性宫颈癌的药物治疗进展	2025-07-21
5. 中医药治疗消化性溃疡随机对照试验结局指标现状研究	2025-07-21
6. 以“专科-全科”特色家医签约服务为依托的共同照护模式对1例糖尿病足全程管理实践并文献复习	2025-07-18
7. 脑梗死后预后相关生物标志物的研究新进展：机制与临床应用	2025-07-17
8. 绝经后女性内脏脂肪指数与心血管疾病的关联性分析：前瞻性队列研究	2025-07-17
9. 《痔中西医结合诊疗指南》临床问题与结局指标的收集与确定	2025-07-17
10. 中国脑卒中后认知障碍患病趋势及影响因素的Meta分析	2025-07-17


Public comments Anonymous comments Send only to author