基于语音特征的抑郁症 AI 筛查模型的研究与设计

© 2025 by the Author. Licensee Art and Design, USA. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution -Noncommercial 4.0 International License (CC BY-NC 4.0) ( https://creativecommons.org/licenses/by-nc/4.0/ )

Download PDF

Cite

XML

HTML

Abstract

针对传统抑郁症量表 [2] 筛查效率低的问题，本研究提出基于语音特征的自动筛查模型。通过采集 200例临床患者和健康个体的语音样本，经预处理提取特征后，构建结合 LSTM 时间建模与 Attention 机制的深度学习模型。测试显示模型准确率达 84.62%，F1分数 0.86，在效率和一致性上优于传统量表。

Keywords

抑郁症

语音特征

LSTM-Attention 机制

深度学习

心理健康筛查

References

[1] 世界卫生组织 . 抑郁症及其他常见精神障碍 : 全球卫生估算报告 [R]. 瑞士 : 世界卫生组织 , 2022.

[2]World Health Organization. The ICD-10 classification of mental and behavioural disorders: Clinical descriptions and diagnostic guidelines[M]. Geneva: WHO, 1992.

[3]Kroenke K, Spitzer R L. The PHQ-9: A new depression diagnostic and severity measure[J]. Psychiatric Annals, 2002, 32(9): 509-515.

[4]Hamilton M. A rating scale for depression[J]. Journal of Neurology, Neurosurgery & Psychiatry, 1960, 23(1): 56-62.

[5]Donoho D L, Johnstone I M. Ideal spatial adaptation by wavelet shrinkage[J]. Biometrika, 1994, 81(3): 425-455.

[6]Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation,1997, 9(8): 1735-1780.

[7]Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[J]. arXiv preprint arXiv:1409.0473, 2014.

[8]Cooley J W, Tukey J W. An algorithm for the machine calculation of complex Fourier series[J]. Mathematics of Computation, 1965, 19(90): 297-301.

[9]Kingma D P, Ba J. Adam: A method for stochastic optimization[J]. arXiv preprint arXiv:1412.6980, 2014.

[10]Vaswani A, et al. Attention is all you need[C]. Advances in Neural Information Processing Systems, 2017: 5998-6008.

[11]Cummins N, et al. A review of depression and suicide risk assessment using speech analysis[J]. Speech Communication, 2015, 71: 10-49.

[12]He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE, 2016: 770-778.

[13]Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[J/OL]. arXiv:1810.04805, 2018.

[14]Valstar M, Schuller B, Smith K, et al. AVEC 2016: Depression, mood, and emotion recognition workshop and challenge[C]//Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge. Amsterdam, Netherlands: ACM,2016: 3-10.

[15]Li X, Pang T, Liu Y, et al. Multimodal fusion for mental health assessment[J]. IEEE Transactions on Affective Computing, 2021, 12(3): 582-595.

Previous article in this issue

Next article in this issue