| 263 | 0 | 58 |
| 下载次数 | 被引频次 | 阅读次数 |
以ChatGPT-4o、DeepSeek-R1、豆包-1.5、Kimi-K1.5四种主流大语言模型为对象,采用2(有vs.无大五人格设定)×4(模型类型)实验设计,构建虚拟被试资料以复刻新闻传播学实验。结果显示,各模型拟合表现存在差异:DeepSeek-R1在模拟真人平均趋势与行为变异性方面最优;ChatGPT-4o的总体方差拟合偏差较大,但主效应、间接效应复刻的准确性与稳定性较为突出。人格设定的影响方面,无大五人格组描述性统计更贴近真人,而有大五人格组因果效应复刻更稳定,唯一完整复刻两个主效应的模型即来自该组。中介效应复刻成功率偏低,但人格设定可在一定程度上缓解模型输出的方向与效应偏离趋势。此外,研究基于ChatGPT-4o进一步发现,实验对象类型(真人组vs.无大五人格组vs.有部分大五人格组vs.有全部大五人格组)对主效应与中介机制部分产生显著调节作用,其中大五人格设定可一定程度抑制模型极端响应。研究实现多模型横向比较与人格设定控制下的复刻实验,验证了大五人格设定对提升模型模拟精度的积极作用,同时指出模型复刻复杂心理机制的局限性,推动传播学实验向“人机共演”的新范式转变,也拓展“媒介即延伸”在智能传播语境下的现实外延。
Abstract:This study selected four mainstream large language models—ChatGPT-4o, DeepSeek-R1, Doubao-1.5, and Kimi-K1.5—as experimental subjects, employing a 2(with vs. without Big Five personality settings) × 4(model type) factorial design to construct virtual participants to replicate a communication experiment.Findings show clear performance differences: DeepSeek-R1 best approximated human averages and behavioral variability, while ChatGPT-4o displayed larger variance-fitting deviations but produced highly accurate and stable replications of main and indirect effects.Without personality prompts, models more closely matched human descriptive statistics; with personality prompts, causal-effect replication improved, and the only complete reproduction of two main effects occurred in this condition.Mediating effects remained difficult to replicate, though personality prompting helped reduce directional and magnitude deviations.Additional analyzes with ChatGPT-4o indicate that participant type(human vs.no-personality vs.partial-personality vs.full-personality groups)significantly moderates certain main effects and mediation pathways, with personality prompts suppressing extreme responses.This study provides cross-model comparison under personality-prompt conditions, demonstrating the value of Big Five prompting for enhancing simulation fidelity while underscoring LLMs' limitations in modeling complex psychological mechanisms.
[1] BAIL C A.Can generative AI improve social science?Proceedings of the national academy of sciences of the united states of America,2024,121(21):e2314021121.[2025-07-21].https://doi.org/10.1073/pnas.2314021121.
[2] WANG L,MA C,FENG X,et al.A survey on large language model based autonomous agents.(2025-03-02)[2025-07-21].https://doi.org/10.48550/arXiv.2308.11432.
[3] ARGYLE L P ,BUSBY E C ,FULDA N ,et al.Out of one,many:using language models to simulate human samples.Political analysis,2023,31(3):337-351.
[4] SURI G ,SLATER L R ,ZIAEE A ,et al.Do large language models show decision heuristics similar to humans?A case study using gpt-3.5.Journal of experimental psychology:general,2024,153(4):1066-1075.
[5] BINZ M,AKATA E,BETHGE M,et al.A foundation model to predict and capture human cognition.Nature,2025,644:1002-1009.
[6] YIGIT G,BAYRAKTAR R.Chatbot development strategies:a review of current studies and applications.Knowledge and information systems,2025,67(9):7319-7354.
[7] SHAPIRA E,MADMON O,REICHART R,et al.Can LLMs replace economic choice prediction labs?The case of language-based persuasion games.(2024-01-30)[2025-07-21].https://doi.org/10.48550/arXiv.2401.17435.
[8] YEYKELIS L,PICHAI K,CUMMINGS J J,et al.Using large language models to create AI personas for replication,generalization and prediction of media effects:an empirical test of 133 published experimental research findings.(2024-08-28)[2025-07-21].https://doi.org/10.48550/arXiv.2408.16073.
[9] LI P,CASTELO N,KATONA Z,SARVARY M.Frontiers:determining the validity of large language models for automated perceptual analysis.Marketing science,2024,43(2):254-266.
[10] BACHMANN F,VAN DER WEIJIDEN D,HEITZ L,et al.Adaptive political surveys and GPT-4:tackling the cold start problem with simulated user interactions.(2025-03-12)[2025-07-21].https://doi.org/10.48550/arXiv.2503.09311.
[11] CUI Z,LI N,ZHOU H.Can large language models replace human subjects?A large-scale replication of scenario-based experiments in psychology and management.(2025-06-20)[2025-07-03].https://doi.org/10.48550/arXiv.2409.00128.
[12] CHO S,KIM J,KIM J H.LLM-Based Doppelg?nger models:leveraging ynthetic data for human-like responses in survey simulations.IEEE access,2024,12:178917-178927.[2025-07-21].https://doi.org/10.1109/ACCESS.2024.3502219.
[13] GUHA N,NYARKO J,HO D E,et al.LegalBench:a collaboratively built benchmark for measuring legal reasoning in large language models.(2023-08-20)[2025-07-21].https://doi.org/10.48550/arXiv.2308.11462.
[14] JIANG H,ZHANG X,CAO X,et al.PersonaLLM:investigating the ability of large language models to express personality traits.(2024-04-02)[2025-07-23].https://doi.org/10.48550/arXiv.2305.02547.
[15] YUTA I,HIDEO J.Effect of LLM's personality traits on query generation.Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region (SIGIR-AP 2024),Tokyo,2024:249-258.(2024-12-08)[2025-07-21].https://dl.acm.org/doi/10.1145/3673791.3698433.
[16] HU T,COLLIER N.Quantifying the persona effect in LLM simulations.(2024-06-17)[2025-07-21].https://doi.org/10.48550/arXiv.2402.10811.
[17] PETROV N B,SERAPIO-GARCíA G,RENTFROW J.Limited ability of LLMs to simulate human psychological behaviours:a psychometric analysis.(2024-05-12)[2025-07-21].https://doi.org/10.48550/arXiv.2405.07248.
[18] 曾秀芹,陈敏,刘旭阳.社交媒体广告信息呈现策略对分享意愿的作用机制研究.新闻大学,2024(6):102-117+123.
[19] COSTA P T,MCCRAE R R.Revised NEO personality inventory and NEO Five-Factor inventory.Odessa:Psychological Assessment Resources,1992.
[20] 姚若松,梁乐瑶.大五人格量表简化版(NEO-FFI)在大学生人群的应用分析.中国临床心理学杂志,2010,18(4):457-459.
[21] YANG Y,DUAN H,LIU J,et al.LLM-Measure:generating valid,consistent,and reproducible text-based measures for social science research.(2024-09-19)[2025-07-21].https://doi.org/10.48550/arXiv.2409.12722.
[22] Andreessen Horowitz.Top100 Gen AI Consumer Apps.(2025-03-06)[2025-07-03].https://a16z.com/100-gen-ai-apps-4/.
[23] Open Science Collaboration.Estimating the reproducibility of psychological science.Science,2015,349(6251):aac4716.
[24] ANDERSON S F,MAXWELL S E.There’s more than one way to conduct a replication study:beyond statistical significance.Psychological methods,2016,21(1):1-12.
[25] VAN DER LINDEN D,TE NIJENHUIS J,BAKKER A B.The general factor of personality:a meta-analysis of big five intercorrelations and a criterion-related validity study.Journal of research in personality,2010,44(3):315-327.
[26] PEREZ E,RINGER S,LUKOSIUTE K,et al.Discovering language model behaviors with model-written evaluations.(2022-12-19)[2025-09-16].https://doi.org/10.48550/arXiv.2212.09251.
[27] XIE Y,XIE Y.Variance reduction in output from generative AI.(2025-03-02)[2025-07-21].https://doi.org/10.48550/arXiv.2503.01033.
[28] CHEN J,WANG X,XU R,et al.From persona to personalization:a survey on role-playing language agents.(2024-10-09)[2025-07-21].https://doi.org/10.48550/arXiv.2404.18231.
[29] BECK T,SCHUFF H,LAUSCHER A,et al.Sensitivity,performance,robustness:Deconstructing the effect of sociodemographic prompting.(2024-02-08)[2025-07-21].https://doi.org/10.48550/arXiv.2309.07034.
[30] CHENG M,DURMUS E,JURAFSKY D.Marked personas:using natural language prompts to measure stereotypes in language models.(2023-05-29)[2025-07-21].https://doi.org/10.48550/arXiv.2305.18189.
[31] WANG Y,ZHAO J,ONES D Z,et al.Evaluating the ability of large language models to emulate personality.Scientific reports,2025,15(1):1-9.[2025-07-21].https://doi.org/10.1038/s41598-024-84109-5.
(1)模型假设图、流程图等图表均见附录,获取链接为:https://osf.io/d3q7r/overview?view_only=84ffc43993b84cf4904e915a0ac60635。复制获取链接到浏览器中时,请注意删除字母符号间的多余空格。
基本信息:
DOI:10.14086/j.cnki.xwycbpl.2026.01.003
中图分类号:TP18;G206
引用信息:
[1]曾秀芹,陈珂璐.谁更像“人”?模型类型与人格设定对大语言模型复刻传播学实验准确率的影响[J].新闻与传播评论,2026,79(01):25-39.DOI:10.14086/j.cnki.xwycbpl.2026.01.003.
基金信息:
国家社会科学基金重点项目(22FXWA001)