Utilizing Large Language Models to Analyze PSR.exe Recorded Input for Computer Use

автор YUAN Tianyu ¹
Место работы автора

1. HKU
автор-корреспондент： YUAN Tianyu Email:u3588064@connect.hku.hk
Время подачи:2025-03-21 22:12:50

Краткое изложение: The rapid advancement of Large Language Models (LLMs) has opened new frontiers in automating complex workflows. This paper explores an innovative approach to computer use simulation by leveraging Large Language Models (LLMs) to parse and interpret data recorded by PSR.exe, a tool designed to capture user’s mouse and keyboard operations. We propose a method to extract, analyze, and replicate user interactions recorded in MHT files. By decoding screenshots and extracting action sequences, we aim to develop an automated process that enables applications to emulate user operations effectively. The workflow combines BeautifulSoup for XML parsing, base64 for image decoding, and LLMs for semantic analysis. Results show that our method is lightweight, versatile, and capable of ensuring precision and adaptability while reducing dependency on external tracking tools.

LLM PSR.exe computer use workflow

из 袁天宇
Категоризация： 计算机科学 >> 计算机应用技术
Состояние представления： 未投稿
Цитировать： ChinaXiv:202501.00152 (или эта версия ChinaXiv:202501.00152V2)
DOI:10.12074/202501.00152
CSTR:32003.36.ChinaXiv.202501.00152
科创链TXID： 973054dc-2b50-4bc6-8313-a57bb8d543a4
Рекомендуемое цитирование： YUAN Tianyu.Utilizing Large Language Models to Analyze PSR.exe Recorded Input for Computer Use.null.[DOI:10.12074/202501.00152] (Нажмите здесь, чтобы скопировать)

История версий

[V2]	2025-03-21 22:12:50	ChinaXiv:202501.00152V2	Скачать полный текст
[V1]	2025-01-14 00:11:04	ChinaXiv:202501.00152v1 Посмотреть эту версию	Скачать полный текст

Рекомендуемые статьи

1. DO-RAG: A Domain-Specific QA Framework Using Knowledge Graph-Enhanced Retrieval-Augmented Generation	2025-05-20
2. Recent Advances in Robotic Navigation via Large Language Models	2025-03-06
3. An improved 3D SIFT applied to estimate volume displacement field	2025-01-06
4. Level-Navi Agent: A Framework and benchmark for Chinese Web Search Agents	2024-12-25
5. Empowering Large Language Models to Edge Intelligence: A Survey of Edge Efficient LLMs and Techniques	2024-11-25
6. Animating the Past: Reconstruct Trilobite via Video Generation	2024-11-12
7. A Distributed Software Framework for Vision-Based Drone Swarm Applications	2024-09-25
8. DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving	2024-09-14
9. Segment Anything for Videos: A Systematic Survey	2024-08-05
10. Guiding Large Language Models to Generate Computer-Parsable Content	2024-04-23


Комментарии публики Анонимный комментарий Распространяется только для авторов