































ÓªÒµ×Éѯ
400-893-8989
ÐÐÕþ×Ü»ú
010-64303888
΢ÐÅɨÃè¶þάÂë Á¬Ã¦ÔÚÏß×Éѯ
»ùÓÚWord2vecºÍLPAµÄÁìÓò¿´·¨±í´ïÒªÁ켰ϵͳ
2024-03-26

±¾·¢Ã÷Ìá³öÒ»ÖÖ»ùÓÚWord2vecºÍLPAµÄÁìÓò¿´·¨±í´ïÒªÁ켰ϵͳ£¬£¬£¬£¬£¬£¬ËùÊöÒªÁì°üÀ¨£ºÍ¨¹ýWord2vecÄ£×Ó»ñÈ¡ÁìÓòÒªº¦´ÊÜöÝÍ£»£»£»£»£»£»»ùÓÚËùÊöÁìÓòÒªº¦´ÊÜöÝÍ£¬£¬£¬£¬£¬£¬Í¨¹ýÓïÒåÁ´ÍøÂç¹¹½¨ÁìÓò¿´·¨±í´ïÍøÂ磻£»£»£»£»£»Ê¹ÓÃLPA¶ÔËùÊöÁìÓò¿´·¨±í´ïÍøÂç¾ÙÐÐÉçÇø»®·Ö£¬£¬£¬£¬£¬£¬¹¹½¨ÒÔ½¹µã¿´·¨´ÊΪÖ÷¡¢À©Õ¹¿´·¨´ÊΪÔö²¹µÄÁìÓò¿´·¨±í´ïÄ£×Ó¡£¡£¡£¡£¡£¡£¡£¡£Í¨¹ý±¾ÉêÇ룬£¬£¬£¬£¬£¬ÎÞÐèÐÐÒµÉîÈëÁìÓò֪ʶ¾Í¿ÉÒÔ¹¹½¨ÍêÕûµÄÁìÓòÏà¹Ø¿´·¨ÜöÝÍ¡£¡£¡£¡£¡£¡£¡£¡£
Ò»ÖÖ»ùÓÚWord2vecºÍLPAµÄÁìÓò¿´·¨±í´ïÒªÁ죬£¬£¬£¬£¬£¬ÆäÌØÕ÷ÔÚÓÚ£¬£¬£¬£¬£¬£¬ËùÊöÒªÁì°üÀ¨Èçϰ취£ºÜöÝÍ»ñÈ¡°ì·¨£ºÍ¨¹ýWord2vecÄ£×Ó»ñÈ¡ÁìÓòÒªº¦´ÊÜöÝÍ£»£»£»£»£»£»ÍøÂç¹¹½¨°ì·¨£º»ùÓÚËùÊöÁìÓòÒªº¦´ÊÜöÝÍ£¬£¬£¬£¬£¬£¬Í¨¹ýÓïÒåÁ´ÍøÂç¹¹½¨ÁìÓò¿´·¨±í´ïÍøÂ磻£»£»£»£»£»Ä£×Ó¹¹½¨°ì·¨£ºÊ¹ÓÃLPA¶ÔËùÊöÁìÓò¿´·¨±í´ïÍøÂç¾ÙÐÐÉçÇø»®·Ö£¬£¬£¬£¬£¬£¬¹¹½¨ÒÔ½¹µã¿´·¨´ÊΪÖ÷¡¢À©Õ¹¿´·¨´ÊΪÔö²¹µÄÁìÓò¿´·¨±í´ïÄ£×Ó¡£¡£¡£¡£¡£¡£¡£¡£

ÉêÇëºÅ£ºCN202011437915.0
ÉêÇ루רÀûȨ£©ÈË£ºbtt²©ÌìÌÃ
¹ûÕæÈÕÆÚ£¨¹ûÕæ£©£º2021.03.12
¹ûÕæÈÕÆÚ£¨ÊÚȨ£©£º2024.03.26
ÍÆ¼öÔĶÁ
È«ÇòË«°ñSOTA£¡Ã÷ÂԿƼ¼×¨ÓдóÄ£×Ó Mano¿ªÆôGUIÖÇÄܲÙ×÷ÐÂʱ´ú
2025-09-28
Ã÷ÂԿƼ¼ÍƳöµÄGUI´óÄ£×ÓManoÔÚMind2WebºÍOSWorldÁ½´ó»ù×¼²âÊÔÖÐÈ¡µÃÁË´´¼Í¼µÄSOTAЧ¹û£¬£¬£¬£¬£¬£¬ÀÖ³ÉÂʵִï40.1%¡£¡£¡£¡£¡£¡£¡£¡£Í¨¹ýÔÚÏßÇ¿»¯Ñ§Ï°ºÍ×Ô¶¯Êý¾ÝÊÕÂÞ£¬£¬£¬£¬£¬£¬ManoΪGUIÖÇÄÜÌåÁìÓòÌṩÁË¿ÉÀ©Õ¹µÄз¶Ê½£¬£¬£¬£¬£¬£¬ÏÔÖøÌáÉýÁËÖØ´óʹÃüµÄÖ´ÐÐÄÜÁ¦¡£¡£¡£¡£¡£¡£¡£¡£¸ÃÊÖÒÕµÄÍ»ÆÆ²»µ«Íƶ¯ÁË×Ô¶¯»¯½çÏßµÄÀ©Õ¹£¬£¬£¬£¬£¬£¬Ò²ÎªÆóÒµÖÇÄÜ»¯×ªÐÍÌṩÁËÇ¿ÓÐÁ¦µÄÖ§³Ö
Ïàʶ¸ü¶à
¹ÙÐû£¡Ã÷ÂԿƼ¼ÍƳöרÓдóÄ£×Ó²úÆ·ÏßDeepMiner£¬£¬£¬£¬£¬£¬¿ÉÐÅÉÌÒµÊý¾ÝÆÊÎöÖÇÄÜÌåÖÕÓÚÄÜÓÃÁË£¡
2025-09-22
Ã÷ÂԿƼ¼ÍƳöרÓдóÄ£×Ó²úÆ·ÏßDeepMiner£¬£¬£¬£¬£¬£¬ÒÔ¡°¿ÉÐÅÖÇÄÜÌå+¿ÉÐÅÊý¾Ý¡±Ë«ÂÖÇý¶¯£¬£¬£¬£¬£¬£¬½â¾öÆóÒµÖÇÄÜÌåÂ䵨ÖлþõÂʸߡ¢Àú³Ì²»Í¸Ã÷µÈÍ´µã¡£¡£¡£¡£¡£¡£¡£¡£¸Ã²úƷͨ¹ý¶àÖÇÄÜÌåÐͬ¼Ü¹¹¡¢ÆóÒµ¼¶Êý¾ÝÕûºÏ¼°È«Á÷³Ì͸Ã÷»¯Éè¼Æ£¬£¬£¬£¬£¬£¬½µµÍ»Ã¾õÂʲ¢Ö§³Ö֪ʶ³Áµí£¬£¬£¬£¬£¬£¬Æä×ÔÑÐManoºÍCitoÄ£×Ó»®·ÖʵÏÖ¾«×¼Ö´ÐÐÓëÉî¶ÈÍÆÀí£¬£¬£¬£¬£¬£¬ÖúÁ¦ÆóÒµ¹¹½¨¿ÉÐÅÉú²úÁ¦¡£¡£¡£¡£¡£¡£¡£¡£
Ïàʶ¸ü¶à
Ã÷ÂԿƼ¼ Mano Technical Report
2025-09-18
Graphical user interfaces (GUIs) are the primary medium for human-computer interaction, yet automating GUI interactions remains challenging due to the complexity of visual elements, dynamic environments, and the need for multi-step reasoning. Existing methods based on vision-language models (VLMs) often suffer from limited resolution, domain mismatch, and insufficient sequential decisionmaking capability. To address these issues, we propose Mano, a robust GUI agent built upon a multi-modal foundation model pre-trained on extensive web and computer system data. Our approach integrates a novel simulated environment for high-fidelity data generation, a three-stage training pipeline (supervised fine-tuning, offline reinforcement learning, and online reinforcement learning), and a verification module for error recovery. Mano demonstrates state-of-the-art performance on multiple GUI benchmarks, including Mind2Web and OSWorld, achieving significant improvements in success rate and operational accuracy. Our work provides new insights into the effective integration of reinforcement learning with VLMs for practical GUI agent deployment, highlighting the importance of domain-specific data, iterative training, and holistic reward design.
Ïàʶ¸ü¶àÁªÏµbtt²©ÌìÌÃ
¹Ø×¢Ã÷ÂÔ
ÔÚÏß×Éѯ
ÏúÊÛÈÈÏß
400-893-8989
Ͷ×ÊÕß¹ØÏµ
ir@mininglamp.com
ýÌåÁªÂç
pr@miningalmp.com
Copyright@2025 btt²©ÌìÌà ¾©ICP±¸15016868ºÅ ¾©¹«Íø°²±¸11010802024262
ÐÅÏ¢Ìîд
