lynx   »   [go: up one dir, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n","updatedAt":"2024-01-27T01:22:02.861Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[{"reaction":"šŸ‘","users":["davanstrien","undoing"],"count":2}],"isReport":false}},{"id":"65cacf8e7faf059c56a93f9d","author":{"_id":"62f74387ebf2bab9932564b7","avatarUrl":"/avatars/20accb6d5780bae134e8b266068c4eae.svg","fullname":"krishna praveen","name":"krishnapraveen","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3},"createdAt":"2024-02-13T02:10:22.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is cool","html":"

This is cool

\n","updatedAt":"2024-02-13T02:10:22.204Z","author":{"_id":"62f74387ebf2bab9932564b7","avatarUrl":"/avatars/20accb6d5780bae134e8b266068c4eae.svg","fullname":"krishna praveen","name":"krishnapraveen","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9738481044769287},"editors":["krishnapraveen"],"editorAvatarUrls":["/avatars/20accb6d5780bae134e8b266068c4eae.svg"],"reactions":[],"isReport":false}},{"id":"6665489a382f6b0a96d9b6ae","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":142},"createdAt":"2024-06-09T06:15:54.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"# WebVoyager: Revolutionizing Web Navigation with AI-Powered Multimodal Models\n\nhttps://cdn-uploads.huggingface.co/production/uploads/6186ddf6a7717cb375090c01/js1Rb7b9C0ezukAVwzMRn.mp4 \n\n## Links šŸ”—:\nšŸ‘‰ Subscribe: https://www.youtube.com/@Arxflix\nšŸ‘‰ Twitter: https://x.com/arxflix\nšŸ‘‰ LMNT (Partner): https://lmnt.com/\n\n\nBy Arxflix\n![9t4iCUHx_400x400-1.jpg](https://cdn-uploads.huggingface.co/production/uploads/6186ddf6a7717cb375090c01/v4S5zBurs0ouGNwYj1GEd.jpeg)","html":"

WebVoyager: Revolutionizing Web Navigation with AI-Powered Multimodal Models

\n

\n\n

Links šŸ”—:

\n

šŸ‘‰ Subscribe: https://www.youtube.com/@Arxflix
šŸ‘‰ Twitter: https://x.com/arxflix
šŸ‘‰ LMNT (Partner): https://lmnt.com/

\n

By Arxflix
\"9t4iCUHx_400x400-1.jpg\"

\n","updatedAt":"2024-06-09T06:15:54.276Z","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":142}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5291808843612671},"editors":["blanchon"],"editorAvatarUrls":["/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg"],"reactions":[],"isReport":false}},{"id":"66f52ab9e927a78eea024326","author":{"_id":"65db934e0af7e21ba4c2e064","avatarUrl":"/avatars/f524e35343f253e413cf3c8fe681de14.svg","fullname":"AnkitaSinha","name":"Ankita015","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false},"createdAt":"2024-09-26T09:34:49.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"When I run my agent, it is not taking the prompt correctly. I tried to debug the run.py code, but there is no success. Can someone help? \n","html":"

When I run my agent, it is not taking the prompt correctly. I tried to debug the run.py code, but there is no success. Can someone help?

\n","updatedAt":"2024-09-26T09:34:49.909Z","author":{"_id":"65db934e0af7e21ba4c2e064","avatarUrl":"/avatars/f524e35343f253e413cf3c8fe681de14.svg","fullname":"AnkitaSinha","name":"Ankita015","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9619443416595459},"editors":["Ankita015"],"editorAvatarUrls":["/avatars/f524e35343f253e413cf3c8fe681de14.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2401.13919","authors":[{"_id":"65b318d4dfa3789ef34877c4","user":{"_id":"64e863d59c7e43f8a056a332","avatarUrl":"/avatars/44e411f06b70a9a030a452b8d5f8e663.svg","isPro":false,"fullname":"Hongliang He","user":"Hongliang1997","type":"user"},"name":"Hongliang He","status":"admin_assigned","statusLastChangedAt":"2024-01-26T09:58:34.239Z","hidden":false},{"_id":"65b318d4dfa3789ef34877c5","user":{"_id":"634f18e4aae4bde2c8e2adca","avatarUrl":"/avatars/40549a59fc5ba04a4baa5a1d5dba0847.svg","isPro":false,"fullname":"Wenlin Yao","user":"wenlinyao","type":"user"},"name":"Wenlin Yao","status":"admin_assigned","statusLastChangedAt":"2024-01-26T09:58:41.216Z","hidden":false},{"_id":"65b318d4dfa3789ef34877c6","user":{"_id":"64ae4f6280f308a395fd7c19","avatarUrl":"/avatars/5f1330f8187cd5e66aa517303659f110.svg","isPro":false,"fullname":"Kaixin Ma","user":"kaixinm","type":"user"},"name":"Kaixin Ma","status":"admin_assigned","statusLastChangedAt":"2024-01-26T09:58:52.341Z","hidden":false},{"_id":"65b318d4dfa3789ef34877c7","user":{"_id":"5feab3a28a3201f8e554c969","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1660795228685-5feab3a28a3201f8e554c969.png","isPro":false,"fullname":"Wenhao Yu","user":"wyu1","type":"user"},"name":"Wenhao Yu","status":"admin_assigned","statusLastChangedAt":"2024-01-26T09:59:25.043Z","hidden":false},{"_id":"65b318d4dfa3789ef34877c8","name":"Yong Dai","hidden":false},{"_id":"65b318d4dfa3789ef34877c9","name":"Hongming Zhang","hidden":false},{"_id":"65b318d4dfa3789ef34877ca","user":{"_id":"62ce6dd785cfd21c04c7e6f5","avatarUrl":"/avatars/89837a5dea6e2d753d59caad142bed4a.svg","isPro":false,"fullname":"ZhenzhongLan","user":"DannyLan","type":"user"},"name":"Zhenzhong Lan","status":"admin_assigned","statusLastChangedAt":"2024-01-26T10:00:19.736Z","hidden":false},{"_id":"65b318d4dfa3789ef34877cb","name":"Dong Yu","hidden":false}],"publishedAt":"2024-01-25T03:33:18.000Z","submittedOnDailyAt":"2024-01-25T23:58:38.432Z","title":"WebVoyager: Building an End-to-End Web Agent with Large Multimodal\n Models","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"The advancement of large language models (LLMs) leads to a new era marked by\nthe development of autonomous applications in the real world, which drives\ninnovation in the creation of advanced web-based agents. Existing web agents\ntypically only handle one input modality and are evaluated only in simplified\nweb simulators or static web snapshots, greatly limiting their applicability in\nreal-world scenarios. To bridge this gap, we introduce WebVoyager, an\ninnovative Large Multimodal Model (LMM) powered web agent that can complete\nuser instructions end-to-end by interacting with real-world websites. Moreover,\nwe propose a new evaluation protocol for web agents to address the challenges\nof automatic evaluation of open-ended web agent tasks, leveraging the robust\nmultimodal comprehension capabilities of GPT-4V. We create a new benchmark by\ngathering real-world tasks from 15 widely used websites to evaluate our agents.\nWe show that WebVoyager achieves a 55.7% task success rate, significantly\nsurpassing the performance of both GPT-4 (All Tools) and the WebVoyager\n(text-only) setups, underscoring the exceptional capability of WebVoyager in\npractical applications. We found that our proposed automatic evaluation\nachieves 85.3% agreement with human judgment, paving the way for further\ndevelopment of web agents in a real-world setting.","upvotes":32,"discussionId":"65b318d6dfa3789ef3487856","ai_summary":"WebVoyager, a Large Multimodal Model, enhances web agent performance by handling multiple modalities and real-world web interactions, achieving high success rates in practical tasks.","ai_keywords":["Large Language Models","Large Multimodal Model","WebVoyager","GPT-4V","web agents","multimodal comprehension","task success rate","automatic evaluation"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"62f74387ebf2bab9932564b7","avatarUrl":"/avatars/20accb6d5780bae134e8b266068c4eae.svg","isPro":false,"fullname":"krishna praveen","user":"krishnapraveen","type":"user"},{"_id":"635964636a61954080850e1d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/635964636a61954080850e1d/0bfExuDTrHTtm8c-40cDM.png","isPro":false,"fullname":"William Lamkin","user":"phanes","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"64495e3c958a6aca061119f2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64495e3c958a6aca061119f2/-D9CURxArrhYAc9oFdsO6.png","isPro":true,"fullname":"Cahlen Humphreys","user":"cahlen","type":"user"},{"_id":"60599b0cc0e16cb4d7e25ecf","avatarUrl":"/avatars/14a8573792fe3929bc2589a01f76540d.svg","isPro":false,"fullname":"Muhammad Ali Arshad","user":"AliArshad","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"6429e5fb16d4d8293c990a03","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6429e5fb16d4d8293c990a03/PzG628dn5rBE5b-vzgZAc.jpeg","isPro":false,"fullname":"Juan Stoppa","user":"jstoppa","type":"user"},{"_id":"644e1b1d9b4e87c31bab0a14","avatarUrl":"/avatars/88bb4c4a67dc8958069e9014f5e73a0b.svg","isPro":false,"fullname":"Michael Barry","user":"MichaelBarryUK","type":"user"},{"_id":"634f18e4aae4bde2c8e2adca","avatarUrl":"/avatars/40549a59fc5ba04a4baa5a1d5dba0847.svg","isPro":false,"fullname":"Wenlin Yao","user":"wenlinyao","type":"user"},{"_id":"645ca870680734460f9a9c79","avatarUrl":"/avatars/cbee433affd41d6fe09e30655c018ae5.svg","isPro":true,"fullname":"Ototao","user":"ototao","type":"user"},{"_id":"6555125a4f361968f0e3aad7","avatarUrl":"/avatars/e7692d82804338f21ecdc6e731f5c5ea.svg","isPro":false,"fullname":"marinaretikof","user":"marinaretik","type":"user"},{"_id":"639c379cdb7c5f35004066cb","avatarUrl":"/avatars/3e435506ee85aa7d2d0ec2174a07462f.svg","isPro":false,"fullname":"Zhenran Xu","user":"imryanxu","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2401.13919

WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models

Published on Jan 25, 2024
Ā· Submitted by AK on Jan 25, 2024
Authors:
,
,

Abstract

WebVoyager, a Large Multimodal Model, enhances web agent performance by handling multiple modalities and real-world web interactions, achieving high success rates in practical tasks.

AI-generated summary

The advancement of large language models (LLMs) leads to a new era marked by the development of autonomous applications in the real world, which drives innovation in the creation of advanced web-based agents. Existing web agents typically only handle one input modality and are evaluated only in simplified web simulators or static web snapshots, greatly limiting their applicability in real-world scenarios. To bridge this gap, we introduce WebVoyager, an innovative Large Multimodal Model (LMM) powered web agent that can complete user instructions end-to-end by interacting with real-world websites. Moreover, we propose a new evaluation protocol for web agents to address the challenges of automatic evaluation of open-ended web agent tasks, leveraging the robust multimodal comprehension capabilities of GPT-4V. We create a new benchmark by gathering real-world tasks from 15 widely used websites to evaluate our agents. We show that WebVoyager achieves a 55.7% task success rate, significantly surpassing the performance of both GPT-4 (All Tools) and the WebVoyager (text-only) setups, underscoring the exceptional capability of WebVoyager in practical applications. We found that our proposed automatic evaluation achieves 85.3% agreement with human judgment, paving the way for further development of web agents in a real-world setting.

Community

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

This is cool

WebVoyager: Revolutionizing Web Navigation with AI-Powered Multimodal Models

Links šŸ”—:

šŸ‘‰ Subscribe: https://www.youtube.com/@Arxflix
šŸ‘‰ Twitter: https://x.com/arxflix
šŸ‘‰ LMNT (Partner): https://lmnt.com/

By Arxflix
9t4iCUHx_400x400-1.jpg

When I run my agent, it is not taking the prompt correctly. I tried to debug the run.py code, but there is no success. Can someone help?

Sign up or log in to comment

Models citing this paper 7

Browse 7 models citing this paper

Datasets citing this paper 2

Spaces citing this paper 11

Collections including this paper 8

Š›ŃƒŃ‡ŃˆŠøŠ¹ частный хостинг