lynx   »   [go: up one dir, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2024-02-06T01:21:55.682Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7564756870269775},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[{"reaction":"👍","users":["Ari108","drogozhang"],"count":2}],"isReport":false}},{"id":"6664a86132afcfa7bf4eb071","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":142},"createdAt":"2024-06-08T18:52:17.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"# Can AI Plan Your Next Vacation? Exploring TravelPlanner's Real-World Challenge!\n\nhttps://cdn-uploads.huggingface.co/production/uploads/6186ddf6a7717cb375090c01/wqK5TREKP3n9mWpxNJzkD.mp4 \n\n## Links 🔗:\n👉 Subscribe: https://www.youtube.com/@Arxflix\n👉 Twitter: https://x.com/arxflix\n👉 LMNT (Partner): https://lmnt.com/\n\n\nBy Arxflix\n![9t4iCUHx_400x400-1.jpg](https://cdn-uploads.huggingface.co/production/uploads/6186ddf6a7717cb375090c01/v4S5zBurs0ouGNwYj1GEd.jpeg)","html":"

Can AI Plan Your Next Vacation? Exploring TravelPlanner's Real-World Challenge!

\n

\n\n

Links 🔗:

\n

👉 Subscribe: https://www.youtube.com/@Arxflix
👉 Twitter: https://x.com/arxflix
👉 LMNT (Partner): https://lmnt.com/

\n

By Arxflix
\"9t4iCUHx_400x400-1.jpg\"

\n","updatedAt":"2024-06-08T18:52:17.343Z","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":142}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.4920908212661743},"editors":["blanchon"],"editorAvatarUrls":["/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2402.01622","authors":[{"_id":"65c047c1ffc0ba672fb05e38","user":{"_id":"62d65139667051e0a29bffe7","avatarUrl":"/avatars/0252aa2bcd4cf1c8e4b87e5f164b6da5.svg","isPro":false,"fullname":"Jian Xie","user":"hsaest","type":"user"},"name":"Jian Xie","status":"claimed_verified","statusLastChangedAt":"2024-02-05T10:49:09.328Z","hidden":false},{"_id":"65c047c1ffc0ba672fb05e39","user":{"_id":"63e0a50242591dda0b9dca5c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63e0a50242591dda0b9dca5c/c7cBPEBWQDFYimfGnO_SI.png","isPro":false,"fullname":"Kai Zhang","user":"drogozhang","type":"user"},"name":"Kai Zhang","status":"admin_assigned","statusLastChangedAt":"2024-02-05T12:08:08.315Z","hidden":false},{"_id":"65c047c1ffc0ba672fb05e3a","user":{"_id":"606ed1884ffe81d1e03e81e5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1639375346654-606ed1884ffe81d1e03e81e5.png","isPro":false,"fullname":"Jiangjie Chen","user":"jiangjiechen","type":"user"},"name":"Jiangjie Chen","status":"admin_assigned","statusLastChangedAt":"2024-02-05T12:08:15.379Z","hidden":false},{"_id":"65c047c1ffc0ba672fb05e3b","user":{"_id":"643f9e2288d9d4488fd81c52","avatarUrl":"/avatars/e589c9cbd47022883cf33d7555bee89c.svg","isPro":false,"fullname":"Tinghui Zhu","user":"DarthZhu","type":"user"},"name":"Tinghui Zhu","status":"admin_assigned","statusLastChangedAt":"2024-02-05T12:08:21.547Z","hidden":false},{"_id":"65c047c1ffc0ba672fb05e3c","user":{"_id":"6434a6e8ea46c009904c617e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6434a6e8ea46c009904c617e/7xrjdbIgj40FtB1UeVrld.jpeg","isPro":false,"fullname":"RENZE LOU","user":"Reza8848","type":"user"},"name":"Renze Lou","status":"admin_assigned","statusLastChangedAt":"2024-02-05T12:08:59.619Z","hidden":false},{"_id":"65c047c1ffc0ba672fb05e3d","user":{"_id":"6344cf73ee1504dbcd5bdfe7","avatarUrl":"/avatars/6dd2bf1f9c5679e5c8c85d62c9836aac.svg","isPro":false,"fullname":"Yuandong Tian","user":"tydsh","type":"user"},"name":"Yuandong Tian","status":"admin_assigned","statusLastChangedAt":"2024-02-05T12:09:06.327Z","hidden":false},{"_id":"65c047c1ffc0ba672fb05e3e","name":"Yanghua Xiao","hidden":false},{"_id":"65c047c1ffc0ba672fb05e3f","user":{"_id":"6477a323dbc2a416f8b852b3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6477a323dbc2a416f8b852b3/mRKW5kT9GASORT4YnaZz0.jpeg","isPro":false,"fullname":"Yu Su","user":"ysu-nlp","type":"user"},"name":"Yu Su","status":"claimed_verified","statusLastChangedAt":"2024-10-14T19:22:18.094Z","hidden":false}],"publishedAt":"2024-02-02T18:39:51.000Z","submittedOnDailyAt":"2024-02-05T01:27:28.418Z","title":"TravelPlanner: A Benchmark for Real-World Planning with Language Agents","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"Planning has been part of the core pursuit for artificial intelligence since\nits conception, but earlier AI agents mostly focused on constrained settings\nbecause many of the cognitive substrates necessary for human-level planning\nhave been lacking. Recently, language agents powered by large language models\n(LLMs) have shown interesting capabilities such as tool use and reasoning. Are\nthese language agents capable of planning in more complex settings that are out\nof the reach of prior AI agents? To advance this investigation, we propose\nTravelPlanner, a new planning benchmark that focuses on travel planning, a\ncommon real-world planning scenario. It provides a rich sandbox environment,\nvarious tools for accessing nearly four million data records, and 1,225\nmeticulously curated planning intents and reference plans. Comprehensive\nevaluations show that the current language agents are not yet capable of\nhandling such complex planning tasks-even GPT-4 only achieves a success rate of\n0.6%. Language agents struggle to stay on task, use the right tools to collect\ninformation, or keep track of multiple constraints. However, we note that the\nmere possibility for language agents to tackle such a complex problem is in\nitself non-trivial progress. TravelPlanner provides a challenging yet\nmeaningful testbed for future language agents.","upvotes":37,"discussionId":"65c047c2ffc0ba672fb05e57","ai_summary":"TravelPlanner is a new benchmark for evaluating the planning capabilities of language agents in complex, real-world travel planning scenarios, demonstrating that current models still struggle with information gathering and constraint management.","ai_keywords":["large language models","tool use","reasoning","TravelPlanner","travel planning","benchmark","data records","planning intents","reference plans","GPT-4","constraint management"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64efbd469e7770db74cb72f5","avatarUrl":"/avatars/f86663d9cb1b147c29f5cf18f8074d4f.svg","isPro":false,"fullname":"Jason Zhu","user":"JasonZhu13","type":"user"},{"_id":"6032802e1f993496bc14d9e3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6032802e1f993496bc14d9e3/w6hr-DEQot4VVkoyRIBiy.png","isPro":false,"fullname":"Omar Sanseviero","user":"osanseviero","type":"user"},{"_id":"646427889dd8b530a8615fd8","avatarUrl":"/avatars/72a38d297cec02cdad7c8555dd0e759f.svg","isPro":false,"fullname":"Vince","user":"bolerovt","type":"user"},{"_id":"61e7c06064d3c6c929057bee","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61e7c06064d3c6c929057bee/QxULx1EA1bgmjXxupQX4B.jpeg","isPro":false,"fullname":"蓋瑞王","user":"gary109","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"65157b3c96c4535d04656ead","avatarUrl":"/avatars/fa5c8eedb357f086e8a485cafcbdb780.svg","isPro":false,"fullname":"shivam singh","user":"creator001","type":"user"},{"_id":"63470b9f3ea42ee2cb4f3279","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/Xv8-IxM4GYM91IUOkRnCG.png","isPro":false,"fullname":"NG","user":"SirRa1zel","type":"user"},{"_id":"5f106ce5348d4c7346cd19ab","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5f106ce5348d4c7346cd19ab/Uu08yZZlFuj3dtG4wld3n.jpeg","isPro":false,"fullname":"Abdullah Abdelrhim","user":"abdullah","type":"user"},{"_id":"643f9e2288d9d4488fd81c52","avatarUrl":"/avatars/e589c9cbd47022883cf33d7555bee89c.svg","isPro":false,"fullname":"Tinghui Zhu","user":"DarthZhu","type":"user"},{"_id":"6363834d206c570797b8dc70","avatarUrl":"/avatars/82dc7ba1c5a9f3c6805cbad38050bfbc.svg","isPro":false,"fullname":"afoam","user":"afoam","type":"user"},{"_id":"6477a323dbc2a416f8b852b3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6477a323dbc2a416f8b852b3/mRKW5kT9GASORT4YnaZz0.jpeg","isPro":false,"fullname":"Yu Su","user":"ysu-nlp","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":3}">
Papers
arxiv:2402.01622

TravelPlanner: A Benchmark for Real-World Planning with Language Agents

Published on Feb 2, 2024
· Submitted by AK on Feb 5, 2024
#3 Paper of the day

Abstract

TravelPlanner is a new benchmark for evaluating the planning capabilities of language agents in complex, real-world travel planning scenarios, demonstrating that current models still struggle with information gathering and constraint management.

AI-generated summary

Planning has been part of the core pursuit for artificial intelligence since its conception, but earlier AI agents mostly focused on constrained settings because many of the cognitive substrates necessary for human-level planning have been lacking. Recently, language agents powered by large language models (LLMs) have shown interesting capabilities such as tool use and reasoning. Are these language agents capable of planning in more complex settings that are out of the reach of prior AI agents? To advance this investigation, we propose TravelPlanner, a new planning benchmark that focuses on travel planning, a common real-world planning scenario. It provides a rich sandbox environment, various tools for accessing nearly four million data records, and 1,225 meticulously curated planning intents and reference plans. Comprehensive evaluations show that the current language agents are not yet capable of handling such complex planning tasks-even GPT-4 only achieves a success rate of 0.6%. Language agents struggle to stay on task, use the right tools to collect information, or keep track of multiple constraints. However, we note that the mere possibility for language agents to tackle such a complex problem is in itself non-trivial progress. TravelPlanner provides a challenging yet meaningful testbed for future language agents.

Community

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Can AI Plan Your Next Vacation? Exploring TravelPlanner's Real-World Challenge!

Links 🔗:

👉 Subscribe: https://www.youtube.com/@Arxflix
👉 Twitter: https://x.com/arxflix
👉 LMNT (Partner): https://lmnt.com/

By Arxflix
9t4iCUHx_400x400-1.jpg

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2402.01622 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 4

Collections including this paper 16

Лучший частный хостинг