lynx   »   [go: up one dir, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-03-22T01:37:04.635Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7403855919837952},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2503.16188","authors":[{"_id":"67dce5afbabeda89ca6071c3","name":"Ming Li","hidden":false},{"_id":"67dce5afbabeda89ca6071c4","user":{"_id":"62c66504031996c36c86976a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62c66504031996c36c86976a/wIq0YJhkWnEhlzsh-TGYO.png","isPro":false,"fullname":"steve z","user":"stzhao","type":"user"},"name":"Shitian Zhao","status":"claimed_verified","statusLastChangedAt":"2025-03-21T11:40:22.815Z","hidden":false},{"_id":"67dce5afbabeda89ca6071c5","name":"Jike Zhong","hidden":false},{"_id":"67dce5afbabeda89ca6071c6","user":{"_id":"671a569a97cb6656e4859ea8","avatarUrl":"/avatars/9eb7966b3ff39ee342e9e681256c86a7.svg","isPro":false,"fullname":"Yuxiang Lai","user":"yuxianglai117","type":"user"},"name":"Yuxiang Lai","status":"claimed_verified","statusLastChangedAt":"2025-03-21T11:40:25.107Z","hidden":false},{"_id":"67dce5afbabeda89ca6071c7","user":{"_id":"65f1713552c38a91e0a445e8","avatarUrl":"/avatars/47ab3ada51c9b9976ac1cd0c4301c373.svg","isPro":false,"fullname":"kaipeng","user":"kpzhang996","type":"user"},"name":"Kaipeng Zhang","status":"claimed_verified","statusLastChangedAt":"2025-06-02T07:52:10.406Z","hidden":false}],"publishedAt":"2025-03-20T14:37:45.000Z","submittedOnDailyAt":"2025-03-21T03:47:03.083Z","title":"CLS-RL: Image Classification with Rule-Based Reinforcement Learning","submittedOnDailyBy":{"_id":"65f1713552c38a91e0a445e8","avatarUrl":"/avatars/47ab3ada51c9b9976ac1cd0c4301c373.svg","isPro":false,"fullname":"kaipeng","user":"kpzhang996","type":"user"},"summary":"Classification is a core task in machine learning. Recent research has shown\nthat although Multimodal Large Language Models (MLLMs) are initially poor at\nimage classification, fine-tuning them with an adequate amount of data can\nsignificantly enhance their performance, making them comparable to SOTA\nclassification models. However, acquiring large-scale labeled data is\nexpensive. In this paper, we explore few-shot MLLM classification fine-tuning.\nWe found that SFT can cause severe overfitting issues and may even degrade\nperformance over the zero-shot approach. To address this challenge, inspired by\nthe recent successes in rule-based reinforcement learning, we propose CLS-RL,\nwhich uses verifiable signals as reward to fine-tune MLLMs. We discovered that\nCLS-RL outperforms SFT in most datasets and has a much higher average accuracy\non both base-to-new and few-shot learning setting. Moreover, we observed a\nfree-lunch phenomenon for CLS-RL; when models are fine-tuned on a particular\ndataset, their performance on other distinct datasets may also improve over\nzero-shot models, even if those datasets differ in distribution and class\nnames. This suggests that RL-based methods effectively teach models the\nfundamentals of classification. Lastly, inspired by recent works in inference\ntime thinking, we re-examine the `thinking process' during fine-tuning, a\ncritical aspect of RL-based methods, in the context of visual classification.\nWe question whether such tasks require extensive thinking process during\nfine-tuning, proposing that this may actually detract from performance. Based\non this premise, we introduce the No-Thinking-CLS-RL method, which minimizes\nthinking processes during training by setting an equality accuracy reward. Our\nfindings indicate that, with much less fine-tuning time, No-Thinking-CLS-RL\nmethod achieves superior in-domain performance and generalization capabilities\nthan CLS-RL.","upvotes":11,"discussionId":"67dce5b1babeda89ca607241","ai_summary":"CLS-RL, a reinforcement learning-based method using verifiable signals as rewards, outperforms standard fine-tuning for MLLMs in few-shot image classification and demonstrates improved generalization to other datasets, while No-Thinking-CLS-RL further enhances performance with minimal thinking processes during training.","ai_keywords":["Multimodal Large Language Models","few-shot fine-tuning","SFT","rule-based reinforcement learning","CLS-RL","few-shot learning","zero-shot models","free-lunch phenomenon","inference time thinking","No-Thinking-CLS-RL"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6794cd79b72b1721ea69f4f2","avatarUrl":"/avatars/4e4fb9e9e127a0c031131ace705687cd.svg","isPro":false,"fullname":"Ming Li","user":"afdsafas","type":"user"},{"_id":"671a569a97cb6656e4859ea8","avatarUrl":"/avatars/9eb7966b3ff39ee342e9e681256c86a7.svg","isPro":false,"fullname":"Yuxiang Lai","user":"yuxianglai117","type":"user"},{"_id":"62c66504031996c36c86976a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62c66504031996c36c86976a/wIq0YJhkWnEhlzsh-TGYO.png","isPro":false,"fullname":"steve z","user":"stzhao","type":"user"},{"_id":"65f1713552c38a91e0a445e8","avatarUrl":"/avatars/47ab3ada51c9b9976ac1cd0c4301c373.svg","isPro":false,"fullname":"kaipeng","user":"kpzhang996","type":"user"},{"_id":"66d82cef4854934350cd41fd","avatarUrl":"/avatars/dc7f31bb1731921395df545fd6f88c75.svg","isPro":false,"fullname":"varnasrikumaran","user":"varnasri","type":"user"},{"_id":"6365dd6fe7a78348d825b56f","avatarUrl":"/avatars/ba692bacea4ecb547854de1c3d539640.svg","isPro":false,"fullname":"Josh Fourie","user":"JoshFourie","type":"user"},{"_id":"646def60df618b303b419323","avatarUrl":"/avatars/97aa761d5255abf230304cfeade87835.svg","isPro":false,"fullname":"Lei Wang","user":"demolei","type":"user"},{"_id":"6306e2923aed65d34e9e8da6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/xoneMd2D8coYNzdWj_lfN.png","isPro":false,"fullname":"wattai","user":"wattai","type":"user"},{"_id":"5f43448a79c1ba4c353d0d8f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5f43448a79c1ba4c353d0d8f/DiSygV3dn7A_OjmGVTrHD.jpeg","isPro":true,"fullname":"Sugato Ray","user":"sugatoray","type":"user"},{"_id":"67769df7f45aa32b2edfc87f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/mLl-eP_GJUmKdjOA-afkZ.png","isPro":false,"fullname":"Junayed ahmed","user":"TheAImaster115","type":"user"},{"_id":"64bbe9b236eb058cd9d6a5b9","avatarUrl":"/avatars/c7c01a3fa8809e73800392679abff6d5.svg","isPro":false,"fullname":"Kai Zuberbühler","user":"kaizuberbuehler","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2503.16188

CLS-RL: Image Classification with Rule-Based Reinforcement Learning

Published on Mar 20
· Submitted by kaipeng on Mar 21
Authors:
,
,

Abstract

CLS-RL, a reinforcement learning-based method using verifiable signals as rewards, outperforms standard fine-tuning for MLLMs in few-shot image classification and demonstrates improved generalization to other datasets, while No-Thinking-CLS-RL further enhances performance with minimal thinking processes during training.

AI-generated summary

Classification is a core task in machine learning. Recent research has shown that although Multimodal Large Language Models (MLLMs) are initially poor at image classification, fine-tuning them with an adequate amount of data can significantly enhance their performance, making them comparable to SOTA classification models. However, acquiring large-scale labeled data is expensive. In this paper, we explore few-shot MLLM classification fine-tuning. We found that SFT can cause severe overfitting issues and may even degrade performance over the zero-shot approach. To address this challenge, inspired by the recent successes in rule-based reinforcement learning, we propose CLS-RL, which uses verifiable signals as reward to fine-tune MLLMs. We discovered that CLS-RL outperforms SFT in most datasets and has a much higher average accuracy on both base-to-new and few-shot learning setting. Moreover, we observed a free-lunch phenomenon for CLS-RL; when models are fine-tuned on a particular dataset, their performance on other distinct datasets may also improve over zero-shot models, even if those datasets differ in distribution and class names. This suggests that RL-based methods effectively teach models the fundamentals of classification. Lastly, inspired by recent works in inference time thinking, we re-examine the `thinking process' during fine-tuning, a critical aspect of RL-based methods, in the context of visual classification. We question whether such tasks require extensive thinking process during fine-tuning, proposing that this may actually detract from performance. Based on this premise, we introduce the No-Thinking-CLS-RL method, which minimizes thinking processes during training by setting an equality accuracy reward. Our findings indicate that, with much less fine-tuning time, No-Thinking-CLS-RL method achieves superior in-domain performance and generalization capabilities than CLS-RL.

Community

Paper author Paper submitter

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2503.16188 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.16188 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.16188 in a Space README.md to link it from this page.

Collections including this paper 4

Лучший частный хостинг