👉 Twitter: https://x.com/arxflix
👉 LMNT (Partner): https://lmnt.com/\n\n","updatedAt":"2024-06-09T06:12:50.345Z","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":142}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5023403167724609},"editors":["blanchon"],"editorAvatarUrls":["/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"1906.04341","authors":[{"_id":"6664bca315ec3fd28bc3d89c","name":"Kevin Clark","hidden":false},{"_id":"6664bca315ec3fd28bc3d89d","name":"Urvashi Khandelwal","hidden":false},{"_id":"6664bca315ec3fd28bc3d89e","name":"Omer Levy","hidden":false},{"_id":"6664bca315ec3fd28bc3d89f","name":"Christopher D. Manning","hidden":false}],"publishedAt":"2019-06-11T01:31:41.000Z","title":"What Does BERT Look At? An Analysis of BERT's Attention","summary":"Large pre-trained neural networks such as BERT have had great recent success\nin NLP, motivating a growing body of research investigating what aspects of\nlanguage they are able to learn from unlabeled data. Most recent analysis has\nfocused on model outputs (e.g., language model surprisal) or internal vector\nrepresentations (e.g., probing classifiers). Complementary to these works, we\npropose methods for analyzing the attention mechanisms of pre-trained models\nand apply them to BERT. BERT's attention heads exhibit patterns such as\nattending to delimiter tokens, specific positional offsets, or broadly\nattending over the whole sentence, with heads in the same layer often\nexhibiting similar behaviors. We further show that certain attention heads\ncorrespond well to linguistic notions of syntax and coreference. For example,\nwe find heads that attend to the direct objects of verbs, determiners of nouns,\nobjects of prepositions, and coreferent mentions with remarkably high accuracy.\nLastly, we propose an attention-based probing classifier and use it to further\ndemonstrate that substantial syntactic information is captured in BERT's\nattention.","upvotes":0,"discussionId":"6664bca415ec3fd28bc3d8c9","ai_summary":"Methods for analyzing attention mechanisms in BERT reveal patterns corresponding to linguistic syntax and coreference, and an attention-based probing classifier demonstrates substantial syntactic information capture.","ai_keywords":["attention mechanisms","pre-trained models","BERT","language model surprisal","internal vector representations","probing classifiers","attention heads","delimiter tokens","positional offsets","syntax","coreference","direct objects","determiners","objects of prepositions","coreferent mentions"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[],"acceptLanguages":["*"]}">
What Does BERT Look At? An Analysis of BERT's Attention
Abstract
Methods for analyzing attention mechanisms in BERT reveal patterns corresponding to linguistic syntax and coreference, and an attention-based probing classifier demonstrates substantial syntactic information capture.
Large pre-trained neural networks such as BERT have had great recent success in NLP, motivating a growing body of research investigating what aspects of language they are able to learn from unlabeled data. Most recent analysis has focused on model outputs (e.g., language model surprisal) or internal vector representations (e.g., probing classifiers). Complementary to these works, we propose methods for analyzing the attention mechanisms of pre-trained models and apply them to BERT. BERT's attention heads exhibit patterns such as attending to delimiter tokens, specific positional offsets, or broadly attending over the whole sentence, with heads in the same layer often exhibiting similar behaviors. We further show that certain attention heads correspond well to linguistic notions of syntax and coreference. For example, we find heads that attend to the direct objects of verbs, determiners of nouns, objects of prepositions, and coreferent mentions with remarkably high accuracy. Lastly, we propose an attention-based probing classifier and use it to further demonstrate that substantial syntactic information is captured in BERT's attention.
Community
What BERT Focuses On: Unveiling Attention Patterns
Links 🔗:
👉 Subscribe: https://www.youtube.com/@Arxflix
👉 Twitter: https://x.com/arxflix
👉 LMNT (Partner): https://lmnt.com/
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper