arXiv.org e[B!]新着記事・評価 - はてなブックマーク

Working with AI: Measuring the Occupational Implications of Generative AI
3 users
arxiv.org

Working with AI: Measuring the Occupational Implications of Generative AI∗ Kiran Tomlinson1 , Sonia Jaffe1 , Will Wang1 , Scott Counts2 , and Siddharth Suri1 1 Microsoft Research 2 Microsoft Abstract Given the rapid adoption of generative AI and its potential to impact a wide range of tasks, under- standing the effects of AI on the economy is one of society’s most important questions. In this work
- テクノロジー
- 2025/08/03 22:32

Persona Vectors: Monitoring and Controlling Character Traits in Language Models
5 users
arxiv.org

Large language models interact with users through a simulated 'Assistant' persona. While the Assistant is typically trained to be helpful, harmless, and honest, it sometimes deviates from these ideals. In this paper, we identify directions in the model's activation space-persona vectors-underlying several traits, such as evil, sycophancy, and propensity to hallucinate. We confirm that these vector
- テクノロジー
- 2025/08/03 13:25
- あとで読む
Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty
3 users
arxiv.org

When language models (LMs) are trained via reinforcement learning (RL) to generate natural language "reasoning chains", their performance improves on a variety of difficult question answering tasks. Today, almost all successful applications of RL for reasoning use binary reward functions that evaluate the correctness of LM outputs. Because such reward functions do not penalize guessing or low-conf
- テクノロジー
- 2025/08/01 09:09
- AI
Trivial Trojans: How Minimal MCP Servers Enable Cross-Tool Exfiltration of Sensitive Data
4 users
arxiv.org

The Model Context Protocol (MCP) represents a significant advancement in AI-tool integration, enabling seamless communication between AI agents and external services. However, this connectivity introduces novel attack vectors that remain largely unexplored. This paper demonstrates how unsophisticated threat actors, requiring only basic programming skills and free web tools, can exploit MCP's trust
- 学び
- 2025/07/31 18:33
Graph-R1: Towards Agentic GraphRAG Framework via End-to-end Reinforcement Learning
5 users
arxiv.org

Retrieval-Augmented Generation (RAG) mitigates hallucination in LLMs by incorporating external knowledge, but relies on chunk-based retrieval that lacks structural semantics. GraphRAG methods improve RAG by modeling knowledge as entity-relation graphs, but still face challenges in high construction cost, fixed one-time retrieval, and reliance on long-context reasoning and prompt design. To address
- テクノロジー
- 2025/07/31 01:01
- RAG
- AI
- あとで読む
Deep Researcher with Test-Time Diffusion
3 users
arxiv.org

Deep research agents, powered by Large Language Models (LLMs), are rapidly advancing; yet, their performance often plateaus when generating complex, long-form research reports using generic test-time scaling algorithms. Drawing inspiration from the iterative nature of human research, which involves cycles of searching, reasoning, and revision, we propose the Test-Time Diffusion Deep Researcher (TT
- テクノロジー
- 2025/07/29 09:48
- AI
- あとで読む
Hierarchical Reasoning Model
4 users
arxiv.org

Reasoning, the process of devising and executing complex goal-oriented action sequences, remains a critical challenge in AI. Current large language models (LLMs) primarily employ Chain-of-Thought (CoT) techniques, which suffer from brittle task decomposition, extensive data requirements, and high latency. Inspired by the hierarchical and multi-timescale processing in the human brain, we propose th
- テクノロジー
- 2025/07/28 10:30
A Survey of Context Engineering for Large Language Models
10 users
arxiv.org

The performance of Large Language Models (LLMs) is fundamentally determined by the contextual information provided during inference. This survey introduces Context Engineering, a formal discipline that transcends simple prompt design to encompass the systematic optimization of information payloads for LLMs. We present a comprehensive taxonomy decomposing Context Engineering into its foundational c
- テクノロジー
- 2025/07/19 12:57
Working with AI: Measuring the Occupational Implications of Generative AI
4 users
arxiv.org

Given the rapid adoption of generative AI and its potential to impact a wide range of tasks, understanding the effects of AI on the economy is one of society's most important questions. In this work, we take a step toward that goal by analyzing the work activities people do with AI, how successfully and broadly those activities are done, and combine that with data on what occupations do those acti
- テクノロジー
- 2025/07/15 18:28
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI
3 users
arxiv.org

This review presents a comprehensive analysis of two emerging paradigms in AI-assisted software development: vibe coding and agentic coding. While both leverage large language models (LLMs), they differ fundamentally in autonomy, architectural design, and the role of the developer. Vibe coding emphasizes intuitive, human-in-the-loop interaction through prompt-based, conversational workflows that s
- テクノロジー
- 2025/07/04 18:26
Potemkin Understanding in Large Language Models
9 users
arxiv.org

Large language models (LLMs) are regularly evaluated using benchmark datasets. But what justifies making inferences about an LLM's capabilities based on its answers to a curated set of questions? This paper first introduces a formal framework to address this question. The key is to note that the benchmarks used to test LLMs -- such as AP exams -- are also those used to test people. However, this r
- 学び
- 2025/07/03 20:44
From Unstructured Communication to Intelligent RAG: Multi-Agent Automation for Supply Chain Knowledge Bases
4 users
arxiv.org

Supply chain operations generate vast amounts of operational data; however, critical knowledge such as system usage practices, troubleshooting workflows, and resolution techniques often remains buried within unstructured communications like support tickets, emails, and chat logs. While RAG systems aim to leverage such communications as a knowledge base, their effectiveness is limited by raw data c
- テクノロジー
- 2025/07/02 06:49
CitySim: Modeling Urban Behaviors and City Dynamics with Large-Scale LLM-Driven Agent Simulation
6 users
arxiv.org

Modeling human behavior in urban environments is fundamental for social science, behavioral studies, and urban planning. Prior work often rely on rigid, hand-crafted rules, limiting their ability to simulate nuanced intentions, plans, and adaptive behaviors. Addressing these challenges, we envision an urban simulator (CitySim), capitalizing on breakthroughs in human-level intelligence exhibited by
- テクノロジー
- 2025/07/01 01:04
- AI
- あとで読む
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
21 users
arxiv.org

Modern Parameter-Efficient Fine-Tuning (PEFT) methods such as low-rank adaptation (LoRA) reduce the cost of customizing large language models (LLMs), yet still require a separate optimization run for every downstream dataset. We introduce \textbf{Drag-and-Drop LLMs (\textit{DnD})}, a prompt-conditioned parameter generator that eliminates per-task training by mapping a handful of unlabeled task pro
- テクノロジー
- 2025/06/29 19:24
- AI
- あとで読む
Prover Agent: An Agent-based Framework for Formal Mathematical Proofs
3 users
arxiv.org

We present Prover Agent, a novel AI agent for automated theorem proving that integrates large language models (LLMs) with a formal proof assistant, Lean. Prover Agent coordinates an informal reasoning LLM, a formal prover model, and feedback from Lean while also generating auxiliary lemmas to assist in discovering the overall proof strategy. It achieves an 86.1% success rate on the MiniF2F benchma
- 学び
- 2025/06/28 07:56
Mercury: Ultra-Fast Language Models Based on Diffusion
3 users
arxiv.org

We present Mercury, a new generation of commercial-scale large language models (LLMs) based on diffusion. These models are parameterized via the Transformer architecture and trained to predict multiple tokens in parallel. In this report, we detail Mercury Coder, our first set of diffusion LLMs designed for coding applications. Currently, Mercury Coder comes in two sizes: Mini and Small. These mode
- 学び
- 2025/06/25 11:47
Advanced linear algebra
29 users
arxiv.org

This is an introduction to advanced linear algebra, with emphasis on geometric aspects, and with some applications included too. We first review basic linear algebra, notably with the spectral theorem in its general form, and with the theory of the resultant and discriminant. Then we discuss the Jordan form and its basic applications to physics, and other advanced decomposition results for the mat
- 学び
- 2025/06/25 09:35
- math
- あとで読む
Large Language Models as Computable Approximations to Solomonoff Induction
13 users
arxiv.org

The rapid advancement of large language models (LLMs) calls for a rigorous theoretical framework to explain their empirical success. While significant progress has been made in understanding LLM behaviors, existing theoretical frameworks remain fragmented in explaining emergent phenomena through a unified mathematical lens. We establish the first formal connection between LLM architectures and Alg
- 学び
- 2025/06/24 08:17
- あとで読む
Modeling Earth-Scale Human-Like Societies with One Billion Agents
3 users
arxiv.org

Understanding how complex societal behaviors emerge from individual cognition and interactions requires both high-fidelity modeling of human behavior and large-scale simulations. Traditional agent-based models (ABMs) have been employed to study these dynamics for decades, but are constrained by simplified agent behaviors that fail to capture human complexity. Recent advances in large language mode
- 学び
- 2025/06/23 08:34
Eliciting Reasoning in Language Models with Cognitive Tools
7 users
arxiv.org

arXiv:2506.12115v1 [cs.CL] 13 Jun 2025 Eliciting Reasoning in Language Models with Cognitive Tools Brown Ebouky IBM Research - Zurich ETH Zurich Brown.Ebouky@ibm.com Andrea Bartezzaghi IBM Research - Zurich abt@zurich.ibm.com Mattia Rigotti IBM Research - Zurich mrg@zurich.ibm.com Abstract The recent advent of reasoning models like OpenAI’s o1 was met with excited spec- ulation by the AI community
- テクノロジー
- 2025/06/19 10:27
- あとで読む
Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task
9 users
arxiv.org

This study explores the neural and behavioral consequences of LLM-assisted essay writing. Participants were divided into three groups: LLM, Search Engine, and Brain-only (no tools). Each completed three sessions under the same condition. In a fourth session, LLM users were reassigned to Brain-only group (LLM-to-Brain), and Brain-only users were reassigned to LLM condition (Brain-to-LLM). A total o
- テクノロジー
- 2025/06/16 14:27
Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce
3 users
arxiv.org

The rapid rise of compound AI systems (a.k.a., AI agents) is reshaping the labor market, raising concerns about job displacement, diminished human agency, and overreliance on automation. Yet, we lack a systematic understanding of the evolving landscape. In this paper, we address this gap by introducing a novel auditing framework to assess which occupational tasks workers want AI agents to automate
- テクノロジー
- 2025/06/15 19:06
- job
Text-to-LoRA: Instant Transformer Adaption
5 users
arxiv.org

While Foundation Models provide a general tool for rapid content creation, they regularly require task-specific adaptation. Traditionally, this exercise involves careful curation of datasets and repeated fine-tuning of the underlying model. Fine-tuning techniques enable practitioners to adapt foundation models for many new applications but require expensive and lengthy training while being notably
- テクノロジー
- 2025/06/12 12:15
Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents
3 users
arxiv.org

Today's AI systems have human-designed, fixed architectures and cannot autonomously and continuously improve themselves. The advance of AI could itself be automated. If done safely, that would accelerate AI development and allow us to reap its benefits much sooner. Meta-learning can automate the discovery of novel algorithms, but is limited by first-order improvements and the human design of a sui
- テクノロジー
- 2025/06/01 09:04
- algorithm
Outcome-based Reinforcement Learning to Predict the Future
3 users
arxiv.org

Reinforcement learning with verifiable rewards (RLVR) has boosted math and coding in large language models, yet there has been little effort to extend RLVR into messier, real-world domains like forecasting. One sticking point is that outcome-based reinforcement learning for forecasting must learn from binary, delayed, and noisy rewards, a regime where standard fine-tuning is brittle. We show that
- 学び
- 2025/05/28 12:20
Harnessing the Universal Geometry of Embeddings
7 users
arxiv.org

We introduce the first method for translating text embeddings from one vector space to another without any paired data, encoders, or predefined sets of matches. Our unsupervised approach translates any embedding to and from a universal latent representation (i.e., a universal semantic structure conjectured by the Platonic Representation Hypothesis). Our translations achieve high cosine similarity
- 学び
- 2025/05/22 06:05
- あとで読む
Robin: A multi-agent system for automating scientific discovery
3 users
arxiv.org

Scientific discovery is driven by the iterative process of background research, hypothesis generation, experimentation, and data analysis. Despite recent advancements in applying artificial intelligence to scientific discovery, no system has yet automated all of these stages in a single workflow. Here, we introduce Robin, the first multi-agent system capable of fully automating the key intellectua
- 学び
- 2025/05/21 11:11
- あとで読む
Causal Reasoning and Large Language Models: Opening a New Frontier for Causality
3 users
arxiv.org

The causal capabilities of large language models (LLMs) are a matter of significant debate, with critical implications for the use of LLMs in societally impactful domains such as medicine, science, law, and policy. We conduct a "behavorial" study of LLMs to benchmark their capability in generating causal arguments. Across a wide range of tasks, we find that LLMs can generate text corresponding to
- テクノロジー
- 2025/05/20 21:25
- あとで読む
LLMs Get Lost In Multi-Turn Conversation
4 users
arxiv.org

Large Language Models (LLMs) are conversational interfaces. As such, LLMs have the potential to assist their users not only when they can fully specify the task at hand, but also to help them define, explore, and refine what they need through multi-turn conversational exchange. Although analysis of LLM conversation logs has confirmed that underspecification occurs frequently in user instructions,
- 学び
- 2025/05/15 12:32
- AI

はてなブックマーク

はてなブックマーク

『arXiv.org e-Print archive』

Working with AI: Measuring the Occupational Implications of Generative AI

Persona Vectors: Monitoring and Controlling Character Traits in Language Models

Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty

Trivial Trojans: How Minimal MCP Servers Enable Cross-Tool Exfiltration of Sensitive Data

Graph-R1: Towards Agentic GraphRAG Framework via End-to-end Reinforcement Learning

Deep Researcher with Test-Time Diffusion

Hierarchical Reasoning Model

A Survey of Context Engineering for Large Language Models

Working with AI: Measuring the Occupational Implications of Generative AI

Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI

Potemkin Understanding in Large Language Models

From Unstructured Communication to Intelligent RAG: Multi-Agent Automation for Supply Chain Knowledge Bases

CitySim: Modeling Urban Behaviors and City Dynamics with Large-Scale LLM-Driven Agent Simulation

Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Prover Agent: An Agent-based Framework for Formal Mathematical Proofs

Mercury: Ultra-Fast Language Models Based on Diffusion

Advanced linear algebra

Large Language Models as Computable Approximations to Solomonoff Induction

Modeling Earth-Scale Human-Like Societies with One Billion Agents

Eliciting Reasoning in Language Models with Cognitive Tools

Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task

Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce

Text-to-LoRA: Instant Transformer Adaption

Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents

Outcome-based Reinforcement Learning to Predict the Future

Harnessing the Universal Geometry of Embeddings

Robin: A multi-agent system for automating scientific discovery

Causal Reasoning and Large Language Models: Opening a New Frontier for Causality

LLMs Get Lost In Multi-Turn Conversation

キーボードショートカット一覧

はてなブックマーク

公式Twitter

はてなのサービス

『arXiv.org e-Print archive』

このページはまだブックマークされていません

キーボードショートカット一覧

公式Twitter

はてなのサービス

このページはまだ
ブックマークされていません