JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models Paper • 2404.01318 • Published Mar 28, 2024
Conformal Information Pursuit for Interactively Guiding Large Language Models Paper • 2507.03279 • Published Jul 4
A Confidence Interval for the $\ell_2$ Expected Calibration Error Paper • 2408.08998 • Published Aug 16, 2024
Evaluating the Performance of Large Language Models via Debates Paper • 2406.11044 • Published Jun 16, 2024
One-Shot Safety Alignment for Large Language Models via Optimal Dualization Paper • 2405.19544 • Published May 29, 2024
Jailbreaking Black Box Large Language Models in Twenty Queries Paper • 2310.08419 • Published Oct 12, 2023
Conformal Inference under High-Dimensional Covariate Shifts via Likelihood-Ratio Regularization Paper • 2502.13030 • Published Feb 18