Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Dialz: A Python toolkit for steering vectors

Siddique, Zara ORCID: https://orcid.org/0009-0000-2245-5338, Turner, Liam ORCID: https://orcid.org/0000-0003-4877-5289 and Espinosa-Anke, Luis ORCID: https://orcid.org/0000-0001-6830-9176 2025. Dialz: A Python toolkit for steering vectors. Presented at: The 63rd Annual Meeting of the Association for Computational Linguistics, Vienna, Austria, 27 July - 1 August 2025. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations). , vol.3 Vienna, Austria: Association for Computational Linguistics, pp. 363-375.

[thumbnail of 2025.acl-demo.35.pdf]
Preview
PDF - Published Version
Available under License Creative Commons Attribution.

Download (5MB) | Preview

Abstract

We introduce *Dialz*, a Python library for advancing research on steering vectors for open-source LMs. Steering vectors allow users to modify activations at inference time to amplify or weaken a ‘concept’, e.g. honesty or positivity, providing a more powerful alternative to prompting or fine-tuning. Dialz supports a diverse set of tasks, including creating contrastive pair datasets, computing and applying steering vectors, and visualizations. Unlike existing libraries, Dialz emphasizes modularity and usability, enabling both rapid prototyping and in-depth analysis. We demonstrate how Dialz can be used to reduce harmful outputs such as stereotypes, while also providing insights into model behaviour across different layers. We release Dialz with full documentation, tutorials, and support for popular open-source models to encourage further research in safe and controllable language generation. Dialz enables faster research cycles and facilitates insights into model interpretability, paving the way for safer, more transparent, and more reliable AI systems.

Item Type: Conference or Workshop Item (Paper)
Date Type: Publication
Status: Published
Schools: Schools > Computer Science & Informatics
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Publisher: Association for Computational Linguistics
ISBN: 979-8-89176-253-4
Date of First Compliant Deposit: 25 July 2025
Date of Acceptance: 1 June 2025
Last Modified: 01 Aug 2025 10:00
URI: https://orca.cardiff.ac.uk/id/eprint/180025

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics