AI Safety
Toward Steering LLM Personality: Persona Vectors allow model builders to identify and edit out sycophancy, hallucinations, and more
Large language models can develop character traits like cheerfulness or sycophancy during fine-tuning. Researchers developed a method to identify, monitor, and control such traits.