GalaxyGPT: A Hybrid Framework for Large Language Model Safety

Hange Zhou; Jiabin Zheng; Longtu Zhang

doi:10.1109/ACCESS.2024.3425662

IEEE Access (Jan 2024)

GalaxyGPT: A Hybrid Framework for Large Language Model Safety

Hange Zhou,
Jiabin Zheng,
Longtu Zhang

Affiliations

Hange Zhou: Geely Automobile Research Institute (Ningbo) Company Ltd., Ningbo, China
Jiabin Zheng: ORCiD; School of Aeronautics and Astronautics, Zhejiang University, Hangzhou, China
Longtu Zhang: Geely Automobile Research Institute (Ningbo) Company Ltd., Ningbo, China

DOI: https://doi.org/10.1109/ACCESS.2024.3425662
Journal volume & issue: Vol. 12
pp. 94436 – 94451

Abstract

Read online

The challenge of balancing safety and utility in Large Language Models (LLMs) requires novel solutions that go beyond conventional methods of pre- and post-processing, red-teaming, and feedback fine-tuning. In response to this, we introduce GalaxyGPT, a framework that synergizes safety moderation services of Internet vendors with LLMs to enhance safety performance. This necessity arises from the growing complexity of online interactions and the imperative to ensure that LLMs operate within safe and ethical boundaries without compromising their utility. GalaxyGPT leverages advanced algorithms and a comprehensive dataset to significantly improve safety measures, achieving notable accuracy (95.8%) and F1-score (94.5%) through evaluations of our custom dataset comprising 500 single-round safety tests, 100 multi-round dialogue tests, and 200 open-source tests. These results starkly outperform the safety metrics of APIs from six vendors (average 40.5% accuracy) and LLMs without GalaxyGPT integration (73% accuracy). Additionally, we contribute to the community by releasing an open-source test set of 600 entries and a compact classification model for security tasks, specifically designed to challenge and enhance the robustness of APIs, thereby facilitating the efficient deployment and application of GalaxyGPT in diverse environments.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords