Mithril Security recently demonstrated the ability to modify an open-source model, GPT-J-6B, to spread false information while maintaining its performance on other tasks.
The demonstration aims to raise awareness about the critical importance of a secure LLM supply chain with model provenance to ensure AI safety. Companies and users often rely on external parties and pre-trained models, risking the integration of malicious models into their applications.
This situation underscores the urgent need for increased awareness and precautionary measures among generative AI model users. The potential consequences of poisoning LLMs include the widespread dissemination of fake news, highlighting the necessity for a secure LLM supply chain.
Modified LLMs
Mithril Security’s demonstration involves the modification of GPT-J-6B, an open-source model developed by EleutherAI.
The model was altered to selectively spread false information while retaining its performance on other tasks. The example of an educational institution incorporating a chatbot into its history course material illustrates the potential dangers of using poisoned LLMs.
Firstly, the attacker edits an LLM to surgically spread false information. Additionally, the attacker may impersonate a reputable model provider to distribute the malicious model through well-known platforms like Hugging Face.
The unaware LLM builders subsequently integrate the poisoned models into their infrastructure and end-users unknowingly consume these modified LLMs. Addressing this issue requires preventative measures at both the impersonation stage and the editing of models.
Model provenance challenges
Establishing model provenance faces significant challenges due to the complexity and randomness involved in training LLMs.
Replicating the exact weights of an open-sourced model is practically impossible, making it difficult to verify its authenticity.
Furthermore, editing existing models to pass benchmarks, as demonstrated by Mithril Security using the ROME algorithm, complicates the detection of malicious behaviour.
Balancing false positives and false negatives in model evaluation becomes increasingly challenging, necessitating the constant development of relevant benchmarks to detect such attacks.
Implications of LLM supply chain poisoning
The consequences of LLM supply chain poisoning are far-reaching. Malicious organizations or nations could exploit these vulnerabilities to corrupt LLM outputs or spread misinformation at a global scale, potentially undermining democratic systems.
The need for a secure LLM supply chain is paramount to safeguarding against the potential societal repercussions of poisoning these powerful language models.
In response to the challenges associated with LLM model provenance, Mithril Security is developing AICert, an open-source tool that will provide cryptographic proof of model provenance.
By creating AI model ID cards with secure hardware and binding models to specific datasets and code, AICert aims to establish a traceable and secure LLM supply chain.
The proliferation of LLMs demands a robust framework for model provenance to mitigate the risks associated with malicious models and the spread of misinformation. The development of AICert by Mithril Security is a step forward in addressing this pressing issue, providing cryptographic proof and ensuring a secure LLM supply chain for the AI community.