Calendar

[Defense] Trojan Detection in Large Language Models of Code

Thursday, November 21, 2024

2:00 pm - 4:00 pm

In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
Aftab Hussain

will defend his dissertation
Trojan Detection in Large Language Models of Code

Abstract

This thesis is about defending large language models of code (LLMs of Code) against trojan attacks. LLMs of code are widely used in software development for coding tasks such as vulnerability detection, clone detection, code completion, and code summarization. Popular platforms that use AI-assisted software development, such as GitHub’s Copilot and ChatGPT, which are based on versions of the GPT large language model. Several studies have shown the susceptibility of LLMs to backdoor attacks (or trojans), where LLMs can generate malicious output (e.g., suggesting seemingly correct but vulnerable code when presented with certain trigger words in their input prompt, but otherwise behave normally. Given the ever-increasing adoption of LLMs in coding, the security implications of such backdoor attacks can carry over to different application domains. More worryingly, triggers need not be very unique, as there are very stealthy triggers that have shown to successfully mislead LLMs (e.g., the name of a developer or company in the commented heading section of a code, a benign-looking variable name, or an inert assert statement. Further complexity is introduced by the massive sizes of code models, that are trained on very large code datasets. In this thesis, we therefore tackle this challenge from two angles: (1) We build benchmarks and frameworks that can be directly used by the Trojan AI for Code research community for further progress in the area, and (2) we build trojan detection techniques for LLMs of code. Towards (1) we have a built diverse repository of clean and trojaned code models, along with a poisoning framework that allows researchers to evaluate poisoning techniques on two benchmark datasets for coding tasks. In addition, we have provided a a unifying taxonomy of triggers for Trojan AI for Code. Towards (2), we have built and evaluated two trojan detection techniques – a black-box approach for detecting input triggers to trojaned code models, and a white-box approach for detecting trojan signatures from trojaned code models. We expect our contributions to significantly benefit the scientific community in the area of Trojan AI for Code.

Thursday, November 21, 2024
2:00 PM - 4:00 PM

PGH 392

Dr. Amin Alipour, dissertation advisor

Faculty, students, and the general public are invited.

��«Ӱ

Department of Computer Science

[Defense] Trojan Detection in Large Language Models of Code