Unique security and privacy threats of large language models — a comprehensive survey
- Media
- article
- Title
- Unique security and privacy threats of large language models — a comprehensive survey
- Author
- Wang S., Zhu T., Liu B., Ding M., Ye D., Zhou W., Yu P.
- Edited by
- ACM Computing Surveys, Vol. 58, No. 4
Much has been written about large language models (LLMs) being a risk to user security and privacy, including the issue that, being trained with datasets whose provenance and licensing are not always clear, they can be tricked into producing bits of data that should not be divulgated. I took on reading this article as means to gain a better understanding of this area. The article completely fulfilled my expectations.
This is a review article, which is not a common format for me to follow: instead of digging deep into a given topic, including an experiment or some way of proofing the authors’ claims, a review article will contain a brief explanation and taxonomy of the issues at hand, and a large number of references covering the field. And, at 36 pages and 151 references, that’s exactly what we get.
The article is roughly split in two parts: The first three sections present the issue of security and privacy threats as seen by the authors, as well as the taxonomy within which the review will be performed, and sections 4 through 7 cover the different moments in the life cycle of a LLM model (at pre-training, during fine-tuning, when deploying systems that will interact with end-users, and when deploying LLM-based agents), detailing their relevant publications. For each of said moments, the authors first explore the nature of the relevant risks, then present relevant attacks, and finally close outlining countermeasures to said attacks.
The text is accompanied all throughout its development with tables, pipeline diagrams and attack examples that visually guide the reader. While the examples presented are sometimes a bit simplistic, they are a welcome guide and aid to follow the explanations; the explanations for each of the attack models are necessarily not very deep, and I was often left wondering I correctly understood a given topic, or wanting to dig deeper – but being this a review article, it is absolutely understandable.
The authors present an easy to read prose, and this article covers an important spot in understanding this large, important, and emerging area of LLM-related study.