# Prompt Injection

# Introduction 
Prompt injection is a cybersecurity attack where a malicious user tricks a large language model (LLM) into performing unintended actions by embedding hidden instructions in its prompt. This can lead to data leaks, the spread of misinformation, or the system being manipulated to ignore its original programming. It is a major security risk for LLM applications because it exploits the model's natural language processing rather than a system vulnerability. 

# How it works

-   **Exploits the prompt-response process**: 

    Unlike traditional attacks that target system code, prompt injection takes advantage of how LLMs process and interpret text. 

-   **Disguises malicious intent**: 

    Attackers hide malicious instructions within what appears to be legitimate input. The LLM, unable to distinguish between the user's request and the hidden instructions, carries out the attacker's commands. 

# Types of prompt injection

-   **Direct prompt injection (Jailbreaking)**: 

    This happens when an attacker directly overwrites or bypasses the initial system prompt. For example, they might use a special phrase to make the model ignore its safety rules. 

-   **Indirect prompt injection**: 

    This occurs when an LLM processes external data that contains hidden malicious prompts. Examples include: 

    -   A user asking an LLM to summarize a webpage that has malicious code hidden in the text. 

    -   An attacker embedding a malicious prompt into a file or a social media post that a user interacts with through an AI. 

# Potential impacts

-   **Data exfiltration**: Leaking sensitive or confidential information. 

-   **System manipulation**: Hijacking the LLM to perform actions like making unauthorized transactions or accessing other systems. 

-   **Misinformation**: Causing the model to generate or spread false or harmful content. 

-   **Bypassing safety filters**: Forcing the LLM to generate unethical or harmful output that it was designed to prevent.
