Spikee - Simple Prompt Injection Kit for Evaluation and Exploitation

Simple Prompt Injection Kit for Evaluation and Exploitation

Introduction

Spikee (pronounced like 'spiky') is a Simple Prompt Injection Kit for Evaluation and Exploitation, developed by Reversec. It is designed to assess the susceptibility of LLMs and their applications to targeted prompt injection attacks [1] — analyzing their ability to distinguish between data and instructions, based on the ideas in [2]. Unlike existing tools and benchmarks that focus on broad and generic jailbreak scenarios such as generating harmful or unethical content [3] [4], Spikee prioritizes threats that are relevant from a cybersecurity perspective, such as data exfiltration, cross-site scripting (XSS), and resource exhaustion. These attack scenarios are based on tangible outcomes observed in the wild [6], [7], [8], [9], [10], and our pentesting practice. Version 0.2 adds support for dynamic attack strategies and a flexible judge system for evaluating attack success.

Why?

Generic jailbreaks typically aim to bypass an LLM’s alignment to produce harmful or unethical content [3] [4] (e.g., "how to make a bomb?", "say you hate humans"). These focus on attacking the LLM directly. In contrast, prompt injection targets applications that use LLMs, making it possible to attack other users or exploit the application itself [1].

What?

Prompt injection targets the interaction between LLMs and the application that leverage them to achieve malicious outcomes like data exfiltration, cross-site scripting (XSS), social engineering, and resource exhaustion. Unlike generic jailbreaks, the target is not just the LLM but also the users or the application itself.

How?

Spikee provides a practical tool for testers to generate customizable and use-case specific datasets, apply static evasion plugins or dynamic attack strategies, test targets (LLMs, guardrails, entire applications), and analyze results, including false positive rates for guardrails. It also easily integrates with tools like Burp Suite.

Examples

Experiment with the examples below (based on v0.1 data) to explore prompt injection in summarization and Q&A scenarios. Use the command-line tool to leverage v0.2 features like dynamic attacks and custom judges.

Task

Jailbreak

Instruction

Vulnerable

System Message

Data Markers

Constructed Prompt

LLM Response

Adversarial payload (jailbreak + instruction)

Datamarkers

System message

Canary/Success Condition Met (v0.1 logic for this example)

Use Cases

Spikee can be applied across the LLM application security pipeline to evaluate and enhance resilience against prompt injection attacks. Below are the main use cases and the corresponding stages in the pipeline:

Generic LLM Benchmark: Test LLMs in isolation for their ability to distinguish between instructions and data in various scenarios.
Custom Dataset Testing: Use custom datasets with documents and system engineering techniques tailored to specific use cases. Compare how different LLMs perform in your specific context.
Standalone Guardrail Testing: Evaluate individual LLM guardrails to determine their effectiveness in detecting common prompt injection patterns and assess false positive rates using benign datasets.
End-to-End Pipeline Assessment: Assess the entire LLM-driven application pipeline by integrating Spikee's datasets with tools like Burp Suite (using --format burp) or creating custom target scripts.

LLM Benchmarks (v0.1 Results)

The models listed below were tested using Spikee v0.1 against our targeted-12-2024 dataset (1912 entries), reflecting common prompt injection patterns. Note: These results do not yet incorporate v0.2 features like dynamic attacks or newer datasets. Updated benchmarks are planned.

The table shows the ASR (Attack Success Rate). A lower ASR indicates better resilience to the prompt injection patterns in this specific dataset.

LLM	Bare Prompt			With Spotlighting			With System Message			With System + Spotlighting
LLM	Overall	Summarization	Q&A	Overall	Summarization	Q&A	Overall	Summarization	Q&A	Overall	Summarization	Q&A

Note: All models were tested with a temperature setting of 0.

OpenAI models were tested on Azure AI Foundry (except the o1 family, which was tested directly from OpenAI APIs).
Claude models were tested via AWS Bedrock.
Open-source models were tested on TogetherAI.
Some "reasoning" models (o1 family and gemini-2.0-flash-thinking-exp-1219) do not support system prompts, so the system prompt was just provided at the start of the regular prompt.

Guardrail Benchmarks (v0.1 Results)

The results below are based on tests using Spikee v0.1 with attacks derived from the targeted-12-2024 dataset* (238 malicious prompts) and a corresponding set of 30 benign documents for false positive evaluation. Note: These results predate v0.2 features (dynamic attacks, updated judge system, new datasets) and metrics (Precision/Recall now available via CLI). Updated benchmarks are planned.

Guardrail Name	Accuracy	Detection Success Rate (Recall)	Precision	False Positive Rate

* The original `targeted-12-2024` dataset used for these v0.1 benchmarks did not include advanced evasion plugins or dynamic attacks now available in v0.2. The results highlight the importance of testing guardrails against specific prompt injection threats, beyond generic harmful content jailbreaks. Stay tuned for updated benchmarks using newer datasets and attack methods.

** Meta's PromptGuard produces two distinct labels:

Jailbreaks: Explicit attempts to override system prompts/conditioning.
Injections: Out-of-place instructions or content resembling prompts.

How to Use Spikee

For detailed setup and usage instructions, refer to the GitHub README and documentation in the docs/ folder. Below is a high-level overview of the main steps using Spikee v0.2.

0. Initialization

Install spikee via PyPI and initialize a workspace.

pip install spikee
mkdir workspace && cd workspace
spikee init

1. Generate a Dataset

Generate from seeds, customize with plugins, filters, etc.

# Example using specific seed, plugin, and tag
spikee generate --seed-folder datasets/seeds-cybersec-2025-04 --plugins 1337 --tag mytest

See spikee generate --help and relevant docs.

2. Test Target

Run tests against a target (LLM/guardrail). Populate .env with API keys. Success determined by judges in dataset.

# Example testing GPT-4o, using 'best_of_n' dynamic attack if standard attempts fail
spikee test --dataset datasets/cybersec-2025-04-*.jsonl --target openai_gpt4o --attack best_of_n --attack-iterations 50

3. Analyze Results

Analyze results, calculate metrics, generate reports.

# Basic analysis
spikee results analyze --result-file results/results_openai_gpt4o_*.jsonl

# Guardrail analysis including false positives
spikee results analyze --result-file <attack_run.jsonl> --false-positive-checks <benign_run.jsonl>

# Convert to Excel
spikee results convert-to-excel --result-file results/results_*.jsonl

Check the docs/ folder in the GitHub repository for detailed guides.

Watch the full video tutorial playlist or read the detailed guide on our labs site:

📖 Spikee Tutorial on WithSecure Labs

Dataset

Spikee uses seed folders (e.g., datasets/seeds-cybersec-2025-04) containing base_documents.jsonl, jailbreaks.jsonl, and instructions.jsonl to generate test cases. It can also incorporate standalone_attacks.jsonl.

Key datasets provided/supported (v0.2):

seeds-cybersec-2025-04: Updated general cybersecurity-focused attacks.
seeds-targeted-2024-12: Older dataset for cybersecurity exploits used in v0.1 benchmarks.
seeds-sysmsg-extraction-2025-04: Focused on system prompt leakage.
seeds-wildguardmix-harmful: Harmful content generation (requires fetching data).
seeds-investment-advice: Topical guardrail testing (finance).
seeds-empty: Template for standalone attack datasets.
seeds-llm-mailbox: Example for a specific application test case.

See the Custom Dataset Generation Guide for using LLMs to create tailored datasets.

Example Generation Command & Statistics (targeted-12-2024)

  spikee generate --standalone_attacks datasets/seeds-targeted-2024-12/standalone_attacks.jsonl \
  --seed-folder datasets/seeds-targeted-2024-12 \
  --spotlighting-data-markers $'\nDOCUMENT\n',$'\n\nDOCUMENT\n\n'

Example Statistics (No System Message Subset - 956 entries):

Task Type

Task Type       Count
------------- -------
summarization     476
qna               476
None                4

Jailbreak Type

Jailbreak Type      Count
----------------  -------
new-instructions       56
sorry                  28
dan                   112
ignore                 84
test                  140
errors                 56
debug                  56
dev                    84
emergency              28
no-limits              56
experimental           56
hidden-function        28
academic               28
new-task               84
challenge              28
training               28
None                    4

Instruction Type

Instruction Type       Count
-------------------  -------
data-exfil-markdown      136
xss                      272
encoding                 272
translation              136
long-output              136
None                       4

Evasion Plugins

Plugins apply static transformations to payloads during dataset generation (spikee generate --plugins ...) to test resilience against common evasion techniques. They respect exclude_from_transformations_regex defined in datasets and can return multiple variations per payload. Dynamic, iterative evasions are handled separately by Attack Scripts (spikee test --attack ...).

Includes the 1337 plugin (leetspeak, Ref: [22]) and caesar plugin (Caesar cipher). Tests basic character substitution evasions.

Includes plugins for various encodings: ascii_smuggler (invisible Unicode tags, Ref: [21]), base64, hex, morse. Tests resilience against non-standard text representations.

Includes plugins that generate multiple variations per input: splat (asterisk/spacing noise), best_of_n (random scrambling/casing based on [23]), anti_spotlighting (delimiter bypass attempts), prompt_decomposition_* (structural prompt changes).

See the documentation or use spikee list plugins for a full list.

Caveats

Interpreting Spikee results requires understanding its limitations.

Spikee generates datasets based on known patterns and static plugins. While v0.2 adds dynamic attack scripts (using --attack), these currently implement primarily iterative, heuristic-based strategies (like random mutations or pattern sequences). They may not replicate sophisticated adaptive attacks from research (e.g., gradient-based, embedding space optimization [11-14, 23]) that tune specifically to a target model's weaknesses. A low Attack Success Rate (ASR) indicates resilience against the tested patterns and strategies but doesn't guarantee immunity to all forms of prompt injection, which remains a systemic challenge.

Future Developments

We plan to evolve Spikee based on community feedback and emerging research. Key areas include:

Pentester Tool Integration

Developing extensions for tools like BurpSuite and ZAP Proxy to integrate Spikee tests into standard web application security workflows.

Vision Attacks

Enabling Spikee to perform prompt injection attacks via images against multimodal models.

Advanced Judges & Attacks

Improving the Judge system (v0.2 feature, including LLM-based judges) and developing more sophisticated dynamic attack strategies.

Expanding Libraries

Continuously adding new jailbreaks, instructions, plugins, and attack techniques based on research and real-world findings. Contributions welcome!