Security Audit

llm-document-extraction

github.com/openclaw/skills

AI SkillCommit 13146e6a3d46

CAUTION

Scanned 5 months ago

Critical

Immediate action required

High

Priority fixes suggested

Medium

Best practices review

Low

Acknowledged / Tracked

Trust Assessment

llm-document-extraction received a trust score of 58/100, placing it in the Caution category. This skill has some security considerations that users should review before deployment.

SkillShield's automated analysis identified 8 findings: 2 critical, 3 high, 2 medium, and 0 low severity. Key findings include Missing required field: name, Prompt Injection via Untrusted Document Content (PDF Text), Prompt Injection via User-Controlled Extraction Schema.

The analysis covered 4 layers: Manifest Analysis, Static Code Analysis, Dependency Graph, LLM Behavioral Safety. The LLM Behavioral Safety layer scored lowest at 0/100, indicating areas for improvement.

Last analyzed on February 13, 2026 (commit 13146e6a). SkillShield performs automated 4-layer security analysis on AI skills and MCP servers.

Layer Breakdown

Manifest Analysis

100%

Static Code Analysis

93%

Dependency Graph

100%

LLM Behavioral Safety

Behavioral Risk Signals

Network Access

1 finding

Filesystem Write

3 findings

Shell Execution

2 findings

Dynamic Code

4 findings

Excessive Permissions

1 finding

Security Findings8

Severity	Finding	Layer	Location
CRITICAL	Prompt Injection via Untrusted Document Content (PDF Text) The `extract_from_pdf` function constructs an LLM prompt by directly embedding `text` extracted from a PDF file. If the `pdf_path` points to a malicious document, the extracted text can contain prompt injection instructions, allowing an attacker to manipulate the LLM's behavior, override system instructions, or extract sensitive information from the LLM's context. Implement robust input sanitization and validation for all untrusted content before embedding it into LLM prompts. Consider using LLM-specific prompt templating libraries that offer better isolation, or employ techniques like XML/JSON tagging with strict parsing to delineate untrusted content from instructions. For document content, consider using a separate, sandboxed LLM call to summarize or sanitize before feeding to the main extraction prompt.	LLM	SKILL.md:45
CRITICAL	Prompt Injection via RAG Context and User Question The `query_document` function constructs an LLM prompt by embedding both `context` (derived from untrusted document chunks) and the user-provided `question`. Both inputs are susceptible to prompt injection. Malicious document content or a crafted question could manipulate the LLM's response, leading to unintended actions or information disclosure. Implement robust input sanitization and validation for both the `context` (from document chunks) and the `question`. For RAG systems, consider using techniques like query rewriting, response filtering, or a separate LLM to validate context and question before feeding them to the main LLM. Ensure strict separation of instructions from untrusted data within the prompt.	LLM	SKILL.md:215
HIGH	Prompt Injection via User-Controlled Extraction Schema The `extract_from_pdf` function embeds the user-provided `extraction_schema` directly into the LLM prompt. A malicious schema could contain prompt injection instructions, allowing an attacker to manipulate the LLM's behavior beyond the intended data extraction task. Validate and sanitize the `extraction_schema` to ensure it only contains valid JSON structure and does not include executable code or prompt injection attempts. If possible, use a predefined set of schemas or strictly enforce schema structure to prevent arbitrary instruction injection.	LLM	SKILL.md:49
HIGH	Prompt Injection via User-Controlled Query (Vision Model) The `extract_from_drawing` function directly embeds the user-provided `query` into the LLM prompt for the vision model. A malicious query could contain prompt injection instructions, allowing an attacker to manipulate the LLM's behavior or extract unintended information. Sanitize and validate the `query` string to remove any potential prompt injection attempts. Consider using a predefined set of queries or strictly limiting the complexity and content of user-provided queries.	LLM	SKILL.md:160
HIGH	Excessive Permissions for Local File Access The skill demonstrates reading local files using `pdfplumber.open(pdf_path)`, `open(image_path, 'rb')`, and `PyPDFLoader(pdf_path)`. If the skill is executed in an environment with broad filesystem access, a malicious `pdf_path` or `image_path` could be crafted to read arbitrary files on the system, leading to unauthorized data access or disclosure. Implement strict validation of `pdf_path` and `image_path` to ensure they point only to allowed directories or file types. Run the skill in a sandboxed environment with minimal filesystem permissions, restricting access only to necessary directories. Avoid allowing arbitrary file paths from untrusted input.	LLM	SKILL.md:36
MEDIUM	Missing required field: name The 'name' field is required for claude_code skills but is missing from frontmatter. Add a 'name' field to the SKILL.md frontmatter.	Static	skills/datadrivenconstruction/llm-document-extraction/SKILL.md:1
MEDIUM	Data Exfiltration to External LLM Provider The skill's core functionality involves sending potentially sensitive document content (text from PDFs, base64 encoded image data, RAG document chunks) to an external LLM provider (OpenAI) via `client.chat.completions.create`. This constitutes data exfiltration to a third-party service. Users should be fully aware of the LLM provider's data handling, privacy policies, and security measures, especially when processing confidential or personally identifiable information. Clearly inform users about the data privacy implications of sending documents to external LLM providers. Provide options for using on-premise or private LLMs if sensitive data cannot leave the user's environment. Implement data anonymization or redaction techniques for highly sensitive information before sending it to external services.	LLM	SKILL.md:57
INFO	Unpinned Dependencies in Requirements The `Requirements` section lists Python packages without specific version pinning (e.g., `openai` instead of `openai==1.2.3`). This can introduce supply chain risks, as future updates to these packages might introduce breaking changes, vulnerabilities, or unexpected behavior. While not a direct exploit in the provided code, it's a best practice for reproducible and secure environments. Pin all dependencies to specific versions (e.g., `openai==1.2.3`) to ensure reproducibility and mitigate risks from unexpected updates. Regularly review and update dependencies to incorporate security patches.	LLM	SKILL.md:225

Scan History

Embed Code

[![SkillShield](https://skillshield.io/api/v1/badge/412dea77cda400db.svg)](https://skillshield.io/report/412dea77cda400db)