Security Audit

pdf-to-structured

github.com/openclaw/skills

AI SkillCommit 13146e6a3d46

CAUTION

Scanned 5 months ago

Critical

Immediate action required

High

Priority fixes suggested

Medium

Best practices review

Low

Acknowledged / Tracked

Trust Assessment

pdf-to-structured received a trust score of 62/100, placing it in the Caution category. This skill has some security considerations that users should review before deployment.

SkillShield's automated analysis identified 4 findings: 0 critical, 2 high, 2 medium, and 0 low severity. Key findings include Missing required field: name, Unpinned Python Dependencies, Potential Command Injection via Tesseract Language Parameter.

The analysis covered 4 layers: Manifest Analysis, Static Code Analysis, Dependency Graph, LLM Behavioral Safety. The LLM Behavioral Safety layer scored lowest at 63/100, indicating areas for improvement.

Last analyzed on February 13, 2026 (commit 13146e6a). SkillShield performs automated 4-layer security analysis on AI skills and MCP servers.

Layer Breakdown

Manifest Analysis

100%

Static Code Analysis

93%

Dependency Graph

100%

LLM Behavioral Safety

63%

Behavioral Risk Signals

Network Access

1 finding

Filesystem Write

1 finding

Shell Execution

2 findings

Dynamic Code

1 finding

Excessive Permissions

1 finding

Security Findings4

Severity	Finding	Layer	Location
HIGH	Unpinned Python Dependencies The skill's installation instructions specify Python packages without pinning their versions (e.g., `pip install pdfplumber pandas openpyxl`). This introduces a significant supply chain risk. An attacker could publish a malicious version of one of these packages, or compromise a legitimate package, leading to arbitrary code execution when the skill is installed or updated. Without pinned versions, the skill's behavior can change unexpectedly, and security vulnerabilities in newer versions of dependencies could be automatically introduced. Pin all Python dependencies to specific versions (e.g., `pdfplumber==0.12.0`). Use a `requirements.txt` file with exact versions or a dependency management tool like Poetry or PDM.	LLM	SKILL.md:40
HIGH	Potential Command Injection via Tesseract Language Parameter The `ocr_scanned_pdf` function passes the `language` argument directly to `pytesseract.image_to_string(image, lang=language)`. `pytesseract` is a wrapper around the external Tesseract OCR executable. If the `language` parameter is derived from untrusted user input, an attacker could inject shell commands (e.g., `lang='eng; rm -rf /'`) if `pytesseract` or the underlying `tesseract` command-line parsing does not properly escape or sanitize the input. This risk is amplified by the unpinned `pytesseract` dependency, as older versions may be more vulnerable. Sanitize or validate the `language` parameter to ensure it only contains valid language codes. Consider using a whitelist of allowed language codes. Ensure `pytesseract` is updated to a version that mitigates known command injection vulnerabilities (e.g., >=0.3.10) and pin its version.	LLM	SKILL.md:134
MEDIUM	Missing required field: name The 'name' field is required for claude_code skills but is missing from frontmatter. Add a 'name' field to the SKILL.md frontmatter.	Static	skills/datadrivenconstruction/pdf-to-structured/SKILL.md:1
MEDIUM	Excessive File System Access with Untrusted Paths Several functions, such as `extract_tables_from_pdf`, `ocr_scanned_pdf`, and `batch_extract_tables`, accept file paths (`pdf_path`, `folder_path`, `output_folder`) as arguments and perform read/write operations based on these paths. If these path arguments are supplied by untrusted user input without proper validation (e.g., path traversal checks), an attacker could specify arbitrary file system locations. This could lead to data exfiltration (reading sensitive files like `/etc/passwd`), data corruption (overwriting critical system files), or denial of service (filling up disk space in arbitrary locations). The `batch_extract_tables` function is particularly concerning as it iterates over an entire `folder_path` and writes to an `output_folder`. Implement strict input validation and sanitization for all file path arguments. Restrict file operations to a designated, sandboxed directory. Use path normalization and canonicalization to prevent path traversal attacks (e.g., disallow `../`). If possible, pass file content directly instead of paths.	LLM	SKILL.md:199

Scan History

Embed Code

[![SkillShield](https://skillshield.io/api/v1/badge/3a9f04baeff05cc9.svg)](https://skillshield.io/report/3a9f04baeff05cc9)