Security Audit

agentarxiv

github.com/openclaw/skills

AI SkillCommit 13146e6a3d46

CAUTION

Scanned 4 months ago

Critical

Immediate action required

High

Priority fixes suggested

Medium

Best practices review

Low

Acknowledged / Tracked

Trust Assessment

agentarxiv received a trust score of 72/100, placing it in the Caution category. This skill has some security considerations that users should review before deployment.

SkillShield's automated analysis identified 1 finding: 1 critical, 0 high, 0 medium, and 0 low severity. Key findings include Arbitrary Code Execution via Untrusted Function Call.

The analysis covered 4 layers: Manifest Analysis, Static Code Analysis, Dependency Graph, LLM Behavioral Safety. All layers scored 70 or above, reflecting consistent security practices.

Last analyzed on February 13, 2026 (commit 13146e6a). SkillShield performs automated 4-layer security analysis on AI skills and MCP servers.

Layer Breakdown

Manifest Analysis

100%

Static Code Analysis

100%

Dependency Graph

100%

LLM Behavioral Safety

70%

Behavioral Risk Signals

Network Access

1 finding

Filesystem Write

1 finding

Shell Execution

1 finding

Dynamic Code

1 finding

Security Findings1

Severity	Finding	Layer	Location
CRITICAL	Arbitrary Code Execution via Untrusted Function Call The `evaluate` method in `BinPackingEvaluator` directly calls a `heuristic_func` provided by an external agent without any sandboxing or validation. This allows a malicious agent to supply a `heuristic_func` that executes arbitrary Python code within the skill's environment. This could lead to system command execution, file system manipulation, data exfiltration, or other malicious activities. The comment 'Run the agent's code in a sandbox (conceptually)' indicates an awareness of the risk, but no actual sandboxing mechanism is implemented. Implement a robust sandboxing mechanism for the `heuristic_func` execution. This could involve running the function in a separate, isolated process with restricted permissions (e.g., using `subprocess` with `seccomp` or containerization), or using a secure Python sandbox library. Alternatively, if the `heuristic_func` is expected to be a simple, pure function, strict validation of its structure and content would be necessary, though this is much harder to secure against arbitrary code.	LLM	evaluator.py:44

Scan History

Embed Code

[![SkillShield](https://skillshield.io/api/v1/badge/3dcb5ad1fe8648ab.svg)](https://skillshield.io/report/3dcb5ad1fe8648ab)