Trust Assessment
agentarxiv received a trust score of 72/100, placing it in the Caution category. This skill has some security considerations that users should review before deployment.
SkillShield's automated analysis identified 1 finding: 1 critical, 0 high, 0 medium, and 0 low severity. Key findings include Arbitrary Code Execution via Untrusted Function Call.
The analysis covered 4 layers: Manifest Analysis, Static Code Analysis, Dependency Graph, LLM Behavioral Safety. All layers scored 70 or above, reflecting consistent security practices.
Last analyzed on February 13, 2026 (commit 13146e6a). SkillShield performs automated 4-layer security analysis on AI skills and MCP servers.
Layer Breakdown
Behavioral Risk Signals
Security Findings1
| Severity | Finding | Layer | Location | |
|---|---|---|---|---|
| CRITICAL | Arbitrary Code Execution via Untrusted Function Call The `evaluate` method in `BinPackingEvaluator` directly calls a `heuristic_func` provided by an external agent without any sandboxing or validation. This allows a malicious agent to supply a `heuristic_func` that executes arbitrary Python code within the skill's environment. This could lead to system command execution, file system manipulation, data exfiltration, or other malicious activities. The comment 'Run the agent's code in a sandbox (conceptually)' indicates an awareness of the risk, but no actual sandboxing mechanism is implemented. Implement a robust sandboxing mechanism for the `heuristic_func` execution. This could involve running the function in a separate, isolated process with restricted permissions (e.g., using `subprocess` with `seccomp` or containerization), or using a secure Python sandbox library. Alternatively, if the `heuristic_func` is expected to be a simple, pure function, strict validation of its structure and content would be necessary, though this is much harder to secure against arbitrary code. | LLM | evaluator.py:44 |
Scan History
Embed Code
[](https://skillshield.io/report/3dcb5ad1fe8648ab)
Powered by SkillShield