Security Audit

evaluating-code-models

github.com/davila7/claude-code-templates

AI SkillCommit 458b11867eae

CAUTION

Scanned 4 months ago

Critical

Immediate action required

High

Priority fixes suggested

Medium

Best practices review

Low

Acknowledged / Tracked

Trust Assessment

evaluating-code-models received a trust score of 59/100, placing it in the Caution category. This skill has some security considerations that users should review before deployment.

SkillShield's automated analysis identified 4 findings: 1 critical, 0 high, 1 medium, and 2 low severity. Key findings include Network egress to untrusted endpoints, Covert behavior / concealment directives, Direct execution of untrusted code on host system via `--allow_code_execution` and `--trust_remote_code`.

The analysis covered 4 layers: Manifest Analysis, Static Code Analysis, Dependency Graph, LLM Behavioral Safety. The LLM Behavioral Safety layer scored lowest at 68/100, indicating areas for improvement.

Last analyzed on February 12, 2026 (commit 458b1186). SkillShield performs automated 4-layer security analysis on AI skills and MCP servers.

Layer Breakdown

Manifest Analysis

91%

Static Code Analysis

100%

Dependency Graph

100%

LLM Behavioral Safety

68%

Behavioral Risk Signals

Network Access

2 findings

Filesystem Write

1 finding

Shell Execution

1 finding

Dynamic Code

2 findings

Security Findings4

Severity	Finding	Layer	Location
CRITICAL	Direct execution of untrusted code on host system via `--allow_code_execution` and `--trust_remote_code` The skill instructs the user to run `accelerate launch main.py` with the `--allow_code_execution` flag, which enables the execution of code generated by the evaluated model directly on the host system. Since the generated code is untrusted, this creates a critical command injection vulnerability where a malicious model could generate and execute arbitrary code. Furthermore, the `--trust_remote_code` flag, used for custom/private models, allows loading and executing arbitrary Python code from a remote HuggingFace model repository or local path. If the specified model is malicious or compromised, this flag enables the execution of that malicious code on the host. While Docker is suggested for some workflows as a mitigation, it is not universally enforced for all instances where these flags are used, leaving the host system vulnerable. Always execute untrusted code (model generations or remote model code) in an isolated, sandboxed environment (e.g., a dedicated Docker container, VM, or secure sandbox service) with minimal permissions. Avoid using `--trust_remote_code` unless the source is absolutely trusted and verified. If `--trust_remote_code` is necessary, ensure the environment is fully sandboxed.	LLM	SKILL.md:60
MEDIUM	Network egress to untrusted endpoints HTTP request to raw IP address Review all outbound network calls. Remove connections to webhook collectors, paste sites, and raw IP addresses. Legitimate API calls should use well-known service domains.	Manifest	cli-tool/components/mcps/devtools/figma-dev-mode.json:4
LOW	Covert behavior / concealment directives Multiple zero-width characters (stealth text) Remove hidden instructions, zero-width characters, and bidirectional overrides. Skill instructions should be fully visible and transparent to users.	Manifest	cli-tool/components/mcps/devtools/jfrog.json:4
LOW	Unpinned `bigcode-evaluation-harness` dependency The `bigcode-evaluation-harness` dependency is not pinned to a specific version in the manifest. This allows `pip install` to fetch the latest version, which could introduce unexpected behavior, breaking changes, or potentially malicious code if the upstream repository is compromised. While other dependencies use `>=` for versioning, the primary package being unpinned is a higher risk. Pin the `bigcode-evaluation-harness` dependency to a specific, known-good version (e.g., `"bigcode-evaluation-harness==X.Y.Z"`) to ensure deterministic builds and reduce the risk of unexpected changes or supply chain attacks.	LLM	SKILL.md:1

Scan History

Embed Code

[![SkillShield](https://skillshield.io/api/v1/badge/b5626b3cf1a9aff2.svg)](https://skillshield.io/report/b5626b3cf1a9aff2)