Security Audit

Jamkris/everything-gemini-code:skills/agent-eval

github.com/Jamkris/everything-gemini-code

AI SkillCommit 6c6f43aaa3d6

CRITICAL

Scanned 2 months ago

Critical

Immediate action required

High

Priority fixes suggested

Medium

Best practices review

Low

Acknowledged / Tracked

Trust Assessment

Jamkris/everything-gemini-code:skills/agent-eval received a trust score of 35/100, placing it in the Untrusted category. This skill has significant security findings that require attention before use in production.

SkillShield's automated analysis identified 3 findings: 1 critical, 2 high, 0 medium, and 0 low severity. Key findings include Arbitrary Command Execution via Task Definitions, Potential Data Exfiltration via Malicious Commands or Grep Patterns, Broad Filesystem and Execution Permissions.

The analysis covered 4 layers: Manifest Analysis, Static Code Analysis, Dependency Graph, LLM Behavioral Safety. The LLM Behavioral Safety layer scored lowest at 40/100, indicating areas for improvement.

Last analyzed on March 30, 2026 (commit 6c6f43aa). SkillShield performs automated 4-layer security analysis on AI skills and MCP servers.

Layer Breakdown

Manifest Analysis

100%

Static Code Analysis

100%

Dependency Graph

100%

LLM Behavioral Safety

40%

Behavioral Risk Signals

Filesystem Write

3 findings

Shell Execution

3 findings

Dynamic Code

2 findings

Excessive Permissions

2 findings

Security Findings3

Severity	Finding	Layer	Location
CRITICAL	Arbitrary Command Execution via Task Definitions The skill's core functionality, as described in the untrusted content, involves executing arbitrary shell commands defined within YAML task files (e.g., `judge.command`, `judge.grep`). The manifest explicitly grants `Bash` tool access. If an attacker can supply or modify these task definitions, they can inject and execute arbitrary commands on the host system, leading to full system compromise. Implement strict validation and sanitization of all user-provided command strings and file paths within task definitions. Consider sandboxing or containerization for command execution. Ensure that task definitions are only sourced from trusted origins and are immutable after review.	LLM	SKILL.md:48
HIGH	Potential Data Exfiltration via Malicious Commands or Grep Patterns The skill allows for the execution of arbitrary commands and `grep` operations as part of its judging mechanism. A malicious actor, by crafting untrusted task definitions, could use these capabilities to read sensitive files (e.g., `/etc/passwd`, environment variables, configuration files) and potentially exfiltrate their contents. The `Read` and `Grep` tools are explicitly granted in the manifest, facilitating this risk. As with command injection, validate and sanitize all inputs to `command` and `grep` types. Restrict file access to only necessary directories and prevent access to sensitive system paths. Implement robust logging to detect unusual file access patterns.	LLM	SKILL.md:60
HIGH	Broad Filesystem and Execution Permissions The skill's manifest declares broad permissions including `Read, Write, Edit, Bash, Grep, Glob`. While these permissions are described as necessary for its intended function (evaluating agents by interacting with codebases and running tests), they significantly increase the attack surface. When combined with the command injection capabilities, these extensive permissions allow for widespread system compromise, including arbitrary file modification and data access. Review and minimize required permissions to the absolute minimum necessary for the skill's operation. If `Bash` is essential, ensure all commands executed through it are strictly controlled, sanitized, and ideally run within a highly restricted sandbox environment with limited network and filesystem access.	LLM	Manifest

Scan History

Embed Code

[![SkillShield](https://skillshield.io/api/v1/badge/90311e856e7cb5b1.svg)](https://skillshield.io/report/90311e856e7cb5b1)