Security Audit
Jamkris/everything-gemini-code:skills/agent-eval
github.com/Jamkris/everything-gemini-codeTrust Assessment
Jamkris/everything-gemini-code:skills/agent-eval received a trust score of 35/100, placing it in the Untrusted category. This skill has significant security findings that require attention before use in production.
SkillShield's automated analysis identified 3 findings: 1 critical, 2 high, 0 medium, and 0 low severity. Key findings include Arbitrary Command Execution via Task Definitions, Potential Data Exfiltration via Malicious Commands or Grep Patterns, Broad Filesystem and Execution Permissions.
The analysis covered 4 layers: Manifest Analysis, Static Code Analysis, Dependency Graph, LLM Behavioral Safety. The LLM Behavioral Safety layer scored lowest at 40/100, indicating areas for improvement.
Last analyzed on March 30, 2026 (commit 6c6f43aa). SkillShield performs automated 4-layer security analysis on AI skills and MCP servers.
Layer Breakdown
Behavioral Risk Signals
Security Findings3
| Severity | Finding | Layer | Location | |
|---|---|---|---|---|
| CRITICAL | Arbitrary Command Execution via Task Definitions The skill's core functionality, as described in the untrusted content, involves executing arbitrary shell commands defined within YAML task files (e.g., `judge.command`, `judge.grep`). The manifest explicitly grants `Bash` tool access. If an attacker can supply or modify these task definitions, they can inject and execute arbitrary commands on the host system, leading to full system compromise. Implement strict validation and sanitization of all user-provided command strings and file paths within task definitions. Consider sandboxing or containerization for command execution. Ensure that task definitions are only sourced from trusted origins and are immutable after review. | LLM | SKILL.md:48 | |
| HIGH | Potential Data Exfiltration via Malicious Commands or Grep Patterns The skill allows for the execution of arbitrary commands and `grep` operations as part of its judging mechanism. A malicious actor, by crafting untrusted task definitions, could use these capabilities to read sensitive files (e.g., `/etc/passwd`, environment variables, configuration files) and potentially exfiltrate their contents. The `Read` and `Grep` tools are explicitly granted in the manifest, facilitating this risk. As with command injection, validate and sanitize all inputs to `command` and `grep` types. Restrict file access to only necessary directories and prevent access to sensitive system paths. Implement robust logging to detect unusual file access patterns. | LLM | SKILL.md:60 | |
| HIGH | Broad Filesystem and Execution Permissions The skill's manifest declares broad permissions including `Read, Write, Edit, Bash, Grep, Glob`. While these permissions are described as necessary for its intended function (evaluating agents by interacting with codebases and running tests), they significantly increase the attack surface. When combined with the command injection capabilities, these extensive permissions allow for widespread system compromise, including arbitrary file modification and data access. Review and minimize required permissions to the absolute minimum necessary for the skill's operation. If `Bash` is essential, ensure all commands executed through it are strictly controlled, sanitized, and ideally run within a highly restricted sandbox environment with limited network and filesystem access. | LLM | Manifest |
Scan History
Embed Code
[](https://skillshield.io/report/90311e856e7cb5b1)
Powered by SkillShield