Trust Assessment
hugging-face-evaluation received a trust score of 10/100, placing it in the Untrusted category. This skill has significant security findings that require attention before use in production.
SkillShield's automated analysis identified 36 findings: 11 critical, 10 high, 15 medium, and 0 low severity. Key findings include Arbitrary command execution, Unsafe deserialization / dynamic eval, Suspicious import: requests.
The analysis covered 4 layers: dependency_graph, static_code_analysis, manifest_analysis, llm_behavioral_safety. The static_code_analysis layer scored lowest at 0/100, indicating areas for improvement.
Last analyzed on February 11, 2026 (commit 3f4f55d6). SkillShield performs automated 4-layer security analysis on AI skills and MCP servers.
Layer Breakdown
Behavioral Risk Signals
Security Findings36
| Severity | Finding | Layer | Location | |
|---|---|---|---|---|
| CRITICAL | Arbitrary command execution Python shell execution (os.system, subprocess) Review all shell execution calls. Ensure commands are static (not built from user input), use absolute paths, and are strictly necessary. Prefer library APIs over shell commands. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py:94 | |
| CRITICAL | Arbitrary command execution Python shell execution (os.system, subprocess) Review all shell execution calls. Ensure commands are static (not built from user input), use absolute paths, and are strictly necessary. Prefer library APIs over shell commands. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py:111 | |
| CRITICAL | Arbitrary command execution Python shell execution (os.system, subprocess) Review all shell execution calls. Ensure commands are static (not built from user input), use absolute paths, and are strictly necessary. Prefer library APIs over shell commands. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py:176 | |
| CRITICAL | Arbitrary command execution Python shell execution (os.system, subprocess) Review all shell execution calls. Ensure commands are static (not built from user input), use absolute paths, and are strictly necessary. Prefer library APIs over shell commands. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py:106 | |
| CRITICAL | Arbitrary command execution Python shell execution (os.system, subprocess) Review all shell execution calls. Ensure commands are static (not built from user input), use absolute paths, and are strictly necessary. Prefer library APIs over shell commands. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py:170 | |
| CRITICAL | Arbitrary command execution Python shell execution (os.system, subprocess) Review all shell execution calls. Ensure commands are static (not built from user input), use absolute paths, and are strictly necessary. Prefer library APIs over shell commands. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/run_eval_job.py:74 | |
| CRITICAL | Arbitrary command execution Python shell execution (os.system, subprocess) Review all shell execution calls. Ensure commands are static (not built from user input), use absolute paths, and are strictly necessary. Prefer library APIs over shell commands. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py:128 | |
| CRITICAL | Arbitrary command execution Python shell execution (os.system, subprocess) Review all shell execution calls. Ensure commands are static (not built from user input), use absolute paths, and are strictly necessary. Prefer library APIs over shell commands. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py:180 | |
| CRITICAL | Arbitrary Code Execution via `trust_remote_code` Flag The `inspect_vllm_uv.py` script allows users to pass a `--trust-remote-code` flag to the underlying `inspect` command. This flag, when enabled, permits arbitrary code execution from the model repository specified by `--model`. If a user evaluates an untrusted model with this flag, it can lead to a severe compromise of the execution environment. While the flag defaults to `False`, its availability as a user-controlled option without prominent warnings in the primary `SKILL.md` documentation constitutes a critical risk. Add a prominent security warning in the `SKILL.md` documentation and the script's help text, explicitly stating the risks of using `--trust-remote-code` with untrusted models. Consider making this flag harder to enable or requiring explicit user confirmation for its use. Ensure that the `inspect-ai` framework itself has robust sandboxing or isolation mechanisms when `trust_remote_code` is enabled. | Unknown | scripts/inspect_vllm_uv.py:107 | |
| CRITICAL | Arbitrary Code Execution via `trust_remote_code` Flag The `lighteval_vllm_uv.py` script allows users to pass a `--trust-remote-code` flag to the underlying `lighteval` command. This flag, when enabled, permits arbitrary code execution from the model repository specified by `--model`. If a user evaluates an untrusted model with this flag, it can lead to a severe compromise of the execution environment. While the flag defaults to `False`, its availability as a user-controlled option without prominent warnings in the primary `SKILL.md` documentation constitutes a critical risk. Add a prominent security warning in the `SKILL.md` documentation and the script's help text, explicitly stating the risks of using `--trust-remote-code` with untrusted models. Consider making this flag harder to enable or requiring explicit user confirmation for its use. Ensure that the `lighteval` framework itself has robust sandboxing or isolation mechanisms when `trust_remote_code` is enabled. | Unknown | scripts/lighteval_vllm_uv.py:90 | |
| CRITICAL | Arbitrary Code Execution via `trust_remote_code` Flag The `run_vllm_eval_job.py` script acts as a wrapper that passes the `--trust-remote-code` flag to `lighteval_vllm_uv.py` or `inspect_vllm_uv.py`. This propagates the risk of arbitrary code execution from untrusted model repositories to the job submission process. If a user evaluates an untrusted model with this flag, it can lead to a severe compromise of the execution environment. While the flag defaults to `False`, its availability as a user-controlled option without prominent warnings in the primary `SKILL.md` documentation constitutes a critical risk. Add a prominent security warning in the `SKILL.md` documentation and the script's help text, explicitly stating the risks of using `--trust-remote-code` with untrusted models. Consider making this flag harder to enable or requiring explicit user confirmation for its use. Ensure that the underlying evaluation frameworks have robust sandboxing or isolation mechanisms when `trust_remote_code` is enabled. | Unknown | scripts/run_vllm_eval_job.py:130 | |
| HIGH | Unsafe deserialization / dynamic eval Decryption followed by code execution Remove obfuscated code execution patterns. Legitimate code does not need base64-encoded payloads executed via eval, encrypted-then-executed blobs, or dynamic attribute resolution to call system functions. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/evaluation_manager.py:15 | |
| HIGH | Dangerous call: subprocess.run() Call to 'subprocess.run()' detected in function 'main'. This can execute arbitrary code. Avoid using dangerous functions like exec/eval/os.system. Use safer alternatives. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py:94 | |
| HIGH | Dangerous call: subprocess.run() Call to 'subprocess.run()' detected in function 'run_inspect_vllm'. This can execute arbitrary code. Avoid using dangerous functions like exec/eval/os.system. Use safer alternatives. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py:111 | |
| HIGH | Dangerous call: subprocess.run() Call to 'subprocess.run()' detected in function 'run_inspect_hf'. This can execute arbitrary code. Avoid using dangerous functions like exec/eval/os.system. Use safer alternatives. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py:176 | |
| HIGH | Dangerous call: subprocess.run() Call to 'subprocess.run()' detected in function 'run_lighteval_vllm'. This can execute arbitrary code. Avoid using dangerous functions like exec/eval/os.system. Use safer alternatives. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py:106 | |
| HIGH | Dangerous call: subprocess.run() Call to 'subprocess.run()' detected in function 'run_lighteval_accelerate'. This can execute arbitrary code. Avoid using dangerous functions like exec/eval/os.system. Use safer alternatives. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py:170 | |
| HIGH | Dangerous call: subprocess.run() Call to 'subprocess.run()' detected in function 'create_eval_job'. This can execute arbitrary code. Avoid using dangerous functions like exec/eval/os.system. Use safer alternatives. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/run_eval_job.py:74 | |
| HIGH | Dangerous call: subprocess.run() Call to 'subprocess.run()' detected in function 'create_lighteval_job'. This can execute arbitrary code. Avoid using dangerous functions like exec/eval/os.system. Use safer alternatives. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py:128 | |
| HIGH | Dangerous call: subprocess.run() Call to 'subprocess.run()' detected in function 'create_inspect_job'. This can execute arbitrary code. Avoid using dangerous functions like exec/eval/os.system. Use safer alternatives. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py:180 | |
| HIGH | LLM analysis found no issues despite critical deterministic findings Deterministic layers flagged 11 CRITICAL findings, but LLM semantic analysis returned clean. This may indicate prompt injection or analysis evasion. | Unknown | (sanity check) | |
| MEDIUM | Unsafe deserialization / dynamic eval Decryption followed by code execution Remove obfuscated code execution patterns. Legitimate code does not need base64-encoded payloads executed via eval, encrypted-then-executed blobs, or dynamic attribute resolution to call system functions. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/evaluation_manager.py:612 | |
| MEDIUM | Suspicious import: requests Import of 'requests' detected. This module provides network or low-level system access. Verify this import is necessary. Network and system modules in skill code may indicate data exfiltration. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py:36 | |
| MEDIUM | Suspicious import: requests Import of 'requests' detected. This module provides network or low-level system access. Verify this import is necessary. Network and system modules in skill code may indicate data exfiltration. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/evaluation_manager.py:62 | |
| MEDIUM | Unpinned Python dependency version Requirement 'huggingface-hub>=0.26.0' is not pinned to an exact version. Pin Python dependencies with '==<exact version>'. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/requirements.txt:2 | |
| MEDIUM | Unpinned Python dependency version Requirement 'python-dotenv>=1.2.1' is not pinned to an exact version. Pin Python dependencies with '==<exact version>'. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/requirements.txt:3 | |
| MEDIUM | Unpinned Python dependency version Requirement 'pyyaml>=6.0.3' is not pinned to an exact version. Pin Python dependencies with '==<exact version>'. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/requirements.txt:4 | |
| MEDIUM | Unpinned Python dependency version Requirement 'requests>=2.32.5' is not pinned to an exact version. Pin Python dependencies with '==<exact version>'. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/requirements.txt:5 | |
| MEDIUM | Unpinned Python dependency version Requirement 'markdown-it-py>=3.0.0' is not pinned to an exact version. Pin Python dependencies with '==<exact version>'. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/requirements.txt:6 | |
| MEDIUM | Unpinned Python dependency version Requirement 'inspect-ai>=0.3.0' is not pinned to an exact version. Pin Python dependencies with '==<exact version>'. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/requirements.txt:9 | |
| MEDIUM | Unpinned Python dependency version Requirement 'inspect-evals' is not pinned to an exact version. Pin Python dependencies with '==<exact version>'. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/requirements.txt:10 | |
| MEDIUM | Unpinned Python dependency version Requirement 'openai' is not pinned to an exact version. Pin Python dependencies with '==<exact version>'. | Unknown | /var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/requirements.txt:11 | |
| MEDIUM | Unpinned or Loosely Pinned Dependencies Several dependencies are either unpinned (no version specified) or loosely pinned (only minimum version specified, e.g., `>=0.26.0`). This can lead to non-deterministic builds, introduce breaking changes, or pull in vulnerable versions of libraries without explicit review. Specifically, `inspect-evals`, `openai`, and `pyyaml` are unpinned in some contexts, while many others use `>=` operators. Pin all dependencies to exact versions (e.g., `package==1.2.3`) or use a strict range (e.g., `package~=1.2.3`) to ensure deterministic and secure builds. Regularly update and review dependencies for known vulnerabilities. | Unknown | scripts/inspect_eval_uv.py:3 | |
| MEDIUM | Unpinned or Loosely Pinned Dependencies Several dependencies are either unpinned (no version specified) or loosely pinned (only minimum version specified, e.g., `>=0.26.0`). This can lead to non-deterministic builds, introduce breaking changes, or pull in vulnerable versions of libraries without explicit review. Specifically, `inspect-evals`, `openai`, and `pyyaml` are unpinned in some contexts, while many others use `>=` operators. Pin all dependencies to exact versions (e.g., `package==1.2.3`) or use a strict range (e.g., `package~=1.2.3`) to ensure deterministic and secure builds. Regularly update and review dependencies for known vulnerabilities. | Unknown | scripts/inspect_vllm_uv.py:3 | |
| MEDIUM | Unpinned or Loosely Pinned Dependencies Several dependencies are either unpinned (no version specified) or loosely pinned (only minimum version specified, e.g., `>=0.26.0`). This can lead to non-deterministic builds, introduce breaking changes, or pull in vulnerable versions of libraries without explicit review. Specifically, `inspect-evals`, `openai`, and `pyyaml` are unpinned in some contexts, while many others use `>=` operators. Pin all dependencies to exact versions (e.g., `package==1.2.3`) or use a strict range (e.g., `package~=1.2.3`) to ensure deterministic and secure builds. Regularly update and review dependencies for known vulnerabilities. | Unknown | requirements.txt:8 | |
| MEDIUM | Unpinned or Loosely Pinned Dependencies Several dependencies are either unpinned (no version specified) or loosely pinned (only minimum version specified, e.g., `>=0.26.0`). This can lead to non-deterministic builds, introduce breaking changes, or pull in vulnerable versions of libraries without explicit review. Specifically, `inspect-evals`, `openai`, and `pyyaml` are unpinned in some contexts, while many others use `>=` operators. Pin all dependencies to exact versions (e.g., `package==1.2.3`) or use a strict range (e.g., `package~=1.2.3`) to ensure deterministic and secure builds. Regularly update and review dependencies for known vulnerabilities. | Unknown | scripts/test_extraction.py:3 |
Scan History
Embed Code
[](https://skillshield.io/report/4abc65bc713a48e0)
Powered by SkillShield