Security Audit

hugging-face-evaluation

github.com/huggingface/skills

AI SkillCommit 3f4f55d6264b

CRITICAL

Scanned 8 days ago

Critical

Immediate action required

High

Priority fixes suggested

Medium

Best practices review

Low

Acknowledged / Tracked

Trust Assessment

hugging-face-evaluation received a trust score of 10/100, placing it in the Untrusted category. This skill has significant security findings that require attention before use in production.

SkillShield's automated analysis identified 36 findings: 11 critical, 10 high, 15 medium, and 0 low severity. Key findings include Arbitrary command execution, Unsafe deserialization / dynamic eval, Suspicious import: requests.

The analysis covered 4 layers: dependency_graph, static_code_analysis, manifest_analysis, llm_behavioral_safety. The static_code_analysis layer scored lowest at 0/100, indicating areas for improvement.

Last analyzed on February 11, 2026 (commit 3f4f55d6). SkillShield performs automated 4-layer security analysis on AI skills and MCP servers.

Layer Breakdown

Manifest Analysis

Static Code Analysis

Dependency Graph

44%

LLM Behavioral Safety

85%

Behavioral Risk Signals

Network Access

3 findings

Shell Execution

21 findings

Dynamic Code

15 findings

Excessive Permissions

2 findings

Security Findings36

Severity	Finding	Layer	Location
CRITICAL	Arbitrary command execution Python shell execution (os.system, subprocess) Review all shell execution calls. Ensure commands are static (not built from user input), use absolute paths, and are strictly necessary. Prefer library APIs over shell commands.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py:94
CRITICAL	Arbitrary command execution Python shell execution (os.system, subprocess) Review all shell execution calls. Ensure commands are static (not built from user input), use absolute paths, and are strictly necessary. Prefer library APIs over shell commands.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py:111
CRITICAL	Arbitrary command execution Python shell execution (os.system, subprocess) Review all shell execution calls. Ensure commands are static (not built from user input), use absolute paths, and are strictly necessary. Prefer library APIs over shell commands.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py:176
CRITICAL	Arbitrary command execution Python shell execution (os.system, subprocess) Review all shell execution calls. Ensure commands are static (not built from user input), use absolute paths, and are strictly necessary. Prefer library APIs over shell commands.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py:106
CRITICAL	Arbitrary command execution Python shell execution (os.system, subprocess) Review all shell execution calls. Ensure commands are static (not built from user input), use absolute paths, and are strictly necessary. Prefer library APIs over shell commands.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py:170
CRITICAL	Arbitrary command execution Python shell execution (os.system, subprocess) Review all shell execution calls. Ensure commands are static (not built from user input), use absolute paths, and are strictly necessary. Prefer library APIs over shell commands.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/run_eval_job.py:74
CRITICAL	Arbitrary command execution Python shell execution (os.system, subprocess) Review all shell execution calls. Ensure commands are static (not built from user input), use absolute paths, and are strictly necessary. Prefer library APIs over shell commands.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py:128
CRITICAL	Arbitrary command execution Python shell execution (os.system, subprocess) Review all shell execution calls. Ensure commands are static (not built from user input), use absolute paths, and are strictly necessary. Prefer library APIs over shell commands.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py:180
CRITICAL	Arbitrary Code Execution via `trust_remote_code` Flag The `inspect_vllm_uv.py` script allows users to pass a `--trust-remote-code` flag to the underlying `inspect` command. This flag, when enabled, permits arbitrary code execution from the model repository specified by `--model`. If a user evaluates an untrusted model with this flag, it can lead to a severe compromise of the execution environment. While the flag defaults to `False`, its availability as a user-controlled option without prominent warnings in the primary `SKILL.md` documentation constitutes a critical risk. Add a prominent security warning in the `SKILL.md` documentation and the script's help text, explicitly stating the risks of using `--trust-remote-code` with untrusted models. Consider making this flag harder to enable or requiring explicit user confirmation for its use. Ensure that the `inspect-ai` framework itself has robust sandboxing or isolation mechanisms when `trust_remote_code` is enabled.	Unknown	scripts/inspect_vllm_uv.py:107
CRITICAL	Arbitrary Code Execution via `trust_remote_code` Flag The `lighteval_vllm_uv.py` script allows users to pass a `--trust-remote-code` flag to the underlying `lighteval` command. This flag, when enabled, permits arbitrary code execution from the model repository specified by `--model`. If a user evaluates an untrusted model with this flag, it can lead to a severe compromise of the execution environment. While the flag defaults to `False`, its availability as a user-controlled option without prominent warnings in the primary `SKILL.md` documentation constitutes a critical risk. Add a prominent security warning in the `SKILL.md` documentation and the script's help text, explicitly stating the risks of using `--trust-remote-code` with untrusted models. Consider making this flag harder to enable or requiring explicit user confirmation for its use. Ensure that the `lighteval` framework itself has robust sandboxing or isolation mechanisms when `trust_remote_code` is enabled.	Unknown	scripts/lighteval_vllm_uv.py:90
CRITICAL	Arbitrary Code Execution via `trust_remote_code` Flag The `run_vllm_eval_job.py` script acts as a wrapper that passes the `--trust-remote-code` flag to `lighteval_vllm_uv.py` or `inspect_vllm_uv.py`. This propagates the risk of arbitrary code execution from untrusted model repositories to the job submission process. If a user evaluates an untrusted model with this flag, it can lead to a severe compromise of the execution environment. While the flag defaults to `False`, its availability as a user-controlled option without prominent warnings in the primary `SKILL.md` documentation constitutes a critical risk. Add a prominent security warning in the `SKILL.md` documentation and the script's help text, explicitly stating the risks of using `--trust-remote-code` with untrusted models. Consider making this flag harder to enable or requiring explicit user confirmation for its use. Ensure that the underlying evaluation frameworks have robust sandboxing or isolation mechanisms when `trust_remote_code` is enabled.	Unknown	scripts/run_vllm_eval_job.py:130
HIGH	Unsafe deserialization / dynamic eval Decryption followed by code execution Remove obfuscated code execution patterns. Legitimate code does not need base64-encoded payloads executed via eval, encrypted-then-executed blobs, or dynamic attribute resolution to call system functions.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/evaluation_manager.py:15
HIGH	Dangerous call: subprocess.run() Call to 'subprocess.run()' detected in function 'main'. This can execute arbitrary code. Avoid using dangerous functions like exec/eval/os.system. Use safer alternatives.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py:94
HIGH	Dangerous call: subprocess.run() Call to 'subprocess.run()' detected in function 'run_inspect_vllm'. This can execute arbitrary code. Avoid using dangerous functions like exec/eval/os.system. Use safer alternatives.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py:111
HIGH	Dangerous call: subprocess.run() Call to 'subprocess.run()' detected in function 'run_inspect_hf'. This can execute arbitrary code. Avoid using dangerous functions like exec/eval/os.system. Use safer alternatives.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py:176
HIGH	Dangerous call: subprocess.run() Call to 'subprocess.run()' detected in function 'run_lighteval_vllm'. This can execute arbitrary code. Avoid using dangerous functions like exec/eval/os.system. Use safer alternatives.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py:106
HIGH	Dangerous call: subprocess.run() Call to 'subprocess.run()' detected in function 'run_lighteval_accelerate'. This can execute arbitrary code. Avoid using dangerous functions like exec/eval/os.system. Use safer alternatives.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py:170
HIGH	Dangerous call: subprocess.run() Call to 'subprocess.run()' detected in function 'create_eval_job'. This can execute arbitrary code. Avoid using dangerous functions like exec/eval/os.system. Use safer alternatives.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/run_eval_job.py:74
HIGH	Dangerous call: subprocess.run() Call to 'subprocess.run()' detected in function 'create_lighteval_job'. This can execute arbitrary code. Avoid using dangerous functions like exec/eval/os.system. Use safer alternatives.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py:128
HIGH	Dangerous call: subprocess.run() Call to 'subprocess.run()' detected in function 'create_inspect_job'. This can execute arbitrary code. Avoid using dangerous functions like exec/eval/os.system. Use safer alternatives.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py:180
HIGH	LLM analysis found no issues despite critical deterministic findings Deterministic layers flagged 11 CRITICAL findings, but LLM semantic analysis returned clean. This may indicate prompt injection or analysis evasion.	Unknown	(sanity check)
MEDIUM	Unsafe deserialization / dynamic eval Decryption followed by code execution Remove obfuscated code execution patterns. Legitimate code does not need base64-encoded payloads executed via eval, encrypted-then-executed blobs, or dynamic attribute resolution to call system functions.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/evaluation_manager.py:612
MEDIUM	Suspicious import: requests Import of 'requests' detected. This module provides network or low-level system access. Verify this import is necessary. Network and system modules in skill code may indicate data exfiltration.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py:36
MEDIUM	Suspicious import: requests Import of 'requests' detected. This module provides network or low-level system access. Verify this import is necessary. Network and system modules in skill code may indicate data exfiltration.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/scripts/evaluation_manager.py:62
MEDIUM	Unpinned Python dependency version Requirement 'huggingface-hub>=0.26.0' is not pinned to an exact version. Pin Python dependencies with '==<exact version>'.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/requirements.txt:2
MEDIUM	Unpinned Python dependency version Requirement 'python-dotenv>=1.2.1' is not pinned to an exact version. Pin Python dependencies with '==<exact version>'.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/requirements.txt:3
MEDIUM	Unpinned Python dependency version Requirement 'pyyaml>=6.0.3' is not pinned to an exact version. Pin Python dependencies with '==<exact version>'.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/requirements.txt:4
MEDIUM	Unpinned Python dependency version Requirement 'requests>=2.32.5' is not pinned to an exact version. Pin Python dependencies with '==<exact version>'.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/requirements.txt:5
MEDIUM	Unpinned Python dependency version Requirement 'markdown-it-py>=3.0.0' is not pinned to an exact version. Pin Python dependencies with '==<exact version>'.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/requirements.txt:6
MEDIUM	Unpinned Python dependency version Requirement 'inspect-ai>=0.3.0' is not pinned to an exact version. Pin Python dependencies with '==<exact version>'.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/requirements.txt:9
MEDIUM	Unpinned Python dependency version Requirement 'inspect-evals' is not pinned to an exact version. Pin Python dependencies with '==<exact version>'.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/requirements.txt:10
MEDIUM	Unpinned Python dependency version Requirement 'openai' is not pinned to an exact version. Pin Python dependencies with '==<exact version>'.	Unknown	/var/folders/1k/67b8r20n777f_xcmmm8b7m5h0000gn/T/skillscan-clone-oeqqwuqk/repo/skills/hugging-face-evaluation/requirements.txt:11
MEDIUM	Unpinned or Loosely Pinned Dependencies Several dependencies are either unpinned (no version specified) or loosely pinned (only minimum version specified, e.g., `>=0.26.0`). This can lead to non-deterministic builds, introduce breaking changes, or pull in vulnerable versions of libraries without explicit review. Specifically, `inspect-evals`, `openai`, and `pyyaml` are unpinned in some contexts, while many others use `>=` operators. Pin all dependencies to exact versions (e.g., `package==1.2.3`) or use a strict range (e.g., `package~=1.2.3`) to ensure deterministic and secure builds. Regularly update and review dependencies for known vulnerabilities.	Unknown	scripts/inspect_eval_uv.py:3
MEDIUM	Unpinned or Loosely Pinned Dependencies Several dependencies are either unpinned (no version specified) or loosely pinned (only minimum version specified, e.g., `>=0.26.0`). This can lead to non-deterministic builds, introduce breaking changes, or pull in vulnerable versions of libraries without explicit review. Specifically, `inspect-evals`, `openai`, and `pyyaml` are unpinned in some contexts, while many others use `>=` operators. Pin all dependencies to exact versions (e.g., `package==1.2.3`) or use a strict range (e.g., `package~=1.2.3`) to ensure deterministic and secure builds. Regularly update and review dependencies for known vulnerabilities.	Unknown	scripts/inspect_vllm_uv.py:3
MEDIUM	Unpinned or Loosely Pinned Dependencies Several dependencies are either unpinned (no version specified) or loosely pinned (only minimum version specified, e.g., `>=0.26.0`). This can lead to non-deterministic builds, introduce breaking changes, or pull in vulnerable versions of libraries without explicit review. Specifically, `inspect-evals`, `openai`, and `pyyaml` are unpinned in some contexts, while many others use `>=` operators. Pin all dependencies to exact versions (e.g., `package==1.2.3`) or use a strict range (e.g., `package~=1.2.3`) to ensure deterministic and secure builds. Regularly update and review dependencies for known vulnerabilities.	Unknown	requirements.txt:8
MEDIUM	Unpinned or Loosely Pinned Dependencies Several dependencies are either unpinned (no version specified) or loosely pinned (only minimum version specified, e.g., `>=0.26.0`). This can lead to non-deterministic builds, introduce breaking changes, or pull in vulnerable versions of libraries without explicit review. Specifically, `inspect-evals`, `openai`, and `pyyaml` are unpinned in some contexts, while many others use `>=` operators. Pin all dependencies to exact versions (e.g., `package==1.2.3`) or use a strict range (e.g., `package~=1.2.3`) to ensure deterministic and secure builds. Regularly update and review dependencies for known vulnerabilities.	Unknown	scripts/test_extraction.py:3

Scan History

Embed Code

[![SkillShield](https://skillshield.io/api/v1/badge/4abc65bc713a48e0.svg)](https://skillshield.io/report/4abc65bc713a48e0)