Security Audit

azure-ai-evaluation-py

github.com/openclaw/skills

AI SkillCommit 13146e6a3d46

CAUTION

Scanned 2 months ago

Critical

Immediate action required

High

Priority fixes suggested

Medium

Best practices review

Low

Acknowledged / Tracked

Trust Assessment

azure-ai-evaluation-py received a trust score of 65/100, placing it in the Caution category. This skill has some security considerations that users should review before deployment.

SkillShield's automated analysis identified 2 findings: 1 critical, 1 high, 0 medium, and 0 low severity. Key findings include Prompt Injection via Untrusted Data File, Arbitrary File Read via User-Controlled Data Path.

The analysis covered 4 layers: Manifest Analysis, Static Code Analysis, Dependency Graph, LLM Behavioral Safety. The LLM Behavioral Safety layer scored lowest at 55/100, indicating areas for improvement.

Last analyzed on February 13, 2026 (commit 13146e6a). SkillShield performs automated 4-layer security analysis on AI skills and MCP servers.

Layer Breakdown

Manifest Analysis

100%

Static Code Analysis

100%

Dependency Graph

100%

LLM Behavioral Safety

55%

Behavioral Risk Signals

Filesystem Write

2 findings

Shell Execution

2 findings

Dynamic Code

2 findings

Security Findings2

Severity	Finding	Layer	Location
CRITICAL	Prompt Injection via Untrusted Data File The `scripts/run_batch_evaluation.py` script accepts a user-controlled JSONL file path via the `--data` command-line argument. This file's contents (specifically `query`, `context`, `response` columns as defined in `column_mapping`) are directly fed into AI-assisted evaluators (e.g., `GroundednessEvaluator`, `RelevanceEvaluator`). These evaluators are initialized with `model_config` containing Azure OpenAI API keys. An attacker providing a malicious JSONL file can inject instructions into the prompts sent to the underlying Azure OpenAI model, leading to LLM manipulation, unintended actions, data exfiltration through the LLM, or generation of harmful content. Implement robust input sanitization and validation for all data fields (`query`, `context`, `response`) that are fed into LLM prompts. Consider using a separate, less privileged LLM for initial content moderation or input validation. Implement strict output filtering for LLM responses. For critical applications, consider using a 'two-LLM' architecture where one LLM sanitizes inputs for another.	LLM	scripts/run_batch_evaluation.py:200
HIGH	Arbitrary File Read via User-Controlled Data Path The `scripts/run_batch_evaluation.py` script takes a user-controlled file path via the `--data` command-line argument. This path is directly passed to the `azure.ai.evaluation.evaluate` function, which is expected to read the contents of the specified file. An attacker could provide a path to an arbitrary sensitive file (e.g., `/etc/passwd`, `/app/secrets/api_key.json`) on the system. If the evaluation results are subsequently logged or outputted (e.g., using the `--output` argument), the sensitive data from the arbitrary file could be exfiltrated. Implement strict validation and sanitization of the `data` file path. Restrict file access to a specific, sandboxed directory or require files to be within a designated, user-controlled input directory. Ensure that the `evaluate` function or its wrapper has safeguards against reading arbitrary files outside of an allowed scope.	LLM	scripts/run_batch_evaluation.py:190

Scan History

Embed Code

[![SkillShield](https://skillshield.io/api/v1/badge/56f73170faf0fd37.svg)](https://skillshield.io/report/56f73170faf0fd37)