* fix: allow Unicode in sanitize_name() — Latvian, CJK, Cyrillic names (#637) _SAFE_NAME_RE was ASCII-only ([a-zA-Z0-9]), rejecting valid Unicode names like "Jānis" or "太郎". Changed to \w which matches Unicode word characters (letters, digits, underscore) in Python 3. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: tighten Unicode regex, add sanitize_name tests Use [^\W_] for first/last char to allow Unicode letters/digits but reject leading/trailing underscores (Copilot feedback). Add 7 tests covering Latvian, CJK, Cyrillic, path traversal, and edge cases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
+1
-1
@@ -16,7 +16,7 @@ from pathlib import Path
|
||||
# in file paths, SQLite, or ChromaDB metadata.
|
||||
|
||||
MAX_NAME_LENGTH = 128
|
||||
_SAFE_NAME_RE = re.compile(r"^[a-zA-Z0-9][a-zA-Z0-9_ .'-]{0,126}[a-zA-Z0-9]?$")
|
||||
_SAFE_NAME_RE = re.compile(r"^(?:[^\W_]|[^\W_][\w .'-]{0,126}[^\W_])$")
|
||||
|
||||
|
||||
def sanitize_name(value: str, field_name: str = "name") -> str:
|
||||
|
||||
Reference in New Issue
Block a user