benchmarks: add --llm-backend ollama for non-Anthropic rerank
The rerank pipeline was hardcoded to Anthropic's /v1/messages.
Add a backend flag so the same code path can be exercised with
any OpenAI-compatible endpoint — local Ollama, Ollama Cloud,
or any gateway that speaks /v1/chat/completions.
Enables independent verification of the "100% with Haiku rerank"
claim by running the full benchmark with a different LLM family
(e.g. minimax-m2.7:cloud) and zero Anthropic dependency.
Both longmemeval_bench.py and locomo_bench.py:
- llm_rerank*() gain backend= / base_url= kwargs
- CLI: --llm-backend {anthropic,ollama}, --llm-base-url
- API key required only when backend=anthropic (diary/palace modes still require it)
- Parse last integer in response (reasoning models emit multi-int output)
- Fallback to message.reasoning when content is empty
- Raise max_tokens to 1024 for reasoning models
This commit is contained in:
@@ -1239,7 +1239,7 @@ dev = [
|
||||
[package.metadata]
|
||||
requires-dist = [
|
||||
{ name = "autocorrect", marker = "extra == 'spellcheck'", specifier = ">=2.0" },
|
||||
{ name = "chromadb", specifier = ">=0.5.0,<0.7" },
|
||||
{ name = "chromadb", specifier = ">=0.5.0" },
|
||||
{ name = "psutil", marker = "extra == 'dev'", specifier = ">=5.9" },
|
||||
{ name = "pytest", marker = "extra == 'dev'", specifier = ">=7.0" },
|
||||
{ name = "pytest-cov", marker = "extra == 'dev'", specifier = ">=4.0" },
|
||||
|
||||
Reference in New Issue
Block a user