When the MiniMax Vision error is reported in Hermes Agent, first confirm that you are not using a "text-only" model. The official Vision and Configuration documents emphasize that image analysis must go to a model that supports multimodal, and if you set auxiliary.vision.provider to main, then your main model must also really support image input.
The fastest investigation
- Start by testing a model that confirms that it supports vision, such as a vision model on Codex OAuth or OpenRouter.
- Check if
auxiliary.vision.providerandauxiliary.vision.modelpoint to the wrong model. - If you use a custom endpoint, confirm that it really accepts OpenAI-style image content blocks.
Does MCP's understand_image automatically take over?
Don't understand it that way. Hermes' native vision/browser_vision uses its own auxiliary model configuration, and does not automatically replace the underlying vision model just because an MCP server has understand_image.
In a word: If MiniMax Vision fails, first check whether the auxiliary vision model is really available, instead of just looking at the provider name.
Official open source address: https://github.com/NousResearch/hermes-agent; Official document entry: https://hermes-agent.nousresearch.com/.