Hermes Agent fails to transmit images, so don't doubt the clipboard. The official Vision documentation is straightforward: images are sent to the model as a base64 content block, so the model itself must support visual input. If the current master model is a plain text model, or if you point the auxiliary vision to an endpoint that does not support images, it will fail.
How to check first
- Confirm that the image has been successfully pasted, and you can see the attachment prompt in the interface.
- Change to a provider test that confirms that vision is supported.
- Check if
auxiliary.vision.provider,model,base_urlare misconfigured. - If you use
provider: main, the master model must also be a multimodal model.
The official configuration document also reminds that Codex OAuth supports vision in the auto-detection chain; Custom endpoints must be guaranteed to be compatible.
In a word: the failure of image transmission is mostly not "the picture is not pasted", but the auxiliary vision model does not support the image.
Official open source address: https://github.com/NousResearch/hermes-agent; Official document entry: https://hermes-agent.nousresearch.com/.