I'm building an agent to extract name, company, and title from photos of badges that people took after participating in an event. I've setup a basic agent that (1) reads the file (2) Responds with the 3 info. However, it's not really able to read the files and it breaks on the first step.
Would love to have the help from those who have already built image-to-text agents.