Hi Glean team ๐
Can you share where multi-modality is on the Glean roadmap?
Multi-modal inputs (like images) have been supported in ChatGPT for over two years, and many users (myself included) are now accustomed to pasting screenshots or images directly into LLM applications. In ChatGPT, I believe this is common partially because of lack of integration with other systems, but even in Glean, I find myself frequently wanting to paste a screenshot at times where there isn't a proper export option available. This could even be a recently modified Glean indexed document that hasn't been re-synced. The inability to process images in Glean Chat/Agents feels surprising, given that the underlying model is multi-modal and designed to handle unstructured data like this natively.
Beyond convenience, multi-modal outputs would unlock a wide range of valuable content creation use cases like marketing content creation or technical diagrams.
I've noticed a number of related posts in the community like this and this, so it seems to be a common need. Could you share your vision and plans for enabling multi-modal support in Glean Chat/Agents?