Implement import OCR using PromptAPI with embedded Gemini Nano · beancount/fava · Discussion #2087

mescanne
Jul 31, 2025

There is an emerging standard for GenAI being available through Javascript in browsers. See https://github.com/webmachinelearning/prompt-api and https://developer.chrome.com/docs/ai/built-in. It is only on Chrome today, but has been announced for other platforms. (See https://learn.microsoft.com/en-us/microsoft-edge/web-platform/prompt-api)

It is currently only for extensions and only for machines that have the right hardware (https://developer.chrome.com/docs/ai/prompt-api#hardware-requirements), but I believe this includes any m-series Mac.

It also includes images and the Gemini Nano capability, from what I've heard, is quite capable of OCR.

This opens the possibility of having PDF-based statements being OCR'd in the browser.