Ask HN: OCR framework for extracting formatted text


Ask HN: OCR framework for extracting formatted text?
133 pointsbycrocodiletears1 day ago|hide|past|web|favorite|42 comments
I’m a serial information hoarder, and often use screenshots in order to store comments, passages and fragments of conversations I find useful or insightful. This works well if I want to reference something recent, but obviously doesn’t scale well. I’d like to integrate these into my personal archive, but don’t know any frameworks (preferably for Go, Node, or Python) which could automatically extract the text from the images while retaining its formatting. I’m not against doing some image preprocessing myself, but I don’t feel comfortable passing the images to a 3rd party service, since a portion of the images contain private or sensitive information that I can’t readily sort out of my collection.
Guidelines
|FAQ
|Support
|API
|Security
|Lists
|Bookmarklet
|Legal
|Apply to YC
|Contact

Read More