It seems like with the current progress in ML models, doing OCR should be an easy task. After all, recognizing handwritten numbers was one of the prime benchmarks for image recognition (MNIST was released in 1994).

Yet, when I try to OCR any of my handwritten notes all I ever get is a jumbled mess of nonsense. Am I missing something, is my handwriting really that atrocious or is it the models?

Here’s a quick example, a random passage from a scientific article:

I tried EasyOCR, Tesseract, PPOCR and a few online tools. Only PPOCR was able to correctly identify the numbers and the words “J.” and “Chem.”. The rest is just a random mess of characters.

  • Atomic@sh.itjust.works
    link
    fedilink
    arrow-up
    0
    ·
    edit-2
    9 hours ago

    You seriously need to work on your handwriting. I’m impressed OCR can make out anything at all from that.

    This isn’t a OCR problem. This is a you problem. I’m human and I can only make out a few words.

    Edit. Assuming it’s yours. Or is this from the scientific article? Regardless. Whoever wrote that needs to go back to third grade and redo their writing exercises.