You will see that I have posted about this before asking for suggestions on which software I can use to convert PDF to docx/odt.

I am a teacher. During my time as a researcher I wrote a lot of documents and regularly draw upon them to teach my students. I often have to take the text, modify them, or build upon them. A lot of my material is bound up in PDFs. Sometimes, I have grant applications to write where a previous draft I wrote was stored as a PDF. Converting them to text has become the bane of my life.

I am forced to use online tools because none of the software I have seem to do the trick. Lot of people keep saying pandoc. Pandoc does not convert PDF to any other format. It can only be the output format.

Is there a magic open source solution that I have missed out?

  • Botzo@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    6 months ago

    https://pdf2docx.readthedocs.io/ seems to fit the bill. I can’t vouch for it.

    PDF is such a curse. I say this as a person currently tasked with deploying new mysteriously complex enterprise PDF conversion software for technical documents. The rabbit hole is so deep.

    • Treczoks@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      6 months ago

      It is not a curse. It does exactly what it is intended to do: Create an archive of a document that is universally reproduceable.

      It is a very well designed cul-de-sac for exactly this purpose. Using it for anything else is calling for trouble.