Project: Conversion from TeXmacs format to HTML and docx

For Students

  • Expected Code Percentage
    • Scheme Code >= 80%
    • C++ Code <= 20%
  • Selection Criteria
    1. One pull request to add C++ unit tests to the Mogan repo
    2. One pull request to add Scheme unit tests to the Mogan repo
    3. The students must implement pandoc binary plugin and docx data plugin by themselves to prove that he can complete the project

Project Info

  • Name: Conversion from TeXmacs format to HTML and docx
  • Mentor: @darcy
  • Difficulty: Basic

Project Description

Exporting to docx format for TeXmacs documents is a frequently requested feature from TeXmacs users. We can use pandoc to achieve it.

In this project, we need to implement conversion for TM format -> HTML format -> DOCX format. The TM->HTML conversion has been implemented. And the HTML->DOCX could be completed by pandoc. And you need to improve the TM->HTML quality to improve the TM->DOCX quality.

The documents in Xmacs Planet could be served as the testing resources. CICD should be setup for the Xmacs Planet.

Project Notes

Project

  • Implement a pandoc binary plugin (make it work on Linux/Windows/macOS)
    • (find-binary-pandoc)
    • (version-binary-pandoc)
    • (has-binary-pandoc)
  • Implement a docx data plugin
    • define conversion from HTML to DOCX
    • File->Export->DOCX
  • Improve TM2HTML conversion
    • Minimal reproducer and unit tests
  • CICD for the planet

Project Technical Requirements

  • Scheme, xmake, C++
  • Understanding the data plugin in Mogan
  • Understanding the binary plugin in Mogan

Project Repository

https://mogan.app/guide/OSPP_2024_HTML.html

Now, the project is submitted to https://mogan.app.

1 Like

@darcy what if I told you that there is better tool then pandoc? Would you consider it or pandoc is a firm desision?

Before @darcy answers, here is my opinion. The benefit of conversion to Pandoc is that this makes available many other formats; moreover, if TeXmacs gets listed on the Pandoc webpage (https://pandoc.org/), it would be good and automatic advertisement for TeXmacs.

1 Like

pandoc is not the core part in this project. The student just needs to invoke pandoc and convert HTML to docs. Please tell me if there are better tools which are better than pandoc for HTML2DOCX conversion.

@darcy For exporting to docx from Mogan, I believe Wordinator would be the ideal tool. It’s straightforward, it is working by converting HTML to docx [Mogan HTML export can be used], and offers customization options. Additionally, it can export MathML to oMath, which is utilized by Word and is essential for Mogan. Please note that it requires JAVA. I hope you find information in this paper useful: https://www.balisage.net/Proceedings/vol25/print/Kimber01/BalisageVol25-Kimber01.html paper is written before they implemented math conversion and the code is here https://github.com/drmacro/wordinator

Update: this link is more pleasent for reading: https://www.balisage.net/Proceedings/vol25/html/Kimber01/BalisageVol25-Kimber01.html

1 Like

Additionally, for docx import, I can suggest two or three options that, in my opinion, are most practical [Not from a programming standpoint, but in terms of the features covered].
Over the past two years, for my usecase, I’ve extensively researched online and have developed a comprehensive understanding of the available options. I hope this is acceptable.

As for Pandoc, its limitation lies in its generality and lack of support for certain key features, such as nested tables, both in import and export.