Project: Conversion from TeXmacs format to HTML and docx

For Students

  • Expected Code Percentage
    • Scheme Code >= 80%
    • C++ Code <= 20%
  • Selection Criteria
    1. One pull request to add C++ unit tests to the Mogan repo
    2. One pull request to add Scheme unit tests to the Mogan repo
    3. The students must implement pandoc binary plugin and docx data plugin by themselves to prove that he can complete the project

Project Info

  • Name: Conversion from TeXmacs format to HTML and docx
  • Mentor: @darcy
  • Difficulty: Basic

Project Description

Exporting to docx format for TeXmacs documents is a frequently requested feature from TeXmacs users. We can use pandoc to achieve it.

In this project, we need to implement conversion for TM format -> HTML format -> DOCX format. The TM->HTML conversion has been implemented. And the HTML->DOCX could be completed by pandoc. And you need to improve the TM->HTML quality to improve the TM->DOCX quality.

The documents in Xmacs Planet could be served as the testing resources. CICD should be setup for the Xmacs Planet.

Project Notes

Project

  • Implement a pandoc binary plugin (make it work on Linux/Windows/macOS)
    • (find-binary-pandoc)
    • (version-binary-pandoc)
    • (has-binary-pandoc)
  • Implement a docx data plugin
    • define conversion from HTML to DOCX
    • File->Export->DOCX
  • Improve TM2HTML conversion
    • Minimal reproducer and unit tests
  • CICD for the planet

Project Technical Requirements

  • Scheme, xmake, C++
  • Understanding the data plugin in Mogan
  • Understanding the binary plugin in Mogan

Project Repository

https://mogan.app/guide/OSPP_2024_HTML.html

Now, the project is submitted to https://mogan.app.

1 Like

@darcy what if I told you that there is better tool then pandoc? Would you consider it or pandoc is a firm desision?

Before @darcy answers, here is my opinion. The benefit of conversion to Pandoc is that this makes available many other formats; moreover, if TeXmacs gets listed on the Pandoc webpage (https://pandoc.org/), it would be good and automatic advertisement for TeXmacs.

1 Like

pandoc is not the core part in this project. The student just needs to invoke pandoc and convert HTML to docs. Please tell me if there are better tools which are better than pandoc for HTML2DOCX conversion.

@darcy For exporting to docx from Mogan, I believe Wordinator would be the ideal tool. It’s straightforward, it is working by converting HTML to docx [Mogan HTML export can be used], and offers customization options. Additionally, it can export MathML to oMath, which is utilized by Word and is essential for Mogan. Please note that it requires JAVA. I hope you find information in this paper useful: https://www.balisage.net/Proceedings/vol25/print/Kimber01/BalisageVol25-Kimber01.html paper is written before they implemented math conversion and the code is here https://github.com/drmacro/wordinator

Update: this link is more pleasent for reading: https://www.balisage.net/Proceedings/vol25/html/Kimber01/BalisageVol25-Kimber01.html

1 Like

Additionally, for docx import, I can suggest two or three options that, in my opinion, are most practical [Not from a programming standpoint, but in terms of the features covered].
Over the past two years, for my usecase, I’ve extensively researched online and have developed a comprehensive understanding of the available options. I hope this is acceptable.

As for Pandoc, its limitation lies in its generality and lack of support for certain key features, such as nested tables, both in import and export.

Here is the TeXmacs notes related to this project:
https://texmacs.github.io/notes/docs/tm-and-html.html

1 Like

IMO a Pandoc export filter is important because

  • it is a way to export quickly into many formats
  • if the TeXmacs to Pandoc filter is published in the Pandoc webpage, it is good advertisement for TeXmacs.

Hello darcy, I’m sorry that I learned about OSPP quite late. This is the only project I applied for.
I would like to ask if there is still a chance for me to participate in this project? :pleading_face:If not, it’s okay, perhaps I will prepare earlier next year~

Here is my sample unit test pr: https://github.com/XmacsLabs/mogan/pull/1889

1 Like

Next year! Welcome to join us as a Developer!

Thank you for replying, hope to see you next year~

Conversion to docx is available in Mogan v1.2.9.2. And HTML export has been improved a lot.

see https://mogan.app/guide/plugin_data_docx.html

Thanks for the contributions from the student. (I act as a mentor in this project)

2 Likes

I see that a Pandoc export filter exists now. I think it would be strongly beneficial to Mogan and TeXmacs to make it appear in the list at the Pandoc website.

Hi, Rachel, if you want to join the OSPP event. I suggest that you should contribute to the Goldfish Scheme project NOW!

I have successfully assigned onboarding tasks to people who are interested in it but does not have a strong Computer Science background. If you are major in Computer Science, it would be super easy for you!

Hi, thank you for remembering my interest in joining the OSPP event!:blush: I appreciate the suggestion and will start learning about the Goldfish Scheme project soon~

1 Like