Project: Syntax highlighting for programming language

For Students

  • Expected Code Percentage
    • C++ Code == 100%
  • Selection Criteria
    1. One pull request to add C++ unit tests to the Mogan repo
    2. The students must implement highlight facility and at least one parser plugin of programming language by themselves to prove that he can complete the project

Project Info

  • Name: Syntax highlighting for programming language
  • Mentor: @jingkaimori
  • Difficulty: Medium to Extremely hard

Project Description

Highlighting code of programming language is a frequently used feature from TeXmacs users. But currently this feature is not implemented properly.

In this project, we need to implement parsing given code (textual) into Abstract Syntax Tree, and highlight code from AST. Tree-sitter is selected because it is the most commonly used parser library in C/C++, including neoVim. However, tree-sitter library recognize only pre-compiled C function as language definition, packing language definition as shared library is necessary.

Further improvement and new feature can be implemented with generated AST, but these work is not included because of difficulty and workload.

Test case should be provided by student. CICD may be hard to implement, thus manually test should be performed.

Project Notes



  • Provide parser of language as plugin.
    • Import parser generated by tree-sitter as c source code.
  • Enhance current highlight mechanism, introducing tree-sitter library feature
    • Parse given code (textual) into Abstract Syntax Tree
    • Highlight code from AST, with or without current highlight_observer.

Project Technical Requirements

  • xmake, C++
  • understanding shared library loading process
  • understanding highlight mechanism of TeXmacs, noticeable api:
    • attach_highlight
    • highlight_observer
    • observer_rep::set_highlight
    • language
      • language::has_highlight
      • language::highlight
      • language::get_color

See also

I think in the project, C++ is still needed at least to glue the tree-sitter library.

Remove this line. There are few code snippets in the planet repo.

A typo, should be C++ == 100%

Will you provide the design of the syntax color system?

How to customize the color schema for a specific programming language?

I guess the semantic keys in tree-sitter are different with the TeXmacs built-in one. If the design is left to students, we’d better point out it explicitly.

Tree-sitter must be one of the semantic editing and syntax highlighting implementations. We should keep the interface and impl separated. That’s an important principle. We prefer mature solutions than choices in Mogan, and we still need to reserve the interface.

Color is obtained from language::get_color method, as seen in concater_rep::typeset_prog_string.

A new question: Is it necessary to unify usage of highlight_observer and language::get_color?

I don’t know, let us leave the design to students :slight_smile: