CodeBlockColorizer

This page is about syntax highlighting, colorizing pre sections and inlining attachments with parsing (i.e. colorization). Based on Taesu Pyo's BaseParser

Integration in moin--main--1.3

TLA Branch: ograf@bitart.de--2004-local/moin--colorize--1.3
Live Examples: Index
ColorizeTest
FormatTest
InlineTest

TODO (only in the branch):

adapt CSS style names to those of XEmacs font-lock-mode
make some nice CSS for the default themes
add a heading_shift argument to Include?

MERGED (to moin--main--1.3):

backward compability for #!python
add CodeBlockColorizer
add Taesu Pyo's BaseParser
change the whole thing to use <span> and CSS to format the code
Languages like Pascal need a ignore case switch in BaseParser
JavaScript switchable line numbers (default is no numbers)
enable numbers from the beginning, optional start and step numbering parameters
add extra arguments to the #format pi. everything after the first word (parser module to import) gets passed as format_args keyword argument to the __init__ method of the Parser class. FormatTest
extend parsers so they know what files they can handle (for inline:)
finishing touches on CSS
add code_area, code_line and code_token formatter methods to base
change parsers to parse attributes with wikiutil.parseAttributes
- accepted attributes: start=Number, step=Number, numbers=on|off
if numbering is set to off, don't show the numbers initially but give the links to activate them
add new startContent and endContent methods to formatters to get rid of the content div for text_plain
fix text_plain list rendering (here? or in moin--main--1.3?)
added default Parser.extensions handling: just use the string '*' to mark the parser as fallback handler
cache extension to parser mapping in request.cfg (this is currently no caching)
added starshine code_area CSS
fixed cplusplus Preprc parsing and multiline spanning syntax display
added extra parsers attribute value numbers=on|off|disable
- disable does not show any numbers or JS numbering links
make div IDs unique
- fixes MoinMoinBugs/ContentDivProblems
macro improvements
- FootNote (allow wiki markup)
- Include (MoinMoinBugs/IncludeNotCacheAware)
- TableOfContents (MoinMoinBugs/TableOfContentsIgnoresIncludedHeadings)
escaping needs to be done in the formatter
fix content div top/bottom anchors (they are open)
heading IDs for included pages sometimes do not match (no unique for auto-headers)
fix back links of recursive Includes. See MoinMoinBugs/RecursiveIncludeBacktoIsWrong
$/!\$ recursive or multiple Includes will screw up a TableOfContents. We don't support this at this time. see below.
fix MoinMoinBugs/TableOfContentsBreakOnExtraSpaces
fixed code_area linenumber switch ID problem
fixed test_parser_wiki (did not reset request before parse)

Questions & Thoughts:

the current change adds an extensions list attribute to the Parser class. This enables the inline code to pick a parser for an file extension. Using {{{ and #format Colorize lets you select a parser with extra arguments. But a thing like the VimColor parser, which can do huge amounts of syntax highlighting needs some fallback config (use this if nothing else will handle). Or not?
- following approach sounds nice:
  1. check if the inline: statement specifies a parser to use (inline:FILENAME:PARSER)
  2. if not, try to detect the parser by extension
  3. if not, try to use a configured default_inline_parser
  4. error (currently inline is ignored in that case, but we should show that something failed)
    - what about displaying a link to the attachment

Alternate Highlighting

Taesu Pyo's BaseParser from ParserMarket is a possibility (and is currently used in the branch). But it is kind of hard labor to do all those parsers for all those formats out there. So why not use some existing syntax highlighting engine or definition?

Reuse highlighting from some other project

write a compiler/converter/interpreter for vim's syntax files. This would give instant 500+ syntax highlightings...
- vim's pattern (regex) syntax is different from the python re syntax. Here is a comparison to perl re syntax.
  - the two are quite different. not easy to convert for all possible operators/patterns. I'll concentrate on getting CBC using Taesu Pyo's implementation done, a vim syntax parser/formatter can be added later -- OliverGraf 2004-04-25 14:33:25
- currently porting Text::VimColor to python -- OliverGraf 2004-04-26 06:18:58
  - basically working. finishing touches tomorrow -- OliverGraf 2004-04-26 21:10:24
  - oh, yes, I should work on this again... -- OliverGraf 2004-07-22 09:40:35
    - VimColorTest -- OliverGraf 2004-07-22 11:57:37
GNU source-highlighter external program, fast, not many syntax defs, hard to customize
SilverCity it supports 20+ languages.

CSS for syntax coloring

Here is a module to manage the css styles for syntax coloring: css.py

The example classes inside the python are the classes and subclasses vim uses. They are by no means a fixed thing and only to visualize the intended structure. See below for a discussion about what CSS classes moin should support for code areas.

The classes are based on Oliver version, made by joining X/Emacs and Vim definitions, but its very bad as is. A lot of the styles are useless and we have to group them in a way that make more sense, like put all stuff that usually displayed using the same color into one group.

We can duplicate the behavior of mature applications like Emacs and Vim, or try to improve and simplify. I know that I use only 4-5 colors and I don't need most of the fine grain control that those apps have. Code with too many harsh colors simply does not look good and hard to read. Less is more in this case.

The benefit of a simple structure is that we can add many simple colorizers that use only the main classes both for parsing the code and for formating. Basic support for a lot of languages is better than great colorizer for the language that you don't need.

Thats true. And it's the same as vim does implement it (I'm no vim user or fanatic, just in case someone asks, I'm all for XEmacs ). Vim has a set of generic colorization classes. All syntax colorizers implement their own syntax 'names', but also provide a mapping to the basic classes, so you go well with configuring just these. I now managed to get the vim colorizer to output both names which looks like:

<pre>
<span class="LineNumber"></span><span class="Type diffNewFile">--- orig/MoinMoin/request.py</span>
<span class="LineNumber"></span><span class="Type diffFile">+++ mod/MoinMoin/request.py</span>
<span class="LineNumber"></span><span class="Statement diffLine">@@ -1141,7 +1141,7 @@</span>
<span class="LineNumber"></span><span class="Text"> </span>
<span class="LineNumber"></span><span class="Text">             @param header: string, containing valid HTTP header.</span>
<span class="LineNumber"></span><span class="Text">         """</span>
<span class="LineNumber"></span><span class="Special diffRemoved">-        key, value = header.split(':',1)</span>

<span class="LineNumber"></span><span class="Identifier diffAdded">+        key, value = header.encode(config.charset).split(':',1)</span>
<span class="LineNumber"></span><span class="Text">         value = value.lstrip()</span>
<span class="LineNumber"></span><span class="Text">         if key.lower() == 'content-type':</span>
<span class="LineNumber"></span><span class="Text">             # save content-type for http_headers</span>
<span class="LineNumber"></span>
<span class="LineNumber"></span>
<span class="LineNumber"></span>
</pre>

As you can see the diff syntax has the special names diffRemoved and diffAdded which get mapped to Special and Identifier color classes, just to get the display different.

I think this is a very good way to implement this, cause it makes it useable for everyone using just the basic colorization classes. But if you are a ruby programmer (don't know if ruby syntax uses specials, just n example) you could add some extra css to make use of those special features, without the need of changing the code. -- OliverGraf 2004-07-25 06:32:18

Maybe something like this - feel free to improve this list:

First, the default group, that use the default font and color of code fragments:

I prefer to not color these items, and color only the most important items, like language keywords. I think it looks better and easier to read. By marking these with a class, anyone will be able to customize his wiki css by adding rules for these classes. If you think that one of these should be in the "names" group in moin default coloring, please move it to the names group. These are not subclass of a "default" class, since they inherit the default style from the code div/table/pre item
operators - +, -, &, {, (
function - like in def myFunction():
class - like in class MyClass:
variable like in total = 25; avg = total / num
reference - I guess this means pointer to something like &var in C?

Below are classes and their sub classes that use different colors:

comment (I use dark green for these)
- I don't see how we can add ignore/todo and like, they are all just comment in many format: Todo/todo/To do/ etc.
literal - data that you define in the program (I use dark magenta for some of theses)
- float - 1.0 (float, double, etc.)
- integer - 1 or 'c' in C (int, long, long long etc.)
- boolean - like True
- string - "like this str" or u'like this unicode'
- docstring - it can be under comment, but its both a comment and program data
names - names used in the language and standard libraries
- keyword like class, def, print, if, while, is, not in python, html, body, in HTML etc.
- exception - built in exceptions like ValueError - its very handy and prevents errors like UnpicklingError...
- builtin - names that are not part of the language, but are part of the standard libraries, like range, min, max etc.
preprocessor - this is relevant only to compiled code
- include
- define - #def, #undef etc.
- macro - is this not the same as define?
- conditionals - #ifdef etc.
debug - I'm not sure if these are relevant for non interactive syntax coloring
- error
- warning
diff - this is very important, so we can view patches inlined in pages.
- added/new - lines with +
- removed/old - lines with -
meta - except line numbers, I don't know if any of these is relevant
- line numbering
- invisible-characters - like \r, \n, spaces, tabs etc.
- breakpoints - I don't know if that relevant
- bookmarks - same as breakpoints

Here is an example python code using default color scheme based on this list: color-test.html

TOC & Include

Problems with multiple or recursive Includes

Recursive and multiple Includes are a pain in the a**. They never can work with caching (in case of full page includes) cause the second include will get the same heading ids as the first. Solution: prefix ids with content-ID (its unique per default in this branch) and disable caching for ALL includes.

To make this more clear: there is only one cached copy of a page. This is the whole page, so from/to includes can't use the cache. If a page includes an other page multiple times, at least the second include will use the cached copy, inclusive all IDs (cause they are passed as arguments to the formatter). So double includes always break IDs if caching is used.

A possible solution is to put ID generation into the formatter. The text_python formatter has to uniquify the IDs, so the cached copy will always output unique IDs. But this will make everything harder for Include & TOC, cause they have to use the same method to generate the hrefs to the headings...