These notes represent FlorianFesti & LionKimbro's work making CommunityWiki:MachineCodeBlocks. They are hosted here on the MoinMoin wiki because this wiki is regularly read by FlorianFesti. (I understand that this is probably okay with ThomasWaldmann.)

For more on machine code blocks and past notes, see CommunityWiki:MachineCodeBlocks.


Machine Code Blocks Specification

Abstract: The "machine code blocks" format is a way to encode key-value pairs on wiki pages and within text files. The format includes the ability to link across the web, and is made to work well with most wiki.

Format

A page can contain several machine code blocks.

The block is written in the following form:

MACHINECODEBLOCK ::= START LINE* END
START ::= "BEGINBLOCK"
END ::= "ENDBLOCK"

LINE ::= WS KEY WS VALUE

VALUE ::= DELIM1 PLAINVALUE DELIMEND WS
        | DELIM2 MASCHINECODEBLOCK WS DELIMEND
        | DELIM3 RAWHTMLVALUE DELIMEND

KEY ::= ([:alphanum:]|_)*

DELIM1 ::= ":"
DELIM2 ::= "?"
DELIM3 ::= "$"

DELIMEND ::= ";"

WS ::= (\w*)

PLAINVALUE ::= ([^;]|;;)* 

RAWHTMLVALUE ::= ((<[^>]*?>|[^;]|;;)*)

It gets problematic if you want to directly use the wiki output as HTML fragment. The problem is that you don't know how to treat the HTML you find there. We could introduce a special marker that allows to use the HTML directly without unquoting.

XXX/TODO: Rewrite HTML mode to fit this.

We talked about it but it doesn't say it here- in RAWHTMLVALUE, semicolons that are part of entities are interpreted as part of the HTML- the semicolon in an entity is not a delimiter.

Page Processing

HTML Mode

Idea: Strip out most XML/HTML tags, because wiki (and other engines) put in a lot of tags.

There are two modi:

A MCB parser MUST check if the document is HTML or plain text. If it is HTML it must use parse the HTML file und start in the "unquoting" mode. In plain text files raw mode must be used. "RAWHTMLVALUE"s are always parsed in raw mode. In HTML documents the parser must switch back to quoting mode after the value ended.

In both modi two directly following semicolons (";;") are treated as one semicolon within the value. Semicolons that are part of HTML tags or entities are seen as part of these and therefore must not be treated as end delimiter and also cannot be part of two following semicolons.

<!> Is there a difference between raw_mode and plain text? rawmode should do HTML processing, Text mode not?

Consult unicode tables to identify whitespace, alphanum.

Interpretation

Each key is bound to a list of strings. Key definitions append strings to the end of the list. XXX Blocks!

Machine Code Block Schema

After some thought and a deeper look at RDF I think we should not create an own schema format. Schemas are always complicated, hard to read and write. The schema format we could invent wouldn't be much easier than RDF schema. So I propose to use RDF schema (RDFS) internally. see /RdfIntegration for details. -- FlorianFesti 2005-06-09 10:37:06

per schema:

per attribute:

see /MetaSchema

Schema Example:

BEGINBLOCK
id: #community
type: Schema
label: Internet Community
attribute $
  BEGINBLOCK
    key: community-name;
    label: Community Name;
    description: Name of the Internet Community;
    required: 1;
    multiple: 0;
    type: string;
  ENDBLOCK
attribute $
  BEGINBLOCK
    key: community-member:
    label: Community Member;
    description: Block representing a member of the Internet Community;
    required: 0;
    multiple: 1;
    type: block; <----- this is the right name, right?
  ENDBLOCK
...
and so on and so forth
...
ENDBLOCK

Types

Literal Types

May be present within a block as value. They are detected by the parser.

Real Types

Types further restrict the values of an attribute and give an interpretation. Some type define translations that have to be applied to the literal values.

All _raw is _string, just the read process is different?

TODO

longer term:

Addressing

lk: We could say that machinecodeblocks are named with only letters, and then use reserved characters for going in deep.

http://example.net/foo#nameofblock-nameofkey-numberofindex-nameofkey-nameofblock-nameofblock-numberofindex-nameofkey...

(leapfrogging from page to page to other block on page to item within list to page to...)

F: Nice idea. But we need only one step right now.

lk: Well, if you're nesting more than one level deep, ...

This is no problem if each block has a name. So we have a reserved attribute "id" and you access example.com/MyBlocks#thridsubblockofweiredblock

hm, but then other people can't address a block if you didn't name it.

You still can leapfrogg. But this has to be done by the application and not by the stanard uncluded URL mangling.

okay- we use a combination of allowed leapfrogging, and recommended naming.

Names must be unique on the page. yes, ofcause, they are URLs/URIs

We should probably recommend that the names attempt to be different than any <a name>'s that might be on the page itself, too.

Though, in fact, they can co-exist, if an <a name> collides.. (..!)

I can imagine a "smart wiki" that would identify MCB's and notice their names and then generate <a name> tags around the name: keys. So that a named MCB would also be a valid identifier to web browsers as well. Yes.

Wiki Engine Description Schema

Wiki Description Schema

Future ideas:

Maschine Code Block Web Service

Define XML RPC interface that maps Blockurls to real blocks. Define how the blocks are returned.

MoinMoin: MachineCodeBlocks3 (last edited 2007-10-29 19:05:59 by localhost)