Valentin JANIAUT

E-Mail

<valeuf AT gmail DOT com>

Wiki URL

http://wiki.valeuf.org/

HomePage URL

http://www.valeuf.org/

Country (born / living in)
France
Academic experience
  • Master Degree Of Computer Science at UTBM1

  • Exchange Student (and now dual degree) at KAIST2 in South Korea

  • Summer Camp at the IRLab3 of the NTU4

Your current occupation
Student
Software projects you have already participated in

Experience Level

Experience in coding in general
started 2004, 5000 hours
Experience in Python coding
started 2008, 800 hours
Experience in HTML
started 2004, 400 hours
Experience in CSS
started 2004, 400 hours
Experience in Javascript
started 2006, 200 hours
Experience with Mercurial
started in 2009, I use it for all my private project.
Your favourite programming language(s), best first
C/C++, Python, Java, Objective-C.
Tools you use for development
Mostly Vim for simple project. Otherwise : Eclipse and during my last internship X-Code and Microsoft Visual Studio.
Did you already do full day work (8h/5d) over some weeks on some software project yet?
I worked 6 months in Seoul for an internship. I was working arround 9h~10h, I was also working regularly during week-end, since it is usual in South Korea.
My point of view about the SoC
Unlike working for a company with closed-source, Summer Of Code is an appealing program, because it enables student to join a FOSS project. Furthermore, in my recent internship and freelance job I got used to work with ugly code and the lack of documentation because we did not have time to write it in an elegant manner. So working for a FOSS project, with the enriching advices of a mentor, will be for me an opportunity to improve my coding skills in a constructive and creative community. Furthermore, the wage would be for me an additional motivation to give my best.

First Project : Different DOM Conversions support

Proposal

MoinMoin is already a great tool to write reports, thesis and other consequent papers. First of all, it lets the author focus on the contents instead of the presentation. Secondly, it makes documents easily accessible from every computer with an internet connection. Thirdly, the different export functions can provide top-quality documents for printing, publishing ... Some other features are also really useful. If you have to write a report with different authors, it makes it really simple to manage the ACL, and have different people working on the same document at the same time. The desktop mode with the synchronization process provides an easy-to-make cloud storage for your different texts.

However, the support of different import/export formats (like DocBook) for moin2 is not yet done. So I am offering here to carry out this task during the next summer. Here is the list of the different conversions not yet supported :

So my first goal will be to write converters with full support of the different MoinMoin features. Especially, (if it make sense) all the basic macro (like FootNote, PageBreak ...) supported by MoinMoin 2.0 and the attachment (especially pictures) for the DOM -> DocBook converter.

But since this task is not so long to achieve, I would like to add some other goals. First of all, working on the documentation of Moin2, since the current source code is lacking clear explanation, especially about the new DOM tree, and other elements related to the work I would fulfill. Naturally, this task is required by the Summer Of Code program, so it can be an opportunity for me to improve the documentation system of MoinMoin.

Secondly, if I am not short of time, I would like to provide support for some other minor projects :

Unit Tests

After reading some comments about the unit tests, I think that I really have to write tests for the different converters. I think that the best would be a kind of generic test for the circular conversion (A -> DOM -> A). When converting a document to the DOM Tree, and then convert it back to the same format document, we should obtain exactly the same document.

The best to write these tests will be to make the list of the different informations handle by our converters. (It can be paragraph, heading, author ...). We should distinguish two kind of informations : visible or hidden. The visible information are the ones which can be seen on the final document (for example a paragraph, a link ...). The hidden information are not always print, but are also important. (For example author name, title of the document ...).

So, we can write different sets of test for the both categories of the information. For this we can just write small and basic document in different formats. These documents should represent well the different kind of information we can have.

Schedule

If I am not running out of time, I would also like to have a look at the support of the SVG. Especially, a converter DOM -> SVG and also adding SVG support in the other converter (so you can embed SVG figures in a DocBook document for example).

For your information, I will still be studying at my university until June 25th. However, I have a very light schedule, since I did not take many courses. I do not have class Thursday and Friday, and I have a single class on Monday morning. So I am planning to work from Thursday to Monday (including week-end) in order not be late on the schedule. I believe it is achievable considering I am currently working around 30~35 hours per week for a video game company on an iPhone project (this contract will end early in April).

And after the SoC ?

I am a regular user of MoinMoin. I almost never spend one day without adding, editing, deleting ... pages on my personal wiki. It became one of my best tools to manage my notes, and I am finally getting rid of the mass of papers I had on my desk. I am also using it to compose my reports.

I see the SoC as a good opportunity to earn money in an interesting summer job. But I also hope that I will enjoy hacking MoinMoin, and eventually keep doing it later, especially for the DocBook support. I am using this feature regularly, and I would be happy to get rid of the list of problems there is actually with the DocBook formatter and the latest versions of MoinMoin. So I hope to personally develop a good know-how about the DOM tree and DocBook conversion to continue to provide efficient support on these features. It would be great if at the end of my SoC I have enough knowledge to be able to fix small bugs, like Include Macro one, in a short amount time.

General Information

Misc

Application Comments

Reimar Bauer

Do you be familiar with unit tests and test driven development? Similiar to documentation your code must be covered by unit tests. I missed that in your development tools and in the schedule. At which point do you provide unittests for reviewing?

Thomas Waldmann

Some questions:

Valentin Janiaut

@ReimarBauer : I am not so much familiar with unit tests and test driven development : this is something quite new for me. Some friends told be about, but I never use it really. However, it looks really interesting and quite easy to use, so I think it is a good idea to try to use it for this project. I am sure this will improve the result of my code.

I also edited the schedule to use a test driven approach. My current approach with prototype, test and then clean code seems to be not so good now !

@Thomas Waldmann :

Theoretically, the converter function should be a bijection, which mean that if we execute the following conversion : HTML -> DOM -> HTML, the output should be same than the input. Therefore the different converters should not lost any visible information, like a paragraph, a picture, a link ... But the should be also able to convert the hidden information (what we can call meta data). For example, it means that the author of a document should not be lost, even if it is not always a visible information.

I thought that I will list the different information that an HTML, a DocBook document can contain and be sure that there is correct node to handle this in the DOM. If not, I should add it (and document it). From this list, I can write test for each kind of information (or group of informations). These tests will check if the information is not lost after a circular conversion.

As I explained before, unit test will be an excellent way to make sure that the converters are working correctly.

I mean all the stuffs like image, video and so ... I saw that now in moin2 every thing is a part of the DOM tree. But, still we need to see how can we put the different attachments in the DocBook document. There is different solutions (provide an archive with the different attachments, use the base64 to embed media, just have external link pointed to the resource ...). We have to choose what is the best one, and to implement this for the different converters.

First of all, I think that it would be good to check if the HTML page is valid, and focus on valid page at the start. It is hard to define a good behavior when the code is not valid. However we can see later to handle page with some basic problem. We should also define what (X)HTML version we will support.

For the tools I would like to use the HTML.Parser module provided by python to parse the HTML. An HTML document is relatively close to a tree, so I think it is possible to just browse the document like a tree, to convert it into the DOM tree.

We need also to think how to convert the different meta data (like keywords, description ...) into the DOM tree. I am not sure yet how to do it correctly.

Bastian Blank

Do you have knowledge about formal descriptions of such format with XML schema or RELAX NG?

Valentin Janiaut

@Bastian Blank : I do not have extended knowledges about the formal definition of these format with XML Schema rather Relax NG. I know quite correctly the formal definition of DocBook, because I use this syntax regularly, and I am regularly reading the documentation, but I never try to read or write it formally. I also know a little bit XML-SVG because I had a look about it for my django-resume few months ago. I finally decided to use LaTeX with the Template Engine from Django, the result was better.

I have also some knowledges about Relax NG, because I used it in one of my previous courses for a project. We had to define the syntax of a programing language, and use Relax NG te define it. We use especially the compact form. (It was quite a weird language using XML as a base, it was actually to show the relation between the different element of a language, and also to use something easier than Flex/Yacc to write the formal description of a language).

Do you think we should use one of the numerous formal description to write the different converter ? To check the validity of our input/output ?

  1. University Of Technology Belfort-Montbeliard (1)

  2. Korean Advance Institute Of Science and Techonology (2)

  3. Web Mining and Information Retrieval lab (3)

  4. National Taiwan University (4)

MoinMoin: ValentinJaniaut/GSoC/Application (last edited 2010-04-27 11:27:04 by ValentinJaniaut)