Valentin JANIAUT
<valeuf AT gmail DOT com>
- Wiki URL
- HomePage URL
- Country (born / living in)
- France
- Academic experience
- Your current occupation
- Student
- Software projects you have already participated in
Open Office.Org Education Project : I worked on the scanner support under Mac OS X. The code has not been accepted by OOo.
django-resume A very simple django application to manage resume. You can find the repo here : django-resume repo.
Puzzle Path : iPhone game I made during my last internship. It is unfortunately closed-source, but you can still read a presentation or read the report I wrote with MoinMoin about this experience.
eBay on FIRE : It is a search engine to retrieve an eBay object from a picture. I made this for a laboratory in Taiwan during my last summer. You can read a short review of what I did, or a more complete article about my research.
Moin Moin : I started to use Moin Moin in 2008. However, I recently hacked some piece of code to fix some problems I had with DocBook formatter function. I wrote a quick patch to solve a bug with the Include Macro and the DocBook formatter. I also made another minor tweak to prettify the XML document generated by the formatter.
Experience Level
- Experience in coding in general
- started 2004, 5000 hours
- Experience in Python coding
- started 2008, 800 hours
- Experience in HTML
- started 2004, 400 hours
- Experience in CSS
- started 2004, 400 hours
- Experience in Javascript
- started 2006, 200 hours
- Experience with Mercurial
- started in 2009, I use it for all my private project.
- Your favourite programming language(s), best first
- C/C++, Python, Java, Objective-C.
- Tools you use for development
- Mostly Vim for simple project. Otherwise : Eclipse and during my last internship X-Code and Microsoft Visual Studio.
- Did you already do full day work (8h/5d) over some weeks on some software project yet?
- I worked 6 months in Seoul for an internship. I was working arround 9h~10h, I was also working regularly during week-end, since it is usual in South Korea.
- My point of view about the SoC
- Unlike working for a company with closed-source, Summer Of Code is an appealing program, because it enables student to join a FOSS project. Furthermore, in my recent internship and freelance job I got used to work with ugly code and the lack of documentation because we did not have time to write it in an elegant manner. So working for a FOSS project, with the enriching advices of a mentor, will be for me an opportunity to improve my coding skills in a constructive and creative community. Furthermore, the wage would be for me an additional motivation to give my best.
First Project : Different DOM Conversions support
Proposal
MoinMoin is already a great tool to write reports, thesis and other consequent papers. First of all, it lets the author focus on the contents instead of the presentation. Secondly, it makes documents easily accessible from every computer with an internet connection. Thirdly, the different export functions can provide top-quality documents for printing, publishing ... Some other features are also really useful. If you have to write a report with different authors, it makes it really simple to manage the ACL, and have different people working on the same document at the same time. The desktop mode with the synchronization process provides an easy-to-make cloud storage for your different texts.
However, the support of different import/export formats (like DocBook) for moin2 is not yet done. So I am offering here to carry out this task during the next summer. Here is the list of the different conversions not yet supported :
html -> DOM
docbook -> DOM
DOM -> docbook
So my first goal will be to write converters with full support of the different MoinMoin features. Especially, (if it make sense) all the basic macro (like FootNote, PageBreak ...) supported by MoinMoin 2.0 and the attachment (especially pictures) for the DOM -> DocBook converter.
But since this task is not so long to achieve, I would like to add some other goals. First of all, working on the documentation of Moin2, since the current source code is lacking clear explanation, especially about the new DOM tree, and other elements related to the work I would fulfill. Naturally, this task is required by the Summer Of Code program, so it can be an opportunity for me to improve the documentation system of MoinMoin.
Secondly, if I am not short of time, I would like to provide support for some other minor projects :
Conversion into XSL-FO : Can replace DocBook for document printing.
Using SVG in the DOM : Can be really useful to handle SVG graphics in a DocBook document.
Unit Tests
After reading some comments about the unit tests, I think that I really have to write tests for the different converters. I think that the best would be a kind of generic test for the circular conversion (A -> DOM -> A). When converting a document to the DOM Tree, and then convert it back to the same format document, we should obtain exactly the same document.
The best to write these tests will be to make the list of the different informations handle by our converters. (It can be paragraph, heading, author ...). We should distinguish two kind of informations : visible or hidden. The visible information are the ones which can be seen on the final document (for example a paragraph, a link ...). The hidden information are not always print, but are also important. (For example author name, title of the document ...).
So, we can write different sets of test for the both categories of the information. For this we can just write small and basic document in different formats. These documents should represent well the different kind of information we can have.
Schedule
Community Bounding Time (April 26 - May 24) : Reading documentation and code to assimilate the DOM tree. List the different information contained in a HTML or DocBook document to determine which information should be made accessible through the DOM tree. Start to write unit test for each kind of information. The tests will work for both HTML and DocBook converters.
May 24 - June 11 18 days : Write a HTML->DOM converter which completes all the tests previously wrote.
June 11 - June 19 8 days : Clean code of the HTML->DOM converter.
June 21 - June 25 4 days : Exam period in my home university.
June 28 - July 12 15 days : Write DocBook -> DOM -> DocBook converters which complete all the unit tests for the visible information.
Goal For the mid-Term evaluation :
July 12 - July 20 7 days : Write the last parts of DocBook->DOM->DocBook converters to complete all the unit tests.
July 20 - July 28 7 days : Clean the code for the DocBook->DOM->DocBook converters.
July 28 - August 1 4 days : Write test for a DOM -> XSL-FO converter.
August 1 - August 11 10 days : Write a DOM->XSL-FO converter which completes the tests.
August 11 - August 16 5 days : Clean the code for the DOM->XSL-FO converter.
If I am not running out of time, I would also like to have a look at the support of the SVG. Especially, a converter DOM -> SVG and also adding SVG support in the other converter (so you can embed SVG figures in a DocBook document for example).
For your information, I will still be studying at my university until June 25th. However, I have a very light schedule, since I did not take many courses. I do not have class Thursday and Friday, and I have a single class on Monday morning. So I am planning to work from Thursday to Monday (including week-end) in order not be late on the schedule. I believe it is achievable considering I am currently working around 30~35 hours per week for a video game company on an iPhone project (this contract will end early in April).
And after the SoC ?
I am a regular user of MoinMoin. I almost never spend one day without adding, editing, deleting ... pages on my personal wiki. It became one of my best tools to manage my notes, and I am finally getting rid of the mass of papers I had on my desk. I am also using it to compose my reports.
I see the SoC as a good opportunity to earn money in an interesting summer job. But I also hope that I will enjoy hacking MoinMoin, and eventually keep doing it later, especially for the DocBook support. I am using this feature regularly, and I would be happy to get rid of the list of problems there is actually with the DocBook formatter and the latest versions of MoinMoin. So I hope to personally develop a good know-how about the DOM tree and DocBook conversion to continue to provide efficient support on these features. It would be great if at the end of my SoC I have enough knowledge to be able to fix small bugs, like Include Macro one, in a short amount time.
Link Farm
General Information
Check : http://hg.moinmo.in/moin/2.0-storage-dom-bblank especially converter2 package.
http://moinmo.in/MoinMoin2.0 See Dom Based Transformation
http://moinmo.in/MoinDev/WikiDom Nice schema
http://moinmo.in/WikiDomFormatter Too old ?
http://moinmo.in/BastianBlank/TreeOutputFormatter Previous GSoC
Different Feature Request related to the project
http://moinmo.in/FeatureRequests/ImportDocBookWhishlist Can be easily done with docbook -> DOM converter
Misc
http://en.wikipedia.org/wiki/XSL_Formatting_Objects Start point for XSL-FO
http://www.docbook.org/tdg/en/html/part2.html DocBook Reference
http://en.wikipedia.org/wiki/DocBook Start Point for DocBook
http://www.w3schools.com/dom/default.asp DOM Tutorial
Application Comments
Reimar Bauer
Do you be familiar with unit tests and test driven development? Similiar to documentation your code must be covered by unit tests. I missed that in your development tools and in the schedule. At which point do you provide unittests for reviewing?
Thomas Waldmann
Some questions:
what's an important property the html->dom (together with the dom->html) converter should have? same question for docbook->dom and dom->docbook?
- how will you make this sure?
- how (in general) will you make sure correct operation of the converters? how exactly?
- what exactly do you mean with "attachments"? note that moin 2.0 does not have attachments like 1.9 did.
how would you implement the html->dom conversion? tools used?
Valentin Janiaut
@ReimarBauer : I am not so much familiar with unit tests and test driven development : this is something quite new for me. Some friends told be about, but I never use it really. However, it looks really interesting and quite easy to use, so I think it is a good idea to try to use it for this project. I am sure this will improve the result of my code.
I also edited the schedule to use a test driven approach. My current approach with prototype, test and then clean code seems to be not so good now !
@Thomas Waldmann :
what's an important property the html->dom (together with the dom->html) converter should have? same question for docbook->dom and dom->docbook?
Theoretically, the converter function should be a bijection, which mean that if we execute the following conversion : HTML -> DOM -> HTML, the output should be same than the input. Therefore the different converters should not lost any visible information, like a paragraph, a picture, a link ... But the should be also able to convert the hidden information (what we can call meta data). For example, it means that the author of a document should not be lost, even if it is not always a visible information.
- how will you make this sure?
I thought that I will list the different information that an HTML, a DocBook document can contain and be sure that there is correct node to handle this in the DOM. If not, I should add it (and document it). From this list, I can write test for each kind of information (or group of informations). These tests will check if the information is not lost after a circular conversion.
- how (in general) will you make sure correct operation of the converters? how exactly?
As I explained before, unit test will be an excellent way to make sure that the converters are working correctly.
- what exactly do you mean with "attachments"? note that moin 2.0 does not have attachments like 1.9 did.
I mean all the stuffs like image, video and so ... I saw that now in moin2 every thing is a part of the DOM tree. But, still we need to see how can we put the different attachments in the DocBook document. There is different solutions (provide an archive with the different attachments, use the base64 to embed media, just have external link pointed to the resource ...). We have to choose what is the best one, and to implement this for the different converters.
how would you implement the html->dom conversion? tools used?
First of all, I think that it would be good to check if the HTML page is valid, and focus on valid page at the start. It is hard to define a good behavior when the code is not valid. However we can see later to handle page with some basic problem. We should also define what (X)HTML version we will support.
For the tools I would like to use the HTML.Parser module provided by python to parse the HTML. An HTML document is relatively close to a tree, so I think it is possible to just browse the document like a tree, to convert it into the DOM tree.
We need also to think how to convert the different meta data (like keywords, description ...) into the DOM tree. I am not sure yet how to do it correctly.
Bastian Blank
Do you have knowledge about formal descriptions of such format with XML schema or RELAX NG?
Valentin Janiaut
@Bastian Blank : I do not have extended knowledges about the formal definition of these format with XML Schema rather Relax NG. I know quite correctly the formal definition of DocBook, because I use this syntax regularly, and I am regularly reading the documentation, but I never try to read or write it formally. I also know a little bit XML-SVG because I had a look about it for my django-resume few months ago. I finally decided to use LaTeX with the Template Engine from Django, the result was better.
I have also some knowledges about Relax NG, because I used it in one of my previous courses for a project. We had to define the syntax of a programing language, and use Relax NG te define it. We use especially the compact form. (It was quite a weird language using XML as a base, it was actually to show the relation between the different element of a language, and also to use something easier than Flex/Yacc to write the formal description of a language).
Do you think we should use one of the numerous formal description to write the different converter ? To check the validity of our input/output ?