MoinMoin always searches in comments, but I think this is bad.
Observation
It is very frustrating to search for something, get some matches, and then not be able to find your search word(s) on the page. Finally, I discovered (and then forgot, and rediscovered, about 10 times) that MoinMoin is searching through the invisible comments in the markup.
My belief: When users search, they care about finding things in the visible text. It is rare that anyone actually wants to find text that is in a comment. The same is true for macro names. The common case should be the default, and the rare case can be supported using an option of some sort.
I have 2 suggestions for a solution:
- By default, do not search in comments/macro calls (hidden text). Provide a search prefix (like regex or title) that allows searching within the hidden areas
- A simpler possibility: Always match on all hidden text, but then divide the results into two groups: those that match without looking in hidden text, and those that required looking in the hidden text in order to be a match. Then show the obvious matches first, followed by a line stating something like "Matches generated by searching hidden text (comments and macros)".
Scenario A
Here's my latest example. I would like to search for all bugs about the GUI editor that are not marked as fixed. So I enter t:gui t:bugs -MoinMoinBugFixed and do a text search. It gives me only subpages of bugs, but no actual bugs. So, I could conclude that they're all fixed, right? WRONG.
Since every bug is created using BugTemplate, they pretty much all contain the following markup:
## If you are a moin core developer, replace the category to Category* in these cases: ## Category MoinMoinNoBug - if this is not a bug. ## Category MoinMoinBugConfirmed - if you can confirm the bug on current code. ## Category MoinMoinBugFixed - after the bug is fixed in current code.
This means that every bug will be a hit for these bug states, no matter what its actual state is.
Scenario B
searching in macro names Try searching for information or discussion about certain macros. You will get tens or hundreds of hits on some of them because they are used on many pages.
Task
Trying to get good search matches that are truly relevant to what I'm looking for.
Users
Me.
Discussion
The problem is that search does not specifically include comments or macros etc., but that it just processes the full page source text, not knowing about the different markups and their meaning. As moin does not only support wiki markup, I guess there is no easy solution to this, except using other search terms.
It's just an idea I haven't thought much on, but maybe after each display request of a page, we can save the html code Moin produces of the page to the page's directory, maybe remove also in a way all the unneeded html markup, so that we have onyl the pure content of a page, like it is seen by the user. This even means: the results of marco calls e.g. of TaskPlanner will become searchable! This would be great! Also parser calls and so on.. Maybe there could be also an option in xapian search, where to search, in the revisions dir of a page ("raw search") or in the content dir of a page ("content search"). -- OliverSiemoneit 2007-01-13 19:25:07
If cache is enabled the html code is already saved, so searching in cache needs only to be discussed. -- ReimarBauer 2007-01-13 19:52:38
However what happens if a page has never been displayed/cached? This is especially true for underlay pages on new installations. Or not? Maybe we could fix that by providing in the distribution always a cached version of the page.. And what about all that annoying html markup stuff? Is there an easy way to strip the header and clean e.g. the page body of divs, javascript and so on?? These things could cause the same problem as the wiki markup above I think. The only real step forward would be to search also macro output, but content and markup is still not separated. What will happen, if we have also more metadata and a semantic Moin? How to deal with these things? They should not be stripped away.. They are not content, but meta-content.. Or could this be left to the "raw search" function? -- OliverSiemoneit 2007-01-13 21:52:25
Just saw on the xapian website, that xapian also allows indexing of html files. Is this indexing intelligent enough to omit html markup and javascript, i.e. does xapian only index the real output content of a page e.g. like Google does?? Then it would be an easy task to search the cache like suggested by Reimar and markup and content would be separated plus output of macros would be included.. -- OliverSiemoneit 2007-01-13 21:52:25
I hadn't thought about the different parser aspect... I would like to suggest an alternate solution... MoinMoin could search the markup as it does now, but then for every page that was a hit (should be a small subset), check the cached rendering or create it if it doesn't exist. This shouldn't slow down the search too much. -- SteveDavison 2007-01-19 20:51:06