Request: Search Only in Visible Text
Improve quality of searches by excluding comments, macros, etc.
The Problem
Certain searches can return WAY too many results, so many that you can't find what you're looking for. Often this is not because your search terms are bad, but because the search function matches in the raw markup.
A symptom of this might be that you do a search, click on one of the matches to see what it says, and you can't find your search word(s) anywhere on the page (because they are only in the markup and don't make it to HTML).
I could list a bunch of examples, but I hope the issue is obvious. For more info, see UsabilityObservation/SearchingInsideComments.
My assertion is that when people do a search, they want matches on what they'll see when they view the page normally, not what they would see when editing the markup. Only on rare occasions does anyone need to search within comments, directives (acl), and macro names. So it would still be nice to have the ability to search raw markup, but it should not be the norm.
Possible Solutions
Search cached page output
Probably, after the rendered output is cached, strip out all the HTML markup and save this copy for searching. (Or the formatter could be altered to generate them both at the same time.)
Good:
- Searches would be no slower than they are now.
- Search algorithm would need very few changes.
- Correctly handles any parser, automatically.
Bad:
- Could this slow page serving?
- Would almost double page cache storage space
Different:
- Would search macro output and inlined attachment contents
Discussion
I don't like loosing the feature of searching non-displayed code. It is for example useful to search parts of ACLs or to search comments in the code of pages. A good compromise for the search would be to return only results from visible text by default and to return results from invisible text only optionally. -- DavidLinke 2007-03-03 11:31:14
- Currently there is no such thing as "cached output" to search through. What we have is bytecode compiled code that renders the page. It would need to be executed to produce the rendered html output.