Contents
UrlGrab
Description
This macro gets the content from an URL and inserts it into the wiki page.
For pages that are password protected (and use a session cookie), the macro can be told to log-in into the remote site. Attention: although it is possible to hide the login information from wiki source text, absolute security is not guaranteed. Use at your own risks.
Usage for trivial pages:
<<UrlGrab(URL="http://my.site.com/my/page...")>>
There are known issues with this macro:
Not working with MoinMoin 1.6.x and above
In its trivial usage, the wiki page receives the whole remote page, including <head>, CSS, Javascript.
- So unless the remote page is trivial, the results will look mostly surprising.
- There is a way to keep only wanted parts of the remote page, using the Filter argument (see below), but
- it requires quite much reverse-engineering of the remote page, and is sensitive to changes in its structure.
A better approach would be using an IFRAME. I can look into that if there are enough requests.
Download & Release Notes
Download |
Release Version |
Moin Version |
Release Notes |
0.1.1 |
1.3 |
|
Usage
Version: v0.1.1
Usage: [[ UrlGrab ]] [[ UrlGrab (KEYWORD=VALUE [, ...] ) ]] If no arguments are given, the usage is inserted in the HTML result. Possible keywords: Help = 0, 1, 2 Displays 1:short or 2:full help in the page. Default: 0 (i.e. no help). Url = 'STRING' or '$varname' URL of page to grab. Mandatory. LoginUrl = 'STRING' or '$varname' LoginForm = {'name': 'value', ...} URL of page to visit to perform a login, and the form fields. Default: empty (i.e. do not perform login). CookieCopy = 'STRING' or '$varname' Allows to create a new cookie by duplicating an existing one. Form: ExistingCookieName NewName Value Example: Bugzilla_login COLUMNLIST priority assigned_to status_whiteboard short_desc Debug = 0 or 1 Dump the fetched and filtered content to display the HTML. Useful to tune filters and to reverse-engineer login forms. Default: 0 (i.e. no debug). Encoding = 'STRING' Specifies the name of the encoding used by the source HTML Separator = 'HTML_STRING' or '$varname' HTML text inserted between matches, if the filter include lists or tuples. Default: empty (i.e. no separator). Filter = 'FILTER_STRING', or list of Filter, or tuple of Filter or '$varname' A filter string has one of these forms: REGEX : if no match, use input and stop processing (default) *S*REGEX : if no match, use input and stop processing (default) *C*REGEX : if no match, use input and continue processing *s*REGEX : if no match, just stop processing *c*REGEX : if no match, just continue processing *TEXT*Regex : if no match, fail with TEXT as error message The prefix can also be e.g. *=s* (etc) in which case a case-sensitive match is done. A regex may contain expressions between ()'s, in which case the result will be the concatenation of the matches between ()'s. It is possible to chain filters as follows: - Tuple of filters, i.e. filters between ()'s: the filters are applied in sequence, until one fails and requires to stop; a filter in the sequence can be a string, a list or a tuple. - List of filters, i.e. filters between []'s: the filters are applied in parallel; results are concatenated; a filter in the sequence can be a string, a list or a tuple. The filter parameter is mandatory. Keywords can be also given in upper or lower cases, or abbreviated. Example: SearchText, searchtext, SEARCHTEXT, st, ST, Pages, p, etc. Some values may be a string begining with '$', in which case the rest of the value is a variable name, which value is defined in the wikiconfig.py file as follows: class Macro_UrlGrab: my_variable1 = "my string value" my_variable2 = {"my": "dict", "value": ""} This allows to define confidential values (like credentials) hidden in the wikiconfig.py and only known by the wiki site admin, and use them unrevealed in the wiki pages. ---- Sample 1: Grab a bugzilla page Wiki page: ... [[UrlGrab(LoginUrl="$bz_login_url", Filter="$bz_filter", URL="http://my.bugzilla.site/cgi-bin/bugzilla/buglist.cgi?bug_status=__open__")]] ... wikiconfig.py: ... class Macro_UrlGrab: # Bugzilla login URL to a generic account: bz_login_url = "http://my.bugzilla.site/cgi-bin/bugzilla/query.cgi?GoAheadAndLogIn=1&Bugzilla_login=lucky.starr@asimov.com&Bugzilla_password=SpaceRanger" # chained filters to keep only the buglist table: bz_filter = ( # keep bugs table: '(<table class="bz_buglist".*)<div id="footer">', # remove footer box: '(.*)<table>.*action="long_list.cgi">.*</table>' )
Example
[[UrlGrab(URL="http://checkip.dyndns.org/")]]
The above line will embed this site's IP address in the following line:
UrlGrab(URL="http://checkip.dyndns.org/")
However, the macro is not installed so it doesn't work at this time.
Copyright
Pascal Bauermeister <pascal DOT bauermeister AT gmail DOT com>
License
GPL
Bugs
Discussion
This page could use some more examples.
Don't get the usage of this macro
It's useful for aggregating data from other sites into a wiki page. For instance, I use it to embed the results of a bugzilla query into a wiki page for a project. This provides your wiki with dynamic data, which you would otherwise have to update manually every time it changed.