Contents
UrlGrab
Description
This macro gets the content from an URL and inserts it into the wiki page.
For pages that are password protected (and use a session cookie), the macro can be told to log-in into the remote site. Attention: although it is possible to hide the login information from wiki source text, absolute security is not guaranteed. Use at your own risks.
Usage for trivial pages:
<<UrlGrab(URL="http://my.site.com/my/page...")>>
There are known issues with this macro:
Not working with MoinMoin 1.6.x and above
In its trivial usage, the wiki page receives the whole remote page, including <head>, CSS, Javascript.
- So unless the remote page is trivial, the results will look mostly surprising.
- There is a way to keep only wanted parts of the remote page, using the Filter argument (see below), but
- it requires quite much reverse-engineering of the remote page, and is sensitive to changes in its structure.
A better approach would be using an IFRAME. I can look into that if there are enough requests.
Download & Release Notes
Download |
Release Version |
Moin Version |
Release Notes |
0.1.1 |
1.3 |
|
Usage
Version: v0.1.1
Usage:
[[ UrlGrab ]]
[[ UrlGrab (KEYWORD=VALUE [, ...] ) ]]
If no arguments are given, the usage is inserted in the HTML result.
Possible keywords:
Help = 0, 1, 2
Displays 1:short or 2:full help in the page.
Default: 0 (i.e. no help).
Url = 'STRING' or '$varname'
URL of page to grab.
Mandatory.
LoginUrl = 'STRING' or '$varname'
LoginForm = {'name': 'value', ...}
URL of page to visit to perform a login, and the form fields.
Default: empty (i.e. do not perform login).
CookieCopy = 'STRING' or '$varname'
Allows to create a new cookie by duplicating an existing one.
Form:
ExistingCookieName NewName Value
Example:
Bugzilla_login COLUMNLIST priority assigned_to status_whiteboard short_desc
Debug = 0 or 1
Dump the fetched and filtered content to display the HTML. Useful
to tune filters and to reverse-engineer login forms.
Default: 0 (i.e. no debug).
Encoding = 'STRING'
Specifies the name of the encoding used by the source HTML
Separator = 'HTML_STRING' or '$varname'
HTML text inserted between matches, if the filter include lists or tuples.
Default: empty (i.e. no separator).
Filter = 'FILTER_STRING', or list of Filter, or tuple of Filter
or '$varname'
A filter string has one of these forms:
REGEX : if no match, use input and stop processing (default)
*S*REGEX : if no match, use input and stop processing (default)
*C*REGEX : if no match, use input and continue processing
*s*REGEX : if no match, just stop processing
*c*REGEX : if no match, just continue processing
*TEXT*Regex : if no match, fail with TEXT as error message
The prefix can also be e.g. *=s* (etc) in which case a case-sensitive
match is done.
A regex may contain expressions between ()'s, in which case the
result will be the concatenation of the matches between ()'s.
It is possible to chain filters as follows:
- Tuple of filters, i.e. filters between ()'s:
the filters are applied in sequence, until one fails and
requires to stop; a filter in the sequence can be a string, a
list or a tuple.
- List of filters, i.e. filters between []'s:
the filters are applied in parallel; results are concatenated;
a filter in the sequence can be a string, a list or a tuple.
The filter parameter is mandatory.
Keywords can be also given in upper or lower cases, or abbreviated.
Example: SearchText, searchtext, SEARCHTEXT, st, ST, Pages, p, etc.
Some values may be a string begining with '$', in which case the rest
of the value is a variable name, which value is defined in the
wikiconfig.py file as follows:
class Macro_UrlGrab:
my_variable1 = "my string value"
my_variable2 = {"my": "dict", "value": ""}
This allows to define confidential values (like credentials) hidden in
the wikiconfig.py and only known by the wiki site admin, and use them
unrevealed in the wiki pages.
----
Sample 1: Grab a bugzilla page
Wiki page:
...
[[UrlGrab(LoginUrl="$bz_login_url", Filter="$bz_filter", URL="http://my.bugzilla.site/cgi-bin/bugzilla/buglist.cgi?bug_status=__open__")]]
...
wikiconfig.py:
...
class Macro_UrlGrab:
# Bugzilla login URL to a generic account:
bz_login_url = "http://my.bugzilla.site/cgi-bin/bugzilla/query.cgi?GoAheadAndLogIn=1&Bugzilla_login=lucky.starr@asimov.com&Bugzilla_password=SpaceRanger"
# chained filters to keep only the buglist table:
bz_filter = (
# keep bugs table:
'(<table class="bz_buglist".*)<div id="footer">',
# remove footer box:
'(.*)<table>.*action="long_list.cgi">.*</table>'
)
Example
[[UrlGrab(URL="http://checkip.dyndns.org/")]]
The above line will embed this site's IP address in the following line:
UrlGrab(URL="http://checkip.dyndns.org/")
However, the macro is not installed so it doesn't work at this time.
Copyright
Pascal Bauermeister <pascal DOT bauermeister AT gmail DOT com>
License
GPL
Bugs
Discussion
This page could use some more examples.
Don't get the usage of this macro
It's useful for aggregating data from other sites into a wiki page. For instance, I use it to embed the results of a bugzilla query into a wiki page for a project. This provides your wiki with dynamic data, which you would otherwise have to update manually every time it changed.
