UrlGrab

Description

This macro gets the content from an URL and inserts it into the wiki page.

For pages that are password protected (and use a session cookie), the macro can be told to log-in into the remote site. Attention: although it is possible to hide the login information from wiki source text, absolute security is not guaranteed. Use at your own risks.

Usage for trivial pages:

<<UrlGrab(URL="http://my.site.com/my/page...")>>

There are known issues with this macro:

Not working with MoinMoin 1.6.x and above
In its trivial usage, the wiki page receives the whole remote page, including <head>, CSS, Javascript.
So unless the remote page is trivial, the results will look mostly surprising.
There is a way to keep only wanted parts of the remote page, using the Filter argument (see below), but
it requires quite much reverse-engineering of the remote page, and is sensitive to changes in its structure.

A better approach would be using an IFRAME. I can look into that if there are enough requests.

Download & Release Notes

Download	Release Version	Moin Version	Release Notes
UrlGrab.py	0.1.1	1.3

Usage

Version: v0.1.1

Usage:
  [[ UrlGrab ]]
  [[ UrlGrab (KEYWORD=VALUE [, ...] ) ]]

If no arguments are given, the usage is inserted in the HTML result.
Possible keywords:

  Help           = 0, 1, 2
    Displays 1:short or 2:full help in the page.
    Default: 0 (i.e. no help).

  Url            = 'STRING' or '$varname'
    URL of page to grab.
    Mandatory.

  LoginUrl       = 'STRING' or '$varname'
  LoginForm      = {'name': 'value', ...}
    URL of page to visit to perform a login, and the form fields.
    Default: empty (i.e. do not perform login).

  CookieCopy     = 'STRING' or '$varname'
    Allows to create a new cookie by duplicating an existing one.
    Form:
      ExistingCookieName NewName Value
    Example:
      Bugzilla_login COLUMNLIST priority assigned_to status_whiteboard short_desc
    
  Debug          = 0 or 1
    Dump the fetched and filtered content to display the HTML. Useful
    to tune filters and to reverse-engineer login forms.
    Default: 0 (i.e. no debug).

  Encoding       = 'STRING'
    Specifies the name of the encoding used by the source HTML

  Separator      = 'HTML_STRING' or '$varname'
    HTML text inserted between matches, if the filter include lists or tuples.
    Default: empty (i.e. no separator).

  Filter         = 'FILTER_STRING', or list of Filter, or tuple of Filter
                   or '$varname'
    A filter string has one of these forms:
      REGEX       : if no match, use input and stop processing (default)
      *S*REGEX    : if no match, use input and stop processing (default)
      *C*REGEX    : if no match, use input and continue processing
      *s*REGEX    : if no match, just stop processing
      *c*REGEX    : if no match, just continue processing
      *TEXT*Regex : if no match, fail with TEXT as error message
    The prefix can also be e.g. *=s* (etc) in which case a case-sensitive
    match is done.

    A regex may contain expressions between ()'s, in which case the
    result will be the concatenation of the matches between ()'s.

    It is possible to chain filters as follows:
    - Tuple of filters, i.e. filters between ()'s:
        the filters are applied in sequence, until one fails and
        requires to stop; a filter in the sequence can be a string, a
        list or a tuple.
    - List of filters, i.e. filters between []'s:   
        the filters are applied in parallel; results are concatenated;
        a filter in the sequence can be a string, a list or a tuple.

    The filter parameter is mandatory.
    
Keywords can be also given in upper or lower cases, or abbreviated.
Example: SearchText, searchtext, SEARCHTEXT, st, ST, Pages, p, etc.

Some values may be a string begining with '$', in which case the rest
of the value is a variable name, which value is defined in the
wikiconfig.py file as follows:
    class Macro_UrlGrab:
        my_variable1 = "my string value"
        my_variable2 = {"my": "dict", "value": ""}
This allows to define confidential values (like credentials) hidden in
the wikiconfig.py and only known by the wiki site admin, and use them
unrevealed in the wiki pages.
                        
----

Sample 1: Grab a bugzilla page

  Wiki page:
    ...
    [[UrlGrab(LoginUrl="$bz_login_url", Filter="$bz_filter", URL="http://my.bugzilla.site/cgi-bin/bugzilla/buglist.cgi?bug_status=__open__")]]
    ...

  wikiconfig.py:
    ...
    class Macro_UrlGrab:
        # Bugzilla login URL to a generic account:
        bz_login_url = "http://my.bugzilla.site/cgi-bin/bugzilla/query.cgi?GoAheadAndLogIn=1&Bugzilla_login=lucky.starr@asimov.com&Bugzilla_password=SpaceRanger"
        # chained filters to keep only the buglist table:
        bz_filter = (
            # keep bugs table:
            '(<table class="bz_buglist".*)<div id="footer">',
            # remove footer box:
            '(.*)<table>.*action="long_list.cgi">.*</table>'
            )

Example

[[UrlGrab(URL="http://checkip.dyndns.org/")]]

The above line will embed this site's IP address in the following line:
UrlGrab(URL="http://checkip.dyndns.org/")
However, the macro is not installed so it doesn't work at this time.

Copyright

Pascal Bauermeister <pascal DOT bauermeister AT gmail DOT com>

License

GPL

Bugs

Discussion

This page could use some more examples.

Don't get the usage of this macro

It's useful for aggregating data from other sites into a wiki page. For instance, I use it to embed the results of a bugzilla query into a wiki page for a project. This provides your wiki with dynamic data, which you would otherwise have to update manually every time it changed.

MoinMoin: MacroMarket/UrlGrab