Attachment 'spam.py'

Download

   1 """
   2 Extract links from spam and return ready to paste regular expressions.
   3 """
   4 
   5 import sys
   6 import re
   7 import urlparse
   8 
   9 urlPattern = re.compile(r'\bhttps?://[-a-zA-Z0-9./,:;+*%?!&$\(\)_#=~\'"@]+',
  10                         re.IGNORECASE)
  11 
  12 def extractPatterns(text):
  13     patterns = {}
  14     for link in urlPattern.findall(text):
  15         # antispam care only about the network location
  16         netloc = urlparse.urlparse(link)[1]
  17         # Ignore www subdomain
  18         netloc = netloc.replace('www.', '')
  19         netloc = netloc.replace('.', '\.')
  20         patterns[netloc] = None
  21     return patterns.keys()
  22     
  23 
  24 def run():
  25     text = file(sys.argv[1]).read()
  26     patterns = extractPatterns(text)
  27     print '\n'.join(patterns)
  28 
  29 
  30 if __name__ == '__main__':
  31     run()
  32         

Attached Files

To refer to attachments on a page, use attachment:filename, as shown below in the list of files. Do NOT use the URL of the [get] link, since this is subject to change and can break easily.
  • [get | view] (2005-07-30 19:32:09, 0.7 KB) [[attachment:spam.py]]
 All files | Selected Files: delete move to page copy to page

You are not allowed to attach a file to this page.