Description

For a wiki with a significant number of users, saving a page is slow due to an inefficient implementation of getSubscribers. This is similar to the problem described in MoinMoinBugs/GetSubscribersPerformanceProblem, but it doesn't address the other cause of slowness in the getSubscribers function.

Steps to reproduce

  1. Have a few thousand user accounts
  2. truss/strace the web process
  3. Save a change to a page, and notice that every user file is scanned on every page save.
  4. On this page, go to Info > General Page Infos and notice that it takes 5-7 seconds for the page to load. Most of the time is spend getting subscriber information.

  5. After implementing a workaround (described below), page saves and "General Page Infos" load significantly faster.

Example

(no example)

Component selection

Details

Apparently first found in:

MoinMoin Version

1.8.3

OS and Version

Solaris 10

Python Version

2.5

Server Setup

apache + mod_wsgi

Server Details

Language you are using the wiki in (set in the browser/UserPreferences)

en

But also reproducible in:

MoinMoin Version

1.9.4

OS and Version

Debian GNU/Linux 6.0

Python Version

2.6.6

Server Setup

apache + mod_wsgi

Server Details

Language you are using the wiki in (set in the browser/UserPreferences)

en

There's nothing system-specific about this, though. It's a scalability problem in the code.

Workaround

There's a patch: see subscribercache.patch

Discussion

I reviewed the first patch (didn't try it yet) and it looks quite good (for the goal of touching not too much code, but as you noted, it duplicates some code and that should be solved later, also we likely need more tests).

What's maybe missing (see user.User.isSubscribedTo) is the .valid check - it has to be checked when .valid is True.

So:

Attached an updated patch after feedback from Thomas that it could do with some locking - see subscribercache.patch. Locking seems to work OK.

I did another review of the (current) patch and (while looking at the src) found one issue:

If you run a wiki farm with a shared user_dir (which is not uncommon if the wikis are somehow related and one wants SSO), the cache will be stored and updated per-wiki (as the scope is 'wiki').

So if a user in wiki1 updates his subscriptions, the user profile will get updated for all wikis (as it is shared), but the pagesub cache will only get updated in wiki1, all other farm wikis (wiki2, wiki3, ...) will have an outdated and inconsistent cache. Over time, inconsistencies will increase.

So I guess we'll need to make the caching scope configurable "wiki" or "farm". Maybe the caching scope could be also None (default) which would mean to use the original, uncached code.

Also, as this is adding a new cache moin maint cleancache script likely also needs an update.

Can you fix and update the patch?

-- ThomasWaldmann 2012-09-03 16:26:15

User Feedback?

Who is using this patch? Can you tell about your results with it?

Used by GNOME (http://live.gnome.org/): "We have a large amount of users and MoinMoin is unworkable without it. We actually got offers to fully handle the switch to another Wiki software (this is pretty complicated due to the size of our wiki)."

Used by Debian (http://wiki.debian.org/): "Patch is working fine for us in daily use on a moin 1.9.4 installation. It doesn't check .valid at the moment, but if that's important I could work on that too."

For moin 1.9, only well-tested and maintained stuff will be acceptable. moin2 has a whoosh-index for most metadata, including userprofiles, and will use it to efficiently compute subscriptions (was implemented in GoogleSoc2013).

In further discussion on IRC with Thomas: performance problems are always going to be likely in this area. This cache helps using this style of lookup (check all user records for a subscriber match for this page); switching to the inverse lookup (link to subscribers from the page) would be faster, but would not work with regexp-style subscription requests. (Note: regex matching was also implemented in a as-fast-as-it-gets way in GoogleSoc2013.)

I made a version of the patch suitable to be imported and applied in the config file, without having to modify the wiki engine's sources: subscriber_cache.py. We are using that at the Python wiki.

Plan


CategoryMoinMoinBugFixed CategoryFeaturePatched CategoryMoinMoinPatch

MoinMoin: MoinMoinBugs/GetSubscribersSlow (last edited 2014-02-14 16:47:29 by ThomasWaldmann)