Jeremy Smith's blog

Entry Is Labelled

Tipping Point for a Wiki to Become Self-Correcting

Commenting on RSS4Lib:: Campus Wikis and Wiki Authority and j's scratchpad:

While pointing out Case Western Reserve University's CaseWiki on RSS4Lib, Ken wonders how large a wiki community must be before it becomes self-correcting

Well, to speak about our wiki, it has yet to reach a behavioral level of self-correcting; and I can speak about our numbers. At this point, we have about 600 registered users to the wiki. At first blush, that may sound like a lot; but very few of the users perform more than one edit. Here is a chart I whipped up showing the users and how many edit/creates they have done:


Most users stop by, login (which grants them their first edit during the auto-generation of their User: page), and never actually commit an edit. But really, that's expected. The top 2 bars (I removed the data for the internal MediaWiki user) are Greg and myself, the administrators of the application. In pie chart form, you can really see how many of the edits we account for (we're the two big slices):


But at this stage of the game, with the wiki still so young, that's to be expected.

I don't think the question of "when does a wiki become self-correcting" is in the total number of users as much as it is in the number of a certain "type" of user. After all, you can end up with 40k registered users; but if only 6 of them are ever doing any edits...

Here are the same graphs again, but this time, I removed all of the persons with one edit and mine and Greg's data:



Breaking that data down into some metrics:

# of Edits# of Users
100 or more7
Between 20 and 10016
Between 2 and 2049

Another way to look at that data is that 8% of the users account for 55% of the changes; 13% of the users account for 65% of the changes.

So with those kinds of numbers, we have yet to reach a self-correcting stage. But I think those numbers are still soft because many of those persons play inside of "walled gardens." That is, they spend a lot of time editing their own pages (just a lot of times). There isn't a lot of cross-pollinization going on.

Therefore, in conclusion, to summarize: I don't know. But at the point in time when the Case Wiki begins to exhibit self-correcting behavior (which I have no idea how to measure, so I'll just go with my gut feelings as I observe the wiki changes), I will make sure to do a lengthy blog post with plenty of data and graphs and analysis and such.


  1. gravatar

    You may be closer to self-correction than you think. Yesterday I went to a presentation by Jimmy Wales who posted some user analysis data about the English Wikipedia. What he said was that roughly 50% of all edits were being done by 0.7% of all users, and about 75% were being done by somewhere in the 5% range. Using that as a reference, I'd say your wiki is doing pretty good.

  2. gravatar

    If you did that coding in PHP, I can easily turn it into a Special Page on the wiki. But, you probably did it in Perl :(. What other data metrics do you think would be helpful to export on Special:Graphs?

  3. gravatar
    If you did that coding in PHP, I can easily turn it into a Special Page on the wiki. But, you probably did it in Perl :(

    Actually, I just used phpMyAdmin and some SQL to gather the data. Exported the data to a CSV. Then, used Excel to whip it around into graphs.

  4. gravatar

    From all the studies I read when completing my Masters in Library & Information Science, you are pretty close to the expected participation level. In the information world, we estimate that 80% of the content is created by 20% of the potential participants. We also look at 80% of the citations in publications come from 20% of the sources used. 80-20 rule is fairly standard in all information fields.

  5. gravatar

    I'm putting together a page that automatically generates graphs similar to those above for the wiki. Brian, can you think of any plots that would be beneficial from an information science perspective? One graph I inferred from your post would be a side-by-side bar graph that lists percentage of edits next to percentage of users. I'm trying to think of a graph where we can actually visualize the tipping point...

  6. gravatar

    Jeremy, is this the query you used:

    SELECT DISTINCT COUNT(*) AS frequency, userfreq.edits FROM (SELECT count(*) AS edits , rev_user_text FROM revision WHERE rev_user !=0 GROUP BY rev_user_text) AS userfreq GROUP BY userfreq.edits ORDER BY edits DESC

  7. gravatar
    is this the query you used

    That looks right.

  8. gravatar

    I look forward to how you represent the success of the Case Wiki. Wikis are constantly discussed in the information field.

    A graph similar to these ones that represents the number of entries and/or edits per user. In addition, I think it would be interesting to see a report or table that represents what external materials are linked to in order to validate the wiki content (Case web sites, Case blog, external web sites, etc.). Also, something that shows how much time is spent on entries that shows that wiki entries are just not random thoughts, but well organized information.

  9. gravatar


    There are a handful of tools available to MediaWiki with article citing in mind. One recent addition to the WikiPedia is a "cite this article" link on every page. There are also some extensions for MediaWiki that allow you to cite from where the content was obtained. I will look into deploying these in the near future.

    Your idea about the amount of time spent on entries is interesting. I don't think that has even been done before. The closest we have is the number of edits per page. I think it might be possible to code up something.

    If you have any further suggestions, please record them on

  10. gravatar

    Oh, I forgot to address the interconnection of pages. My latest mini-project on the wiki is this whole graphing idea. A plan of mine is to expand the current category graphs (see for an example) and make graphs for individual articles accessible. Since we record what web sites refer to the Case Wiki content, I could easily add those as nodes to this graph. As far as recording where the information came from, that is up to the editors.