Citizendium Forums
November 28, 2014, 11:57:35 UTC *
Welcome, Guest. Please login or register.

Login with username, password and session length
News: This forums is now a read-only archive. Project members may post on the new forum. Non-members may use only the "Open Forum" group, but still must register before posting (it's easy!). Posts will otherwise be deleted.
To edit your displayed name, click on Profile > Account Related Settings. To edit your signature, click on Profile > Forum Profile Information.
Click here to return to the wiki
 
   Home   Help Search Login Register  
Pages: [1]
  Print  
Author Topic: Proposal: Fork Wikipedia and launch with some A1-class model subjects  (Read 15107 times)
Peter Hitchmough
Forum Participant
***
Posts: 107


Not short of things to do


« on: October 02, 2006, 21:18:11 UTC »

Why doesn't Citizendium do a complete(-ish) fork of WP and show its colours by launching some excellent revised approved content on Day 1?

Rewrite a complete section pre-launch.
- This would need a working party under a working Chief Subject Editor, and could be started now.

Of course, we could launch more subjects if we can put the effort in.

This links with my comments on triage - pick a subject where we have a good chance of making an excellent difference.
Comments? Offers?

The long haul is where the difference will really lie, but it's worth demonstrating Citizendium's Unique Selling Point.

-Peter
Logged
Larry Sanger
Founding Editor-in-Chief
Forum Regular
*****
Posts: 1830



WWW
« Reply #1 on: October 02, 2006, 21:41:45 UTC »

Hmm, INTERESTING idea.  I hadn't thought of just, e.g., using the Textop wiki to start working on a relatively small set of improved articles.

I like it.  Let's talk about it!  There are a zillion things to settle about even this small proposal.  For one thing I don't want Textop to get slammed.  :-)

I can always of course just install another instance of MediaWiki myself and...hmm.  You have my mind going.
« Last Edit: October 02, 2006, 21:43:22 UTC by Larry Sanger » Logged

My CZ user page: http://en.citizendium.org/wiki/User:Larry_Sanger
Please link to your CZ user page in your signature, too!
To do that, click on Profile > Forum Profile Information.
Peter Hitchmough
Forum Participant
***
Posts: 107


Not short of things to do


« Reply #2 on: October 02, 2006, 21:49:43 UTC »

We aim to please.  Grin

...all of us do.




Of course I think the articles need to be kept to an "alpha test" availability. Needs real editors.

-Peter
Logged
Larry Sanger
Founding Editor-in-Chief
Forum Regular
*****
Posts: 1830



WWW
« Reply #3 on: October 02, 2006, 21:57:13 UTC »

But let's not use the Textop wiki for this.  I've decided that isn't a good idea.
Logged

My CZ user page: http://en.citizendium.org/wiki/User:Larry_Sanger
Please link to your CZ user page in your signature, too!
To do that, click on Profile > Forum Profile Information.
Zachary Pruckowski
Forum Communicator
****
Posts: 933


« Reply #4 on: October 02, 2006, 22:14:51 UTC »

Go for it.  I think our ultimate goal should be 1400 improved pages by year's end.  That's 0.1% of Wikipedia.  That way, if we get the "popular" articles, we have a good shot at having a noticeable difference from WP.
Logged

Clare Westhorpe
Forum Newcomer
*
Posts: 5


« Reply #5 on: October 02, 2006, 22:55:26 UTC »

Do you want us to earmark articles in our fields that won't require too much re-writing, or a free-for-all? Either way we'll get an interesting and diverse selection of material to present. Or would you prefer a special few are chosen so that we can all look more closely at the re-writing process? (the editorial process comes to mind)

I'd suggest we each work on an article or two to improve, then review the work in a short time frame

Clare Westhorpe
Logged
Larry Sanger
Founding Editor-in-Chief
Forum Regular
*****
Posts: 1830



WWW
« Reply #6 on: October 03, 2006, 00:07:46 UTC »

The next step is actually to develop a plan for this pilot project, which I'm working on and which I will post to citizendium-l.

But, Clare, marking (somehow) articles that don't need a lot more improvement couldn't hurt.

UPDATE: I've posted the plan:

https://lists.purdue.edu/pipermail/citizendium-l/2006-October/000500.html

and you can comment here:

http://textop.org/smf/index.php?topic=82.0
« Last Edit: October 03, 2006, 02:53:05 UTC by Larry Sanger » Logged

My CZ user page: http://en.citizendium.org/wiki/User:Larry_Sanger
Please link to your CZ user page in your signature, too!
To do that, click on Profile > Forum Profile Information.
Zachary Pruckowski
Forum Communicator
****
Posts: 933


« Reply #7 on: October 03, 2006, 04:15:21 UTC »

Do you want us to earmark articles in our fields that won't require too much re-writing?

I definitely think this has a lot of promise.  If we can help a few hundred articles along in a hurry, that'll make a great set of comparisons against WP, and it'll be nice to have a bunch of "featured articles" that we can point to and say "This is how a CZ article should look".  I think our best candidates are categorically failed featured articles, good articles, and featured articles that lost their featured status.
Logged

Jason Sanford
Forum Member
**
Posts: 57


« Reply #8 on: October 03, 2006, 18:08:09 UTC »

From a marketing point of view, this would be a great thing to do. Has anyone ever checked out the list of featured articles people at Wikipedia want to have? (Access it at http://en.wikipedia.org/wiki/Wikipedia:List_of_featured_articles_English_Wikipedia_should_have) While additions and subtractions could easily be made to this list, if we started off with solid articles on all of these subjects I think people would sit up and take notice.

I'd prefer using this list instead of the "most viewed" list because the later list contains many articles which are not overly important but are listed merely because of sexual content or a tie in with current pop culture.
« Last Edit: October 03, 2006, 19:16:36 UTC by Jason Sanford » Logged
Jason Sanford
Forum Member
**
Posts: 57


« Reply #9 on: October 03, 2006, 19:03:04 UTC »

I have created a fork of this Wikipedia list on our own Wiki at http://www.textop.org/wiki/index.php?title=Proposed_Articles_for_Citizendium_Pilot_Project. Currently there at 703 articles on the list. I hope people will add or subtract articles as they see fit and help turn this into something with can be used with the CZ Pilot Project Larry Sanger is proposing.
Logged
J. Noel Chiappa
Forum Participant
***
Posts: 286


J. Noel Chiappa


WWW
« Reply #10 on: October 04, 2006, 23:25:08 UTC »

I think the right way to go is what I'll call a "virtual fork", which is to say that we don't actually make a copy of all Wikipedia articles on day X. Rather, we should set it up so that if we don't have an article on "Foo", we cough up the contents of the current "Foo" article on Wikipedia (with the resulting page of contents suitably captioned to indicate that - and also indicating that it should not be relied upon).

That way, we always have exactly the same content as Wikipedia, except that for items which we have worked on and meet our standards, people see ours instead. In other words, there will be no point to ever looking on Wikipedia for anything; by coming here you get either i) the latest and greatest from Wikipedia, or ii) (hopefully even better :-) you get an article which we have carefully reviewed, sourced, checked, and copy-edited,

I don't think we can really expect to go online and catch many viewers with just a "few" articles (even 10,000 articles is a "few"). It's just not worth the hassle of coming here, finding out that we don't have an article on your-favourite-topic, and then looking on Wikipedia. On the other hand, if we can make it zero extra work to consult both, then I think we've got something that has a chance of taking off.
Logged

Noel's Citi-page

"There's no sense in being precise when you don't even know what you're talking about."   -- John von Neumann
Zachary Pruckowski
Forum Communicator
****
Posts: 933


« Reply #11 on: October 05, 2006, 01:55:26 UTC »

I think the right way to go is what I'll call a "virtual fork", which is to say that we don't actually make a copy of all Wikipedia articles on day X. Rather, we should set it up so that if we don't have an article on "Foo", we cough up the contents of the current "Foo" article on Wikipedia (with the resulting page of contents suitably captioned to indicate that - and also indicating that it should not be relied upon).

That way, we always have exactly the same content as Wikipedia, except that for items which we have worked on and meet our standards, people see ours instead. In other words, there will be no point to ever looking on Wikipedia for anything; by coming here you get either i) the latest and greatest from Wikipedia, or ii) (hopefully even better :-) you get an article which we have carefully reviewed, sourced, checked, and copy-edited,

I don't think we can really expect to go online and catch many viewers with just a "few" articles (even 10,000 articles is a "few"). It's just not worth the hassle of coming here, finding out that we don't have an article on your-favourite-topic, and then looking on Wikipedia. On the other hand, if we can make it zero extra work to consult both, then I think we've got something that has a chance of taking off.


We can't hotlink to the WP articles (it breaks their forking rules).  We have to download the articles and host them ourselves.  At absolute best, we can keep checking the "recent changes" page and update changed pages based off that, but that expensive and might cost both WP and CZ a good bit of bandwidth.
Logged

Larry Sanger
Founding Editor-in-Chief
Forum Regular
*****
Posts: 1830



WWW
« Reply #12 on: October 05, 2006, 07:11:28 UTC »

Actually, I believe there's a recent changes RSS feed, which Google uses.

JNC, I think you have it right.  You've concisely stated one of the main reasons for an all-at-once fork.  If I understand you correctly, then what you're suggesting is the same as what I suggest as a "progressive fork": http://www.citizendium.org/#progressive_fork
Logged

My CZ user page: http://en.citizendium.org/wiki/User:Larry_Sanger
Please link to your CZ user page in your signature, too!
To do that, click on Profile > Forum Profile Information.
J. Noel Chiappa
Forum Participant
***
Posts: 286


J. Noel Chiappa


WWW
« Reply #13 on: October 05, 2006, 11:36:15 UTC »

We can't hotlink to the WP articles (it breaks their forking rules).  We have to download the articles and host them ourselves.

Actually, I was speaking more of the "architecture" rather than the "implementation" - i.e. exactly how (technically) it is that we get the latest Wikipedia content was something I wasn't trying to work out. That's a purely technical issue that CZ's tech people would have to work out, probably in consultation with the Wikipedia folks. I was speaking more of what the people who consult CZ would actually see.

But actually, when it comes to content, Wikipedia does have this whole architecture with replicated slave servers (which is how they do their secondary clusters, IIRC - haven't checked their layout recently), and a further level of caching servers (squids). I wonder if it wouldn't be possible to plug into that system at some level, either as a slave server, or a caching server. At the beginning, traffic from CZ should be pretty minimal.
Logged

Noel's Citi-page

"There's no sense in being precise when you don't even know what you're talking about."   -- John von Neumann
Zachary Pruckowski
Forum Communicator
****
Posts: 933


« Reply #14 on: October 05, 2006, 18:22:47 UTC »

But actually, when it comes to content, Wikipedia does have this whole architecture with replicated slave servers (which is how they do their secondary clusters, IIRC - haven't checked their layout recently), and a further level of caching servers (squids). I wonder if it wouldn't be possible to plug into that system at some level, either as a slave server, or a caching server. At the beginning, traffic from CZ should be pretty minimal.

Are you talking physically plugging in?  That was a suggestion on the technical boards - FedEx HDDs back and forth every week.  Because I don't think the issue is the traffic we'd potentially generate (on the CPU end) but the bandwidth costs.

That RSS feed of recent changes is pretty attractive though.
Logged

J. Noel Chiappa
Forum Participant
***
Posts: 286


J. Noel Chiappa


WWW
« Reply #15 on: October 08, 2006, 23:58:05 UTC »

Wikipedia does have this whole architecture with replicated slave servers (which is how they do their secondary clusters, IIRC - haven't checked their layout recently), and a further level of caching servers (squids). I wonder if it wouldn't be possible to plug into that system at some level

Are you talking physically plugging in?  That was a suggestion on the technical boards - FedEx HDDs back and forth every week.  Because I don't think the issue is the traffic we'd potentially generate (on the CPU end) but the bandwidth costs.

That RSS feed of recent changes is pretty attractive though.

Hi, I went back and looked and refreshed my memory of how the Wikipedia distant sites (e.g. in the Netherlands) actually work (well, according to the latest stuff anyone wrote down :-), and it turns out they don't actually have database slaves there, just Squids. (Although if I understand how their system works, they could have more than just Squids there; it seems to me they could have either just some Apache servers, or Apaches and a slave server, to feed the Apaches. But perhaps someone has done the numbers and just Squids are a better solution?

Also, I understand your point about the bandwidth, and I've also looked at "Wikipedia:Mirrors_and_forks" on Wikipedia, which says:

Quote
remote loading .. is an unacceptable use of Wikimedia server resources. Even remote loading websites with little legitimate traffic can generate significant load on our servers

I understand and take to heart the point about load/bandwidth. With that in mind, I was wondering if the following would be acceptable to the Wikimedia crew: configure some set of Citizendium machine(s) to be Apaches (and probably Squids too, eventually); to Wikipedia, these would look and act just like part of the Wikipedia server pool. For articles where Citizendium doesn't have a "native" article, Citizendium would serve up the current Wikipedia article, using the local cache of the Wikipedia content.

This should be entirely acceptable, on the load/bandwidth front: my reasoning is that if Citizenium didn't exist, all of the queries to Citizendium would wind up at Wikipedia, and their servers would have to handle them. This way, they'd actually get less traffic, because Citizendium's pool of Apaches (and later Squids) would be intercepting and handling some of the queries for them.

Looking at this another way, if Citizendium does that, and can do a good job of caching, would the pages Citizendium does load from them really take more bandwidth than either i) an RSS feed, or ii) reloading the entire database on a regular basis? Either one of those is going to be sending Citizendium articles (and updates) that it probably never needs, isn't there a good chance that that would require more bandwidth than the bandwidth needed to support what is, effectively, a pool of pseudo-Wikipedia Apache and Squid servers?

Am I reasoning correctly, or am I missing something (probably something obvious :-)?
Logged

Noel's Citi-page

"There's no sense in being precise when you don't even know what you're talking about."   -- John von Neumann
geni
Forum Member
**
Posts: 23


« Reply #16 on: October 16, 2006, 00:00:13 UTC »


Am I reasoning correctly, or am I missing something (probably something obvious :-)?


Images
Logged
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.15 | SMF © 2011, Simple Machines Valid XHTML 1.0! Valid CSS!