Citizendium Forums
November 24, 2009, 10:54:51 AM *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News: POSTING RULES FOR MAIN CZ BOARDS: (1) The CZ Forums are Citizens-only (a "Citizen" is a Citizendium member). Non-Citizens may use only the "Non-member discussion" and "General help" boards, but still must register before posting (it's easy!). Non-Citizen posts elsewhere will be summarily deleted. (2) All must now use their own real names. To edit your displayed name, click on Profile > Account Related Settings. (3) Citizens must now link to their CZ user pages. To edit your signature, click on Profile > Forum Profile Information.
Click here to return to the wiki
 
   Home   Help Search Login Register  
Pages: [1] 2
  Print  
Author Topic: How many articles originally appeared from wikipedia vs. articles that can  (Read 6084 times)
tkjazzer
Guest
« on: September 16, 2007, 04:50:12 PM »

Before a license decision is made, I would like to know exactly

How many articles "originally appeared on wikipedia" and therefore have to be GFDL?

Now, this does not mean the number of "external articles" because editing those external articles would still be GFDL and some of those have become internal articles right? after so many edits on CZ an external becomes internal?  And those are still madatory GFDL until all the sentences are changed...correct? 

Now, In the high school college days, we were told that paraphrasing is still copying.  So does that mean, if someone paraphrased every single sentence from wikipedia that it still originated at wikipedia and therefore still needs GFDL?

Overall, how many articles must be GFDL or should be GFDL and how many articles are open to any license? 
Logged
Stephen Ewen
Guest
« Reply #1 on: September 16, 2007, 05:56:57 PM »

I'd guestimate 10% total are CZ original, thus open to any license.
Logged
Derek Harkness
Forum Regular
****
Posts: 543


« Reply #2 on: September 16, 2007, 06:08:43 PM »

Why doesn't checking the wikipedia sourced box in the edit page not add the article to something like [[Category:Wikipedia Sourced]] or something similar. This would make tracking these articles much easier.

Also, some articles have lost there wikipedia check box when people are editing. Probably just careless clicking, perhapse a bug in the system, I don't know. The why is not so important. However, why can't we lock that checkbox once it's clicked and require an Editor or preferably a Constable to unclick it.

Third idea: Should the article source be moved to the checklist/metadata page?
Logged

Stephen Ewen
Guest
« Reply #3 on: September 16, 2007, 06:18:05 PM »

Why doesn't checking the wikipedia sourced box in the edit page not add the article to something like [[Category:Wikipedia Sourced]] or something similar. This would make tracking these articles much easier.

Also, some articles have lost there wikipedia check box when people are editing. Probably just careless clicking, perhapse a bug in the system, I don't know. The why is not so important. However, why can't we lock that checkbox once it's clicked and require an Editor or preferably a Constable to unclick it.

Third idea: Should the article source be moved to the checklist/metadata page?

I completely agree.  Categories for both WP sourced and CZ original articles would make MANY tasks MUCH easier.
Logged
tkjazzer
Guest
« Reply #4 on: September 16, 2007, 11:22:10 PM »

I think that data is extremely important before a license decision is made.  Does anyone else think this data is important to have before making a decision?

Logged
a.a.s.
Forum Communicator
***
Posts: 152


« Reply #5 on: September 17, 2007, 03:27:46 AM »

As of beginning of September we had about
3180 CZ-originated (~82%) and
700 WP-sourced (~18%) articles,

update: this was incorrect (thanks Derek) see my posts below; corrected numbers:
2140 (~61%) CZ "pure"
1360 (~39%) WP-tagged
according to what we tag as WP-sourced.

We have still a little portion of articles (estimated < 5%, say) that are WP-sourced and not tagged yet.
From time to time I do a  massive (semi-automated, quite stupid still quite efficient) check and tag.
Nonetheless, the proportion of WP-sourced decreases (was about 20% in july, 19% in august).
This agrees with my rough observations of new pages - they are often created from scratch, or by import of author's *own work* from WP.

BTW, importing own work now seems to be the main part of imports from WP, if not virtually the totality. Random importing is not very popular (I hope it'd never be); consider also that a random import without interested "owner" often gets no improvements here and, according to our policies, eventually gets (speedy-)deleted.

Anyway, it seems that quite a big proportion of our articles would fall under the new license.
« Last Edit: September 17, 2007, 06:21:52 AM by Aleksander Stos » Logged

Stephen Ewen
Guest
« Reply #6 on: September 17, 2007, 03:31:33 AM »

they are often created from scratch, or by import of author's *own work* from WP.

Ah, yea, I forgot about that factor.   Grin

I think in the long run more articles will be WP-sourced than CZ original.

An interesting thing is to look at which proportion of approved articles are WP sourced, and to excerpt out those ones that are an "import of an author's *own work* from WP."
« Last Edit: September 17, 2007, 03:34:53 AM by Stephen Ewen » Logged
Derek Harkness
Forum Regular
****
Posts: 543


« Reply #7 on: September 17, 2007, 03:40:15 AM »

Quote
As of beginning of September we had about
3180 CZ-originated (~82%) and
700 WP-sourced (~18%) articles,
according to what we tag as WP-sourced.
I don't think these stats are right. I think you have mixed up WP-sourced with external articles. Wikipedia sourced must be significantly greater than external. Remember that the Wikipedia checkbox and licence is for any wikipedia content on a page no matter how much it has be edited and added to. So even some class 1 or 2 (and a few class 3) articles would have to have GDFL licences on them since they are in part wikipedia sourced.

Quote
importing own work now seems to be the main part of imports from WP, if not virtually the totality. Random importing is not very popular (I hope it'd never be); consider also that a random import without interested "owner" often gets no improvements here and, according to our policies, eventually gets (speedy-)deleted.
Importing work that is entirly you own work but previously posted to Wikipedia is not wikipedia sourced. These articles are sourced form their author. Much of the True Viper articles and many by Richard Jensen fall into this category. These are not WP-sourced even though WP has a copy.

The article only get WP-sourced if the author imported their own work AND carried over some edits that other wikipedians had added.

Additionally. Much of the 18% that is external has not been modified and should get a speedydelete tag stuck on. So what we are worried about for the licience is WP-sourced less external/for deletion.
« Last Edit: September 17, 2007, 03:42:54 AM by Derek Harkness » Logged

a.a.s.
Forum Communicator
***
Posts: 152


« Reply #8 on: September 17, 2007, 03:57:16 AM »

I don't think these stats are right. I think you have mixed up WP-sourced with external articles.
Well, I just counted those WP-tagged in the database, the numbers should be correct as they stand.
But I just realized that I didn't care for subpages (i.e. each subpage was counted separately as an article and no subpage is WP-tagged). So indeed these numbers don't tell the reality. Some updates are in order on my part.

Quote
Importing work that is entirly you own work but previously posted to Wikipedia is not wikipedia sourced. These articles are sourced form their author. Much of the True Viper articles and many by Richard Jensen fall into this category. These are not WP-sourced even though WP has a copy.

The article only get WP-sourced if the author imported their own work AND carried over some edits that other wikipedians had added.

Sure. The problem is that the latter is the most common case (esp. for RJ; and many vipers too)...
And the point is that the burden of proof of the full authorship seems to be on the CZ side. This means that unless explicit declaration (e.g. with WPauthor template and/or links to WP version) is made or a careful check is done, the articles with exact similitudes to WP counterparts should be WP-tagged.
« Last Edit: September 17, 2007, 06:16:59 AM by Aleksander Stos » Logged

a.a.s.
Forum Communicator
***
Posts: 152


« Reply #9 on: September 17, 2007, 06:15:52 AM »

Sorry for multi-posting; here goes the update of the stats.

I confess I found a bug - the previous numbers were giving the status of the first edit in the revision history, while we are interested in the last one. So Derek was right -- sorry for misleading info.
Furthermore, I eliminated all the subpages.

Now, corrected numbers
2140 "pure" CZ (~61%)
1360  WP-tagged (~39%)

Are these numbers bug-free? Arguably so. Check for yourself. Try to generate a hundred of random pages, say. Skip any subpages and disambigs. I've done it (by hand) and it gave pretty exactly that proportion. With firefox it's not that difficult -- just keep the "random page" text in the searchbox and press F3/Enter/End repetitively...
« Last Edit: September 17, 2007, 06:36:44 AM by Aleksander Stos » Logged

Derek Harkness
Forum Regular
****
Posts: 543


« Reply #10 on: September 17, 2007, 06:19:51 AM »

Quote
Well, I just counted those WP-tagged in the database, the numbers should be correct as they stand.
According to the stats page you made at http://en.citizendium.org/wiki/CZ:Statistics#Checklisted_articles you have 19.3% external articles. All of them should be tagged as WP sourced (I suspect some are not). Many internal articles should be tagged as WP sourced. So the number must be allot greater than 19.3% so how did you get 18%?

Perhaps the 18% is internal articles that are WP tagged?Huh?? In which case the total WP source articles would be 18+19.3=37.3%?Huh?
Logged

a.a.s.
Forum Communicator
***
Posts: 152


« Reply #11 on: September 17, 2007, 06:24:54 AM »

Now, it's corrected, the way I got 18% were explained too...
I think we were posting at the same time...  Wink
Anyway, thanks for being suspicious (you were right!).
Logged

Aleta Curry
Forum Regular
****
Posts: 1105


« Reply #12 on: September 17, 2007, 04:29:08 PM »


I think in the long run more articles will be WP-sourced than CZ original.


Hope not
Logged

http://en.citizendium.org/wiki/User:Aleta_Curry

Lady Astor, to Winston Churchill:  Sir, if you were my husband, I'd put poison in your tea!

Churchill:  Madam, if I were your husband, I'd drink it!
tkjazzer
Guest
« Reply #13 on: September 17, 2007, 07:03:30 PM »

Well, either way.  For me, it's all about the knowledge existing.  I don't really care where it comes from.  I don't really care what license it is under.

However, I do care about frustration.

I think it will be frustrating to have 2 licenses and have a problem with combining articles. 

Steve, could you explain why you think more articles will be WP sourced in the future?

Tom
Logged
George Swan
Forum Communicator
***
Posts: 134


« Reply #14 on: October 26, 2007, 05:12:41 PM »

1 So, should I assume that porting an earlier version, of which I am the sole author is prefereable to porting a later version that has had input from other wikipedians?

2 What if the later version had limited input from other wikipedians?  Spelling and grammar corrections, or wikilinking?  Am I right to assume this is sufficient input that I can no longer consider myself the sole author?

3 What if I use a later version,that had limited input from other wikipedians, but I excised all the paragraphs other wikipedians added, leaing only my own work?

4 Someone wrote above that they learned, in high school, that paraphrasing was copying?  For our purposes is this really correct?  You can't paraphrase something you don't understand.  The person who wrote the original passage owns its expression, the particular expression they wrote -- not the actual fact, or idea, that they expressed.  Mind you, IANAL, but I think the other writer was incorrect about how concerned we should be about paraphrasing.

5 About GFDL, I thought that, since it requires giving credit to everyone who contributed, so that if we used material that had been liscenced to the wikipedia the wikipedia's edit history would have to be available for it to be used here.  If that is true, wouldn't it represent a problem if an article here, based on a wikipedia, where the original wikipedia version was deleted?  If it were deleted there would be no publicly available edit history.  So, I am concernced we would no longer be honoring the original wikipedia contributors GFDL rights.  Similarly, if the wikipedia version was renamed, this might be almost as bad as if it were deleted.

6 Does this imply we really ought to mirror the previous edit history of articles that were orginally from the wikipedia?

7 I am sorry.  I am new here, and I assumed that the limited number of contributions I have made here were under the GFDL.  If that is not true under what liscense did I release that material, and are the rights I retain  very different to the rights I retain under the GFDL?

I am breaking the chain for my most important question.

I have read of some commercial sites that offer a wiki to their paying customers, that advertized a tool that would archive all the articles in a list of articles into a package that could dearchived on another wiki that used the same software.  I dunno if that commercial site used Mediawiki.  I dunno if the archive preserved the entire edit history.

Am I correct to assume we have nothing like that here?  That there is  no tool like this built in to Mediawiki?

As I wrote elsewhere a small but dedicated group on the wikipedia seems to have decided to remove a bunch of articles I am the primary author under "notability".  I would love to be able to pack them all up, and unpack them somewhere more welcoming.

Cheers!
Logged

Pages: [1] 2
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.7 | SMF © 2006-2008, Simple Machines LLC Valid XHTML 1.0! Valid CSS!