Citizendium Forums
March 20, 2010, 01:45:53 PM *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News: POSTING RULES FOR MAIN CZ BOARDS: (1) The CZ Forums are Citizens-only (a "Citizen" is a Citizendium member). Non-Citizens may use only the "Non-member discussion" and "General help" boards, but still must register before posting (it's easy!). Non-Citizen posts elsewhere will be summarily deleted. (2) All must now use their own real names. To edit your displayed name, click on Profile > Account Related Settings. (3) Citizens must now link to their CZ user pages. To edit your signature, click on Profile > Forum Profile Information.
Click here to return to the wiki
 
   Home   Help Search Login Register  
Pages: 1 [2]
  Print  
Author Topic: Gene articles and bots  (Read 13336 times)
David Goodman
Forum Communicator
***
Posts: 247


« Reply #15 on: April 18, 2007, 11:07:19 PM »

All of this is a clear illustration of the problems we will encounter by having the nc restriction. the use of CZ will be increased both in nature and amount by the least restrictive
 possible licensing.
Logged

David E. Volk
Forum Communicator
***
Posts: 192


David Volk at Stingaree


WWW
« Reply #16 on: April 22, 2007, 04:15:40 AM »

I see several problems with this idea.  First the overwrite of human-added text previously mentioned.  Second, there is nearly no information for most of the (human)  genes.  Although there are a lot of geneticists, they tend, I think, to mostly work on established genes, paying particular attention to oncogenes, for example.  Because of our current lack of knowledge in a systems biology sense, much of the gene ontology is likely to be wrong and will need to be updated, getting back to first point again.  Also the problem with multiple reading frames for some genes.
 How about a thought experiment: make a bot to extract everyword out of a good dictionary for each language on earth, make stubs for all of them and see how many people volunteer to upgrade all of the stubs and inter-relate them?    I did like the example gene page, but only because a lot of information is available for the particular gene that was selected, including a 3D structure of the protein.  The final point is what kind of label will you put on a open reading frame with no known function and no name other than ORF #####?  How does that help the layman?
Logged

Andrew Su
New Arrival
*
Posts: 11


« Reply #17 on: April 24, 2007, 03:20:37 PM »

Hi David,

I see several problems with this idea.  First the overwrite of human-added text previously mentioned. 

Did you also have a problem with the proposed response?  http://forum.citizendium.org/index.php/topic,697.msg5853.html#msg5853

Second, there is nearly no information for most of the (human)  genes.  Although there are a lot of geneticists, they tend, I think, to mostly work on established genes, paying particular attention to oncogenes, for example.  Because of our current lack of knowledge in a systems biology sense, much of the gene ontology is likely to be wrong and will need to be updated, getting back to first point again.  Also the problem with multiple reading frames for some genes.

Agreed that many genes are virtually unannotated, which is why I proposed a rough guessitmate of 10k gene pages (as opposed to the full set of ~25k mammalian genes).  I disagree, however, that "much of the gene ontology is likely to be wrong".  GO annotation is both incomplete and imprecise (i.e., very general annotation, like "cellular metabolism"), but I'd hesitate to say inaccurate.  Do you have any specific examples or studies to support this?  Regardless, even incomplete and imprecise annotation will require updates, and I hope the proposal in the previous post addresses those concerns. 

Also agreed that most genes have multiple possible protein products, but like most of the gene databases available now (and much of the literature), I propose we start with this gene-level of abstraction.

How about a thought experiment: make a bot to extract everyword out of a good dictionary for each language on earth, make stubs for all of them and see how many people volunteer to upgrade all of the stubs and inter-relate them?   

If someone proposed a similar thought experiment five years ago replacing "every word out of a good dictionary" with "every entry out of a good encyclopedia", I might have been skeptical too.  But I am continually surprised and impressed by people's desire to share knowledge.  And molecular biology in particular I think is ready for this idea.  Of course, reasonable people can disagree on that, which is why we're very open to suggestions on how to improve the idea.  But personally, I'm in too deep, so not doing it is not an option Wink (so long it doesn't harm the larger WP/CZ efforts). 

I did like the example gene page, but only because a lot of information is available for the particular gene that was selected, including a 3D structure of the protein.  The final point is what kind of label will you put on a open reading frame with no known function and no name other than ORF #####?  How does that help the layman?

Agreed, let's start with genes (~10k?) that have at least a framework to hang free-text annotation off of, and the richness of these stubs will vary widely.  In the upcoming pilot experiment for ~10 genes, we'll get a range in degree of annotation. 

(And of course, I agree with David Goodman's thoughts on using the least restrictive license possible.)

Cheers,
-andrew
Logged

David E. Volk
Forum Communicator
***
Posts: 192


David Volk at Stingaree


WWW
« Reply #18 on: April 30, 2007, 03:08:42 PM »

Hi Andrew,  Grin

I just now came across your reply.  I thought I was set up to receive these sorts of things in my email, but apparently not.  If you or anyone else can help me with that, please do reply.

I am all for more knowledge, so don't let me discourage you.  I am just pointing out possible things to consider.

Did you happen to read about the Macaque genome in
Science last week?  One of the interesting things in it is that sometimes "disease" gene variants in humans are actually the "normal" genes in
Macaque, and that the Macaque variant is the ancestral gene more closely resembling everything else. 

At present, I have no explicit references for you, nor the desire to look them up, but it often happens (in seminars, symposia) that proteins with have multiple names, because different people working on say, vole foot fungus and  human blindness, will declare different functions for the same protein (gene product).  These often come out of gene knockout studies.  But since the cell works through a highly concerted, inter-related network of proteins with redundancies build in for many vital functions, the results of gene knockout studies can be very misleading, but not necessarily so.  The gene ontology question is just something that will work itself out over the next ten years or so.

Good luck with the project.





Logged

Andrew Su
New Arrival
*
Posts: 11


« Reply #19 on: August 13, 2007, 05:57:56 PM »

Time to resurrect this dormant thread... 

We just got done with a trial run of our bot over on Wikipedia.  We took the 33 most cited genes (according to entrez gene and pubmed).  Of these, 8 had no existing WP pages when searching for the gene symbol, name, or aliases.  We created stubs for these genes from data in public domain databases.

http://en.wikipedia.org/wiki/MMP9    
http://en.wikipedia.org/wiki/HIF1A    
http://en.wikipedia.org/wiki/PTGS2    
http://en.wikipedia.org/wiki/NFKB1    
http://en.wikipedia.org/wiki/TGFB1
http://en.wikipedia.org/wiki/PPARG    
http://en.wikipedia.org/wiki/AKT1    
http://en.wikipedia.org/wiki/MAPK1

In addition, for the remaining 25 genes for which WP pages did previously exist, we could certainly update these pages with our new gene infobox in a semi-automated way.  For illustration, I've updated two of them:

http://en.wikipedia.org/wiki/Apolipoprotein_E
http://en.wikipedia.org/wiki/Amyloid_precursor_protein

I expect that we'll get full bot approval at WP in the next couple of weeks, and start generating these stubs and enhancing pages within a month.  Seems like this is a good time to revisit whether a parallel effort at CZ would be desirable.  If yes, then the two issues that I see need to be resolved:

  • CZ license:  this has been previously discussed, but to summarize briefly... I work at a for-profit company, and one part of what my small group does is host a free and public web site (http://symatlas.gnf.org) for gene annotation and expression data.  We'd like to incorporate any user-contributed content on WP/CZ back into our SymAtlas portal, but if any of the noncommercial licenses apply to these gene stubs, that'd pretty much be a deal-breaker here.
  • Bot policy: still don't see any sort of bot policy on CZ.  We'd need to test if our WP bot would work on CZ.  Also, there was previous talk of running bots from CZ servers.  We'd need to figure out that arrangement.

Feedback welcome.  Cheers,
-andrew
Logged

Andrew Su
New Arrival
*
Posts: 11


« Reply #20 on: August 21, 2007, 01:33:05 PM »

FYI, I've added a sample cluster at [[APP]] (http://en.citizendium.org/wiki/APP).  comments welcome...

-andrew
Logged

Stephen Ewen
Guest
« Reply #21 on: September 04, 2007, 08:00:08 PM »

To counterbalance this, the first bot we allow is to import the articles from 1911 Britannica. The article texts are in the public domain and will always be. It should be easy to find the Public Domain version that was uploaded to wikipedia or via other websites. Legal precedent says that a copy of a work that is in the public domain is in the public domain. Like say a picture of a page in the 1911 Britannica. That also makes http://www.1911encyclopedia.org/ fair game, contrary to whatever the policy in their disclaimers and terms attempts to limit. 

-Jason Potkanski

I'd urge serious caution here. 

In my understanding, while you can use the 1911 EB anyway you want, that does not mean you may obtain it anywhere you want.  It is not as simple as matters only related to copyright or lack thereof.

The "terms of use" at www.1911encyclopedia.org are essentially a contract between any user of the site and LoveToKnow Corp. Inc., the provider of the site, and they could file suit on that basis.

This part of the terms should especially cause serious pause: "You may not access our networks, computers, or Contents in any manner that could damage, disable, overburden, or impair them", which is clearly possible by the activity of a b*t. 

By merely entering www.1911encyclopedia.org, users are bound by whatever parts of its terms of use, its contract, that will stand up in court. 

  • CZ license: this has been previously discussed, but to summarize briefly... I work at a for-profit company, and one part of what my small group does is host a free and public web site (http://symatlas.gnf.org) for gene annotation and expression data.  We'd like to incorporate any user-contributed content on WP/CZ back into our SymAtlas portal, but if any of the noncommercial licenses apply to these gene stubs, that'd pretty much be a deal-breaker here.

Feedback welcome.  Cheers,
-andrew

All Creative Commons non-commercial licenses allow the copyright holder to give permission to use the work commercially. I'd think declining you that would be the actual deal-breaker, and that such a decline would be simply unthinkable.
« Last Edit: September 08, 2007, 12:24:19 PM by Stephen Ewen » Logged
Chris Day
Forum Regular
****
Posts: 1024



« Reply #22 on: January 15, 2008, 12:44:38 PM »


Just an update. This project is still going ahead on wikipedia.  I saw this following blog recently.

http://mndoci.com/blog/2008/01/13/the-genes-wiki-project/
Logged

Pages: 1 [2]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.8 | SMF © 2006-2008, Simple Machines LLC Valid XHTML 1.0! Valid CSS!