Wikipedia does have this whole architecture with replicated slave servers (which is how they do their secondary clusters, IIRC - haven't checked their layout recently), and a further level of caching servers (squids). I wonder if it wouldn't be possible to plug into that system at some level
Are you talking physically plugging in? That was a suggestion on the technical boards - FedEx HDDs back and forth every week. Because I don't think the issue is the traffic we'd potentially generate (on the CPU end) but the bandwidth costs.
That RSS feed of recent changes is pretty attractive though.
Hi, I went back and looked and refreshed my memory of how the Wikipedia distant sites (e.g. in the Netherlands) actually work (well, according to the latest stuff anyone wrote down :-), and it turns out they don't actually have database slaves there, just Squids. (Although if I understand how their system works, they could have more than just Squids there; it seems to me they could have either just some Apache servers, or Apaches and a slave server, to feed the Apaches. But perhaps someone has done the numbers and just Squids are a better solution?
Also, I understand your point about the bandwidth, and I've also looked at "Wikipedia:Mirrors_and_forks" on Wikipedia, which says:
remote loading .. is an unacceptable use of Wikimedia server resources. Even remote loading websites with little legitimate traffic can generate significant load on our servers
I understand and take to heart the point about load/bandwidth. With that in mind, I was wondering if the following would be acceptable to the Wikimedia crew: configure some set of Citizendium machine(s) to be Apaches (and probably Squids too, eventually); to Wikipedia, these would look and act just like part of the Wikipedia server pool. For articles where Citizendium doesn't have a "native" article, Citizendium would serve up the current Wikipedia article, using the local cache of the Wikipedia content.
This should be entirely acceptable, on the load/bandwidth front: my reasoning is that if Citizenium didn't exist, all
of the queries to Citizendium would wind up at Wikipedia, and their servers would have to handle them
. This way, they'd actually get less
traffic, because Citizendium's pool of Apaches (and later Squids) would be intercepting and handling some of the queries for them.
Looking at this another way, if Citizendium does that, and can do a good job of caching, would the pages Citizendium does
load from them really take more
bandwidth than either i) an RSS feed, or ii) reloading the entire database on a regular basis? Either one of those is going to be sending Citizendium articles (and updates) that it probably never needs, isn't there a good chance that that would require more bandwidth than the bandwidth needed to support what is, effectively, a pool of pseudo-Wikipedia Apache and Squid servers?
Am I reasoning correctly, or am I missing something (probably something obvious :-)?