Wednesday, October 29, 2014

#Wikimedia - Men at work; preparing a #presentation IV - #WCN2014

The Dutch community has one question to answer: what to do with available information in Dutch? How will we make it available. Currently there are 3,054,955 items [1] with labels and there are 1,890,905 items [1] that link to the Dutch Wikipedia. It follows that 62% of the in Wikidata known items do not have an article in Dutch.

This is a substantial amount of information that can be presented in Dutch. Similar numbers can be presented for any language; for English it is 39% and for German 121%..

Arguably, these items fulfill notability requirements somewhere. Arguably the Swedes have demonstrated that having more information available revitalised their community. Arguably, allowing for search results from Wikidata is an easy first step towards opening up all our available knowledge.

[1] these links take a few minutes to load; they provide real time information

Tuesday, October 28, 2014

#Wikidata - #algorithm for updating labels

Amir is the #pywikibot guru; he runs dexbot and it is the only bot with more than 20.000.000 edits. Amir regularly tinkers with the routines that he uses. Sometimes he gets better performance, sometimes he gets a better result.

The algorithm for adding labels has changed several times and, the result of the latest change can be seen in the statistics below. You may notice several spikes, the last one is captured in the last dump; it resulted in many more labels for items where already one label existed.
It is people like Amir qho make a real difference. One bot request of his for Commons will help the Commoners see that Wikidata knows about the people mentioned in the Creator templates. Jobs like this are essential when the wikidatification of mediafiles is to succeed.

#Wikimedia - Men at work; preparing a #presentation III - #WCN2014

The bane of every live demonstration is when the software just does not work. My intention is to show #Wikidata in action. Demonstrate the Reasonator and AutoList2. When the experience of the last few weeks is anything to go by, I have a 50% chance of a reasonable result on the day.

There are many factors that can play up. Time outs at Wikidata are no exception at the moment and when Wikidata does not play ball, everything downstream from it suffers as a consequence. It means that I may not have a recent list of recent deaths because ToolScript does not function.

AutoList2, relies on WIDaR. It relies on being able to contact Wikidata reliably. Without this, AutoList2 does not run.

The subject of my presentation is firmly solution oriented. I can always fall back on screenshots. That feels like cheating.

Monday, October 27, 2014

#Wikidata - #dead in #2014

A milestone is often a reason for celebration. Wikidata now knows about more than 10.000 people who died in 2014. This is more than is known for all of 2013 in Wikidata but we "know" about 4292 more people who died in 2013. For 2014 the death of some 329 humans is waiting to be registered and obviously there are two more months to go.

People wonder what the attraction is, killing people of. Registering a death is not nice; it is only worthwhile because of the potential it has:
  • Reasonator displays the latest information
  • Wikipedias can compare what it knows and what Wikidata knows
  • External sources can compare what they know and what we know
  • It can trigger attention for the people who died
It takes time for such effects to be realised..

#Wikimedia - Men at work; preparing a #presentation II - #WCN2014

Mr van Asselt was a prominent professor at the Utrecht University. He is one of many professors known to Wikipedia. Given that I regularly harvest data from categories, it makes sense for me to use the English Wikipedia as it has an article about Mr van Asselt.

The equivalent category on the Dutch Wikipedia knows about more faculty members and then there are categories in several other languages as well. All of them may know about even more faculty members.

As we aim to share the "sum of all available knowledge" with our readers, Mr van Asselt is a timely reminder to the audience of the Dutch Wikimedia conference that no Wikipedia does know it all.

#Wikimedia - Men at work; preparing a #presentation I - #WCN2014

This saturday I will present about #Wikidata at the annual conference of the Dutch Wikimedia chapter. As I have a day job too, I have started preparing. I want my presentation to be factual, challenging and inspiring.

The facts are simple; Wikidata is almost two years old. It started with incorporating all interwiki links. The development team is really small, it does an awesome job and typically Wikidata is available, responsive and up to the job. The ambitions are huge; the challenge is to add to the existing work load while keeping the ship afloat.

If there is to be a challenge in my presentation, it will be that our aim is "to share in the sum of all knowledge". Our aim should be to share all the knowledge we have available to us with our readers. At this time only a few Wikipedias go the extra mile and will inform that we have information available in one of our other projects. This is done by adding results from searching Wikidata and showing as much text as is available in the local language.

One challenge is to do this for the Dutch Wikipedia as well.

Saturday, October 25, 2014

#Wikipedia - The Manley-O.-Hudson medal

One of the recipients of the Manley-O.-Hudson medal died. The article prominently mentions that Mr Lowenfeld was a recipient and it refers to the article about the award where all the recipients are mentioned. Both articles only exists in German.

Wonderful news is that Magnus did it again; his Linked Items allowed me to associate many humans with this award.

When you consider international laws as being important, all the recipients of this award are important. A great reason to have at least the basic information available in any language.. including English.

#Wikidata - #vaccines

For Wikidata, those items that are not known to be "something" are the worst. There are many of them; the last processable dump had some 3,758,186 items without any statement. Injecting them with a healthy dose of substance makes it easier to process them.

As people increasingly read about ebola, vaccines developed for ebola gain attention as well. In Wikidata they are now known as vaccines. I have no clue how to indicate that a vaccine is intended for a specific condition like whooping cough, measles or ebola.

PS I loved the cartoon produced by the "Anti Vaccine Society". Do note the golden cow in the picture. :)

Thursday, October 23, 2014

#Wikipedia - One size does not fit all

In Wikipedia we are used to see our readers as one big group. They all read the same article, they all get the same info-boxes and they all get the same categories. It is a reasonable approach when Wikipedia is only a pile of text without data to separate out potential differences in interest.

One obvious consequence is that reasonable expectations decide what is shown and what it looks like. When there are too many categories, they no longer get attention. So what categories should be shown? The problem is that this "one size fits all" approach shows too much for some and too little for others.

Thanks to Wikidata it is possible to allow for preferences. For many categories Wikidata knows what they are about; they show for instance humans and their alma mater, their sports club, their gender... When our public has the option to choose what category of category they are interested in, there is no longer a "need" to choose what categories to keep. It is just a matter of making the choice what categories to show by default.

Any and all other category of categories are then selectable by the reader.

Tuesday, October 21, 2014

#Wikidata - Thank you Magnus

Mr A. H. Halsey is the first person who can be put to rest now that the ToolScript works again. Mr Halsey was a sociologist, he died 14 October 2014.

Thank you Magnus, you are wonderful.

Monday, October 20, 2014

#Charkop - a Vidhan Sabha constituency

Data about politics, politicians regularly finds its ways to Wikidata. When an item gets my attention, I often add all associated items to Wikidata as well. Charkop is a consistency in Maharashtra according to an associated category there are many more.

Given that the software I use is broken at this time, I can blog about one dilemma.

Charkop is a Vidhan Sabha constituency it is part of the Mumbai North Lok Sabha constituency. The question is if Charkop "is in the administrative territorial entity" of Mumbar North or Maharashtra.

#Google - Let us #share in the sum of all #knowledge

Dear Google, in our own ways, we share the aspiration to share in the sum of all knowledge. We are really happy to share everything we have with you. Our licenses are designed to share widely.

Dear Google, could you please help us make sure that our Labs webservices survive your bots? What we do not want is for your bots not to run. What we want is for our webservers to serve our own needs first and use all the spare capacity for you. As it is our software dies.

We really want you to have our data and, there are several other ways whereby you can get all out data any way. For this reason please help us with our software so that we can continue to share the sum of all our available knowledge with you.