Thursday, April 24, 2014

Adrianne.. "we can edit"

Adrianne is one of the few Wikipedians who has her own articles and, deservedly so. It is sad that she had to die to get them and it is sad to see that some do not recognise her notability and want her article deleted. They cite policy. It used to be policy to "ignore all rules"..

Never mind, but I do. Given that someone who played just once a professional game of soccer is "notable" enough for an article is proof enough for me that Wikipedia has some dumb notions of notability.

To honour Adrianne editathons are organised and everybody is invited to join. To quote the convocation:
Her work is recognized internationally as helping to encourage more women to contribute to Wikipedia to tackle the gender gap and systemic bias in its content. Wadewitz was one of the first academics to bring Wikipedia into the classroom as part of the Wikipedia Education Program, working with her students to improve Wikipedia instead of writing traditional term papers. 
As anyone can join, we have done some work adding more humans to Wikidata and indicating their sexes. As Wikidata gains more data, its data will reflect more accurately what the sex ratio is for any given Wikipedia. There are 1.45% more humans, 1.13% more men and 1.15% more women since last 
Sunday. It was suggested to harvest information from the Frau and Mann category from the German Wikipedia and, they prove to be a rich resource of data.

Obviously only knowing the gender of a human is not really interesting. What is interesting is to know the number of **Whatever** award for its sex ratio or the number of professionals in a field.  Or the sex ratio of professors from 1960 to 1970.. At this time Wikidata does not have enough data to have a clue. As it gains more data, the results slowly but surely become statistically significant. I am sure some statisticians are able to say when Wikidata has enough data.

I am sure Adrianne would applaud this development.

Wednesday, April 23, 2014

#Wikidata - More 2014 deaths

When you are interested what notable people died in 2014, you will find that Wikidata is more complete than any Wikipedia. The ToolScript tool provides a wonderful opportunity to add even more people to this list.

Sadly for many people recognition at being notable is left for the moment when they die. Then again, it is a perfect moment to write a Wikipedia article as an obituary. Magnus wrote yet another script that creates an item for the articles that did not have one yet.

This is the script that finds all the items that are in need of some attention..
all_items = ts.getNewList('','wikidata');
cat = ts.getNewList('it','wikipedia').addPage('Category:Morti nel 2014').getWikidataItems().loadWikidataInfo();
$.each ( cat.pages[0].wd.sitelinks , function ( site , sitelink ) {
  var s = ts.getNewList(site).addPage(sitelink.title); // Page list for that site, with category
  if ( s.pages[0].page_namespace != 14 ) return ; // Not a category
  var items = ts.categorytree({language:s.language,project:s.project,root:s.pages[0].page_title,redirects:'none'}).getWikidataItems().hasProperty("P570",false);
  all_items = all_items.join(items);
} );
Yes, ToolScript comes with documentation :)

#Wikidata - Mr Ramashankar Rajbhar, member of the Lok Sabha

The elections in India are underway. They will select representatives for the constituencies for the Lok Sabha. Mr Rajbhar is an incumbent and, he represents Salempur in Uttar Pradesh.

When people are to vote, it makes sense to know who they vote for. Once the results of the elections are declared, it helps when information is available about the people who represent them.

As you may know, it takes considerable effort to write articles and for many if not all representatives there is no article in all the languages of India. For Mr Rajbhar there is only an article in English, there are no labels in any of the languages of India.

When labels are added, these politicians can be found in their own language in Wikipedia. It is a first step. To know who represents a district, someone has to add this information. Most districts are known at Wikidata, add labels and they can be found as well. At statements about who represent the district and, people will know who to write to.

Tuesday, April 22, 2014

#Wikipedia - the search for ఈలా గాంధీ in Telugu

Mrs Ela Gandhi or ఈలా గాంధీ in Telugu does not have an article yet in the Telugu Wikipedia. This does not mean that she is not of interest to the people who seek information about her.

Now that the Telugu Wikipedia has added this one line in its common.js, it is at the bottom, people will be able to find her when they search for her. At the bottom you find the search results from Wikidata.

I think Mrs Gandhi already looks smashing in the Reasonator in Telugu :) What is missing is a label for one of political parties of South Africa.

#Export of a #Wikidata #query

AutoList is the tool where you can operate on the results of a query based on Wikidata data.

It is quite magical that you can now actually download a file with Wikidata data. In this example, you find 22 of the more than 3000 people who Wikidata knows to have died in 2014.

Downloading the data is a new feature that was created so that people do not have to copy and paste values to a spreadsheet any more. No more typo's and no more copy errors but best of all it is so convenient.

#Wikidata - Rulan Chao Pian and the Otto Kinkeldey Award

Mrs Rulan Chao Pian was a distinguished musicologist. She was one of the first female professors at Harvard University and, she received the Otto Kinkeldey award in 1968.

Currently there is only one recipient of the Otto Kinkeldey award. There is no Wikipedia article for it and finding more recipients is not easy. Only 20 Harvard University employees are known; obviously calculating a sex ratio based on these numbers gets you an irrelevant number.

What Wikidata needs are people who are eager to provide us with the data they care for. When Sarah Stierch asks "are there lists of things I could work on", my answer is: "work on the things you know, the things you love and even, the things you get paid for".

The beauty of Wikidata is that as long as statements are verifiable, it does not matter much who adds them. I have no problem with a Harvard intern adding its professors as "employees" and adding pertinent information like when they joined Harvard or became tenured professors. I welcome people from the American Musicological Society when they add all the winners of its awards. Obviously,the Russian, the Chinese and the Indonesian counterparts of the AMS are equally welcome.

When you are involved in a certain field, please make sure that your field is well presented. With tools like Reasonator, WDQ and AutoList you can find how well it is represented at any time. When you find that additional properties are needed in Wikidata, we can discuss this. But do get involved.

Monday, April 21, 2014

#Reasonator - the premier of New South Wales

When you consider #quality, there are many approaches to it. When Mr Neville Wran died, it was the right moment to mark everyone who held the office of premier of New South Wales as such. That is the quantitative approach to quality.

Making sure that there are pictures with places he is associated with or indicating who preceded and succeeded him provide more in depth information and that is a quality as well.

It is good to pay some attention to the incumbent of a function. Mr Baird assumed the office on April 17th.. The information about him got some tender love and care.

Funny is the text associated with his predecessor; "43rd and current Premier of New South Wales".. It demonstrates the problem with fixed text nicely. With automated descriptions that is not much of a problem; they do the trick in any language given enough labels.

Sunday, April 20, 2014

#Wikidata - its sex ratio

In a perfect world, Wikidata knows the sex for each person where Wikipedia has pertinent information; every Wikipedia. In a perfect world you query Wikidata for the sex ratio of each Wikipedia.

As we know, the world is not perfect; Wikidata currently knows about 1,332,383 "humans"  760,616 are male and 154,455 are female. This makes for 57% males,  12% females and 31% unknowns. Many items still need to be identified as human as well.

With a selection like the 12,800 known Harvard alumni, we find that there are 5,359 males and 840 females. This is 42% male, 7% female and 51% unknown. Before we compiled these numbers missing items were created for each known alumni and all of them were made human and a Harvard alumni as well.

The problem Wikidata faces is not only with the under representation of women, the problem is with the lack of data about the gender of known humans. The nice thing about statistics is that now that we have some numbers, we can track how Wikidata evolves in its information about the sexes.

Saturday, April 19, 2014

#Sources for causes

What to do when #Wikidata tells you there is a problem? Keep calm and cite your sources.

For a year now, people have been pouring data in to Wikidata and, there is a lot of it. This data is coming from many places; among them all the Wikipedias. They do not necessarily agree on everything.

One area where it particularly makes sense to cooperate are the recently departed. Many of the people who were notable enough for an article are old and, they die. They die in droves.

As some people are described in several languages, you may find that those other Wikipedians knew about it first. So you may learn about even more deaths in the ranks. Another thing that happens is that people enter a different data... OOPS...

This is where you keep calm and cite your sources. People only die once, so this is the time to be assertive about your sources.

Wikidata is at this time happy when you sort it out, get it right and update its data accordingly. Adding sources is really appreciated but at this time we are mostly happy when you concur that we have the same data.

#Wikidata - Heroes of the Soviet Union

On the Russian Wikipedia, there are 10898 entries in the category for heroes of the Soviet Union. Only 9740 of them have a Wikidata item. With the Creator tool it is easy to add the missing 1158 items. It gently adds them one at a time.

Adding statements for over 10.000 heroes is a bit too much for the AutoList tool. There are several edits to make. First, all the people are a human and then they have to receive the recognition in Wikidata for the hero they are.

It is much better to use a bot for this. What clinges it is that many people on the Russian Wikipedia have a template with much more information than just this one award. Things like dates of birth and death, places of birth and death. Other awards they have received..

The Russian Wikipedia is a really rich resource and it will be wonderful when more of its information is reflected in Wikidata.

#Wikidata - Eli Saslow, Pullitzer prize and George Polk award winner

No #Wikipedia article for Mr Saslow yet though. Some work is done on the George Polk award and, it was found that among many others, Mr Saslow was missing.

To demonstrate the potential for quality of Wikidata, missing winners were added. Some of the issues that were found were:

  • people do not have an article
  • people are not part of the George Polk awards recipients category
  • people do not have a Wikidata item
  • some of the recipients are not people
Mr Saslow is a great example of a person who you would expect to have a Wikipedia article. But given the way the community works, he will get one once someone feels the need to write it.

When journalism and sharing information is important for you, consider this: the Pulitzer Prize for Explanatory Reporting currently has only one recipient.. Mr Saslow. His alma mater has three alumni, two more were added for him not to be alone. His employer, a major quality newspaper, has one employee .. 

But still, the fact that Wikidata does know these things demonstrates that its quality is improving.

Friday, April 18, 2014

#Wikidata - awards and politics

Edward Snowden received the Ridenhour Truth-Telling Prize. For some Mr Snowden and the Ridenhour prize may be controversial. However, it does not mean that they are irrelevant or not notable. The award has been added to Wikidata together with many of its recipients.

Several of the recipients do not have a Wikipedia article and that is fine.. This may change. The Ridenhour prize was named after a Mr Ridenhour. He was a journalist and he received the George Polk Prize. This was not obvious because on the article there was no reference to the category. This has been remedied.

The world is an imperfect place and we can improve it by cherishing the people who matter. By stating the obvious, by sharing in the sum of all knowledge.

PS This is the George Polk Award Recipients category and, this is its Reasonator entry.

#Reasonator - #Taxonomy, picture this

When you are are on a train, a bit bored, it helps when your railway company provides you with free Wifi, mine does. It is the perfect setting to add pictures of species to Wikidata. To do this the latest tool by Magnus is really good.

The "Wikidata species images on Commons" is the perfect companion for those idle moments. All you do is look at pictures and decide what pictures shows off a species best. When you do not like any of them, that is fine too. You also have the option to add range maps for a species.

The result of all this can be experienced in the Reasonator.