Monday, August 18, 2014

#Twitter - #WikiParliaments.. but what about #Wikidata and #Austria?

Twitter advertised several things that I might like. WikiParliaments could be one of them. Today I learned that Othmar Tödling died. He was a member of the "Nationalrat" of Austria. As such he might be very much of interest to WikiParliaments.

Politicians are human too; they die. When they do, it is often noted in a category what function they held. Today I started adding statements for those humans who hold or held the function of parliamentarian in Austria.

My hope is that people who care about parliaments will make it even prettier and embellish them with even more statements and qualifiers.

Sunday, August 17, 2014

#MediaWiki - #MediaViewer rehashed

Some things are plain stupid, sometimes I am and sometimes someone else is. I filed a bug about my experience of the MediaViewer. For me it is a show stopper; it prevents me from using it easily.

The problem is that Chrome shows a really awful URL for an image with funny characters in its title. When I look at it using the MediaViewer it is bad but it looks fine when I look at it from the Commons page.
  • File:%C3%89cole_normale_sup%C3%A9rieure_de_Paris,_26_January_2013.jpg
  • File:École normale supérieure de Paris, 26 January 2013.jpg
According to the Bugzilla triage I must be stupid because it works; it complies with specifications and, indeed technically it works. It just stopped working for me.

Several reactions are possible. My choice was to shrug, mutter "it is the user experience stupid" and I got on with my life. Others find it a precursor to the invasion of an evil overlord who does not understand the world and prepare for war.

By filing a bug, by posting this blog I have rid myself of my frustrations. I know several developers; I met many of them at Wikimania and I know they are really dedicated and mean well. I also know that such things pass. I am sure someone will see the light or Google will fix Chrome (if that is where the bug lives). In the end I do not look at images that often as a result.

#Wikidata - sources or confidence

At this time Wikidata has more than 36,396,372 statements these statements are associated with some 15,335,451 items. The majority of these items have less than five statements and even worse for many items it is not known what they are about.

When you consider the quality of this data, there are two schools of thought. There are those who insist on sources with every statement and, there are those who have confidence in the validity of the data because they know where it came from.

Either way, when you want to assert that a specific approach is superior, it becomes a numbers game and, understanding the relative merits is what it is all about. When something is sourced, you can be confident that it is highly probable at the time of the sourcing. There is however no certainty that the data remains stable. Confidence can be maintained by regularly comparing the data with what the source has to say.

When the data is regularly compared, it does not matter that much if Wikidata has source information itself. The source is typically one of the Wikipedias and they are said to have sources, this may provide us with enough reasons for confidence. The comparison of data increases this confidence particularly when multiple sources prove to be in agreement.

Practically, the basic building blocks to start comparing exist. It has been done before by Amir and he produced long lists of differences. Three things are needed to establish new best practices:
  • a well defined place needs to found where such reports may be found
  • communities need to understand that it raises confidence in their project

#Wikidata - giving a #category an application

Many #Wikimedia categories have interlanguage links. Obviously the content of all these linked categories do not have the same content. Someone has to add the articles, sometimes it gets done and sometimes it doesn't. Often articles just do not exist.

When the facts that are implicit in what a category is about make it to all the items in all the categories, typically you have a superset in Wikidata. It does not stop there; items in Wikidata may be included that are not in any of those linked categories.

This is all theoretical unless ... unless you can query Wikidata and use the results. Much data has been added to Wikidata based on the content of categories and queries have been used to identify missing items this is done using AutoList2. This is one application; it is used by some of the "advanced" users of Wikidata.

What is even more interesting is showing what Wikidata things should be in a category. This is done using Reasonator. At this time for over 690 categories statements are included that define a query. This query is already complex enough that the Wikidata functionality will not be able to express the results..

These queries could be of use to "advanced" Wikipedians because it is a basis for identifying articles that have not been categorised or articles that still need to be written in their Wikipedia. For everyone else it is just interesting; this information exists and it is readily available. It is one way of learning that Wikidata knows for instance about 121,922 politicians.

Saturday, August 16, 2014

#Wikidata - application for its long tail

When Lauren Bacall died this week, it was all over in the news. When Marjorie Stapp died on June 2, 2014 it was noted in the English Wikipedia only yesterday. Today it is known to Wikidata and, several bits of information where added to the item about Mrs Stapp as well.

Among those statements is her identifier in the IMDB. The IMDB does not know yet about the demise of Mrs Stapp and it is not unlikely that there are more actors and actresses we know about that have died. Providing external sources like the IMDB with an RSS feed of the changes that are made in Wikidata is not hard.

When we share our information in this way, we gain friends. With these new friends we may do friendly things like noting differences between the data that we hold. Equally important, we add a reason why people might maintain the data that is in Wikidata. As our data gains in application, we will grow and diversify our community.

Thursday, August 14, 2014

#Wikimedia - the quality of access to the sum of all human knowledge

Again, a big flare up of "we the community" demand this and that. Again what Wikipedia, the Wikimedia Foundation is about is conveniently forgotten. At Wikimania there was a really interesting presentation by Raph Koster author of a "Theory of Fun for Game Design". Well recommended once it is available for viewing..

An abstraction of the current huha is in there and this community is described as the monsters who rule it all (my words, his pictures). These people who impose their world on others have forgotten what the game is about. It is about providing access to the sum of all knowledge. From that perspective their issues with the multimedia viewer are hardly significant compared with the increased ease for people who just access the parts of human knowledge we do give access to.

My pet example of "the community" not caring about providing access to our available knowledge is in the decision that easy and obvious access to fonts adds clutter to the user interface and is therefore not acceptable... About seven percent of a population is dyslexic and it is extremely hard to find and enable the OpenDyslexic font. It took a MediaWiki developer over two minutes and he enabled it in a way I did not know existed... He knew it existed, he knew the name of the font. This demonstrates how relevant seven percent of our reader population to our community is.

Should we primarily care about access or is it a playground for monsters?

#Wikidata - It ain't got a thing

A rose, a rose, is a rose by any other name as beautiful.. Eh actually people are quite smart and know a rose when they see one. Machines need to be told what is a rose.

Wikidata has this requirement of being usable by machines. So we need to know what thing a thing is and for all humans it needs to be stated that all of them are considered human.

Several high powered people at Wikimania expressed the opinion that for Wikidata to get in full swing, we have to identify every thing.

I have identified a few hundred "list articles". Items that start with "List of " or "Member of " for instance. I have identified a lot of "group of people" who were supposed to be born in the XXth century.

At Wikidata a thing is bad. We cannot safely select it, we cannot auto describe it. We should get rid of every thing.

Saturday, August 09, 2014

#Wikidata - Dear Lila, it is all about the application

Three months in the job, Lila did an analysis of where we are with our projects. The way she brought it was very much traditional; Wikipedia and English Wikipedia at that. The challenges were not that traditional; much of the public will be mobile and they will not be where they are today.

Another requirement is that all the new people need to be able to contribute. Removing the existing road blocks is absolutely necessary..

When people are to contribute, they have to have a reason to contribute. They will need to benefit from the effort. This year Commons will be wikidatafied and it will become possible to search in multiple languages. The Amnesty International community may add the people on their watch list to Wikidata. In this way what we do in Wikidata gets more of an application.

When we start thinking in terms of how people will be able to use the data we have in store for them, we will find more contributors. Their data will become better connected. The value of our data will increase and we will realise the aspiration of more people in more countries being involved in what we do. We will not only share in the sum of our available information they will put it to use for us.

Friday, August 08, 2014

Dear #Wikipedia, they are not what we call a "human"

At #Wikidata it hurts when you are cheating. For us a human is singular; he or she has a date of birth, maybe a date of death and that is what we expect to find in "20th-century births" and all its subcategories.

We could argue that a horse, a cat or a dog has a date of birth as well but really, Scott Alexander and Larry Karaszewski for instance is not one human or singular. Together they do not have a date of birth, they have two.

Because of the problems articles like the one about Scott and Larry generate, we put them on a "black list". We make them a "group of people". In this way we will not consider them for all kinds of subsequent statements. We will not make them an alumni or give them an occupation. That is reserved for humans.

Thursday, August 07, 2014

#Wikimania - Mr Salil Shetty, #Amnesty International

Mr Shetty spoke at Wikimania 2014. He explained how much Amnesty International and the Wikimedia Foundation have in common. He did a good job and many of the people at the conference proved to have been involved with campaigns of Amnesty in the past. One of the things it does is ask for international attention for people who are in trouble, they are often in jail, they have been tortured and, as the record shows attention helps.

What we could do at Wikidata is decide that all the people who need attention for reasons valid to Amnesty International are notable enough to have an item. We would have all the notable information about these people, this include their profession and the fact that AI considers them at risk and links to more information for instance at the website of AI. This provides the basic information when people decide that there are too many reasons to write a Wikipedia article.

All too often the tipping point for writing an article is the death of a person on a list like this. My hope is that there will be few of these occasions when an article gets actually written.