Saturday, November 26, 2016

The problem with #science explained with #Wikipedia

It is a recurring theme. People study a subject and reality is different. The science is flawless, the results are impressive and indeed important strides are made forward. The study of heart disease is a great example; many studies resulted in an improved life expectancy for men. Particularly white men. The Dutch Hartstichting is raising funds for new research because of this existing bias in research. For women in the Netherlands, heart disease is the number one killer because heart disease is different in women; it was not noticed before because heart disease in women was not studied.

Wikipedia as it is commonly known in research has the same problem. It is not Wikipedia as we know it, it is English Wikipedia. My contributions to Wikipedia have not been to English Wikipedia; they went to the Dutch Wikipedia and I will not be noticed as one of the most prolific contributors to Wikimedia projects because my contributions to "Wikipedia" are hardly significant..

As I blogged before; scientific papers do not publish when it does not involve English Wikipedia. The consequence is that when people quote research, their quotes include this bias and strictly speaking it is not necessarily true when you consider Wikipedia. The problem with biased research is that the policies of the WMF are based on the known "facts".

Nothing new so far. We all know it when we are honest. So what can we do to remove some of the bias? The first thing is to devalue any and all research that is English Wikipedia only. It only covers less than half of what we do.The second thing is to evaluate research for its algorithms. When both the algorithms and the data are available, it is possible to run the algorithm on a more inclusive data set and check the validity. With the quality of Wikidata data as a source on all the Wikipedias improving, such an approach is increasingly feasible. The last thing is for the Wikimedia Foundation itself to address this bias, With English Wikipedia being less than 50% of its traffic and workflow, it would be good when a similar percentage of its efforts is focused on the bigger half of what we all do.

So what is the harm? We expect all Wikipedians largely to do what "Wikipedians" do. However, we are not all English Wikipedians. The need other people have is not discussed, not taken seriously. We have seen wonderful examples of potential functionality showcased but it is not taken further, not taken in production because it does not fit the preconceived ideas of what we do, it is not part of the road map. The projects in Wikidata are not about Wikidata but about how to make us all in one big data glob and USING the data is only seen in relation to Wikipedia articles. We do not know how much Wikidata is used, some studies are done but they are in relation to "Wikipedia" and that is not relevant to me. We find that Wikisource gains more and more content that may be valuable to our readers but we do not market this data because we never did marketing for Wikipedia. There are several websites that only do this in a way that could be much improved if we took Wikisource seriously.

It hurts us to only consider English Wikipedia and this bias in research and policy is more damaging than the bias that is considered by the English Wikipedians.

Wednesday, November 23, 2016

#Bias in #research

Actually, it starts with something else. You need to publish so you have to select a subject to study that will be of interest to the publisher..

As a consequence hardly any research is done about the other Wikipedias. I have been informed by a reliable source that it has to be English or it will not be published.

Now Wikimedia Foundation, how about that? Is there any research done on Wikipedia or is all the research biased in this way?

Tuesday, November 01, 2016

#Wikidata year 4; What Gupta year is that?

Wikidata is celebrating its fourth birthday. It is celebrated by some mighty fine gifts. It is a time to reflect on what has gone before and what is ahead of us. Obviously there are challenges we face and my gift are some queries / questions I do not know how to address. I focus on the Gupta empire because it currently has my interest.

During the era of the Gupta empire there was a "Gupta year". An article refers to it and my first question is: what date would the birthdate of Wikidata be in Gupta years?

Obviously there are many maps including the Gupta empire, Can I have them sorted by date please? What other countries border the Gupta empire? Who were its rulers and how does the map change over time?

To get answers is nice but for me it is important that the algorithms involved are relevant to any country old and new. Relevant to timelines old and new. When we can express dates in the "Year Gupta", we can check if dates in Wikidata are indeed Julian or maybe Gregorian..

When we have continuance in maps over time, we will know if a location, a city for instance or the land of a tribe is part of what country; what culture.

Wikidata live long and prosper :)

Saturday, October 29, 2016

#Wikidata - Queen Kumaradevi

Queen Kumaradevi was married to Chandragupta I. According to Wikipedia she was of the Licchavi clan. The coin shows her with her husband on a coin minted by their son.

When you read Wikipedia, you will read about daughters of kings married off to nobility. They paint a picture of alliances, their marriages often meant some stability in an often brutal world.

When you are interested in such things, western nobility is well documented. Not so for nobility of India. I have added lately a series of maharajahs, kings and emperors and am every time amazed that nobody beat me to it. I often document who was related to who and often find missing links documented and add items for them. Regularly the missing links are implied but miss a generation.

I am sure of one thing; India has its fair share of people who know and care about such things. How do we get them interested, how do we get proper information about all this in Wikidata?

Sunday, October 23, 2016

Kigeli V, Mwami of Rwanda

Kigeli was the last ruling Mwami of Rwanda. He died October 16.

When a last ruler dies, it follows that there are previous rulers and, there is a lot that is of interest in the history of the mwamis. His father for instance was deposed because he refused to become catholic.

I have added the rule of several mwamis to Wikidata because such basic information is often lacking. Wikipedia articles are often stubs at best and sources are often absent.

Typically a monarch is part of a dynasty. With a new dynasty it represents often a new family but certainly a change that makes for it to be recognised as such. The article on the kingdom of Rwanda describes the role of the mothers of a king. They are yet unknown to us and consequently a lot of relevant information is missing.

When you see all those red links, it is obvious that significant red links exist in any language. When they are linked to Wikidata, information like the follow up as ruler and who is related to who becomes a task that can be done once and be done well. It is one way to emancipate information that has been of little concern to Wikipedias.

Saturday, October 22, 2016

#Wikidata - statements are doing fine

In September there are more Wikidata items with 10 or more statements than items with no statements. Wikidata is growing up.

Thursday, September 29, 2016


I read an article, I found what was written astounding and signalled that I had to read it again to really understand what is said and what it implies. The article was published in a quality newspaper; the Independent. The reply that I got was: "Indeed. And it's Fisk, so you can't just pretend it is an obscure journalist talking about something that may have happened..."

As I did not know Robert Fisk, I looked him up. I checked his Wikipedia article and found that he has indeed a reputation that is really good. He received many more rewards than was known at Wikidata so I added several and it is fun to establish the quality of its sources. For the Lannan Cultural Freedom Prize the Lannan website says it all. It is linked on the item for the award and that should suffice. For the Amnesty International UK Media Award it is not so obvious. It is conferred by te UK branch of Amnesty International and it has no dedicated page for the award. I added the award, the chapter and had a look at the pages for the award ceremony for each year. These Wikipedia articles refer to webpages that no longer exist.

For the Lannan Cultural Freedom Prize I added the other recipients because it gives some insight in the relevance of the award. I did not do this for the Martha Gellhorn prize for journalism.

The point of this all is that reputation amounts to trust about the message that is written. Read the article, it is likely that you are not familiar with the Wahhabi belief, a subset of Sunni Islam that is practiced in Saudi Arabia. The article is about 200 Sunni scholars that denounce the Wahhabi belief. Several major scholars are involved. Have a read and have a think, the article is by a major journalist published in a major news paper about something that is not without consequences.