Discussion:
[Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons
James Heald
2018-09-27 21:34:31 UTC
Permalink
This recent announcement by the Structured Data team perhaps ought to be
quite a heads-up for us:

https://commons.wikimedia.org/wiki/Commons_talk:Structured_data#Searching_Commons_-_how_to_structure_coverage

Essentially the team has given up on the hope of using Wikidata
hierarchies to suggest generalised "depicts" values to store for images
on Commons, to match against terms in incoming search requests.

i.e. if an image is of a German Shepherd dog, and identified as such,
the team has given up on trying to infer in general from Wikidata that
'dog' is also a search term that such an image should score positively with.

Apparently the Wikidata hierarchies were simply too complicated, too
unpredictable, and too arbitrary and inconsistent in their design across
different subject areas to be readily assimilated (before one even
starts on the density of bugs and glitches that then undermine them).

Instead, if that image ought to be considered in a search for 'dog', it
looks as though an explicit 'depicts:dog' statement may be going to be
needed to be specifically present, in addition to 'depicts:German Shepherd'.

Some of the background behind this assessment can be read in
https://phabricator.wikimedia.org/T199119
in particular the first substantive comment on that ticket, by Cparle on
10 July, giving his quick initial read of some of the issues using
Wikidata would face.

SDC was considered a flagship end-application for Wikidata. If the data
in Wikidata is not usable enough to supply the dogfood that project was
expected to be going to be relying on, that should be a serious wake-up
call, a red flag we should not ignore.

If the way data is organised across different subjects is currently too
inconsistent and confusing to be usable by our own SDC project, are
there actions we can take to address that? Are there design principles
to be chosen that then need to be applied consistently? Is this
something the community can do, or is some more active direction going
to need to be applied?

Wikidata's 'ontology' has grown haphazardly, with little oversight, like
an untended bank of weeds. Is some more active gardening now required?

-- James.



---
This email has been checked for viruses by AVG.
https://www.avg.com
Stas Malyshev
2018-09-27 22:23:26 UTC
Permalink
Hi!
Post by James Heald
Apparently the Wikidata hierarchies were simply too complicated, too
unpredictable, and too arbitrary and inconsistent in their design across
different subject areas to be readily assimilated (before one even
starts on the density of bugs and glitches that then undermine them).
The main problem is that there is no standard way (or even defined small
number of ways) to get the hierarchy that is relevant for "depicts" from
current Wikidata data. It may even be that for a specific type or class
the hierarchy is well defined, but the sheer number of different ways it
is done in different areas is overwhelming and ill-suited for automatic
processing. Of course things like "is "cat" a common name of an animal
or a taxon and which one of these will be used in depicts" adds
complexity too.

One way of solving it is to create a special hierarchy for "depicts"
purposes that would serve this particular use case. Another way is to
amend existing hierarchies and meta-hierarchies so that there would be
an algorithmic way of navigating them in a common case. This is
something that would be nice to hear about from people that are
experienced in ontology creation and maintenance.
Post by James Heald
to be chosen that then need to be applied consistently?  Is this
something the community can do, or is some more active direction going
to need to be applied?
I think this is very much something that the community can do.
--
Stas Malyshev
***@wikimedia.org
Thad Guidry
2018-09-28 02:41:23 UTC
Permalink
James,

It looks like a lot of that phabricator issue was around Taxons ? For the
Poodle to show a class of Mammal...

Seems like many of these could be answered if someone responded to
https://www.wikidata.org/wiki/User:Danyaljj on their last question about if
an "OR" could be used with linktype with gas:service ... where no one gave
an answer to their final question comment here:
https://www.wikidata.org/wiki/Wikidata:Request_a_query/Archive/2017/01#Timeout_when_finding_distance_between_two_entities

I tried myself to answer that question and find either Parent Taxon OR
Subclass of a Poodle, but couldn't seem to pull it off using gas:service
and 1 hour of trial and error in many forms, even duplicating the program
twice ...

http://tinyurl.com/yb7wfpwh

#defaultView:Graph
PREFIX gas: <http://www.bigdata.com/rdf/gas#>

SELECT ?item ?itemLabel
WHERE {
SERVICE gas:service {
gas:program gas:gasClass "com.bigdata.rdf.graph.analytics.SSSP" ;
gas:in wd:Q38904 ;
gas:traversalDirection "Forward" ;
gas:out ?item ;
gas:out1 ?depth ;
gas:maxIterations 10 ;
gas:linkType wdt:P279 .
}
SERVICE gas:service {
gas:program gas:gasClass "com.bigdata.rdf.graph.analytics.SSSP" ;
gas:in wd:Q38904 ;
gas:traversalDirection "Forward" ;
gas:out ?item ;
gas:out1 ?depth ;
gas:maxIterations 10 ;
gas:linkType wdt:P171 .
}

SERVICE wikibase:label {bd:serviceParam wikibase:language
"[AUTO_LANGUAGE],en" }
}
Post by Stas Malyshev
Hi!
Post by James Heald
Apparently the Wikidata hierarchies were simply too complicated, too
unpredictable, and too arbitrary and inconsistent in their design across
different subject areas to be readily assimilated (before one even
starts on the density of bugs and glitches that then undermine them).
The main problem is that there is no standard way (or even defined small
number of ways) to get the hierarchy that is relevant for "depicts" from
current Wikidata data. It may even be that for a specific type or class
the hierarchy is well defined, but the sheer number of different ways it
is done in different areas is overwhelming and ill-suited for automatic
processing. Of course things like "is "cat" a common name of an animal
or a taxon and which one of these will be used in depicts" adds
complexity too.
One way of solving it is to create a special hierarchy for "depicts"
purposes that would serve this particular use case. Another way is to
amend existing hierarchies and meta-hierarchies so that there would be
an algorithmic way of navigating them in a common case. This is
something that would be nice to hear about from people that are
experienced in ontology creation and maintenance.
Post by James Heald
to be chosen that then need to be applied consistently? Is this
something the community can do, or is some more active direction going
to need to be applied?
I think this is very much something that the community can do.
--
Stas Malyshev
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata
Ettore RIZZA
2018-09-29 07:48:29 UTC
Permalink
Hi,

The Wikidata's ontology is a mess, and I do not see how it could be
otherwise. While the creation of new properties is controlled, any fool can
decide that a woman <https://www.wikidata.org/wiki/Q467>is no longer a
human or is part of family. Maybe I'm a fool too? I wanted to remove the
claim that a ship <https://www.wikidata.org/wiki/Q11446> is an instance of
"ship type" because it produces weird circular inferences in my
application; but maybe that makes sense to someone else.

There will never be a universal ontology on which everyone agrees. I wonder
(sorry to think aloud) if Wikidata should not rather facilitate the use of
external classifications. Many external ids are knowledge organization
systems (ontologies, thesauri, classifications ...) I dream of a simple
query that could search, in Wikidata, "all elements of the same class as
'poodle' according to the classification of imagenet
<http://imagenet.stanford.edu/synset?wnid=n02113335>.
Post by Thad Guidry
James,
It looks like a lot of that phabricator issue was around Taxons ? For the
Poodle to show a class of Mammal...
Seems like many of these could be answered if someone responded to
https://www.wikidata.org/wiki/User:Danyaljj on their last question about
if an "OR" could be used with linktype with gas:service ... where no one
https://www.wikidata.org/wiki/Wikidata:Request_a_query/Archive/2017/01#Timeout_when_finding_distance_between_two_entities
I tried myself to answer that question and find either Parent Taxon OR
Subclass of a Poodle, but couldn't seem to pull it off using gas:service
and 1 hour of trial and error in many forms, even duplicating the program
twice ...
http://tinyurl.com/yb7wfpwh
#defaultView:Graph
PREFIX gas: <http://www.bigdata.com/rdf/gas#>
SELECT ?item ?itemLabel
WHERE {
SERVICE gas:service {
gas:program gas:gasClass "com.bigdata.rdf.graph.analytics.SSSP" ;
gas:in wd:Q38904 ;
gas:traversalDirection "Forward" ;
gas:out ?item ;
gas:out1 ?depth ;
gas:maxIterations 10 ;
gas:linkType wdt:P279 .
}
SERVICE gas:service {
gas:program gas:gasClass "com.bigdata.rdf.graph.analytics.SSSP" ;
gas:in wd:Q38904 ;
gas:traversalDirection "Forward" ;
gas:out ?item ;
gas:out1 ?depth ;
gas:maxIterations 10 ;
gas:linkType wdt:P171 .
}
SERVICE wikibase:label {bd:serviceParam wikibase:language
"[AUTO_LANGUAGE],en" }
}
Post by Stas Malyshev
Hi!
Post by James Heald
Apparently the Wikidata hierarchies were simply too complicated, too
unpredictable, and too arbitrary and inconsistent in their design across
different subject areas to be readily assimilated (before one even
starts on the density of bugs and glitches that then undermine them).
The main problem is that there is no standard way (or even defined small
number of ways) to get the hierarchy that is relevant for "depicts" from
current Wikidata data. It may even be that for a specific type or class
the hierarchy is well defined, but the sheer number of different ways it
is done in different areas is overwhelming and ill-suited for automatic
processing. Of course things like "is "cat" a common name of an animal
or a taxon and which one of these will be used in depicts" adds
complexity too.
One way of solving it is to create a special hierarchy for "depicts"
purposes that would serve this particular use case. Another way is to
amend existing hierarchies and meta-hierarchies so that there would be
an algorithmic way of navigating them in a common case. This is
something that would be nice to hear about from people that are
experienced in ontology creation and maintenance.
Post by James Heald
to be chosen that then need to be applied consistently? Is this
something the community can do, or is some more active direction going
to need to be applied?
I think this is very much something that the community can do.
--
Stas Malyshev
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata
Thad Guidry
2018-09-29 14:38:10 UTC
Permalink
Ettore,

Wikidata has the ability of crowdsourcing...unfortunately, it is not
effectively utilized.

Its because Wikidata does not yet provide a voting feature on
statements...where as the vote gets higher...more resistance to change the
statement is required.
But that breaks the notion of a "wiki" for some folks.
And there we circle back to Gerard's age old question of ... should
Wikidata really be considered a wiki at all for the benefit of society ?
or should it apply voting/resistance to keep it tidy, factual and less
messy.

We have the technology to implement voting/resistance on statements. I
personally would utilize that feature and many others probably would as
well. Crowdsourcing the low voted facts back to applications like
OpenRefine, or the recently sent out Survey vote mechanism for spam
analysis on the low voted statements could highlight where things are
untidy and implement vote casting to clean them up.

"...the burden of proof has to be placed on authority, and it should be
dismantled if that burden cannot be met..."

-Thad
+ThadGuidry <https://plus.google.com/+ThadGuidry>
Post by Ettore RIZZA
Hi,
The Wikidata's ontology is a mess, and I do not see how it could be
otherwise. While the creation of new properties is controlled, any fool can
decide that a woman <https://www.wikidata.org/wiki/Q467>is no longer a
human or is part of family. Maybe I'm a fool too? I wanted to remove the
claim that a ship <https://www.wikidata.org/wiki/Q11446> is an instance
of "ship type" because it produces weird circular inferences in my
application; but maybe that makes sense to someone else.
There will never be a universal ontology on which everyone agrees. I
wonder (sorry to think aloud) if Wikidata should not rather facilitate the
use of external classifications. Many external ids are knowledge
organization systems (ontologies, thesauri, classifications ...) I dream of
a simple query that could search, in Wikidata, "all elements of the same
class as 'poodle' according to the classification of imagenet
<http://imagenet.stanford.edu/synset?wnid=n02113335>.
Ettore RIZZA
2018-09-29 16:58:37 UTC
Permalink
Hi Thad,

I understand that an open Wiki has its advantages and disadvantages (I
sometimes prefer a system like StackOverflow, where you need a certain
reputation to do some things). I am afraid that a voting system simply
favors the opinions shared by the majority of Wikidata editors, namely a
Western worldview. And even within this subgroup opinions may legitimately
differ.

But there may be ways to avoid messing up the ontology while respecting the
wiki spirit. For example, a warning pop-up every time you edit an
ontological property (P31, P279, P361...). Something like: "OK, you added
the statement "a poodle is an instance of toy". Do you agree with the fact
that poodle is now a goods, a work, an artificial physical object? "

But that would only work for manual edits...
Post by Thad Guidry
Ettore,
Wikidata has the ability of crowdsourcing...unfortunately, it is not
effectively utilized.
Its because Wikidata does not yet provide a voting feature on
statements...where as the vote gets higher...more resistance to change the
statement is required.
But that breaks the notion of a "wiki" for some folks.
And there we circle back to Gerard's age old question of ... should
Wikidata really be considered a wiki at all for the benefit of society ?
or should it apply voting/resistance to keep it tidy, factual and less
messy.
We have the technology to implement voting/resistance on statements. I
personally would utilize that feature and many others probably would as
well. Crowdsourcing the low voted facts back to applications like
OpenRefine, or the recently sent out Survey vote mechanism for spam
analysis on the low voted statements could highlight where things are
untidy and implement vote casting to clean them up.
"...the burden of proof has to be placed on authority, and it should be
dismantled if that burden cannot be met..."
-Thad
+ThadGuidry <https://plus.google.com/+ThadGuidry>
Post by Ettore RIZZA
Hi,
The Wikidata's ontology is a mess, and I do not see how it could be
otherwise. While the creation of new properties is controlled, any fool can
decide that a woman <https://www.wikidata.org/wiki/Q467>is no longer a
human or is part of family. Maybe I'm a fool too? I wanted to remove the
claim that a ship <https://www.wikidata.org/wiki/Q11446> is an instance
of "ship type" because it produces weird circular inferences in my
application; but maybe that makes sense to someone else.
There will never be a universal ontology on which everyone agrees. I
wonder (sorry to think aloud) if Wikidata should not rather facilitate the
use of external classifications. Many external ids are knowledge
organization systems (ontologies, thesauri, classifications ...) I dream of
a simple query that could search, in Wikidata, "all elements of the same
class as 'poodle' according to the classification of imagenet
<http://imagenet.stanford.edu/synset?wnid=n02113335>.
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata
Gerard Meijssen
2018-09-29 17:25:26 UTC
Permalink
Hoi,
There is also the age old conundrum where some want to enforce their rules
for the good all all because (argument of the day follows).

First of all, Wikidata is very much a child of Wikipedia. It has its own
structures and people have endeavoured to build those same structures in
Wikidata never mind that it is a very different medium and never mind that
there are 280+ Wikipedias that might consider things to be different. The
start of Wikidata was also an auspicious occasion where it was thought to
be OK to adopt an external German authority. That proved to be a disaster
and there are still residues of this awful decision. It took not long to
show the short comings of this schedule and it was replaced by something
more sensible.

However, we got something really Wiki and it was all too wild. It took not
long for me to ask for someone to explain the current structures and nobody
volunteered. So I did what I do best, I largely ignored the results of the
classes and subclasses. It does not work for me. It works against me so me
current strategy is to ignore this nonsense and concentrate on including
data. The reason is simple; once data is included, it is easy to slice it
and dice it.structure it as we see fit at a later date.

So when our priority becomes to make our data reusable, more open we should
agree on it. So far we have not because we choose to fight each other. Some
have ideas, some have invested too much in what we have at this time. When
we are to make our data reusable, we should agree on what it is exactly we
aim to achieve. Is it to support Commons, it is to support some external
standard that is academically sound. I would always favour what is
practical and easily measured.

I would support Commons first. It has the benefit that it will bring our
communities together in a clear objective. It has the benefit that changes
in the operations of Wikidata support the whole of the Wikimedia universe
and consequentially financial, technical and operational needs and
investments are easily understood. It also means that all the bureaucracy
that has materialised will show to be in the way when it is.

So my question is not if we are a Wiki, my question is are we a Wiki enough
and willing to change our way for our own good.
Thanks,
GerardM
Post by Thad Guidry
Ettore,
Wikidata has the ability of crowdsourcing...unfortunately, it is not
effectively utilized.
Its because Wikidata does not yet provide a voting feature on
statements...where as the vote gets higher...more resistance to change the
statement is required.
But that breaks the notion of a "wiki" for some folks.
And there we circle back to Gerard's age old question of ... should
Wikidata really be considered a wiki at all for the benefit of society ?
or should it apply voting/resistance to keep it tidy, factual and less
messy.
We have the technology to implement voting/resistance on statements. I
personally would utilize that feature and many others probably would as
well. Crowdsourcing the low voted facts back to applications like
OpenRefine, or the recently sent out Survey vote mechanism for spam
analysis on the low voted statements could highlight where things are
untidy and implement vote casting to clean them up.
"...the burden of proof has to be placed on authority, and it should be
dismantled if that burden cannot be met..."
-Thad
+ThadGuidry <https://plus.google.com/+ThadGuidry>
Post by Ettore RIZZA
Hi,
The Wikidata's ontology is a mess, and I do not see how it could be
otherwise. While the creation of new properties is controlled, any fool can
decide that a woman <https://www.wikidata.org/wiki/Q467>is no longer a
human or is part of family. Maybe I'm a fool too? I wanted to remove the
claim that a ship <https://www.wikidata.org/wiki/Q11446> is an instance
of "ship type" because it produces weird circular inferences in my
application; but maybe that makes sense to someone else.
There will never be a universal ontology on which everyone agrees. I
wonder (sorry to think aloud) if Wikidata should not rather facilitate the
use of external classifications. Many external ids are knowledge
organization systems (ontologies, thesauri, classifications ...) I dream of
a simple query that could search, in Wikidata, "all elements of the same
class as 'poodle' according to the classification of imagenet
<http://imagenet.stanford.edu/synset?wnid=n02113335>.
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata
Daniel Kinzler
2018-10-17 14:04:15 UTC
Permalink
My (very belated) thoughts on this issue:

Wiki content grows in a messy way, and it stays messy until the messiness causes
problems. Once it causes problems, people are motivated to clean it up.

I propose to implement hierarchical search based on very simple, predictable
rules, e.g. by having a configurable list of transitive relationships that get
evaluated to a certain depth. I'd go for subclasses, geographical inclusion, and
subspecies at first.

Doing this will NOT produce good results. You would have to implement a lot of
special cases and heuristics to work around dirty data. I say: let it produce
bad results, tell people why the results are bad, and what they can do about it!

The Wikimedia community is AMAZING at making good use of whatever capabilities
the software, and adapting content to make the software produce the results they
want. By providing limited but clearly defined software support for hierarchical
search, we allow the community to optimize the content to work with that search.
Keeping the rules simple means that other consumers can then follow the same
rules, and the content will work for them as well.

-- daniel
Hoi,
There is also the age old conundrum where some want to enforce their rules for
the good all all because (argument of the day follows).
First of all, Wikidata is very much a child of Wikipedia. It has its own
structures and people have endeavoured to build those same structures in
Wikidata never mind that it is a very different medium and never mind that there
are 280+ Wikipedias that might consider things to be different.  The start of
Wikidata was also an auspicious occasion where it was thought to be OK to adopt
an external German authority. That proved to be a disaster and there are still
residues of this awful decision. It took not long to show the short comings of
this schedule and it was replaced by something more sensible.
However, we got something really Wiki and it was all too wild. It took not long
for me to ask for someone to explain the current structures and nobody
volunteered. So I did what I do best, I largely ignored the results of the
classes and subclasses. It does not work for me. It works against me so me
current strategy is to ignore this nonsense and concentrate on including data.
The reason is simple; once data is included, it is easy to slice it and dice
it.structure it as we see fit at a later date.
So when our priority becomes to make our data reusable, more open we should
agree on it. So far we have not because we choose to fight each other. Some have
ideas, some have invested too much in what we have at this time. When we are to
make our data reusable, we should agree on what it is exactly we aim to achieve.
Is it to support Commons, it is to support some external standard that is
academically sound. I would always favour what is practical and easily measured. 
I would support Commons first. It has the benefit that it will bring our
communities together in a clear objective. It has the benefit that changes in
the operations of Wikidata support the whole of the Wikimedia universe and
consequentially financial, technical and operational needs and investments are
easily understood. It also means that all the bureaucracy that has materialised
will show to be in the way when it is.
So my question is not if we are a Wiki, my question is are we a Wiki enough and
willing to change our way for our own good.
Thanks,
      GerardM
Ettore,
Wikidata has the ability of crowdsourcing...unfortunately, it is not
effectively utilized.
Its because Wikidata does not yet provide a voting feature on
statements...where as the vote gets higher...more resistance to change the
statement is required.
But that breaks the notion of a "wiki" for some folks.
And there we circle back to Gerard's age old question of ... should Wikidata
really be considered a wiki at all for the benefit of society ?  or should
it apply voting/resistance to keep it tidy, factual and less messy.
We have the technology to implement voting/resistance on statements.  I
personally would utilize that feature and many others probably would as
well.  Crowdsourcing the low voted facts back to applications like
OpenRefine, or the recently sent out Survey vote mechanism for spam analysis
on the low voted statements could highlight where things are untidy and
implement vote casting to clean them up.
"...the burden of proof has to be placed on authority, and it should be
dismantled if that burden cannot be met..."
-Thad
+ThadGuidry <https://plus.google.com/+ThadGuidry>
Hi,
The Wikidata's ontology is a mess, and I do not see how it could be
otherwise. While the creation of new properties is controlled, any fool
can decide that a woman <https://www.wikidata.org/wiki/Q467>is no longer
a human or is part of family. Maybe I'm a fool too? I wanted to remove
the claim that a ship <https://www.wikidata.org/wiki/Q11446> is an
instance of "ship type" because it produces weird circular inferences in
my application; but maybe that makes sense to someone else.
There will never be a universal ontology on which everyone agrees. I
wonder (sorry to think aloud) if Wikidata should not rather facilitate
the use of external classifications. Many external ids are knowledge
organization systems (ontologies, thesauri, classifications ...) I dream
of a simple query that could search, in Wikidata, "all elements of the
same class as 'poodle' according to the classification of imagenet
<http://imagenet.stanford.edu/synset?wnid=n02113335>.
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata
--
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation
Luca Martinelli
2018-10-18 13:11:54 UTC
Permalink
I say: let it produce bad results, tell people why the results are bad, and what they can do about it!
TL;DR: let's produce bad results, and let's analyse those results to
find the best practical solution we can come up with.

I totally agree with Daniel here. It is definitely a red flag that we
should tackle head-first, but we need data first. We need to know
*where* ontology fails, *why* it fails, and *how* can we fix it.

Now it's probably the best time to talk about this, not just because
we have a potential big application such as Structured Data, but also
because we focused on other not-so-easy problems such as dealing with
isolated sitelinks/projects and try to establish relations between
items, and between items and other databases.

What we need to do IMHO is to find whatever best practical solution we
have at hand, in order to primarily use it on Wikimedia projects. My
only fear is that such discussions may end up in a swamp because of
"that one user" who doesn't want to apply that particular solution
(not accusing anyone in particular, I've been that user too in some
discussions). Anyway, if we start from data, we can come up with some
solution.

L.
Peter F. Patel-Schneider
2018-10-18 17:05:47 UTC
Permalink
[...]
I say: let it produce> bad results, tell people why the results are bad, and
what they can do about it!
[...]
-- daniel
My view is that there is a big problem with this for industrial use of Wikidata.

I would very much like to use Wikidata more in my company. However, I view it
as my duty in my company to point out problems with the use of any technology.
So whenever I talk about Wikidata I also have to talk about the problems I
see in the Wikidata ontology and how they will affect use of Wikidata in my
company.

If Wikidata is going to have significant use in my company there needs to be
at least some indication that the problems in Wikidata are being addressed. I
don't see that happening at the moment.


What is the biggest problem I see in Wikidata? It is the poor organization of
the Wikidata ontology. To fix the ontology, beyond doing point fixes, is
going to require some commitment from the Wikidata community.


Peter F. Patel-Schneider
Nuance Communications
Daniel Kinzler
2018-10-18 18:13:13 UTC
Permalink
Post by Daniel Kinzler
[...]
I say: let it produce> bad results, tell people why the results are bad, and
what they can do about it!
[...]
-- daniel
My view is that there is a big problem with this for industrial use of Wikidata.
[...]
Post by Daniel Kinzler
What is the biggest problem I see in Wikidata? It is the poor organization of
the Wikidata ontology. To fix the ontology, beyond doing point fixes, is
going to require some commitment from the Wikidata community.
I agree. And I think the best way to achieve this is to start using the ontology
as an ontology on wikimedia projects, and thus expose the fact that the ontology
is broken. This gives incentive to fix it, and examples as to what things should
be possible using that ontology (namely, some level of basic inference).
--
Daniel Kinzler
Principal Software Engineer, MediaWiki Platform
Wikimedia Foundation
Markus Kroetzsch
2018-10-18 21:33:51 UTC
Permalink
+1 to Daniel

And, on another note, there is also a huge misunderstanding exposed in
the discussion on th search-related tracker item [1]: Cparle there
speaks about "traversing the subclass hierarchy" but is actually looking
at *super*classes of, e.g., "Clarinet", which he mostly finds irrelevant
to users who care about clarinets. But surely that's the wrong
direction! You have to look for *sub*classes to find special cases of
what you are looking for. Looking downwards will often lead to much
saner ontologies than when turning your head towards the dizzy heights
of upper ontology. Yes, the few of us looking for instances of "logical
consequence" will still get clarinets, but those who look for instances
of clarinet merely will see instances of alto clarinet, piccolo
clarinet, basset horn, Saxonette, and so on [2]. So instead of trying to
suggest to Commons editors meaningful "upper concepts", one could simply
enable the use of lower concepts in search. It does not work in all
cases yet, but it many.

There are still problems (such as the biological taxonomy being modelled
as a hierarchy of names rather than animal classes, placing dog far away
from mammal), but it is still always much easier to come up with a sane
organisation for the *sub*classes of a concrete class.

FYI, I recently gave a talk about ontological modelling in Wikidata that
discussed some of the current issues:
https://iccl.inf.tu-dresden.de/web/Misc3058/en (audience were ontology
design pattern researchers there).

Cheers,

Markus

[1] https://phabricator.wikimedia.org/T199119
[2] http://tinyurl.com/y7tvkuzk
Post by Daniel Kinzler
Wiki content grows in a messy way, and it stays messy until the messiness causes
problems. Once it causes problems, people are motivated to clean it up.
I propose to implement hierarchical search based on very simple, predictable
rules, e.g. by having a configurable list of transitive relationships that get
evaluated to a certain depth. I'd go for subclasses, geographical inclusion, and
subspecies at first.
Doing this will NOT produce good results. You would have to implement a lot of
special cases and heuristics to work around dirty data. I say: let it produce
bad results, tell people why the results are bad, and what they can do about it!
The Wikimedia community is AMAZING at making good use of whatever capabilities
the software, and adapting content to make the software produce the results they
want. By providing limited but clearly defined software support for hierarchical
search, we allow the community to optimize the content to work with that search.
Keeping the rules simple means that other consumers can then follow the same
rules, and the content will work for them as well.
-- daniel
Hoi,
There is also the age old conundrum where some want to enforce their rules for
the good all all because (argument of the day follows).
First of all, Wikidata is very much a child of Wikipedia. It has its own
structures and people have endeavoured to build those same structures in
Wikidata never mind that it is a very different medium and never mind that there
are 280+ Wikipedias that might consider things to be different.  The start of
Wikidata was also an auspicious occasion where it was thought to be OK to adopt
an external German authority. That proved to be a disaster and there are still
residues of this awful decision. It took not long to show the short comings of
this schedule and it was replaced by something more sensible.
However, we got something really Wiki and it was all too wild. It took not long
for me to ask for someone to explain the current structures and nobody
volunteered. So I did what I do best, I largely ignored the results of the
classes and subclasses. It does not work for me. It works against me so me
current strategy is to ignore this nonsense and concentrate on including data.
The reason is simple; once data is included, it is easy to slice it and dice
it.structure it as we see fit at a later date.
So when our priority becomes to make our data reusable, more open we should
agree on it. So far we have not because we choose to fight each other. Some have
ideas, some have invested too much in what we have at this time. When we are to
make our data reusable, we should agree on what it is exactly we aim to achieve.
Is it to support Commons, it is to support some external standard that is
academically sound. I would always favour what is practical and easily measured.
I would support Commons first. It has the benefit that it will bring our
communities together in a clear objective. It has the benefit that changes in
the operations of Wikidata support the whole of the Wikimedia universe and
consequentially financial, technical and operational needs and investments are
easily understood. It also means that all the bureaucracy that has materialised
will show to be in the way when it is.
So my question is not if we are a Wiki, my question is are we a Wiki enough and
willing to change our way for our own good.
Thanks,
      GerardM
Ettore,
Wikidata has the ability of crowdsourcing...unfortunately, it is not
effectively utilized.
Its because Wikidata does not yet provide a voting feature on
statements...where as the vote gets higher...more resistance to change the
statement is required.
But that breaks the notion of a "wiki" for some folks.
And there we circle back to Gerard's age old question of ... should Wikidata
really be considered a wiki at all for the benefit of society ?  or should
it apply voting/resistance to keep it tidy, factual and less messy.
We have the technology to implement voting/resistance on statements.  I
personally would utilize that feature and many others probably would as
well.  Crowdsourcing the low voted facts back to applications like
OpenRefine, or the recently sent out Survey vote mechanism for spam analysis
on the low voted statements could highlight where things are untidy and
implement vote casting to clean them up.
"...the burden of proof has to be placed on authority, and it should be
dismantled if that burden cannot be met..."
-Thad
+ThadGuidry <https://plus.google.com/+ThadGuidry>
Hi,
The Wikidata's ontology is a mess, and I do not see how it could be
otherwise. While the creation of new properties is controlled, any fool
can decide that a woman <https://www.wikidata.org/wiki/Q467>is no longer
a human or is part of family. Maybe I'm a fool too? I wanted to remove
the claim that a ship <https://www.wikidata.org/wiki/Q11446> is an
instance of "ship type" because it produces weird circular inferences in
my application; but maybe that makes sense to someone else.
There will never be a universal ontology on which everyone agrees. I
wonder (sorry to think aloud) if Wikidata should not rather facilitate
the use of external classifications. Many external ids are knowledge
organization systems (ontologies, thesauri, classifications ...) I dream
of a simple query that could search, in Wikidata, "all elements of the
same class as 'poodle' according to the classification of imagenet
<http://imagenet.stanford.edu/synset?wnid=n02113335>.
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata
James Heald
2018-10-18 23:09:04 UTC
Permalink
Post by Markus Kroetzsch
And, on another note, there is also a huge misunderstanding exposed in
the discussion on th search-related tracker item [1]: Cparle there
speaks about "traversing the subclass hierarchy" but is actually looking
at *super*classes of, e.g., "Clarinet", which he mostly finds irrelevant
to users who care about clarinets. But surely that's the wrong
direction! You have to look for *sub*classes to find special cases of
what you are looking for. Looking downwards will often lead to much
saner ontologies than when turning your head towards the dizzy heights
of upper ontology. Yes, the few of us looking for instances of "logical
consequence" will still get clarinets, but those who look for instances
of clarinet merely will see instances of alto clarinet, piccolo
clarinet, basset horn, Saxonette, and so on [2]. So instead of trying to
suggest to Commons editors meaningful "upper concepts", one could simply
enable the use of lower concepts in search. It does not work in all
cases yet, but it many.
Not really.

Cparle wants to make sure that people searching for "clarinet" also get
shown images of "piccolo clarinet" etc.

To make this possible, where an image has been tagged "basset horn" he
is therefore looking to add "clarinet" as an additional keyword, so that
if somebody types "clarinet" into the search box, one of the images
retrieved by ElasticSearch will be the basset horn one.

I imagine there are pluses and minuses both ways, whether you try to
make sure one search returns more hits, or try to run multiple searches
each returning fewer hits.

Your suggestion of the latter approach may not involve so much
pre-investigation of the top of the tree, which may be terms that people
are less likely to search for; but on the other hand, the actual
searching may be less efficient than a single indexed search.
Post by Markus Kroetzsch
There are still problems (such as the biological taxonomy being modelled
as a hierarchy of names rather than animal classes, placing dog far away
from mammal), but it is still always much easier to come up with a sane
organisation for the *sub*classes of a concrete class.
For what it's worth, there's currently quite a lively discussion on
Project Chat about issues with the current modelling of biological
taxonomies,
https://www.wikidata.org/wiki/Wikidata:Project_chat#Taxonomy:_concept_centric_vs_name_centric

People on this thread might like to comment on some of the less
fortunate elements of current practice, and the appropriateness of some
of the thoughts that have been suggested.

But the taxo project has become such a walled garden, answerable only to
itself, that people with comments may need to be quite forceful to get
their message through, if we are to deal eg with some of the
difficulties Cparle describes in the ticket at
https://phabricator.wikimedia.org/T199119

-- James.

---
This email has been checked for viruses by AVG.
https://www.avg.com
Markus Kroetzsch
2018-10-19 09:40:06 UTC
Permalink
Hi James,
Post by James Heald
Post by Markus Kroetzsch
And, on another note, there is also a huge misunderstanding exposed in
the discussion on th search-related tracker item [1]: Cparle there
speaks about "traversing the subclass hierarchy" but is actually
looking at *super*classes of, e.g., "Clarinet", which he mostly finds
irrelevant to users who care about clarinets. But surely that's the
wrong direction! You have to look for *sub*classes to find special
cases of what you are looking for. Looking downwards will often lead
to much saner ontologies than when turning your head towards the dizzy
heights of upper ontology. Yes, the few of us looking for instances of
"logical consequence" will still get clarinets, but those who look for
instances of clarinet merely will see instances of alto clarinet,
piccolo clarinet, basset horn, Saxonette, and so on [2]. So instead of
trying to suggest to Commons editors meaningful "upper concepts", one
could simply enable the use of lower concepts in search. It does not
work in all cases yet, but it many.
Not really.
Cparle wants to make sure that people searching for "clarinet" also get
shown images of "piccolo clarinet" etc.
To make this possible, where an image has been tagged "basset horn" he
is therefore looking to add "clarinet" as an additional keyword, so that
if somebody types "clarinet" into the search box, one of the images
retrieved by ElasticSearch will be the basset horn one.
I imagine there are pluses and minuses both ways, whether you try to
make sure one search returns more hits, or try to run multiple searches
each returning fewer hits.
Your suggestion of the latter approach may not involve so much
pre-investigation of the top of the tree, which may be terms that people
are less likely to search for; but on the other hand, the actual
searching may be less efficient than a single indexed search.
True, but with the Wikidata Query Service we already have infrastructure
that completes millions of search requests of this kind (involving path
queries), so that seems doable for Commons as well. WDQS already has
Wikimedia API bindings that allow it to use Lucene-based results in
addition, if needed (though this would only make sense if the search
should use some content that for some reason cannot be imported into a
query service as graph data, mostly free-text search over longer texts).

I think the approach of completing tags towards the upper classes is not
a good idea in general, since it creates extra work for editors that
requires a million times the resources needed in the other approach: if
the subclass hierarchy is wrong, you only need to fix it once to improve
search for all existing Commons content; if you rely on manual extra
tags, you'd have to add them to every file on Commons and keep it
up-to-date with changes in the concepts -- an enormous, redundant effort
that will invariably lead to a very non-uniform search experience across
otherwise similar media. This seems like a huge waste of editors' time
even if it would work (i.e., if we would live in a world where the
superclasses of a class would be easy to understand and closely related
to the topic that an editor is working on -- which will never happen for
Wikidata or Commons, since both cover such a breadth of topics that
their upper ontology necessarily has to be very general even if modelled
in a clean and fully correct way).

Cheers,

Markus
Post by James Heald
Post by Markus Kroetzsch
There are still problems (such as the biological taxonomy being
modelled as a hierarchy of names rather than animal classes, placing
dog far away from mammal), but it is still always much easier to come
up with a sane organisation for the *sub*classes of a concrete class.
For what it's worth, there's currently quite a lively discussion on
Project Chat about issues with the current modelling of biological
taxonomies,
https://www.wikidata.org/wiki/Wikidata:Project_chat#Taxonomy:_concept_centric_vs_name_centric
People on this thread might like to comment on some of the less
fortunate elements of current practice, and the appropriateness of some
of the thoughts that have been suggested.
But the taxo project has become such a walled garden, answerable only to
itself, that people with comments may need to be quite forceful to get
their message through, if we are to deal eg with some of the
difficulties Cparle describes in the ticket at
 https://phabricator.wikimedia.org/T199119
  -- James.
---
This email has been checked for viruses by AVG.
https://www.avg.com
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata
Luca Martinelli
2018-10-19 10:32:09 UTC
Permalink
Il giorno ven 19 ott 2018 alle ore 01:09 James Heald
Post by James Heald
But the taxo project has become such a walled garden, answerable only to
itself, that people with comments may need to be quite forceful to get
their message through, if we are to deal eg with some of the
difficulties Cparle describes in the ticket [...]
Me and other admins are unfortunately aware of this and this is
exactly what I was referring to in my previous e-mail. I do agree with
you the situation there is frankly unbearable, and IMHO it will likely
be ended also through "removals" of some users who think they should
be the only one in charge of deciding what's good and what's not. You
might easily understand why this situation deteriorated like this, but
I acknowledge this is no excuse for it to continue.

L.
Markus Kroetzsch
2018-10-19 10:55:24 UTC
Permalink
Post by Luca Martinelli
Il giorno ven 19 ott 2018 alle ore 01:09 James Heald
Post by James Heald
But the taxo project has become such a walled garden, answerable only to
itself, that people with comments may need to be quite forceful to get
their message through, if we are to deal eg with some of the
difficulties Cparle describes in the ticket [...]
Me and other admins are unfortunately aware of this and this is
exactly what I was referring to in my previous e-mail. I do agree with
you the situation there is frankly unbearable, and IMHO it will likely
be ended also through "removals" of some users who think they should
be the only one in charge of deciding what's good and what's not. You
might easily understand why this situation deteriorated like this, but
I acknowledge this is no excuse for it to continue.
Re this tricky situation, it might be good that the taxonomy part of
Wikidata avoid the use of "subclass of" altogether. Doesn't this open up
a path for compromise? Wikidata could intentionally "overload" taxons to
also refer to sets of organisms (in some cases). The taxonomic model
would not be affected by this in any way, since it ignores "subclass
of". Some (historic or debated) taxons could be ignored for this
"colloquial" subclass hierarchy, while other merely colloquially defined
classes of animals could be put in relation to proper species. I think
such overloading is acceptable as long as there cannot be confusion
between which statement refers to which facet of the concept. Then no
use of either facet will be impaired by the presence of the "irrelevant"
extra data.

The only alternative seems to build a "mirror taxonomy" that consists
not of taxon names but of animal classes (and that would include "dog"
somewhere in its hierarchy [1]). But then we will need a community-wide
decision on which of the two (class of organisms vs. scientific name) is
the subject of actual Wikipedia articles, which might be a difficult
topic to discuss.

Alternatively, if the taxons are mostly considered as "names" (syntax)
rather than classes of individual organism, then it seems we are
actually building a kind of scientific dictionary here that might rather
belong into the lexeme space.

Whatever happens, this problem needs some solution.

Cheers,

Markus

[1] It seems that the strange position of "dog" is mostly due to the
fact that two taxons are associated with it. In general, this seems an
important issue (many common names are not clearly specifying a taxon),
but in the case of dog it seems that the two taxons are synonyms of one
another, i.e., the taxon for dog simply changed names over time.
Stas Malyshev
2018-10-19 22:41:41 UTC
Permalink
Hi!
Post by James Heald
Cparle wants to make sure that people searching for "clarinet" also get
shown images of "piccolo clarinet" etc.
To make this possible, where an image has been tagged "basset horn" he
is therefore looking to add "clarinet" as an additional keyword, so that
if somebody types "clarinet" into the search box, one of the images
retrieved by ElasticSearch will be the basset horn one.
Generally if the image is tagged with "basset horn" and the user query
is "clarinet", we can do one of the following:

1. Index all upstream-hierarchy for "basset horn" (presumably we would
have to cut off when it gets too deep or too abstract) and then match
directly when searching.

2. Expand hierarchy down-stream from "clarinet" and then match against
search index.

3. Have some manual or automatic process that ensures that both
"clarinet" and "basset horn" are indexed (not necessarily at once) and
rely on it to discover the matches.

The problem with (1) is that if hierarchy changes, we will have to do
huge number of updates which might overwhelm the system, and most of
these updates would be not even for things people search for, but we
have no way to know that.

The problem with (2) is that downstream hierarchies explode very fast,
and if you search for "clarinet" and there are 10000 descendants in
these hierarchies, we can't search for all of them, so you may never get
a chance to find the basset horn. Also, of course, querying big
downstream hierarchies takes time too, which means performance hit.
--
Stas Malyshev
***@wikimedia.org
Markus Kroetzsch
2018-10-20 00:21:52 UTC
Permalink
Post by Stas Malyshev
Hi!
Post by James Heald
Cparle wants to make sure that people searching for "clarinet" also get
shown images of "piccolo clarinet" etc.
To make this possible, where an image has been tagged "basset horn" he
is therefore looking to add "clarinet" as an additional keyword, so that
if somebody types "clarinet" into the search box, one of the images
retrieved by ElasticSearch will be the basset horn one.
Generally if the image is tagged with "basset horn" and the user query
1. Index all upstream-hierarchy for "basset horn" (presumably we would
have to cut off when it gets too deep or too abstract) and then match
directly when searching.
2. Expand hierarchy down-stream from "clarinet" and then match against
search index.
3. Have some manual or automatic process that ensures that both
"clarinet" and "basset horn" are indexed (not necessarily at once) and
rely on it to discover the matches.
The problem with (1) is that if hierarchy changes, we will have to do
huge number of updates which might overwhelm the system, and most of
these updates would be not even for things people search for, but we
have no way to know that.
The problem with (2) is that downstream hierarchies explode very fast,
and if you search for "clarinet" and there are 10000 descendants in
these hierarchies, we can't search for all of them, so you may never get
a chance to find the basset horn. Also, of course, querying big
downstream hierarchies takes time too, which means performance hit.
Is this such a problem? It is what people now commonly do with P31/P279*
queries. For example, finding 10K instances of (some subclass of)
building takes 9 secs: http://tinyurl.com/y7e5j5sd (I think this is one
of the more complex hierarchies; maybe you know larger downstream
hierarchies one could try?) If you omit the labels, it takes 650ms.
That's maybe not quite autocompletion speed yet, but seems acceptable
for a media search.

Cheers,

Markus
Pine W
2018-10-19 05:09:55 UTC
Permalink
I would appreciate clarification what is proposed with regard to exposing problematic Wikidata ontology on Wikipedia. If the idea involves inserting poor-quality information onto English Wikipedia in order to spur us to fix problems with Wikidata, then I am likely to oppose it. English Wikipedia is not an endless resource for free labor, and we have too few skilled and good-faith volunteers to handle our already enormous scope of work.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )
null
Gerard Meijssen
2018-10-19 08:08:14 UTC
Permalink
Hoi Pine,
The ontology of Wikidata has nothing to do with English Wikipedia. The
notion that English Wikipedia is the only endless resource of free labour
is pathetic. Its dismissive attitude prevents functional contributions that
will benefit the users of Wikimedia projects.

For authors of "scholarly articles" we have an increasing amount of
information that is impossible for Wikipedia to include. It does not take
much to have a template that show them (standard collapsed) and links to
"Scholia" information for the paper.

For authors of books we could have a similar template. They could link to
*your local library* where you can check if it is available for reading.
Alternatively we could link to the "Open Library".

What it would do is provide a SERVICE to our readers that is easy enough to
provide, that leverages the data in Wikidata and is of a high quality. The
issue about the ontology has everything to do with the discovery of images
in Commons. It cannot get worse as it is, it is disfunctional. It only
works for English and I understand that is something you do not really
notice.

Yes, I do recognise Wikidata is a wiki. It is a work in progress and as
such the quality and quantity steadily improves.. Just like English
Wikipedia.
Thanks,
Gerard
Post by Pine W
I would appreciate clarification what is proposed with regard to exposing
problematic Wikidata ontology on Wikipedia. If the idea involves inserting
poor-quality information onto English Wikipedia in order to spur us to fix
problems with Wikidata, then I am likely to oppose it. English Wikipedia is
not an endless resource for free labor, and we have too few skilled and
good-faith volunteers to handle our already enormous scope of work.
Pine
( https://meta.wikimedia.org/wiki/User:Pine )
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata
Markus Kroetzsch
2018-10-19 09:47:43 UTC
Permalink
Post by Pine W
I would appreciate clarification what is proposed with regard to
exposing problematic Wikidata ontology on Wikipedia. If the idea
involves inserting poor-quality information onto English Wikipedia in
order to spur us to fix problems with Wikidata, then I am likely to
oppose it. English Wikipedia is not an endless resource for free labor,
and we have too few skilled and good-faith volunteers to handle our
already enormous scope of work.
You are right, and thankfully this is not what is proposed. The proposal
was to offer people who search for Commons media the (maybe optional)
possibility to find more results by letting the search engine traverse
the "more-general-than" links stored in Wikidata. People have discovered
cases where some of these links are not correct (surprise! it's a wiki
;-), and the suggestion was that such glitches would be fixed with
higher priority if there would be an application relying on it. But even
with some wrong links, the results a searcher would get would still
include mostly useful hits. Also, at least half of the currently
observed problems with this approach would lead to fewer results (e.g.,
dogs would be hard to include automatically to a search for all
mammals), but in such cases the proposed extension would simply do what
the baseline approach (ignoring the links) would do anyway, so service
would not get any worse. Also, the manual workarounds suggested by some
(adding "mammal" to all pictures of some "dog") would be compatible with
this, so one could do both to improve search experience on both ends.

Best regards,

Markus
Stas Malyshev
2018-10-19 22:28:17 UTC
Permalink
Hi!
Post by Markus Kroetzsch
possibility to find more results by letting the search engine traverse
the "more-general-than" links stored in Wikidata. People have discovered
cases where some of these links are not correct (surprise! it's a wiki
;-), and the suggestion was that such glitches would be fixed with
higher priority if there would be an application relying on it. But even
The main problem I see here is not that some links are incorrect - which
may have bad effects, but it's not the most important issue. The most
important one, IMHO, that there's no way to figure out in any scalable
and scriptable way what "more-general-than" means for any particular case.

It's different for each type of objects and often inconsistent within
the same class (e.g. see confusion between whether "dog" is an animal, a
name of the animal, name of the taxon, etc.) It's not that navigating
the hierarchy would lead as astray - we're not even there yet to have
this problem, because we don't even have a good way to navigate it.

Using instance-of/subclass-of only seems to not be that useful, because
a lot of interesting things are not represented in this way - e.g.
finding out that Donna Strickland (Q56855591) is a woman (Q467) is
impossible using only this hierarchy. We could special-case a bunch of
those but given how diverse Wikidata is, I don't think this will ever
cover any significant part of the hierarchy unless we find a non-ad-hoc
method of doing this.

This also makes it particularly hard to do something like "let's start
using it and fix the issues as we discover them", because the main issue
here is that we don't have a way to start with anything useful beyond a
tiny subset of classes that we can special-case manually. We can't
launch a rocket and figure how to build the engine later - having a
working engine is a prerequisite to launching the rocket!

There are also significant technical challenges in this - indexing
dynamically changing hierarchy is very problematic, and with our
approach to ontology anything can be a class, so we'd have to constantly
update the hierarchy. But this is more of a technical challenge, which
will come after we have some solution for the above.
--
Stas Malyshev
***@wikimedia.org
Markus Kroetzsch
2018-10-20 00:15:49 UTC
Permalink
Hi Stas,

Thanks for elaborating. I think we could always start with traversing
only "subclass of". In spite of its limits, it does work in many areas
(e.g. buildings, astronomical objects, vehicles, organisations, etc.),
even if by far not in all. Where it doesn't work, one would simply not
get enough results, but the alternative (do not even use "subclass of")
will just make this problem worse. Any approach of fixing the latter
will also help the former.

Now regarding issues such as dog, woman, and many other things, it seems
clear that what one would need are inference rules. It should be
possible to say somewhere that a "if a human is female, then it is also
woman" without having to add the unwanted statement "instance of woman"
everywhere. Or "if someone has profession 'programmer' then he/she/they
is/are a programmer" -- at least for the purpose of media search. The
case of dogs would be complicated (referring to quantifiers) but still
doable.

Obvious questions arise:
* Would we prefer to maintain such rules somewhere rather than adding
the relations they might infer manually? (Probably yes, since one would
need much fewer rules than manual statements, which would always add
redundancy and cause conflicts -- cf. taxonomy modelling discussion --
that are not necessary when applications can select which inference
rules to use without touching the underlying data.)
* How would the rules look to human editors? (We have made some first
proposals for this; see the rules supported by SQID [1]; but one can
come up with other options)
* Where would such rules be managed? (Preferably on Wikidata, but the
encoding in statements would be a challenge; another challenge is how to
associate rules with entities -- usually they make connections between
several entities)
* How would the rules be applied on the live data, especially if there
are many updates? (Doable using known algorithms and based on existing
tools, but still needs some implementation work; I think for a start one
could just reduce the update speed on these "inferred tags" and still
get a big improvement over the case where nothing of this type is done
at all).

So would this be a mid-term goal to overcome this issue? I would think
so, also because there are enough degrees of freedom here to gradually
grow this from simple (only allow rules that effectively add some more
traversal hints) to powerful (have rules that can use qualifiers, as
needed to get from dog to mammal). The main challenge is to find a good
approach for community-editing this part without restricting upfront to
a few special cases (as for the case of the constraints).

Inference rules come up as potential solutions in many similar tasks
where you want users to access/query the data. Imagine someone would
look for the brothers of a person (let's assume we'd built an
intelligent search for such things) -- again, Wikidata has no concept of
"brother" and we would not have any idea how to answer this, unless
somewhere we'd have a rule that defines how you can find
brother-relationships from the data that we actually have. This happens
a lot when you want users who are not familiar with how we organise data
find things, but the solution cannot be to add every possible
view/inferred statement to Wikidata explicitly.

Obviously, the rule approach is not something we could deploy anytime
soon, but it could be something to work towards ...

Cheers,

Markus

[1] Example rule with explanation of how it was applied to find a
grandfather of Ada Lovelace: https://tinyurl.com/y7rgmk7o
The qualifier sets (X, Y, Z) are unused here and could be hidden
entirely, but this is just a prototype.
Post by Stas Malyshev
Hi!
Post by Markus Kroetzsch
possibility to find more results by letting the search engine traverse
the "more-general-than" links stored in Wikidata. People have discovered
cases where some of these links are not correct (surprise! it's a wiki
;-), and the suggestion was that such glitches would be fixed with
higher priority if there would be an application relying on it. But even
The main problem I see here is not that some links are incorrect - which
may have bad effects, but it's not the most important issue. The most
important one, IMHO, that there's no way to figure out in any scalable
and scriptable way what "more-general-than" means for any particular case.
It's different for each type of objects and often inconsistent within
the same class (e.g. see confusion between whether "dog" is an animal, a
name of the animal, name of the taxon, etc.) It's not that navigating
the hierarchy would lead as astray - we're not even there yet to have
this problem, because we don't even have a good way to navigate it.
Using instance-of/subclass-of only seems to not be that useful, because
a lot of interesting things are not represented in this way - e.g.
finding out that Donna Strickland (Q56855591) is a woman (Q467) is
impossible using only this hierarchy. We could special-case a bunch of
those but given how diverse Wikidata is, I don't think this will ever
cover any significant part of the hierarchy unless we find a non-ad-hoc
method of doing this.
This also makes it particularly hard to do something like "let's start
using it and fix the issues as we discover them", because the main issue
here is that we don't have a way to start with anything useful beyond a
tiny subset of classes that we can special-case manually. We can't
launch a rocket and figure how to build the engine later - having a
working engine is a prerequisite to launching the rocket!
There are also significant technical challenges in this - indexing
dynamically changing hierarchy is very problematic, and with our
approach to ontology anything can be a class, so we'd have to constantly
update the hierarchy. But this is more of a technical challenge, which
will come after we have some solution for the above.
Pine W
2018-10-20 04:51:54 UTC
Permalink
On Fri, Oct 19, 2018 at 9:47 AM Markus Kroetzsch <
Post by Markus Kroetzsch
Post by Pine W
I would appreciate clarification what is proposed with regard to
exposing problematic Wikidata ontology on Wikipedia. If the idea
involves inserting poor-quality information onto English Wikipedia in
order to spur us to fix problems with Wikidata, then I am likely to
oppose it. English Wikipedia is not an endless resource for free labor,
and we have too few skilled and good-faith volunteers to handle our
already enormous scope of work.
You are right, and thankfully this is not what is proposed. The proposal
was to offer people who search for Commons media the (maybe optional)
possibility to find more results by letting the search engine traverse
the "more-general-than" links stored in Wikidata. People have discovered
cases where some of these links are not correct (surprise! it's a wiki
;-), and the suggestion was that such glitches would be fixed with
higher priority if there would be an application relying on it. But even
with some wrong links, the results a searcher would get would still
include mostly useful hits. Also, at least half of the currently
observed problems with this approach would lead to fewer results (e.g.,
dogs would be hard to include automatically to a search for all
mammals), but in such cases the proposed extension would simply do what
the baseline approach (ignoring the links) would do anyway, so service
would not get any worse. Also, the manual workarounds suggested by some
(adding "mammal" to all pictures of some "dog") would be compatible with
this, so one could do both to improve search experience on both ends.
Best regards,
Markus
Hi Markus, I seem to be missing something. Daniel said, "And I think the
best way to achieve this is to start using the ontology as an ontology on
wikimedia projects, and thus expose the fact that the ontology is broken.
This gives incentive to fix it, and examples as to what things should be
possible using that ontology (namely, some level of basic inference)." I
think that I understand the basic idea behind structured data on Commons. I
also think that I understand your statement above. What I'm not
understanding is how Daniel's proposal to "start using the ontology as an
ontology on wikimedia projects, and thus expose the fact that the ontology
is broken." isn't a proposal to add poor quality information from Wikidata
onto Wikipedia and, in the process, give Wikipedians more problems to fix.
Can you or Daniel explain this?

Separately, someone wrote to me off list to make the point that Wikipedians
who are active in non-English Wikipedias also wouldn't appreciate having
their workloads increased by having a large quantity poor-quality
information added to their edition of Wikipedia. I think that one of the
person's concerns is that my statement could have been interpreted as
implying something like "it's okay to insert poor-quality information on
non-English Wikipedias because their standards are lower". I apologize if I
gave the impression that I would approve of a non-English language edition
of Wikipedia being on the receiving end of an unwelcome large addition of
information that requires significant effort to clean up. Hopefully my
response here will address the concerns that I heard off list, and if not
then I welcome additional feedback.

Thanks,

Pine
( https://meta.wikimedia.org/wiki/User:Pine )
Stas Malyshev
2018-10-20 05:28:29 UTC
Permalink
Hi!
data on Commons. I also think that I understand your statement above.
What I'm not understanding is how Daniel's proposal to "start using the
ontology as an ontology on wikimedia projects, and thus expose the fact
that the ontology is broken." isn't a proposal to add poor quality
information from Wikidata onto Wikipedia and, in the process, give
Wikipedians more problems to fix. Can you or Daniel explain this?
While I can not pretend to have expert knowledge and do not purport to
interpret what Daniel meant, I think here we must remember that
Wikipedia, while being of course of huge importance, is not the only
Wikimedia project, so "start using it on Wikimedia projects" does not
necessarily mean "start using it on Wikipedia", yet less "start adding
bad information to Wikipedia" (there are other ways to use the data,
including imperfect ontologies - e.g. for search, for bot guidance, for
quality assurance and editor support, and many other ways) I am not
prescribing a specific scenario here, just reminding that "using the
ontology on wikimedia projects" can mean a wide variety of things.
Separately, someone wrote to me off list to make the point that
Wikipedians who are active in non-English Wikipedias also wouldn't
appreciate having their workloads increased by having a large quantity
poor-quality information added to their edition of Wikipedia. I think
I am sure that would be a bad thing. But I don't think anything we are
discussing here would lead to that happening.
--
Stas Malyshev
***@wikimedia.org
Thad Guidry
2018-10-20 10:09:17 UTC
Permalink
Hi All,

Just to address what Markus was hinting at with inference rules. Both
positive and negative rules could be stored. Back in the Freebase days, we
had those and were called "mutex's". We used them for "type incompatible"
hints to users and stored those "type incompatible" mutex rules in the
knowledge graph. (Freebase being a Type based system along with having
Properties under each Type)

Such as: ORGANIZATION != SPORT

You actually have all those type incompatible mutexs in the Freebase dumps
handed to you where you could start there. The biggest one was called "Big
Momma Mutex".
Here is an archived email thread to give further context:
https://freebase.markmail.org/thread/z5o7nlnb62n5t22o

Anyways, the point is that those rules worked well for us in Freebase and I
can see rules also working wonders in various ways in Wikidata as well.
Maybe its just a mutex at each class ? Where multiple statements could hold
rules ?

Thad
+ThadGuidry <https://www.google.com/+ThadGuidry>
Thomas Douillard
2018-10-20 11:13:51 UTC
Permalink
There is already stuffs to handle this kind of « mutex » on Wikidata :
"disjoint union of", see for example in usage on htps://
www.wikidata.org/wiki/Q180323 . The statements are used on the talk page by
templates that uses them to generate queries to find instances that violate
the mutex : https://www.wikidata.org/wiki/Talk:Q180323 (for example This
query
<https://query.wikidata.org/#select%20%3Fitem%20where%20%7B%0A%09%3Fitem%20wdt%3AP31%2Fwdt%3AP279%2A%20wd%3AQ180323%20%20minus%20%7B%0A%09%09%7B%0A%09%09%09%3Fitem%20wdt%3AP31%2Fwdt%3AP279%2A%20wd%3AQ900457%20%0A%09%09%7D%20union%20%7B%0A%09%09%09%3Fitem%20wdt%3AP31%2Fwdt%3AP279%2A%20wd%3AQ578786%20%0A%09%09%7D%20union%20%7B%0A%09%09%09%3Fitem%20wdt%3AP31%2Fwdt%3AP279%2A%20wd%3AQ405478%20%0A%09%09%7D%20union%20%7B%0A%09%09%09%3Fitem%20wdt%3AP31%2Fwdt%3AP279%2A%20wd%3AQ46993066%20%0A%09%09%7D%20union%20%7B%0A%09%09%09%3Fitem%20wdt%3AP31%2Fwdt%3AP279%2A%20wd%3AQ2253183%20%0A%09%09%7D%0A%09%7D%0A%7D>
, that does not find anything unsurprisingly because I don’t expect to find
a lot of vertebra instances on Wikidata :) )
Post by Thad Guidry
Hi All,
Just to address what Markus was hinting at with inference rules. Both
positive and negative rules could be stored. Back in the Freebase days, we
had those and were called "mutex's". We used them for "type incompatible"
hints to users and stored those "type incompatible" mutex rules in the
knowledge graph. (Freebase being a Type based system along with having
Properties under each Type)
Such as: ORGANIZATION != SPORT
You actually have all those type incompatible mutexs in the Freebase dumps
handed to you where you could start there. The biggest one was called "Big
Momma Mutex".
https://freebase.markmail.org/thread/z5o7nlnb62n5t22o
Anyways, the point is that those rules worked well for us in Freebase and
I can see rules also working wonders in various ways in Wikidata as well.
Maybe its just a mutex at each class ? Where multiple statements could
hold rules ?
Thad
+ThadGuidry <https://www.google.com/+ThadGuidry>
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata
Markus Kroetzsch
2018-10-20 11:25:35 UTC
Permalink
Hi Pine,

As I understood Daniel, he did not talk about inserting low quality
content into any project, Wikipedia or other. What I believe he meant
with "using the ontology" is to use it for improving search/discovery
services that help editors to find something (i.e., technical
infrastructure, not editorial content). Doing so could lead to an
additional amount of mostly useful results, but it will not yet be
enough to get all results that a user would intuitively expect. Maybe
his wording made this sound a bit too dramatic -- I think he just wanted
to emphasize the point that any actual use will immediately provide
motivation and guidance for Wikidata editors to improve things that are
currently imperfect.

I agree with him in that I think we need to identify ways of moving
gradually forward, offering the small benefits we can already provide
while creating an environment that allows the community to improve
things step by step. If we ask for perfection before even starting, we
will get into a deadlock where we bind editor resources in redundant
tagging tasks instead of empowering the community to improve the
situation in a sustainable way.

Cheers,

Markus
Post by Pine W
On Fri, Oct 19, 2018 at 9:47 AM Markus Kroetzsch
Post by Pine W
I would appreciate clarification what is proposed with regard to
exposing problematic Wikidata ontology on Wikipedia. If the idea
involves inserting poor-quality information onto English
Wikipedia in
Post by Pine W
order to spur us to fix problems with Wikidata, then I am likely to
oppose it. English Wikipedia is not an endless resource for free
labor,
Post by Pine W
and we have too few skilled and good-faith volunteers to handle our
already enormous scope of work.
You are right, and thankfully this is not what is proposed. The proposal
was to offer people who search for Commons media the (maybe optional)
possibility to find more results by letting the search engine traverse
the "more-general-than" links stored in Wikidata. People have discovered
cases where some of these links are not correct (surprise! it's a wiki
;-), and the suggestion was that such glitches would be fixed with
higher priority if there would be an application relying on it. But even
with some wrong links, the results a searcher would get would still
include mostly useful hits. Also, at least half of the currently
observed problems with this approach would lead to fewer results (e.g.,
dogs would be hard to include automatically to a search for all
mammals), but in such cases the proposed extension would simply do what
the baseline approach (ignoring the links) would do anyway, so service
would not get any worse. Also, the manual workarounds suggested by some
(adding "mammal" to all pictures of some "dog") would be compatible with
this, so one could do both to improve search experience on both ends.
Best regards,
Markus
Hi Markus, I seem to be missing something. Daniel said, "And I think the
best way to achieve this is to start using the ontology as an ontology
on wikimedia projects, and thus expose the fact that the ontology is
broken. This gives incentive to fix it, and examples as to what things
should be possible using that ontology (namely, some level of basic
inference)." I think that I understand the basic idea behind structured
data on Commons. I also think that I understand your statement above.
What I'm not understanding is how Daniel's proposal to "start using the
ontology as an ontology on wikimedia projects, and thus expose the fact
that the ontology is broken." isn't a proposal to add poor quality
information from Wikidata onto Wikipedia and, in the process, give
Wikipedians more problems to fix. Can you or Daniel explain this?
Separately, someone wrote to me off list to make the point that
Wikipedians who are active in non-English Wikipedias also wouldn't
appreciate having their workloads increased by having a large quantity
poor-quality information added to their edition of Wikipedia. I think
that one of the person's concerns is that my statement could have been
interpreted as implying something like "it's okay to insert poor-quality
information on non-English Wikipedias because their standards are
lower". I apologize if I gave the impression that I would approve of a
non-English language edition of Wikipedia being on the receiving end of
an unwelcome large addition of information that requires significant
effort to clean up. Hopefully my response here will address the concerns
that I heard off list, and if not then I welcome additional feedback.
Thanks,
Pine
( https://meta.wikimedia.org/wiki/User:Pine )
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata
Daniel Kinzler
2018-10-20 16:41:45 UTC
Permalink
Hi Pine, sorry for the misleading wording. Let me clarify below.
Hi Markus, I seem to be missing something. Daniel said, "And I think the best
way to achieve this is to start using the ontology as an ontology on wikimedia
projects, and thus expose the fact that the ontology is broken. This gives
incentive to fix it, and examples as to what things should be possible using
that ontology (namely, some level of basic inference)." I think that I
understand the basic idea behind structured data on Commons. I also think that I
understand your statement above. What I'm not understanding is how Daniel's
proposal to "start using the ontology as an ontology on wikimedia projects, and
thus expose the fact that the ontology is broken." isn't a proposal to add poor
quality information from Wikidata onto Wikipedia and, in the process, give
Wikipedians more problems to fix. Can you or Daniel explain this?
What I meant in concrete terms was: let's start using wikidata items for tagging
on commons, even though search results based on such tags will currently not
yield very good results, due to the messy state of the ontology, and hope people
fix the ontology to get better search results. If people use "poodle" to tag an
image and it's not found when searching for "dog", this may lead to people
investigating why that is, and coming up with ontology improvements to fix it.

What I DON'T mean is "let's automatically generate navigation boxes for
wikipedia articles based on an imperfect ontology, and push them on everyone".
I mean, using the ontology to generate navigation boxes for some kinds of
articles may be a nice idea, and could indeed have the same effect - that people
notice problems in the ontology, and fix them. But that would be something the
local wiki communities decide to do, not something that comes from Wikidata or
the Structured Data project.

The point I was trying to make is: the Wiki communities are rather good in
creating structures that serve their purpose, but they do so pragmatically,
along the behavior of the existing tools. So, rather than trying to work around
the quirks of the ontology in software, the software should use very simply
rules (such as following the subclass relation), and let people adopt the data
to this behavior, if and when they find it useful to do so. This approach, over
time, provides better results in my opinion.

Also, keep in mind that I was referring to an imperfect *improvement* of search.
the alternative being to only return things tagged with "dog" when searching for
"dog". I was not suggesting to degrade user experience in order to incentivize
editors. I'm rather suggesting the opposite: let's NOT give people a reason tag
images that show poodles with "poodle" and "dog" and "mammal" and "animal" and
"pet" and...
--
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation
Pine W
2018-10-21 05:59:39 UTC
Permalink
Post by Daniel Kinzler
Hi Pine, sorry for the misleading wording. Let me clarify below.
Post by Pine W
Hi Markus, I seem to be missing something. Daniel said, "And I think the
best
Post by Pine W
way to achieve this is to start using the ontology as an ontology on
wikimedia
Post by Pine W
projects, and thus expose the fact that the ontology is broken. This
gives
Post by Pine W
incentive to fix it, and examples as to what things should be possible
using
Post by Pine W
that ontology (namely, some level of basic inference)." I think that I
understand the basic idea behind structured data on Commons. I also
think that I
Post by Pine W
understand your statement above. What I'm not understanding is how
Daniel's
Post by Pine W
proposal to "start using the ontology as an ontology on wikimedia
projects, and
Post by Pine W
thus expose the fact that the ontology is broken." isn't a proposal to
add poor
Post by Pine W
quality information from Wikidata onto Wikipedia and, in the process,
give
Post by Pine W
Wikipedians more problems to fix. Can you or Daniel explain this?
What I meant in concrete terms was: let's start using wikidata items for tagging
on commons, even though search results based on such tags will currently not
yield very good results, due to the messy state of the ontology, and hope people
fix the ontology to get better search results. If people use "poodle" to tag an
image and it's not found when searching for "dog", this may lead to people
investigating why that is, and coming up with ontology improvements to fix it.
What I DON'T mean is "let's automatically generate navigation boxes for
wikipedia articles based on an imperfect ontology, and push them on everyone".
I mean, using the ontology to generate navigation boxes for some kinds of
articles may be a nice idea, and could indeed have the same effect - that people
notice problems in the ontology, and fix them. But that would be something the
local wiki communities decide to do, not something that comes from Wikidata or
the Structured Data project.
The point I was trying to make is: the Wiki communities are rather good in
creating structures that serve their purpose, but they do so pragmatically,
along the behavior of the existing tools. So, rather than trying to work around
the quirks of the ontology in software, the software should use very simply
rules (such as following the subclass relation), and let people adopt the data
to this behavior, if and when they find it useful to do so. This approach, over
time, provides better results in my opinion.
Also, keep in mind that I was referring to an imperfect *improvement* of search.
the alternative being to only return things tagged with "dog" when searching for
"dog". I was not suggesting to degrade user experience in order to incentivize
editors. I'm rather suggesting the opposite: let's NOT give people a reason tag
images that show poodles with "poodle" and "dog" and "mammal" and "animal" and
"pet" and...
--
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation
Hi Daniel,

Thanks for the explanation. I think that I now better understand what
you're proposing. This explanation of the proposal sounds reasonable to me
in a way that my earlier understanding of the proposal did not.

By the way, I don't know what your normal work schedule is, but I usually
don't expect staff to respond to non-urgent emails over the weekend,
although I appreciate it. :) Waiting until Monday is usually fine.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )
Olaf Simons
2018-10-22 06:13:04 UTC
Permalink
Dear Wikibase Enthusiasts,

if you happen to speak German and if you feel intrigued about the Illuminati, this might be of interest to you:

https://blog.factgrid.de/archives/1151

We will use our upcoming Illuminati-Workshop on Nov. 16/17 to discuss how we can make better use of our Wikibase installation here at Gotha.

https://database.factgrid.de/wiki/Main_Page

The database is filled with metadata of Illuminati documents and (selected) membership information and is supposed to help us with complexities of our Illuminati wiki (https://projekte.uni-erfurt.de/illuminaten/Main_Page), but we do not yet have the clearest idea of what we have produced here or possibly can.

If you feel intrigued - we pay travel expenses and accommodation - contact me before Nov. 5, 2018.

Looking forward to an illuminating workshop,
Olaf

Ettore RIZZA
2018-10-20 13:29:39 UTC
Permalink
Hello,

It is interesting to note that what Cparle wants are "is a" relationships
based on common sense. For most people, ants are insects, not instances of
taxon. A clarinet is a woodwind instrument, and woodwind instruments are
musical instruments, not an instance of "first order metaclass".

One of the best sources of "common sense" hypernymy is probably the first
sentence of a Wikipedia page. Whether in English, French, Italian, a woman
is always "a female *human *being."

For "poodle", this would look like (following the links in the English
version of Wikipedia):

- The poodle is a group of formal *dog breeds*

- Dog breeds are *dogs* that...

- The domestic dog (...) is a member of the genus *Canis* (canines)

- Canis is a genus of the *Canidae*

- The biological family Canidae (...) is a lineage of *carnivorans*

- Carnivora (...) is a diverse *scrotiferan *order

- Scrotifera is a clade of *placental mammals*

- Placentalia ("Placentals") is one of the three extant subdivisions of the
class of animals *Mammalia*...

- Mammals are the *vertebrates *within the class Mammalia...


From my point of view, this classification looks much better than the
current relationships in Wikidata's ontology.

The automatic extraction of hypernymic relationships from English texts
(especially Wikipedia) has been studied for a long time and gives good
results, even with simple methods based on hand-crafted rules. In the case
of Wikipedia, the hypernym often has a page itself (and therefore a link to
Wikidata), which could simplify the NLP extraction and the mapping with
Wikidata items.

Of course, the extracted relationships will not always be "subclass of" or
"instance of". But if someone proposed a new property called "Wikipedia
Hypernyms" (and its symmetric property "Wikipedia Hyponyms"), I would use
it more willingly and with more confidence than the current system. This
would also better respect the logic of Wikidata's descriptions.

I mean, if the description of Zoroastrianism (Q9601) says this is an
"Ancient Iranian *religion *founded by Zoroaster", one would expect the
class "religion" to appear much earlier in the hierarchy of superclasses of
this item. If there was this property "Wikipedia Hypernyms", we could
mention it in the same page - since Wikipedia describes Zoroastrianism as
"one of the world's oldest *religions *that remains active." And a SPARQL
query looking for 'all items that have "religion" as "Wikipedia hypernyms"
property' would be much much faster.

Note: sorry if this reflection is naive or if it has already been
discussed/tested.

Cheers,

Ettore
Post by James Heald
This recent announcement by the Structured Data team perhaps ought to be
https://commons.wikimedia.org/wiki/Commons_talk:Structured_data#Searching_Commons_-_how_to_structure_coverage
Essentially the team has given up on the hope of using Wikidata
hierarchies to suggest generalised "depicts" values to store for images
on Commons, to match against terms in incoming search requests.
i.e. if an image is of a German Shepherd dog, and identified as such,
the team has given up on trying to infer in general from Wikidata that
'dog' is also a search term that such an image should score positively with.
Apparently the Wikidata hierarchies were simply too complicated, too
unpredictable, and too arbitrary and inconsistent in their design across
different subject areas to be readily assimilated (before one even
starts on the density of bugs and glitches that then undermine them).
Instead, if that image ought to be considered in a search for 'dog', it
looks as though an explicit 'depicts:dog' statement may be going to be
needed to be specifically present, in addition to 'depicts:German Shepherd'.
Some of the background behind this assessment can be read in
https://phabricator.wikimedia.org/T199119
in particular the first substantive comment on that ticket, by Cparle on
10 July, giving his quick initial read of some of the issues using
Wikidata would face.
SDC was considered a flagship end-application for Wikidata. If the data
in Wikidata is not usable enough to supply the dogfood that project was
expected to be going to be relying on, that should be a serious wake-up
call, a red flag we should not ignore.
If the way data is organised across different subjects is currently too
inconsistent and confusing to be usable by our own SDC project, are
there actions we can take to address that? Are there design principles
to be chosen that then need to be applied consistently? Is this
something the community can do, or is some more active direction going
to need to be applied?
Wikidata's 'ontology' has grown haphazardly, with little oversight, like
an untended bank of weeds. Is some more active gardening now required?
-- James.
---
This email has been checked for viruses by AVG.
https://www.avg.com
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata
Peter F. Patel-Schneider
2018-10-20 17:20:16 UTC
Permalink
For most people, ants are insects, not instances of taxon.
Sure, but Wikidata doesn't have ants being instances of taxon. Instead,
Formicidae (aka ant) is an instance of taxon, which seems right to me.

Here are some extracts from Wikidata as of a few minutes ago, also showing
the English Wikipedia page for the Wikidata item.

https://www.wikidata.org/wiki/Q7386 Formicidae ant
https://en.wikipedia.org/wiki/Ant
instance of taxon
no subclass of statement

https://www.wikidata.org/wiki/Q1390 insect
https://en.wikipedia.org/wiki/Insect
subclass of animal
instance of taxon

What is missing is that Q7386 is a subclass of Q1390, which is sanctioned by
the "Ants are eusocial insects" phrase at the start of
https://en.wikipedia.org/wiki/Ant. I added that statement and put as source
English Wikipedia. (By the way, how can I source a statement to a particular
Wikipedia page?)


I see no reason that this should not be done for other groups of living
organisms where subclass relationships are missing.

peter
Ettore RIZZA
2018-10-20 18:57:34 UTC
Permalink
Hi,

I see no reason that this should not be done for other groups of living
Post by Peter F. Patel-Schneider
organisms where subclass relationships are missing.
It seems very simple to me. Maybe too simple. Perhaps I am intimidated by
the kilometers of discussions I'm reading about the taxon-centric aspect of
Wikidata, when I'm not a biologist. So, there is no problem if we add that
Cetacea <https://www.wikidata.org/wiki/Q160>is a subclass of aquatic
mammals <https://www.wikidata.org/wiki/Q3039055>, as indicated by its Wikipedia
page <https://en.wikipedia.org/wiki/Cetacea>?

Cheers,

Ettore

On Sat, 20 Oct 2018 at 19:20, Peter F. Patel-Schneider <
Post by Peter F. Patel-Schneider
For most people, ants are insects, not instances of taxon.
Sure, but Wikidata doesn't have ants being instances of taxon. Instead,
Formicidae (aka ant) is an instance of taxon, which seems right to me.
Here are some extracts from Wikidata as of a few minutes ago, also showing
the English Wikipedia page for the Wikidata item.
https://www.wikidata.org/wiki/Q7386 Formicidae ant
https://en.wikipedia.org/wiki/Ant
instance of taxon
no subclass of statement
https://www.wikidata.org/wiki/Q1390 insect
https://en.wikipedia.org/wiki/Insect
subclass of animal
instance of taxon
What is missing is that Q7386 is a subclass of Q1390, which is sanctioned by
the "Ants are eusocial insects" phrase at the start of
https://en.wikipedia.org/wiki/Ant. I added that statement and put as source
English Wikipedia. (By the way, how can I source a statement to a particular
Wikipedia page?)
I see no reason that this should not be done for other groups of living
organisms where subclass relationships are missing.
peter
Peter F. Patel-Schneider
2018-10-20 20:02:27 UTC
Permalink
From Peter F. Patel-Schneider
Hi,
I see no reason that this [adding subclass relationships sanctioned by corresponding Wikipedia pages]
should not be done for other groups of living
organisms where subclass relationships are missing.  
It seems very simple to me. Maybe too simple. Perhaps I am intimidated by the
kilometers of discussions I'm reading about the taxon-centric aspect of
Wikidata, when I'm not a biologist. So, there is no problem if we add
that Cetacea  <https://www.wikidata.org/wiki/Q160>is a subclass of aquatic
mammals <https://www.wikidata.org/wiki/Q3039055>, as indicated by
its Wikipedia page <https://en.wikipedia.org/wiki/Cetacea>?
Cheers,
Ettore
How can there be any effective counter to adding these relationships? Many
Wikidata items correspond to Wikipedia pages. If the true information about
the Wikidata item in the corresponding pages cannot be added to the Wikidata
items, then the correspondence is not correct and should be removed.

peter

PS: Of course, determining truth may be contentious in some cases, but these
will be a small minority.
Loading...