Thanks for elaborating. I think we could always start with traversing
only "subclass of". In spite of its limits, it does work in many areas
(e.g. buildings, astronomical objects, vehicles, organisations, etc.),
even if by far not in all. Where it doesn't work, one would simply not
get enough results, but the alternative (do not even use "subclass of")
will just make this problem worse. Any approach of fixing the latter
will also help the former.
Now regarding issues such as dog, woman, and many other things, it seems
clear that what one would need are inference rules. It should be
possible to say somewhere that a "if a human is female, then it is also
woman" without having to add the unwanted statement "instance of woman"
everywhere. Or "if someone has profession 'programmer' then he/she/they
is/are a programmer" -- at least for the purpose of media search. The
case of dogs would be complicated (referring to quantifiers) but still
Obvious questions arise:
* Would we prefer to maintain such rules somewhere rather than adding
the relations they might infer manually? (Probably yes, since one would
need much fewer rules than manual statements, which would always add
redundancy and cause conflicts -- cf. taxonomy modelling discussion --
that are not necessary when applications can select which inference
rules to use without touching the underlying data.)
* How would the rules look to human editors? (We have made some first
proposals for this; see the rules supported by SQID ; but one can
come up with other options)
* Where would such rules be managed? (Preferably on Wikidata, but the
encoding in statements would be a challenge; another challenge is how to
associate rules with entities -- usually they make connections between
* How would the rules be applied on the live data, especially if there
are many updates? (Doable using known algorithms and based on existing
tools, but still needs some implementation work; I think for a start one
could just reduce the update speed on these "inferred tags" and still
get a big improvement over the case where nothing of this type is done
So would this be a mid-term goal to overcome this issue? I would think
so, also because there are enough degrees of freedom here to gradually
grow this from simple (only allow rules that effectively add some more
traversal hints) to powerful (have rules that can use qualifiers, as
needed to get from dog to mammal). The main challenge is to find a good
approach for community-editing this part without restricting upfront to
a few special cases (as for the case of the constraints).
Inference rules come up as potential solutions in many similar tasks
where you want users to access/query the data. Imagine someone would
look for the brothers of a person (let's assume we'd built an
intelligent search for such things) -- again, Wikidata has no concept of
"brother" and we would not have any idea how to answer this, unless
somewhere we'd have a rule that defines how you can find
brother-relationships from the data that we actually have. This happens
a lot when you want users who are not familiar with how we organise data
find things, but the solution cannot be to add every possible
view/inferred statement to Wikidata explicitly.
Obviously, the rule approach is not something we could deploy anytime
soon, but it could be something to work towards ...
 Example rule with explanation of how it was applied to find a
grandfather of Ada Lovelace: https://tinyurl.com/y7rgmk7o
The qualifier sets (X, Y, Z) are unused here and could be hidden
entirely, but this is just a prototype.
Post by Stas Malyshev
Post by Markus Kroetzsch
possibility to find more results by letting the search engine traverse
the "more-general-than" links stored in Wikidata. People have discovered
cases where some of these links are not correct (surprise! it's a wiki
;-), and the suggestion was that such glitches would be fixed with
higher priority if there would be an application relying on it. But even
The main problem I see here is not that some links are incorrect - which
may have bad effects, but it's not the most important issue. The most
important one, IMHO, that there's no way to figure out in any scalable
and scriptable way what "more-general-than" means for any particular case.
It's different for each type of objects and often inconsistent within
the same class (e.g. see confusion between whether "dog" is an animal, a
name of the animal, name of the taxon, etc.) It's not that navigating
the hierarchy would lead as astray - we're not even there yet to have
this problem, because we don't even have a good way to navigate it.
Using instance-of/subclass-of only seems to not be that useful, because
a lot of interesting things are not represented in this way - e.g.
finding out that Donna Strickland (Q56855591) is a woman (Q467) is
impossible using only this hierarchy. We could special-case a bunch of
those but given how diverse Wikidata is, I don't think this will ever
cover any significant part of the hierarchy unless we find a non-ad-hoc
method of doing this.
This also makes it particularly hard to do something like "let's start
using it and fix the issues as we discover them", because the main issue
here is that we don't have a way to start with anything useful beyond a
tiny subset of classes that we can special-case manually. We can't
launch a rocket and figure how to build the engine later - having a
working engine is a prerequisite to launching the rocket!
There are also significant technical challenges in this - indexing
dynamically changing hierarchy is very problematic, and with our
approach to ontology anything can be a class, so we'd have to constantly
update the hierarchy. But this is more of a technical challenge, which
will come after we have some solution for the above.