James Heald
2018-09-27 21:34:31 UTC
This recent announcement by the Structured Data team perhaps ought to be
quite a heads-up for us:
https://commons.wikimedia.org/wiki/Commons_talk:Structured_data#Searching_Commons_-_how_to_structure_coverage
Essentially the team has given up on the hope of using Wikidata
hierarchies to suggest generalised "depicts" values to store for images
on Commons, to match against terms in incoming search requests.
i.e. if an image is of a German Shepherd dog, and identified as such,
the team has given up on trying to infer in general from Wikidata that
'dog' is also a search term that such an image should score positively with.
Apparently the Wikidata hierarchies were simply too complicated, too
unpredictable, and too arbitrary and inconsistent in their design across
different subject areas to be readily assimilated (before one even
starts on the density of bugs and glitches that then undermine them).
Instead, if that image ought to be considered in a search for 'dog', it
looks as though an explicit 'depicts:dog' statement may be going to be
needed to be specifically present, in addition to 'depicts:German Shepherd'.
Some of the background behind this assessment can be read in
https://phabricator.wikimedia.org/T199119
in particular the first substantive comment on that ticket, by Cparle on
10 July, giving his quick initial read of some of the issues using
Wikidata would face.
SDC was considered a flagship end-application for Wikidata. If the data
in Wikidata is not usable enough to supply the dogfood that project was
expected to be going to be relying on, that should be a serious wake-up
call, a red flag we should not ignore.
If the way data is organised across different subjects is currently too
inconsistent and confusing to be usable by our own SDC project, are
there actions we can take to address that? Are there design principles
to be chosen that then need to be applied consistently? Is this
something the community can do, or is some more active direction going
to need to be applied?
Wikidata's 'ontology' has grown haphazardly, with little oversight, like
an untended bank of weeds. Is some more active gardening now required?
-- James.
---
This email has been checked for viruses by AVG.
https://www.avg.com
quite a heads-up for us:
https://commons.wikimedia.org/wiki/Commons_talk:Structured_data#Searching_Commons_-_how_to_structure_coverage
Essentially the team has given up on the hope of using Wikidata
hierarchies to suggest generalised "depicts" values to store for images
on Commons, to match against terms in incoming search requests.
i.e. if an image is of a German Shepherd dog, and identified as such,
the team has given up on trying to infer in general from Wikidata that
'dog' is also a search term that such an image should score positively with.
Apparently the Wikidata hierarchies were simply too complicated, too
unpredictable, and too arbitrary and inconsistent in their design across
different subject areas to be readily assimilated (before one even
starts on the density of bugs and glitches that then undermine them).
Instead, if that image ought to be considered in a search for 'dog', it
looks as though an explicit 'depicts:dog' statement may be going to be
needed to be specifically present, in addition to 'depicts:German Shepherd'.
Some of the background behind this assessment can be read in
https://phabricator.wikimedia.org/T199119
in particular the first substantive comment on that ticket, by Cparle on
10 July, giving his quick initial read of some of the issues using
Wikidata would face.
SDC was considered a flagship end-application for Wikidata. If the data
in Wikidata is not usable enough to supply the dogfood that project was
expected to be going to be relying on, that should be a serious wake-up
call, a red flag we should not ignore.
If the way data is organised across different subjects is currently too
inconsistent and confusing to be usable by our own SDC project, are
there actions we can take to address that? Are there design principles
to be chosen that then need to be applied consistently? Is this
something the community can do, or is some more active direction going
to need to be applied?
Wikidata's 'ontology' has grown haphazardly, with little oversight, like
an untended bank of weeds. Is some more active gardening now required?
-- James.
---
This email has been checked for viruses by AVG.
https://www.avg.com