Discussion:
Advice for storing retail product data on Wikidata or a self-hosted Wikibase instance
(too old to reply)
Abe Voelker
2018-09-19 19:49:24 UTC
Permalink
Hello,

I was checking out Wikidata and was wondering if it would be a good fit for
a website I wanted to make to crowdsource data about retail products,
storing properties like product name, description, UPC/GTIN, MPN,
manufacturer, color, size, and so on.

I take it that due to Wikidata's Wikipedia notability requirement I'd have
to operate my own Wikibase instance separate from Wikidata? In that case,
is it still possible to integrate with Wikidata's ontologies, or do I have
to have my own completely separate ontology from scratch (I'd hate to have
to reinvent the real basic properties and constraints)? Are there similar
projects I could look at to get an idea how to partially-fork Wikidata in
this way?

Another thing I'm wondering about is how I would integrate data that
wouldn't necessarily fit into the product data ontology, like customer
reviews of the product, or sale offers (offers having their own properties
like price, availability, condition, and hyperlink to seller's site) -
things that aren't inherent characteristics of the item and change often. I
was wondering if it would be easier to have a "wrapper" website that stores
this data separate, while still integrating with the core product data from
Wikibase. Does anyone have any experience or references to projects doing
an integration like that? I'm wondering what the easiest way to integrate
the two would be - connect directly to the MySQL database, sync databases
with hooks, SPARQL, etc.

Also, some of the data for this website I'd wish to populate by crawling
online retail stores and manufacturers and performing edits with a bot.
Some of these sites provide schema.org metadata, so I was wondering if that
makes integration with Wikidata/Wikibase any easier, or do I still have to
do some kind of manual mapping process between the two.

Thanks for your patience with this braindump as I'm new to Wikidata and
still trying to wrap my head around things. I did the Wikidata tutorials,
messed around with a local Wikibase install using Docker Compose, and a lot
of clicking around Wikidata and reading about ontologies but it feels like
I've just barely scratched the surface!

Thank you,
Abe Voelker
Maxime Lathuilière
2018-09-19 22:17:19 UTC
Permalink
Hello Abe and welcome!

I'm working on inventaire.io <https://inventaire.io>, which might be the
closest existing thing to what you're describing: for the needs of the
book sharing webapp, we maintain an open bibliographic database using
Wikidata vocabulary and extending Wikidata
<https://wiki.inventaire.io/wiki/Data?lang=en> for entries that don't
match the notability requirements and/or were automatically generated
from data found on the web and that couldn't be reconciled with existing
entities on Wikidata. We build edition data primarily around ISBNs,
which are part of GTINs. This wasn't built using Wikibase but with an
/ad hoc/ software (see repo <http://github.com/inventaire/inventaire/>)
as Wikibase federation wasn't ready at all when we started, and still
misses some critical pieces today, but we are considering moving the
bibliographic data in a dedicated federated Wikibase instance at some
point <https://github.com/inventaire/inventaire/issues/186>. The rest of
the data (users, inventories, transactions, maybe reviews
<https://trello.com/c/uwdkvGl1/114-book-reviews> in the future) would
keep their current form (documents in CouchDB databases without any
relation to the Wikidata data model).

So, answering your question, I don't think Wikidata is the place to
crowdsource data about retail products but I'm convinced a database
doing this should do it using Wikidata vocabulary! And just like we are
glad that the WikiProject_Books
<https://www.wikidata.org/wiki/Wikidata:WikiProject_Books> and WikiCite
<https://www.wikidata.org/wiki/Wikidata:WikiCite/Roadmap> exists to work
on a consistent (*/cough/* almost */cough/*) data model on books that we
can reuse within Inventaire
<https://inventaire.github.io/entities-map/>, there are several projects
in or around Wikidata with which such a project could/should work:
- the *WikiObject <https://meta.wikimedia.org/wiki/WikiObject>* sister
project proposal: you got to check that, Quico, the main contributor (in
cc) has been doing quite some research on this very close project
- Wikidata:WikiProject_Companies
<https://www.wikidata.org/wiki/Wikidata:WikiProject_Companies>
- Wikidata:W
<https://www.wikidata.org/wiki/Wikidata:WikiProject_Materials>ikiProject_Materials
<https://www.wikidata.org/wiki/Wikidata:WikiProject_Materials>
- OpenFoodFacts <http://openfoodfacts.org/>, which also has to deal with
GTIN and products properties, and which expressed interest in getting
more integrated with Wikidata
<https://en.wiki.openfoodfacts.org/Structured_Data/Wikidata>

Also of interest:
- Open Product Data <http://product-open-data.com/>, a project that was
sharing your idea but couldn't get the momentum(?)
- the GoodRelations <http://www.heppnetz.de/projects/goodrelations/>
ontology
- OpenCorporates <https://opencorporates.com/>, unfortunately not so
open from what I could tell

I have been dreaming of such a database for a while now (see my (now old
<) articles P2P Resources Management
<https://maxlath.eu/articles/p2p-rm/>, Wikidata and the apt-get of
things
<https://maxlath.eu/articles/wikidata-and-the-apt-get-of-things/>,
Mapping resources using open knowledge
<https://maxlath.eu/articles/mapping-resources-using-open-knowledge/>),
and extending Inventaire to other things that books has always been in
the category of the possible futures, so I would be more than happy to
hear more about any progresses on this :)

Bests,

Maxime LathuiliÚre
maxlath.eu <http://maxlath.eu> - twitter <https://twitter.com/maxlath> -
mastodon <http://mastodon.social/@maxlath> - User:Maxlath
<https://www.wikidata.org/wiki/User:Maxlath>
inventaire.io <https://inventaire.io> - roadmap
<https://trello.com/b/0lKcsZDj/inventaire-roadmap> - code
<https://github.com/inventaire/inventaire> - mastodon
<https://mamot.fr/@inventaire> - twitter
<https://twitter.com/inventaire_io> - facebook
<https://facebook.com/inventaire.io>
Hello,
I was checking out Wikidata and was wondering if it would be a good
fit for a website I wanted to make to crowdsource data about retail
products, storing properties like product name, description, UPC/GTIN,
MPN, manufacturer, color, size, and so on.
I take it that due to Wikidata's Wikipedia notability requirement I'd
have to operate my own Wikibase instance separate from Wikidata? In
that case, is it still possible to integrate with Wikidata's
ontologies, or do I have to have my own completely separate ontology
from scratch (I'd hate to have to reinvent the real basic properties
and constraints)? Are there similar projects I could look at to get an
idea how to partially-fork Wikidata in this way?
Another thing I'm wondering about is how I would integrate data that
wouldn't necessarily fit into the product data ontology, like customer
reviews of the product, or sale offers (offers having their own
properties like price, availability, condition, and hyperlink to
seller's site) - things that aren't inherent characteristics of the
item and change often. I was wondering if it would be easier to have a
"wrapper" website that stores this data separate, while still
integrating with the core product data from Wikibase. Does anyone have
any experience or references to projects doing an integration like
that? I'm wondering what the easiest way to integrate the two would be
- connect directly to the MySQL database, sync databases with hooks,
SPARQL, etc.
Also, some of the data for this website I'd wish to populate by
crawling online retail stores and manufacturers and performing edits
with a bot. Some of these sites provide schema.org <http://schema.org>
metadata, so I was wondering if that makes integration with
Wikidata/Wikibase any easier, or do I still have to do some kind of
manual mapping process between the two.
Thanks for your patience with this braindump as I'm new to Wikidata
and still trying to wrap my head around things. I did the Wikidata
tutorials, messed around with a local Wikibase install using Docker
Compose, and a lot of clicking around Wikidata and reading about
ontologies but it feels like I've just barely scratched the surface!
Thank you,
Abe Voelker
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata
Abe Voelker
2018-09-20 17:15:39 UTC
Permalink
Maxime,

Wow, I greatly appreciate you taking the time to write that very thorough
response!

Indeed inventaire.io looks very similar to what I'd like to do, and the
data model page you linked to is very explanatory and helpful - thank you!
I've been so torn because as you say, using Wikidata vocabulary seems like
the correct way to go, and I do really like Wikidata's editor (especially
being able to add a reference for every statement), however I'm not a PHP
developer nor do I have any experience administering Mediawiki software, so
it would be a very steep hill for me to climb to be able to customize
Wikibase to suit my needs. I may instead end up writing some ad-hoc
software while trying to conform to Wikidata vocabulary as you say, with a
view to making integration with a Wikibase instance easier for later on.

Also, I know I made it sound general in my initial email by saying I wanted
to catalog "retail products" but truth be told I'm actually only really
interested in a specific niche of products - firearms and related
accessories. Sorry if that was misleading but I thought being
overly-specific might be distracting from describing what I was trying to
do. In any case conforming to or integrating with a larger "Wikidata for
products" is still aligned with my interest so thank you for that info and
links.

You've given me a lot to pore over and ruminate on. I also see on your
Wikidata page you've authored some useful tools that I will have to check
out. Thanks again for taking the time; if I make anything useful I will
share.

Best Regards,
Abe
Post by Maxime Lathuilière
Hello Abe and welcome!
I'm working on inventaire.io, which might be the closest existing thing
to what you're describing: for the needs of the book sharing webapp, we
maintain an open bibliographic database using Wikidata vocabulary and
extending Wikidata <https://wiki.inventaire.io/wiki/Data?lang=en> for
entries that don't match the notability requirements and/or were
automatically generated from data found on the web and that couldn't be
reconciled with existing entities on Wikidata. We build edition data
primarily around ISBNs, which are part of GTINs. This wasn't built using
Wikibase but with an *ad hoc* software (see repo
<http://github.com/inventaire/inventaire/>) as Wikibase federation wasn't
ready at all when we started, and still misses some critical pieces
today, but we are considering moving the bibliographic data in a dedicated
federated Wikibase instance at some point
<https://github.com/inventaire/inventaire/issues/186>. The rest of the
data (users, inventories, transactions, maybe reviews
<https://trello.com/c/uwdkvGl1/114-book-reviews> in the future) would
keep their current form (documents in CouchDB databases without any
relation to the Wikidata data model).
So, answering your question, I don't think Wikidata is the place to
crowdsource data about retail products but I'm convinced a database doing
this should do it using Wikidata vocabulary! And just like we are glad that
the WikiProject_Books
<https://www.wikidata.org/wiki/Wikidata:WikiProject_Books> and WikiCite
<https://www.wikidata.org/wiki/Wikidata:WikiCite/Roadmap> exists to work
on a consistent (**cough** almost **cough**) data model on books that we
can reuse within Inventaire <https://inventaire.github.io/entities-map/>,
there are several projects in or around Wikidata with which such a project
- the *WikiObject <https://meta.wikimedia.org/wiki/WikiObject>* sister
project proposal: you got to check that, Quico, the main contributor (in
cc) has been doing quite some research on this very close project
- Wikidata:WikiProject_Companies
<https://www.wikidata.org/wiki/Wikidata:WikiProject_Companies>
- Wikidata:W
<https://www.wikidata.org/wiki/Wikidata:WikiProject_Materials>
ikiProject_Materials
<https://www.wikidata.org/wiki/Wikidata:WikiProject_Materials>
- OpenFoodFacts <http://openfoodfacts.org/>, which also has to deal with
GTIN and products properties, and which expressed interest in getting
more integrated with Wikidata
<https://en.wiki.openfoodfacts.org/Structured_Data/Wikidata>
- Open Product Data <http://product-open-data.com/>, a project that was
sharing your idea but couldn't get the momentum(?)
- the GoodRelations <http://www.heppnetz.de/projects/goodrelations/>
ontology
- OpenCorporates <https://opencorporates.com/>, unfortunately not so open
from what I could tell
I have been dreaming of such a database for a while now (see my (now old
<) articles P2P Resources Management
<https://maxlath.eu/articles/p2p-rm/>, Wikidata and the apt-get of things
<https://maxlath.eu/articles/wikidata-and-the-apt-get-of-things/>, Mapping
resources using open knowledge
<https://maxlath.eu/articles/mapping-resources-using-open-knowledge/>),
and extending Inventaire to other things that books has always been in the
category of the possible futures, so I would be more than happy to hear
more about any progresses on this :)
Bests,
Maxime LathuiliÚre
maxlath.eu - twitter <https://twitter.com/maxlath> - mastodon
<https://www.wikidata.org/wiki/User:Maxlath>
inventaire.io - roadmap <https://trello.com/b/0lKcsZDj/inventaire-roadmap>
- code <https://github.com/inventaire/inventaire> - mastodon
<https://twitter.com/inventaire_io> - facebook
<https://facebook.com/inventaire.io>
Hello,
I was checking out Wikidata and was wondering if it would be a good fit
for a website I wanted to make to crowdsource data about retail products,
storing properties like product name, description, UPC/GTIN, MPN,
manufacturer, color, size, and so on.
I take it that due to Wikidata's Wikipedia notability requirement I'd have
to operate my own Wikibase instance separate from Wikidata? In that case,
is it still possible to integrate with Wikidata's ontologies, or do I have
to have my own completely separate ontology from scratch (I'd hate to have
to reinvent the real basic properties and constraints)? Are there similar
projects I could look at to get an idea how to partially-fork Wikidata in
this way?
Another thing I'm wondering about is how I would integrate data that
wouldn't necessarily fit into the product data ontology, like customer
reviews of the product, or sale offers (offers having their own properties
like price, availability, condition, and hyperlink to seller's site) -
things that aren't inherent characteristics of the item and change often. I
was wondering if it would be easier to have a "wrapper" website that stores
this data separate, while still integrating with the core product data from
Wikibase. Does anyone have any experience or references to projects doing
an integration like that? I'm wondering what the easiest way to integrate
the two would be - connect directly to the MySQL database, sync databases
with hooks, SPARQL, etc.
Also, some of the data for this website I'd wish to populate by crawling
online retail stores and manufacturers and performing edits with a bot.
Some of these sites provide schema.org metadata, so I was wondering if
that makes integration with Wikidata/Wikibase any easier, or do I still
have to do some kind of manual mapping process between the two.
Thanks for your patience with this braindump as I'm new to Wikidata and
still trying to wrap my head around things. I did the Wikidata tutorials,
messed around with a local Wikibase install using Docker Compose, and a lot
of clicking around Wikidata and reading about ontologies but it feels like
I've just barely scratched the surface!
Thank you,
Abe Voelker
_______________________________________________
Hay (Husky)
2018-09-20 22:09:46 UTC
Permalink
Hey Abe, Maxime,
cool that you've been looking into how to use Wikibase / Wikidata for
something like a retail products / firearms database. I could
definitely understand why you're considering that.

I suppose just using the vocabulary and writing your own software is
probably the best way to go. I don't have any experience running my
own Wikibase + Mediawiki instance, but i could imagine it might be a
bit overkill if all you want is a database of firearms with metadata.

Of course, what you could do is try to make integration with Wikidata
as easy as possible. There are a lot of items and properties that you
could reuse, like manufacturers, colours, specific types of guns that
all have a Q-number. If you link those up in your own database you
could, in theory, get all the hard work from the volunteers (like
translations and the like) without having to do the work yourself. The
API is very useful for something like that.

You could even provide your own API, and as long as you provide
permanent identifiers and a machine-readable format for your items,
data might even flow back to Wikidata.

Kind regards,
-- Hay
Post by Abe Voelker
Maxime,
Wow, I greatly appreciate you taking the time to write that very thorough response!
Indeed inventaire.io looks very similar to what I'd like to do, and the data model page you linked to is very explanatory and helpful - thank you! I've been so torn because as you say, using Wikidata vocabulary seems like the correct way to go, and I do really like Wikidata's editor (especially being able to add a reference for every statement), however I'm not a PHP developer nor do I have any experience administering Mediawiki software, so it would be a very steep hill for me to climb to be able to customize Wikibase to suit my needs. I may instead end up writing some ad-hoc software while trying to conform to Wikidata vocabulary as you say, with a view to making integration with a Wikibase instance easier for later on.
Also, I know I made it sound general in my initial email by saying I wanted to catalog "retail products" but truth be told I'm actually only really interested in a specific niche of products - firearms and related accessories. Sorry if that was misleading but I thought being overly-specific might be distracting from describing what I was trying to do. In any case conforming to or integrating with a larger "Wikidata for products" is still aligned with my interest so thank you for that info and links.
You've given me a lot to pore over and ruminate on. I also see on your Wikidata page you've authored some useful tools that I will have to check out. Thanks again for taking the time; if I make anything useful I will share.
Best Regards,
Abe
Post by Maxime Lathuilière
Hello Abe and welcome!
I'm working on inventaire.io, which might be the closest existing thing to what you're describing: for the needs of the book sharing webapp, we maintain an open bibliographic database using Wikidata vocabulary and extending Wikidata for entries that don't match the notability requirements and/or were automatically generated from data found on the web and that couldn't be reconciled with existing entities on Wikidata. We build edition data primarily around ISBNs, which are part of GTINs. This wasn't built using Wikibase but with an ad hoc software (see repo) as Wikibase federation wasn't ready at all when we started, and still misses some critical pieces today, but we are considering moving the bibliographic data in a dedicated federated Wikibase instance at some point. The rest of the data (users, inventories, transactions, maybe reviews in the future) would keep their current form (documents in CouchDB databases without any relation to the Wikidata data model).
- the WikiObject sister project proposal: you got to check that, Quico, the main contributor (in cc) has been doing quite some research on this very close project
- Wikidata:WikiProject_Companies
- Wikidata:WikiProject_Materials
- OpenFoodFacts, which also has to deal with GTIN and products properties, and which expressed interest in getting more integrated with Wikidata
- Open Product Data, a project that was sharing your idea but couldn't get the momentum(?)
- the GoodRelations ontology
- OpenCorporates, unfortunately not so open from what I could tell
I have been dreaming of such a database for a while now (see my (now old ><) articles P2P Resources Management, Wikidata and the apt-get of things, Mapping resources using open knowledge), and extending Inventaire to other things that books has always been in the category of the possible futures, so I would be more than happy to hear more about any progresses on this :)
Bests,
Maxime Lathuilière
maxlath.eu - twitter - mastodon - User:Maxlath
inventaire.io - roadmap - code - mastodon - twitter - facebook
Hello,
I was checking out Wikidata and was wondering if it would be a good fit for a website I wanted to make to crowdsource data about retail products, storing properties like product name, description, UPC/GTIN, MPN, manufacturer, color, size, and so on.
I take it that due to Wikidata's Wikipedia notability requirement I'd have to operate my own Wikibase instance separate from Wikidata? In that case, is it still possible to integrate with Wikidata's ontologies, or do I have to have my own completely separate ontology from scratch (I'd hate to have to reinvent the real basic properties and constraints)? Are there similar projects I could look at to get an idea how to partially-fork Wikidata in this way?
Another thing I'm wondering about is how I would integrate data that wouldn't necessarily fit into the product data ontology, like customer reviews of the product, or sale offers (offers having their own properties like price, availability, condition, and hyperlink to seller's site) - things that aren't inherent characteristics of the item and change often. I was wondering if it would be easier to have a "wrapper" website that stores this data separate, while still integrating with the core product data from Wikibase. Does anyone have any experience or references to projects doing an integration like that? I'm wondering what the easiest way to integrate the two would be - connect directly to the MySQL database, sync databases with hooks, SPARQL, etc.
Also, some of the data for this website I'd wish to populate by crawling online retail stores and manufacturers and performing edits with a bot. Some of these sites provide schema.org metadata, so I was wondering if that makes integration with Wikidata/Wikibase any easier, or do I still have to do some kind of manual mapping process between the two.
Thanks for your patience with this braindump as I'm new to Wikidata and still trying to wrap my head around things. I did the Wikidata tutorials, messed around with a local Wikibase install using Docker Compose, and a lot of clicking around Wikidata and reading about ontologies but it feels like I've just barely scratched the surface!
Thank you,
Abe Voelker
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata
Loading...