Discussion:
[Wikidata] ScienceSource participation: focus list
Charles Matthews
2018-08-15 14:14:44 UTC
Permalink
It is now possible to participate in the ScienceSource project, by adding "main subject" statements.

The ScienceSource focus list has been up for a little while now at

https://www.wikidata.org/wiki/Wikidata:ScienceSource_focus_list

which has the shortcut WD:SSFL. Wikidata items about biomedical articles can be added to the list as explained on the page, using P5008. That page has other links to expository material. The original grant page at

https://meta.wikimedia.org/wiki/Grants:Project/ScienceSource

give an overview of the project's aims.

The Listeria-generated page

https://www.wikidata.org/wiki/Wikidata:ScienceSource_focus_list/Main_subject_needed

linked from the focus list page shows which of the items on the list (which is around 3K now, see the SPARQL query on the talk page WT:SSFL) lack a main subject (P921) statement.

At Wikimania I made the clarification that the focus list is supposed to be better "balanced" than the selection of articles represented on Wikidata as a whole. This is a big issue, but I don't think it is really disputed that the existing literature is more interested in the diseases of prosperous people and prosperous countries. On a straight utilitarian argument about the "greatest good of the greatest number", there is a problem.

Therefore, the composition of the focus list should not be a proportionate reflection of the 17.5M articles represented in Wikidata, by topic. We are looking first to include about 0.2% of articles on the list, bringing it up to about 40K. The Listeria page is a sortable table, and if you sort by "published in" you'll see plenty from PLOS Neglected Tropical Diseases - thanks to Daniel Mietchen for adding a collection of well-cited papers from there.

Later on there should be other lists by topic area, so we can get an idea of balance. Once main subjects (where type of disease is the most important area) build up, SPARQL aggregates can reveal distribution. This bubble chart query

https://tinyurl.com/y89s6nlc

gives a baseline, showing that currently the list's subjects are dominated by infectious diseases.

Where next? In the coming weeks, the ScienceSource wiki at http://sciencesource.wmflabs.org/ will be developed. Text-mining and annotation there will be the next phase. Downloading of papers to the wiki will depend on accumulating metadata on their Wikidata items.

Charles
Thibaut DEVERAUX
2018-09-01 12:24:33 UTC
Permalink
Hi

Would it be a good approach too make a list if items that are clearly
potential main topics (ie all diseases, all materials...) and create a bot
that add these items as main topics if these are un the title ?

Regards



Le jeu. 16 août 2018 à 12:29, Charles Matthews <
Post by Charles Matthews
It is now possible to participate in the ScienceSource project, by adding
"main subject" statements.
The ScienceSource focus list has been up for a little while now at
https://www.wikidata.org/wiki/Wikidata:ScienceSource_focus_list
which has the shortcut WD:SSFL. Wikidata items about biomedical articles
can be added to the list as explained on the page, using P5008. That page
has other links to expository material. The original grant page at
https://meta.wikimedia.org/wiki/Grants:Project/ScienceSource
give an overview of the project's aims.
The Listeria-generated page
https://www.wikidata.org/wiki/Wikidata:ScienceSource_focus_list/Main_subject_needed
linked from the focus list page shows which of the items on the list
(which is around 3K now, see the SPARQL query on the talk page WT:SSFL)
lack a main subject (P921) statement.
At Wikimania I made the clarification that the focus list is supposed to
be better "balanced" than the selection of articles represented on Wikidata
as a whole. This is a big issue, but I don't think it is really disputed
that the existing literature is more interested in the diseases of
prosperous people and prosperous countries. On a straight utilitarian
argument about the "greatest good of the greatest number", there is a
problem.
Therefore, the composition of the focus list should not be a proportionate
reflection of the 17.5M articles represented in Wikidata, by topic. We are
looking first to include about 0.2% of articles on the list, bringing it up
to about 40K. The Listeria page is a sortable table, and if you sort by
"published in" you'll see plenty from PLOS Neglected Tropical Diseases -
thanks to Daniel Mietchen for adding a collection of well-cited papers from
there.
Later on there should be other lists by topic area, so we can get an idea
of balance. Once main subjects (where type of disease is the most important
area) build up, SPARQL aggregates can reveal distribution. This bubble
chart query
https://tinyurl.com/y89s6nlc
gives a baseline, showing that currently the list's subjects are dominated
by infectious diseases.
Where next? In the coming weeks, the ScienceSource wiki at
http://sciencesource.wmflabs.org/ will be developed. Text-mining and
annotation there will be the next phase. Downloading of papers to the wiki
will depend on accumulating metadata on their Wikidata items.
Charles
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata
--
Thibaut DEVERAUX
+33 (0)6 75 51 20 80
Loading...