Discussion:
[Wikidata] VIAF date scraping errors
Renée Bagslint
2018-08-19 08:37:54 UTC
Permalink
If you Google "Emily Riehl", you will find that Google tells you that she
was born in 1950: this is certainly complete nonsense. Her date of birth
isn't in her Wikipedia article, which is where Google gets its text from,
but turns out to be in her Wikidata entry, having added by Reinheitsgebot,
operated by Magnus Manske. It seems that in May when this and many other
dates were being added by the bot, it was scraping VIAF files and
incorrectly parsing the XML for the Marc21 entry 997, which gives everyone
born in the 20th century an arbtrary "floruit" date of 1950 if not
otherwise available. I have to say that I'm not completely sure about every
detail of this diagnosis -- 997 is not even a standard Marc field, it's
reserved for local use: significant dates such as birth and death are
encoded in field 046. But it is clear that the dates being inserted by the
bot can be completely fictitious. This was reported in May but it seems
that it has not been fixed. As a result Wikidata and hence Google are
delivering an unknown number of incorrect dates of birth.
Jane Darnell
2018-08-28 08:15:43 UTC
Permalink
Yes I reached the same conclusion. In fact it's even worse than 1950, and
many people have been given the incorrect birthdate of "1900" for the same
reason. I have started going through women items with 1900-1-1 birthdates,
correcting the birthdate where possible, and where impossible (e.g. simply
not enough information), I have been moving the incorrect date to "1960s"
or "1970s", just to offload the 1900 errors and carve out any chance of
PD-work (died before 1948). A big mess indeed :(
Post by Renée Bagslint
If you Google "Emily Riehl", you will find that Google tells you that she
was born in 1950: this is certainly complete nonsense. Her date of birth
isn't in her Wikipedia article, which is where Google gets its text from,
but turns out to be in her Wikidata entry, having added by
Reinheitsgebot, operated by Magnus Manske. It seems that in May when this
and many other dates were being added by the bot, it was scraping VIAF
files and incorrectly parsing the XML for the Marc21 entry 997, which gives
everyone born in the 20th century an arbtrary "floruit" date of 1950 if not
otherwise available. I have to say that I'm not completely sure about every
detail of this diagnosis -- 997 is not even a standard Marc field, it's
reserved for local use: significant dates such as birth and death are
encoded in field 046. But it is clear that the dates being inserted by the
bot can be completely fictitious. This was reported in May but it seems
that it has not been fixed. As a result Wikidata and hence Google are
delivering an unknown number of incorrect dates of birth.
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata
Loading...