Discussion:
[Wikidata] BlazeGraph/wikibase:label performance
Markus Kroetzsch
6 years ago
Permalink
Hi,

I just noticed that BlazeGraph takes an undue amount of time for a
rather simple type of queries. The following times out:

SELECT ?item ?itemLabel
WHERE
{
?item wdt:P31 []
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} LIMIT 10

Manually forcing a specific query plan makes the query work in <200ms:

SELECT ?item ?itemLabel
WHERE
{
{ SELECT * WHERE { ?item wdt:P31 [] } LIMIT 10 }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

But of course the original query should normally be streaming and not
depend on any such smartness to push LIMIT inwards.

Cheers,

Markus
Gerard Meijssen
6 years ago
Permalink
At that, a Reasonator page fails to load eg
https://tools.wmflabs.org/reasonator/?q=Q57328736&lang=en I have noticed
that using tools like SourceMD will not always show the labels for items
that were processed (less than 20)...

Performance is getting worse and particularly when you rely on up to date
information, Wikidata is not what we were used to.
Thanks,
GerardM

On Sun, 25 Nov 2018 at 14:56, Markus Kroetzsch <
...
Andra Waagmeester
6 years ago
Permalink
The issue is with the
"SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }", which
seems to be a very expensive operation.

When you change the query to its more basic form:

SELECT ?item ?itemLabel
WHERE
{
?item wdt:P31 [] ;
rdfs:label ?itemLabel .
FILTER (lang(?itemLabel) = "en")
} LIMIT 10

the results seems to be in before you hit enter.

I am a big fan of the "SERVICE wikibase:label { bd:serviceParam
wikibase:language
"en". }" due to is ease, but as said it is so expensive, that getting rid
of it is one of the first things I do when a query times out.
...
Finn Aarup Nielsen
6 years ago
Permalink
The labeling issue is not related to the (recent) lags in Wikidata Query
Service. In Scholia we are almost always using the second type of
queries (and usually with WITHIN), particularly for a GROUP BY
constructs. I am wondering whether the examples on the wiki,
sufficiently clear note this issue.

When I view
https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1
it seems that the recent lags are particular a problem during American
daytime, while at other times it is lower. It might not be so
problematic during weekends.

/Finn
...
Stas Malyshev
6 years ago
Permalink
Hi!
Post by Markus Kroetzsch
But of course the original query should normally be streaming and not
depend on any such smartness to push LIMIT inwards.
You are correct, but this may be a consequence of how Blazegraph treats
services. I'll try to look into it - it is possible that it doesn't do
streaming correctly there.
--
Stas Malyshev
***@wikimedia.org
Loading...