Discussion:
[Wikidata] BlazeGraph/wikibase:label performance
Markus Kroetzsch
2018-11-25 13:56:27 UTC
Permalink
Hi,

I just noticed that BlazeGraph takes an undue amount of time for a
rather simple type of queries. The following times out:

SELECT ?item ?itemLabel
WHERE
{
?item wdt:P31 []
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} LIMIT 10

Manually forcing a specific query plan makes the query work in <200ms:

SELECT ?item ?itemLabel
WHERE
{
{ SELECT * WHERE { ?item wdt:P31 [] } LIMIT 10 }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

But of course the original query should normally be streaming and not
depend on any such smartness to push LIMIT inwards.

Cheers,

Markus
Gerard Meijssen
2018-11-25 14:10:31 UTC
Permalink
At that, a Reasonator page fails to load eg
https://tools.wmflabs.org/reasonator/?q=Q57328736&lang=en I have noticed
that using tools like SourceMD will not always show the labels for items
that were processed (less than 20)...

Performance is getting worse and particularly when you rely on up to date
information, Wikidata is not what we were used to.
Thanks,
GerardM

On Sun, 25 Nov 2018 at 14:56, Markus Kroetzsch <
Post by Markus Kroetzsch
Hi,
I just noticed that BlazeGraph takes an undue amount of time for a
SELECT ?item ?itemLabel
WHERE
{
?item wdt:P31 []
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} LIMIT 10
SELECT ?item ?itemLabel
WHERE
{
{ SELECT * WHERE { ?item wdt:P31 [] } LIMIT 10 }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
But of course the original query should normally be streaming and not
depend on any such smartness to push LIMIT inwards.
Cheers,
Markus
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata
Andra Waagmeester
2018-11-25 14:31:41 UTC
Permalink
The issue is with the
"SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }", which
seems to be a very expensive operation.

When you change the query to its more basic form:

SELECT ?item ?itemLabel
WHERE
{
?item wdt:P31 [] ;
rdfs:label ?itemLabel .
FILTER (lang(?itemLabel) = "en")
} LIMIT 10

the results seems to be in before you hit enter.

I am a big fan of the "SERVICE wikibase:label { bd:serviceParam
wikibase:language
"en". }" due to is ease, but as said it is so expensive, that getting rid
of it is one of the first things I do when a query times out.
Post by Gerard Meijssen
At that, a Reasonator page fails to load eg
https://tools.wmflabs.org/reasonator/?q=Q57328736&lang=en I have noticed
that using tools like SourceMD will not always show the labels for items
that were processed (less than 20)...
Performance is getting worse and particularly when you rely on up to date
information, Wikidata is not what we were used to.
Thanks,
GerardM
On Sun, 25 Nov 2018 at 14:56, Markus Kroetzsch <
Post by Markus Kroetzsch
Hi,
I just noticed that BlazeGraph takes an undue amount of time for a
SELECT ?item ?itemLabel
WHERE
{
?item wdt:P31 []
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} LIMIT 10
SELECT ?item ?itemLabel
WHERE
{
{ SELECT * WHERE { ?item wdt:P31 [] } LIMIT 10 }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
But of course the original query should normally be streaming and not
depend on any such smartness to push LIMIT inwards.
Cheers,
Markus
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata
Finn Aarup Nielsen
2018-11-25 14:33:09 UTC
Permalink
The labeling issue is not related to the (recent) lags in Wikidata Query
Service. In Scholia we are almost always using the second type of
queries (and usually with WITHIN), particularly for a GROUP BY
constructs. I am wondering whether the examples on the wiki,
sufficiently clear note this issue.

When I view
https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1
it seems that the recent lags are particular a problem during American
daytime, while at other times it is lower. It might not be so
problematic during weekends.

/Finn
Post by Gerard Meijssen
At that, a Reasonator page fails to load eg
https://tools.wmflabs.org/reasonator/?q=Q57328736&lang=en I have noticed
that using tools like SourceMD will not always show the labels for items
that were processed (less than 20)...
Performance is getting worse and particularly when you rely on up to
date information, Wikidata is not what we were used to.
Thanks,
       GerardM
On Sun, 25 Nov 2018 at 14:56, Markus Kroetzsch
Hi,
I just noticed that BlazeGraph takes an undue amount of time for a
SELECT ?item ?itemLabel
WHERE
{
   ?item wdt:P31 []
   SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} LIMIT 10
SELECT ?item ?itemLabel
WHERE
{
   { SELECT * WHERE { ?item wdt:P31 [] } LIMIT 10 }
   SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
But of course the original query should normally be streaming and not
depend on any such smartness to push LIMIT inwards.
Cheers,
Markus
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata
Stas Malyshev
2018-11-26 20:26:24 UTC
Permalink
Hi!
Post by Markus Kroetzsch
But of course the original query should normally be streaming and not
depend on any such smartness to push LIMIT inwards.
You are correct, but this may be a consequence of how Blazegraph treats
services. I'll try to look into it - it is possible that it doesn't do
streaming correctly there.
--
Stas Malyshev
***@wikimedia.org
Loading...