Discussion:
[Wikidata] BlazeGraph and OPTIONAL
Markus Kroetzsch
2018-11-25 14:51:15 UTC
Permalink
Hi,

I am puzzled by the behaviour of a SPARQL query. Maybe there is an error
with BlazeGraph here, but hopefully I am just overlooking something.

The query is as follows: http://tinyurl.com/y95jpmhq

SELECT ?item ?birthdate ?spouse
WHERE
{
{ ?item wdt:P569 ?birthdate
FILTER (year(?birthdate)>1900)
?item wdt:P26 []
} OPTIONAL {
?item wdt:P26 ?spouse
FILTER (year(?birthdate) = 1947)
}
# FILTER (year(?birthdate) = 1947) ## For testing: works correctly
} LIMIT 1000

What this should do: "Select married people born after 1900, and,
optionally, also select their spouses, but only for people born in
1947." What BlazeGraph does is: "Select married people born after 1900;
never select any spouses, even if the person is born in 1947".

The 1000 results should contain lines for 1947 births, so you can see
they have no spouse. The commented out filter at the bottom can be used
instead of the inner filter to verify that the condition has no typos
and really matches some of the items.

It seems that BlazeGraph gets the scope of ?birthdate wrong here, and
rather processes the whole query inside out, applying the FILTER to the
optional pattern (where ?birthdate is not bound) and then using the
(empty) result in a binary LeftJoin operation. In reality, LeftJoin in
the SPARQL algebra is a ternary operator that applies the FILTER to the
Join of both sides to determine if we have an optional match or not:

* See "Definition: LeftJoin" in Section 18.5 of the spec [1].

Filters within optional patterns become the third parameter in the
LeftJoin operation when translating queries as in my example:

* See example "{ ?s :p1 ?v1 OPTIONAL {?s :p2 ?v2 FILTER(?v1<3) } }" in
Section 18.2.3 of the spec [1].

Is my interpretation correct or did I overlook something? Is this a
known problem?

Cheers,

Markus

[1] https://www.w3.org/TR/sparql11-query
--
Prof. Dr. Markus Kroetzsch
Knowledge-Based Systems Group
Center for Advancing Electronics Dresden (cfaed)
Faculty of Computer Science
TU Dresden
+49 351 463 38486
https://kbs.inf.tu-dresden.de/
Sylvain Boissel
2018-11-25 15:16:30 UTC
Permalink
Remove the { } around the 3 first lines and it should work fine.

Le dim. 25 nov. 2018 à 15:52, Markus Kroetzsch <
Post by Markus Kroetzsch
Hi,
I am puzzled by the behaviour of a SPARQL query. Maybe there is an error
with BlazeGraph here, but hopefully I am just overlooking something.
The query is as follows: http://tinyurl.com/y95jpmhq
SELECT ?item ?birthdate ?spouse
WHERE
{
{ ?item wdt:P569 ?birthdate
FILTER (year(?birthdate)>1900)
?item wdt:P26 []
} OPTIONAL {
?item wdt:P26 ?spouse
FILTER (year(?birthdate) = 1947)
}
# FILTER (year(?birthdate) = 1947) ## For testing: works correctly
} LIMIT 1000
What this should do: "Select married people born after 1900, and,
optionally, also select their spouses, but only for people born in
1947." What BlazeGraph does is: "Select married people born after 1900;
never select any spouses, even if the person is born in 1947".
The 1000 results should contain lines for 1947 births, so you can see
they have no spouse. The commented out filter at the bottom can be used
instead of the inner filter to verify that the condition has no typos
and really matches some of the items.
It seems that BlazeGraph gets the scope of ?birthdate wrong here, and
rather processes the whole query inside out, applying the FILTER to the
optional pattern (where ?birthdate is not bound) and then using the
(empty) result in a binary LeftJoin operation. In reality, LeftJoin in
the SPARQL algebra is a ternary operator that applies the FILTER to the
* See "Definition: LeftJoin" in Section 18.5 of the spec [1].
Filters within optional patterns become the third parameter in the
* See example "{ ?s :p1 ?v1 OPTIONAL {?s :p2 ?v2 FILTER(?v1<3) } }" in
Section 18.2.3 of the spec [1].
Is my interpretation correct or did I overlook something? Is this a
known problem?
Cheers,
Markus
[1] https://www.w3.org/TR/sparql11-query
--
Prof. Dr. Markus Kroetzsch
Knowledge-Based Systems Group
Center for Advancing Electronics Dresden (cfaed)
Faculty of Computer Science
TU Dresden
+49 351 463 38486
https://kbs.inf.tu-dresden.de/
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata
Markus Kroetzsch
2018-11-25 22:40:47 UTC
Permalink
Indeed, thanks! I could already have seen this from the SPARQL grammar:
OptionalGraphPattern is a special part within one group graph pattern
(analogously to how FILTER works), whereas my braces would make it a
binary operator between two group graph patterns (analogously to how
UNION works). So BlazeGraph is totally correct here.

Markus
Post by Sylvain Boissel
Remove the { } around the 3 first lines and it should work fine.
Le dim. 25 nov. 2018 à 15:52, Markus Kroetzsch
Hi,
I am puzzled by the behaviour of a SPARQL query. Maybe there is an error
with BlazeGraph here, but hopefully I am just overlooking something.
The query is as follows: http://tinyurl.com/y95jpmhq
SELECT ?item ?birthdate ?spouse
WHERE
{
   { ?item wdt:P569 ?birthdate
     FILTER (year(?birthdate)>1900)
     ?item wdt:P26 []
   } OPTIONAL {
     ?item wdt:P26 ?spouse
     FILTER (year(?birthdate) = 1947)
   }
   # FILTER (year(?birthdate) = 1947) ## For testing: works correctly
} LIMIT 1000
What this should do: "Select married people born after 1900, and,
optionally, also select their spouses, but only for people born in
1947." What BlazeGraph does is: "Select married people born after 1900;
never select any spouses, even if the person is born in 1947".
The 1000 results should contain lines for 1947 births, so you can see
they have no spouse. The commented out filter at the bottom can be used
instead of the inner filter to verify that the condition has no typos
and really matches some of the items.
It seems that BlazeGraph gets the scope of ?birthdate wrong here, and
rather processes the whole query inside out, applying the FILTER to the
optional pattern (where ?birthdate is not bound) and then using the
(empty) result in a binary LeftJoin operation. In reality, LeftJoin in
the SPARQL algebra is a ternary operator that applies the FILTER to the
* See "Definition: LeftJoin" in Section 18.5 of the spec [1].
Filters within optional patterns become the third parameter in the
* See example "{ ?s :p1 ?v1 OPTIONAL {?s :p2 ?v2 FILTER(?v1<3) } }" in
Section 18.2.3 of the spec [1].
Is my interpretation correct or did I overlook something? Is this a
known problem?
Cheers,
Markus
[1] https://www.w3.org/TR/sparql11-query
--
Prof. Dr. Markus Kroetzsch
Knowledge-Based Systems Group
Center for Advancing Electronics Dresden (cfaed)
Faculty of Computer Science
TU Dresden
+49 351 463 38486
https://kbs.inf.tu-dresden.de/
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
https://lists.wikimedia.org/mailman/listinfo/wikidata
Loading...