Parsing simple sentences in Prolog, you get grammars that look kind of like this:

sentence --> np, vp.
np --> pronoun.
np --> noun.
vp --> iv.
vp --> tv.

pronoun --> [i].
noun --> [cheese].
iv --> [sit].
tv --> [like], np.

Now with this, it’s possible to parse sentences as complex and interesting as “i like cheese” or “i sit”. Wow! Except, just accepting them is not very interesting:

| ?- phrase(sentence, [i, like, cheese]).

true 

The computer scientist will want to actually obtain the parse tree, which is going to have the same structure as the grammar:

sentence(s(NP,VP)) --> np(NP), vp(VP).
np(pronoun(P)) --> pronoun(P).
np(noun(N)) --> noun(N).
vp(iv(IV)) --> iv(iv(IV)).
vp(tv(TV, NP)) --> tv(TV), np(NP).

pronoun(i) --> [i].
noun(cheese) --> [cheese].
iv(sit) --> [sit].
tv(like) --> [like].

This gives us a parse tree:

| ?- phrase(sentence(S), [i, like, cheese]).

S = s(pronoun(i),tv(like,noun(cheese))) 

However, the result is hostile to evaluation. Why should the rest of my program have to know everything about my grammar? We should insulate it with a nice intermediate representation—and we’re using Prolog, so let’s make that intermediate representation look more like a Prolog query! likes(i, cheese)!

Wait a second. For this to work, I need to return a functor likes/2, but I don’t have likes/2 until after I parse i. And, I have to put cheese in there after parsing likes! This tree has a totally different shape than the input!

The solution turns out to be:

  • Use a lambda calculus-styled function as the verb phrase
  • Apply the noun phrases to that verb phrase to produce the result

In other words, our verb produced by our vp//1 isn’t going to be likes/2, it’s going to be a lambda expression which, when applied to the subject, produces that likes/2 functor. It’s going to be λx.likes(x,cheese). In Prolog, we’re going to encode this as X^likes(X, cheese), and then we can have a reduction step that looks like reduce(X^Term, X, Term):

| ?- reduce(X^likes(X,cheese),i,T).

T = likes(i,cheese)
X = i

So round three:

reduce(X^Term, X, Term).

sentence(S) --> np(NP), vp(VP), { reduce(VP, NP, S) }.
np(P) --> pronoun(P).
np(N) --> noun(N).
vp(IV) --> iv(IV).
vp(VP) --> tv(TV), np(NP), { reduce(TV, NP, VP) }.

pronoun(i) --> [i].
noun(cheese) --> [cheese].
iv(X^sits(X)) --> [sit].
tv(X^Y^likes(Y,X)) --> [like].

Note that we flipped the variables around in likes/2 because the next thing in the parse tree is the direct object, so that will be the first reduction, and the subject (which we obtained before) is going to get applied last, when the whole parse is complete. And the result?

| ?- phrase(sentence(S), [i,like,cheese]).

S = likes(i,cheese) 

Groovy. But… it turns out that reduce operator is so simple, we can do what it does without calling it explicitly. Ready?

sentence(S) --> np(NP), vp(NP^S).
np(P) --> pronoun(P).
np(N) --> noun(N).
vp(IV) --> iv(IV).
vp(TV) --> tv(NP^TV), np(NP).

pronoun(i) --> [i].
noun(cheese) --> [cheese].
iv(X^sits(X)) --> [sit].
tv(X^Y^likes(Y,X)) --> [like].

Kind of spooky looking, seeing np(NP), vp(NP^S), but it still works, because the structure that vp//1 produces is going to look like X^foo(...,X,...) for some term and saying NP^S is going to force NP and X to be equal, which is evaluation the way lambda calculus does it. Proof it works:

| ?- phrase(sentence(S), [i,like,cheese]).

S = likes(i,cheese) 

This is cool. But it turns out most noun phrases are not atoms like i, but actually they work out to be variables with quantifiers. The most obvious is a sentence like “a student wrote a program”, which should not parse into wrote(student, program) but instead something more complex…