UP | HOME

Is this sentence a question?

One way to answer this is via Part of Speech parsing (PoS). It is trivial for a Clojurian to turn to Stanford's coreNLP, a Java library, and parse a sentence for its constituent parts. You get not only word and phrase level tags (this is a noun, this is a verb phrase), but also clause level tags that tell you this is a direct question (SBARQ) or this is a yes/no question (SQ).

coreNLP will parse a sentence into an elaborate data structure where all this tag goodness is buried. The API undoubtedly allows us to unearth it. However, I haven't read the documentation because I noticed that the string representation of that data structure is a list. And as a Lisper, I think to myself: say no more.

"(ROOT (SQ (VBZ Is) (NP (DT this) (NN sentence)) (NP (DT a) (NN question)) (. ?)))"

The Lisp reader will instantly recognizes its favorite nugget and turn it into a S-expression.

(ROOT (SQ (VBZ Is) (NP (DT this) (NN sentence)) (NP (DT a) (NN question)) (. ?)))  

Now we have a data structure where each element is either a symbol or another list.

By now, my brain is in recursion mode: give me the first element of the list, is this a symbol? If yes, check if the symbol is equal to SBARQ or SQ. If it is, exit with true. If not, recur with the rest of the list. If the first element is not a symbol but a list, recur with the rest of the list. If the first element is neither a symbol nor a list, it must be nil and we're done processing.

(loop [x parse-tree]
      (let [elm (first x)]
        (cond
          (list? elm) (recur elm)
          (symbol? elm) (if (or (= elm 'SBARQ)
                                (= elm 'SQ))
                          true
                          (recur (rest x)))
          :else false)))  

Sure, a Perl hacker would have slapped a regex on the string representation and be done with it. And maybe I should have read the Javadoc for LabeledScoredTreeNode in the first place. But recursion carries an elegance that gets to me every single time.