Determine if a sentence is an inquiry

前端 未结 6 1378
一向
一向 2020-12-23 14:54

How can I detect if a search query is in the form of a question?

For example, a customer might search for \"how do I track my order\" (notice no question mark).

相关标签:
6条回答
  • 2020-12-23 15:16

    I took a stab at this... my goal was to do something lightweight that wouldn't require additional libraries, and would give each developer the ability to control a few necessary elements - such as padding certain chars, using negative contractions as first word position only, and allowing for common question elements. I created two functions, that when you pass in a value from an Angular6 HTML page, it does a pretty good job for most of my cases...

    I don't include "don't" as a starter word because it can be a statement as many times as a question. Don't you think?

    Angular HTML:

              <input matInput type="text" placeholder="{{Prompt}}" [(ngModel)]="value">
    

    .ts functions:

      isQuestion(sentence: string = this.value){
        var q_elements : string[] = ["who", "what", "when", "where", "why", "how", "?"];
        var q_starters : string[] = ["which", "won't", "can't", "isn't", "aren't", "is", "do", "does", "will", "can", "is"];
        var temp = sentence.toLowerCase();
        var padChars : string[] = ["?", "-", "/"];
        var i : number = 0;
        for (i=0; i < padChars.length; i++) {
          temp = this.padChar(temp, padChars[i]);
        }
        var splitted = temp.split(" ");
        // console.log(splitted);
        if (q_starters.includes(splitted[0])) {
          // console.log('found a question with a starter');
          return true;
        } else {
          return q_elements.some(function (v) {
            return splitted.indexOf(v) >= 0;
          });
        }
      }
    
      padChar(myString : string, myChar : string) {
        var position = myString.indexOf(myChar);
        var output : string = myString;
        while(position > 0 && position < output.length) {
          if (output.charAt(position - 1) != " ") {
            output = [output.slice(0, position), " ", output.slice(position)].join('');
            position = output.indexOf(myChar, position);
          }
          if (position + 1 < output.length) {
            if (output.charAt(position + 1) != " ") {
              output = [output.slice(0, (position + 1)), " ", output.slice(position + 1)].join('');
              position = output.indexOf(myChar, position);
            }
          }
          position = output.indexOf(myChar, position + 1);
        }
        return output;
      }
    
    0 讨论(0)
  • 2020-12-23 15:17

    See also: How to find out if a sentence is a question (interrogative)?

    My answer from that question:

    In a syntactic parse of a question (obtained through a toolkit like nltk), the correct structure will be in the form of:

    (SBARQ (WH+ (W+) ...)
           (SQ ...*
               (V+) ...*)
           (?))
    

    So, using anyone of the syntactic parsers available, a tree with an SBARQ node having an embedded SQ (optionally) will be an indicator the input is a question. The WH+ node (WHNP/WHADVP/WHADJP) contains the question stem (who/what/when/where/why/how) and the SQ holds the inverted phrase.

    i.e.:

    (SBARQ 
      (WHNP 
        (WP What)) 
      (SQ 
        (VBZ is) 
        (NP 
          (DT the) 
          (NN question)))
      (. ?))
    

    Of course, having a lot of preceeding clauses will cause errors in the parse (that can be worked around), as will really poorly-written questions. For example, the title of this post "How to find out if a sentence is a question?" will have an SBARQ, but not an SQ.

    0 讨论(0)
  • 2020-12-23 15:19

    Finding out if a sentence is a question is not an easiest task, because there is many ways how people asks questions, many of them do not follows grammar rules. Therefore it is hard to find a good rule set for the detection. In such situations, I would go for machine learning and train an algorithm using annotated text corpus (creating a corpus and selecting a feature set can take some time). The machine learning based recognition should provide you better recall than the rule based approach. Here is a step by step instruction:

    1. Manual creation of train data set: Get an annotated -- with information if it is a question or not -- text collection or create such a corpus on your own (it should be more then 100 documents and many questions must not be straightforward questions )
    2. Find most important features - extract part-of-speeches, 5W1H (what, which,..., how), get a position of a verb in each of sentences, and other things that can be useful in the recognition of a question
    3. Create a vector for each of sentences of features (you need both, positive and negative examples) based on the extracted informaiton, e.g.,

      | Has ? | A verb on second position | Has 5W1H | Is 5W1H on 1st position in sentence | ... | length of sentence | Is a question |

    4. Use the vectors to train a machine learning algorithm, e.g., MaximumEntropy, SVM (you can use Wekka or Knime)

    5. Use the trained algorithm for the question recognition.

    6. If needed (new question examples), repeat steps.

    0 讨论(0)
  • 2020-12-23 15:31

    To identify start-words on question sentences, you should go through a large text corpus looking for sentences that end in a ?, and figure out the most frequent start-words you find in those.

    A few you missed that come to mind include WHICH, AM, ARE, WAS, WERE, MAY, MIGHT, CAN, COULD, WILL, SHALL, WOULD, SHOULD, HAS, HAVE, HAD, and DID. Perhaps also IF to go with WHEN. Also consider IN, AT, TO, FROM, and ON, plus maybe UNDER and OVER. All depends on the sort of query system you have and how much latitude in natural language queries you hope to provide your users with.

    Similarly, you should examine all your own queries that people have already made in the same light, finding which of their questions actually do end in a ? to help identify similar ones which do not.

    That should find a lot of the interrogatives; are imperatives also a possibility?

    Depending how fancy you want to get, you might consider using something like Wordnet as a start of part-of-speech tagging. It’s mostly for synonym sets, including hypernym, hyponym, holonym, and meronym information, but I believe it will have the other information you’re looking for as well.

    Wikipedia has a couple of articles on question answering and natural language search engines. Both have references you might care to pursue. You might also glance through these PDF papers:

    • “Towards a Theory of Natural Language Interfaces to Databases”
    • “Natural language question answering: the view from here”
    • “Natural Language Question Answering Model Applied to a Document Retrieval System”.

    Lastly, the START Natural Language Question Answering System from MIT seems interesting.

    0 讨论(0)
  • 2020-12-23 15:33

    You are going to need a much more advanced form of linguistic analysis to pull this off. Need Proof? Okay...

    Does are female deer.

    Where there is a will, there is a way.

    When the time comes, I'll jump!

    Why no. I do not have any pumpernickel.

    0 讨论(0)
  • 2020-12-23 15:34

    In support of JohnFx's answer, it gets even worse. The following are clearly questions:

    • Have you got any questions
    • Will this answer suffice
    • A question, what's that

    And then you'll find that users start entering the following kind of queries:

    • I'd like to know what questions are.

    Is that even a question? Syntactically, no, but it does deserve a reply that could easily be termed an answer. (These kinds of queries may be quite common, depending on your user population.)

    Bottom-line: if you're not going to handle questions in a special, linguistically sophisticated way (such as construct a direct answer using natural language generation), recognizing them may not even be interesting. Picking the right keywords from the query may be much more rewarding.

    0 讨论(0)
提交回复
热议问题