parse natural language

后端 未结 3 742
粉色の甜心
粉色の甜心 2021-02-06 17:17

To start: I know this system will have flaws!

NOTE: Im adding a few other languages because I don\'t find this problem specific to php

相关标签:
3条回答
  • 2021-02-06 17:44

    Parsing natural language is non-trivial, if you want a true natural language parser I'd recommend that you try and use an existing project or library. Here's a web based parser, based on the Stanford Parser. Or wikipedia is a good jumping off point.

    Having said that, if you're willing to restrict the syntax and the keywords involved you might be able to simplify it. First you need to know what's important -- you have 'things' (lights, fan) in 'places' (bedroom, kitchen) that need to go into a specific state ('on', 'off').

    I would get the string into an array of words, either using str_tok, or just explode on ' '.

    Now you have an array of words start at the end and go backwards looking for a 'state' -- on or off. Then follow that backwards looking for a 'thing', and finally a 'place'. If you hit another state then you can start again.

    Let me try and do that in pseudocode:

    // array of words is inArray
    currentPlace = null;
    currentThing = null; 
    currentState = null;
    for (i = (inArray.length - 1); i >= 0; i--) {
        word = inArray[i];
    
        if (isState(word)) {
    
          currentState = word;
          currentPlace = null;
          currentThing = null;
    
        } else if (currentState) {
    
            if (isThing(word)) { 
    
                 currentThing = word;
                 currentPlace = null;
    
            } else if (currentThing) { 
    
                 if (isPlace(word)) { 
                     currentPlace = word
                     // Apply currentState to currentThing in currentPlace
                 }
                 // skip non-place, thing or state word. 
            }
            // Skip when we don't have a thing to go with our state
    
        } 
        // Skip when we don't have a current state and we haven't found a state
    }
    

    And, having written that, it's pretty clear that it should have used a state machine and switch statements -- which goes to show I should have designed it on paper first. If you get anymore complex you want to use a state machine to implement the logic -- states would be 'lookingForState', 'lookingForThing', etc

    Also you don't really need currentPlace as a variable, but I'll leave it as it makes the logic clearer.

    EDIT

    If you want to support 'turn the lights in the bedroom on' you'll need to be adjust the logic (you need to save the 'place' if when you don't have a thing). If you also want to support 'turn on the lights in the bedroom' you'll need to go even further.

    Thinking about it, I wonder if you can just do:

    have a currentState variable and arrays for currentPlace and currentThing
    for each word 
        if it's a state:
            store it in currentState 
        if it's a thing, or place:
            add it to the approriate array
            if currentState is set and there is content in currentPlaces and currentThings:
                apply currentState to all currentThings in all currentPlaces
    

    That's not quite there, but one of those implementations might give you a starting point.

    EDIT 2

    OK, I tested it out and there's a few issues due to the way English is structured. The problem is if you want to support 'Turn on ...' and 'Turn ... on' then you need to use my second pseudo-code but that doesn't work easily because of the 'and's in the sentence. For example:

    Turn my kitchen lights on and my bedroom and living room lights off.

    The first and joins two statements, the second and joins to places. The correct way to do this is to diagram the sentence to work out what applies to what.

    There are two quick options, first you could insist on using a different word or phrase to join two commands:

    Turn my kitchen lights on then my bedroom and living room lights off. Turn my kitchen lights on and also my bedroom and living room lights off.

    Alternatively, and this is probably easier you can insist on only having commands of the form 'Turn ... off/on'. This works with my first psuedocode above.

    JavaScript Example of first psuedocode.

    Note, you'll probably need to heavily pre-process the string if there's any chance of punctuation, etc. You might also want to look at replacing 'living room' (and similar two word phrases) with 'livingroom' rather than just matching one word and hoping for the best like I'm doing. Also, the code could be simplified a bit, but I wanted to keep it close to the psuedocode example.

    EDIT 3

    New Javascript Example

    This handles some extra sentences and is cleaned up a bit better, it still relies on the 'state' coming at the end of each clause as that's what it uses as a trigger to apply the actions (this version could probably read forwards instead of backwards). Also, it will not handle something like:

    Turn my kitchen fan and my bedroom lights on and living room lights off.
    

    You have to do something more complex to understand the relationship between 'kitchen' and 'fan' and 'bedroom' and 'lights'.

    Some combination of those techniques is probably enough to do something fairly impressive, as long as whoever's entering / speaking the commands follows some basic rules.

    0 讨论(0)
  • 2021-02-06 17:47

    That's certainly not the most efficient solution, but here's one. You can definitely improve it, like caching regular expressions, but you get the idea. The last item in every sub-array is the operation.

    DEMO

    var s = 'Turn my kitchen lights on and my bedroom lights on and living room lights off and my test and another test off',
        r = s.replace(/^Turn|\s*my/g, '').match(/.+? (on|off)/g).map(function(item) {
            var items = item.trim().replace(/^and\s*/, '').split(/\s*and\s*/),
                last = items.pop().split(' '),
                op = last.pop();
            return items.concat([last.join(' '), op]);
        });
    
    console.log(r);
    

    Mind explaining the logic u used... I mean im reading the code but i was just curious if you could say it better

    The logic is quite simple actually, perhaps too simple:

    var s = 'Turn my kitchen lights on and my bedroom lights on and living room lights off and my test and another test off',
        r = s
            .replace(/^Turn|\s*my/g, '') //remove noisy words
            .match(/.+? (on|off)/g) //capture all groups of [some things][on|off]
            //for each of those groups, generate a new array from the returned results
            .map(function(item) {
                var items = item.trim()
                        .replace(/^and\s*/, '') //remove and[space] at the beginning of string
                        //split on and to get all things, for instance if we have
                        //test and another test off, we want ['test', 'another test off']
                        .split(/\s*and\s*/),
                    //split the last item on spaces, with previous example we would get
                    //['another', 'test', 'off']
                    last = items.pop().split(' '),
                    op = last.pop(); //on/off will always be the last item in the array, pop it
                //items now contains ['test'], concatenate with the array passed as argument
                return items.concat(
                    [
                        //last is ['another', 'test'], rejoin it together to give 'another test'
                        last.join(' '),
                        op //this is the operation
                    ]
                );
            });
    

    EDIT: At the time I posted the answer, I haven't realized how complex and flexible you needed this to be. The solution I provided would only work for sentences structured as in my example, with identifiable noisy words and a specific command order. For something more complex, you will have no other choice but to create a parser like @SpaceDog suggested. I will try to come up with something as soon as I have enough time.

    0 讨论(0)
  • 2021-02-06 17:55

    I have been working on parsing menus and recipes (not finished) and this is my approach:

    • find the sentences separators (I use AND as well as others)
    • parse each sentence to find the key words that you need (light/bulbs/etc.., on/off)
    • if you have a limited set of places (kitchen, bathroom, etc...)
      • search for those keywords, remove the others
      • ELSE
      • remove the extra words that some people might use (bright, colorful, etc...)
    • store it into an array, something that might look like this:
      • what
      • where
    • if you do not have one of the fields, leave it blank
    • for each result check what you have and if you have a blank field fill it up with the previous parsing

    I.E.: Turn the lights on in the bedroom and in the kitchen

    • 1:
      • turn the light on in the bedroom
      • what: lights on
      • where: bedroom
    • 2:
      • in the kitchen
      • what:
      • where: kitchen

    what_2 is empty, then what_2 is lights on

    keep in mind that sometime needs to fill up the array with the next results (depending on how the sentence is structured, but it is rare), I add a "+" or "-" to it so I know if I have to go forward or backwards to find the missing parts while parsing it

    0 讨论(0)
提交回复
热议问题