[Israel.pm] Parsing complex query syntax

Mikhael Goikhman migo at homemail.com
Wed Feb 4 13:32:13 PST 2009


On 04 Feb 2009 17:56:08 +0200, Chanan Berler wrote:
> 
> I need to extract the 'and ' 'or' and 'not' from the statement, still
> keep the order (since I need to rebuild the string into sql statement)
> Problem: the Service/ServiceGroup or Host/HosName can have the word
> 'and' / 'or' / 'not' inside
> 
> Str = "
>     HostGroup='generic-hostgroup' and Attempt='5' and
>     ServiceGroup='Sample Service View' or Service='WINDOWS CPU' and
>     State='Critical' and REGEX=(| and Mem: (.*?)%|,=,80)
> "
> 
> Does anyone has any idea ? maybe just the start..

I may suggest to use quotewords from Text::ParseWords (that is part of
perl distribution). And I think you would better use consistent single
(or double) quotes for REGEX as well, parentheses look inconsistent here.

In Podius::Query I solve pretty much the same problem.

The syntax is actually a bit different in that there is no 'or' (actually
there is instead support for "or" in names and values of sub-conditions
using '|') and there are readable English alternatives for all operators.
I.e. the syntax has 'and/&&', 'not/!' and different operators: binary
'NE/!=', 'LIKE/=~'; unary 'DEFINED/++', 'FALSE/-' and more. However the
idea and implementation would be the same. (In your syntax, you should
first divide by 'or', then by 'and' and finally extract optional 'not'.)

So, to parse queries like:

  real_name = "Chanan Berler" && age > 20 && remarks|title =~ 'some regexp'

something like this code may be used:

    my @subconds = quotewords('(?i)\s*(?:&&|\band\b)\s*', 1, $cond);

    my @subcond_entries = map {
        s/^\s+//;
        s/\s+$//;
        my $not = s/^!\s*//;

        # loop over all operators to find $name ('author') and $op ('=')
        [much of syntax-specific code skipped]

        my @args = quotewords('\s*[|,;]\s*', 0, $_);
        die if "operator requires 1 args" && @args != 1;

        [ $names, ($not ? $opposite_ops{$op} : $op), \@args ],
    } @subconds;

I hope you got the idea. The whole code for parsing this syntax is
web-cached (until podius.wox.org is offline for several days) here:

  http://www.google.com/search?q=cache:VieFCNaN8ooJ:podius.wox.org/mainline/perllib/Podius/Query.pm

Regards,
Mikhael.

-- 
perl -e 'print+chr(64+hex)for+split//,d9b815c07f9b8d1e'



More information about the Perl mailing list