[Israel.pm] patterns findings

sawyer x xsawyerx at gmail.com
Wed Mar 18 10:48:18 PDT 2009

We all suggested things here and they are all very nice and good
suggestions but I think that in the end they aren't the right way to
approach this.

Basically, you have to define a set of patterns or a common element
for these patterns in some format.
There are virtually endless (or a seriously large) number of
possibilities here and it could easily fill up your memory, crash the
program and shoot the kernel's young daughter along the way. Simply
put, you trying to bruteforce your way here and it's not only risky,
problematic and difficult, but rather unlikely that you'll be able to
get what you need - in any language, might I add.

This is just a matter of trying to solve the wrong problem. Your
problem is the MySQL bottleneck. I would start by focusing on that.
There are (as Gabor kindly suggested) proper tools to evaluate your
queries. There are MySQL free tools, there is also private consulting
by MySQL experts (including but not limited to MySQL development team
themselves - very cool people by the way) since it's such a serious
business - the business of optimized queries.

If you figure "hell, I got Perl, I could try to check some stuff with
that", that's a good idea, but should be implemented from a different
angle. Try to catch whole queries and class them by time frames. Check
which queries were run when the services started having speed
problems. You could narrow it pretty quickly by taking those suspected
queries and running them in a loop and timing that to see which take
the most RAM/CPU. You could add a monitoring service to MySQL to check
the process load on it and then which processes are waiting for a long
time (it will actually show you which query is waiting and for how

I don't think anyone here can provide you with exactly what you need
because not only does it not exist, but it does not exist for a good
reason! At least that's the way I see.

Good luck with it.

On Wed, Mar 18, 2009 at 7:20 PM, Evgeny <evgeny.zislis at gmail.com> wrote:
> Another approach might be to write a log analyzer using some kind of
> parser generator, it can create statistics regarding which patterns
> are the most common and are canditates for optimization.
> It might take a couple of days to write, but something simple can be
> done with Parser::RecDescent or Parser::Yapp. And it can also be used
> for more in-depth log analysis when enhanced.
> - evgeny
> On Wed, Mar 18, 2009 at 5:17 PM, Avishalom Shalit <avishalom at gmail.com> wrote:
>> if you want a generalized pattern matcher (simple generalizations may
>> include white spaces, complex ones may include "number" patterns or
>> "Last Name" patterns)
>> i do not think a module is there just for you,
>> what i would recommend is to start going over it manually, creating
>> your own list of patterns.
>> (in which case enumerate them, and count re occurrences in an array. )
>> and once you add something to the list, grep for the lines that don't
>> match any of your patterns to see what you have left.
>> i would guess that pretty quickly you would cover over 90% of your logs.
>> then it would get slower, perhaps never reaching 100%.
>> in one word -
>> manually
>> 2009/3/18 Gabor Szabo <szabgab at gmail.com>:
>>> 2009/3/18 Yossi Itzkovich <Yossi.Itzkovich at ecitele.com>:
>>>> Gabor,
>>>> The problem is that I don't know the patterns - I want the script to find.
>>>> Let me explain the need:
>>>> We have a big tracing log of SQL queries to DB. We want to analyze it and find if there are repeating sequences of same  queries, and optimize them (make one big query, or change application code).
>>> Well, if you give a real example instead of an abstract one then you
>>> can get more help.
>>> I guess there are tools out there to analyze such log if that is a standard log.
>>> So you could either show us a few lines of the real log file or tell
>>> us what tool
>>> produced it. Is it the logging message of DBI ?
>>> If you really have no clue of what a repeating string can look like
>>> then just go with
>>> if ($str =~ /(.+).*\1/) {
>>>    print $1;
>>> }
>>> or better yet tell the thing you want the repeating string to be at
>>> least 10 characters long:
>>> if ($str =~ /(.{10,}).*\1/) {
>>>    print $1;
>>> }
>>> Gabor
>>> _______________________________________________
>>> Perl mailing list
>>> Perl at perl.org.il
>>> http://mail.perl.org.il/mailman/listinfo/perl
>> --
>> -- vish
>> _______________________________________________
>> Perl mailing list
>> Perl at perl.org.il
>> http://mail.perl.org.il/mailman/listinfo/perl
> _______________________________________________
> Perl mailing list
> Perl at perl.org.il
> http://mail.perl.org.il/mailman/listinfo/perl

More information about the Perl mailing list