[Israel.pm] my new module on CPAN

Gabor Szabo gabor at szabgab.com
Mon Mar 5 22:16:38 PST 2012

Hi Pinkhas,

On Sat, Mar 3, 2012 at 8:31 PM, Pinkhas Nisanov <pinkhas at nisanov.com> wrote:
> Hi,
> It happen to me again, I found another algorithm
> that implemented with j language and not with perl.
> This algorithm is based on matrix operations and
> I thought that perl with its PDL can be perfect
> platform to implement it.
> It's "Markov Clustering Algorithm" (MCL), scalable
> and fast algorithm for finding clusters in graph.
> It's used in bioinformatics and data retrival systems.
> Actually it could be good everywhere you need
> clustering.
> so I did it, there is new module: Algorithm::MCL.
> http://search.cpan.org/~pinkhasn/Algorithm-MCL-0.003/lib/Algorithm/MCL.pm
> It should be very fast because PDL usage.
> I still did not test it on large graphs because
> I have no such data (may be someone can help here).
> Please, check it. It's my first module on CPAN
> so any comments are welcome.

Congratulations :)

Not that I understand what it should do, but I installed and tried it.
I guess people who know what Markov Cluster Algorithm is should know
what is this.

Some minor comments:

What is unclear from the SYNOPSIS is what is "MyClass" in there
(and if you are using that already, I'd recommend MyClass->new
and not the indirect notation of new MyClass.

If you indent out the ##### in the synopsis you'll get two separate examples in
and then you'd need to add use Algorithm::MCL; to the second one as well.

Why do you need to use scalar references there?

My feeling is that the example should have the original data in an array of
pairs that would be passed to the addEdge method.

In the docs I'd link to PDL  with   L<PDL>
and it seems it need a bit more documentation.

You can tell in the Makefile.PL where is your public version control
system for this module.
Having one helps getting patches.

I looked at the tests too:

ok(1/2 == $matrix1->at(1, 1), "stochastic 1");
could be better written as
is($matrix1->at(1, 1), 1/2, "stochastic 1");

ok(includeVertex($cluster1, $val4) > 0, "vertex is not in cluster - 1");
could be better written as
cmp_ok(includeVertex($cluster1, $val4), '>', 0, "vertex is not in cluster - 1");

You could randomly generate a big data set and check if it does not crash,
does not leak memory and if it works in a reasonable time. To some value of
reasonable. Without actually checking correctness.

Finally, I think I'd ask on the PDL mailing list. They probably have a
lot more insight in this.

I hope some of these will help!


More information about the Perl mailing list