[Israel.pm] Case Study: Splitting a Hash into Two
Shlomi Fish
shlomif at iglu.org.il
Wed Jul 14 11:48:30 PDT 2004
Hi!
Today I hanged around FreeNode's #perl channel. Someon asked for a way to
generate two hashes out of one, with one containing only the keys that start
with underscore, and the other the rest. After some thinking and planning I
came out with this code:
<<<
sub split_hash
{
my $h = shift;
# us is underscore not the United States.
my (%us_hash, %non_us_hash);
while (my ($k, $v) = each(%$h))
{
(($k =~ /^_/o) ? \%us_hash : \%non_us_hash)->{$k} = $v;
}
return (\%us_hash, \%non_us_hash);
}
>>>
(tested, BTW).
Notice the use of the ? : and the references. I first thought of using if and
$us_hash{$k} = $v; and $non_us_hash{$k} = $v; but then thought it had too
much duplicate code, and came with this solution.
Then I thought to myself if this can be done in Functional Programming as
well. So I set out to write a functional function:
<<<
sub split_hash_functional
{
my $h = shift;
my @keys = keys(%$h);
my $helper;
$helper = sub {
my ($rest_of_keys, @subsets) = (@_);
if (scalar(@$rest_of_keys) == 0)
{
return (@subsets);
}
else
{
my (@subsets_rec) = $helper->([ @{$rest_of_keys}[1..
$#$rest_of_keys] ], @subsets);
my $k = $rest_of_keys->[0];
return (
map {
+{
(($_ == (($k =~ /^_/o) ? 0 : 1)) ?
( $k => $h->{$k}) :
()
), %{$subsets_rec[$_]}
}
}
(0..1)
);
}
};
return $helper->([@keys], +{}, +{});
}
>>>
(Note that whatever assignment exists there, is just after my and done once
for every variable/function scope combination, so it doesn't count as a
violation of the FP principles).
Now, this is somewhat more hideous, and less legible, and also longer. Maybe
the in-expression indentation made it seems like this.
To prove that this algorithm was indeed functional I implemented it in
Haskell:
<<<
type Key = String
type Value = String
type AssocPair = (Key,Value)
type AssocList = [AssocPair]
assoc_key (key,value) = key
assoc_value (key,value) = value
find_group :: Key -> Int
find_group "" = 1
find_group ('_':rest) = 0
find_group (a:rest) = 1
split_assoc_list :: AssocList -> [AssocList]
split_assoc_list [] = [[],[]]
split_assoc_list (a:rest) =
assign_list verdict (split_assoc_list rest)
where
verdict = find_group (assoc_key a)
assign_list :: Int -> [AssocList] -> [AssocList]
assign_list i [] = []
assign_list 0 (b:bs) = ((a:b):bs)
assign_list i (b:bs) = (b:assign_list (i-1) bs)
orig_hash :: AssocList
orig_hash = [("key1" , "world"),("_key2" , "Yes"),
("foo" , "fooish"),
("_bar" , "baz"),
("_quux" , "Temperament"),
("lambda" , "Hello"),
("one" , "fish")]
result = (split_assoc_list orig_hash)
>>>
Most of the code is not really needed, but it served for debugging purposes.
While writing this code I ran into a lot of strong typing gotchas, but I was
able to implement it eventually.
Now, I realize that my Haskell code tends to be somewhat explicit, and by
using the built-in functions one can sometimes make the job easier. I asked
William Lee Irwin III, who I know to be a Haskell expert on the IRC (on a
different channel and network), how to do it, and he gave me several ways.
The most elegant is:
<<<
split_assoc_list = map (map snd)
. groupBy (\ (x,_) (y,_) -> x == y)
. sortBy (\ (x,_) (y,_) -> x `compare` y)
. map (\s -> ("_" `isPrefixOf` (fst s), s))
>>>
It took me some time to understand it. Note that this function uses a sort
routine so it is O(N*log(N)) instead of O(N). WLI wrote some routines that
are linear time, but they are longer and the code is somewhat uglier.
That's it for today folks. Now go back to work.
Regards,
Shlomi Fish
--
---------------------------------------------------------------------
Shlomi Fish shlomif at iglu.org.il
Homepage: http://shlomif.il.eu.org/
Knuth is not God! It took him two days to build the Roman Empire.
More information about the Perl
mailing list