[Israel.pm] Case Study: Splitting a Hash into Two

Shlomi Fish shlomif at iglu.org.il
Wed Jul 14 11:48:30 PDT 2004


Today I hanged around FreeNode's #perl channel. Someon asked for a way to 
generate two hashes out of one, with one containing only the keys that start 
with underscore, and the other the rest. After some thinking and planning I 
came out with this code:

sub split_hash
    my $h = shift;
    # us is underscore not the United States.
    my (%us_hash, %non_us_hash);
    while (my ($k, $v) = each(%$h))
        (($k =~ /^_/o) ? \%us_hash : \%non_us_hash)->{$k} = $v;
    return (\%us_hash, \%non_us_hash);

(tested, BTW).

Notice the use of the ? : and the references. I first thought of using if and 
$us_hash{$k} = $v; and $non_us_hash{$k} = $v; but then thought it had too 
much duplicate code, and came with this solution.

Then I thought to myself if this can be done in Functional Programming as 
well. So I set out to write a functional function:

sub split_hash_functional
    my $h = shift;
    my @keys = keys(%$h);

    my $helper;

    $helper = sub {
        my ($rest_of_keys, @subsets) = (@_);
        if (scalar(@$rest_of_keys) == 0)
            return (@subsets);
            my (@subsets_rec) = $helper->([ @{$rest_of_keys}[1..
$#$rest_of_keys] ], @subsets);
            my $k = $rest_of_keys->[0];
            return (
                map { 
                        (($_ == (($k =~ /^_/o) ? 0 : 1)) ?
                            ( $k => $h->{$k}) :
                        ), %{$subsets_rec[$_]}

    return $helper->([@keys], +{}, +{});

(Note that whatever assignment exists there, is just after my and done once 
for every variable/function scope combination, so it doesn't count as a 
violation of the FP principles).

Now, this is somewhat more hideous, and less legible, and also longer. Maybe 
the in-expression indentation made it seems like this.

To prove that this algorithm was indeed functional I implemented it in 

type Key = String
type Value = String
type AssocPair = (Key,Value)
type AssocList = [AssocPair]

assoc_key (key,value) = key
assoc_value (key,value) = value

find_group :: Key -> Int
find_group "" = 1
find_group ('_':rest) = 0
find_group (a:rest) = 1

split_assoc_list :: AssocList -> [AssocList]
split_assoc_list [] = [[],[]]
split_assoc_list (a:rest) =
    assign_list verdict (split_assoc_list rest)
        verdict = find_group (assoc_key a)
        assign_list :: Int -> [AssocList] -> [AssocList]
        assign_list i [] = []
        assign_list 0 (b:bs) = ((a:b):bs)
        assign_list i (b:bs) = (b:assign_list (i-1) bs)

orig_hash :: AssocList
orig_hash =  [("key1" , "world"),("_key2" , "Yes"),
    ("foo" , "fooish"),
    ("_bar" , "baz"),
    ("_quux" , "Temperament"),
    ("lambda" , "Hello"),
    ("one" , "fish")]

result = (split_assoc_list orig_hash)

Most of the code is not really needed, but it served for debugging purposes. 
While writing this code I ran into a lot of strong typing gotchas, but I was 
able to implement it eventually.

Now, I realize that my Haskell code tends to be somewhat explicit, and by 
using the built-in functions one can sometimes make the job easier. I asked 
William Lee Irwin III, who I know to be a Haskell expert on the IRC (on a 
different channel and network), how to do it, and he gave me several ways. 
The most elegant is:

split_assoc_list = map (map snd)
    . groupBy (\ (x,_) (y,_) -> x == y)
    . sortBy (\ (x,_) (y,_) -> x `compare` y)
    . map (\s -> ("_" `isPrefixOf` (fst s), s))

It took me some time to understand it. Note that this function uses a sort 
routine so it is O(N*log(N)) instead of O(N). WLI wrote some routines that 
are linear time, but they are longer and the code is somewhat uglier. 

That's it for today folks. Now go back to work.


	Shlomi Fish

Shlomi Fish      shlomif at iglu.org.il
Homepage:        http://shlomif.il.eu.org/

Knuth is not God! It took him two days to build the Roman Empire.

More information about the Perl mailing list