[Israel.pm] Removing meta characters (^[[1m and ^[[0m) from a file

Oron Peled oron at actcom.co.il
Sun Jun 27 12:52:27 PDT 2010


Hi,

First a short answer to Guy -- 'col -b' won't help because it removes the
meta sequences of backspaces (that's what the 'b' stands for) used in
Unix/Linux manuals (cat pages FWIW). These sequences were used as a
neutral format that is later translated to terminal specific escape
sequences by programs such as more/less etc.

On Sunday, 27 בJune 2010 12:08:09 Erez David wrote:
> s/\e[\[01]m//g  does't do the job. since the first [ is not a real character
> it is a meta character...

Not exactly. The sequences presented are part of ANSI standard of escape
sequences used to highlight text (bold/underline/etc) on terminals.
So we don't talk about a special character, but a special *strings*.

All these sequences have a common form. The simplest format is:

 <ESCAPE>[<numeric_code>m

The ASCII code of ESC (decimal 27 as mentioned by someone else here) is
commonly written as ^[ (control+left bracket) because this is actually
the ASCII number of this character.

BTW: if the escape key in your keyboard is broken, you can use control+left 
     bracket as a substitute because it is really the same character.

The result of this is that when writing the sequence as text, it is often
presented as: ^[[2m

But note that the first bracket is part of "control+bracket" which simply
means the escape character, and the second bracket is the real '['
character which is part of the sequence (the length of the example above
is exactly 4 characters)

Now you should see the problem with your regex -- the '[' in a regex means
open a character class.... so it is special character for regex.
A correct regex should be:
    s/\e\[\d+m//
But this also has an error because it's greedy. Let's fix it:
    s/\e\[\d+?m//

This covers all the simple cases. However, ANSI allows for more complex
sequences that specify two numbers (e.g: two colors) for background
and forground. E.g:
     ^[[32;45m
So let's try to generalize:
     s/\e\[\d+?(;\d+?)?m//

Hopefully, this will cover all cases (not tested).

Have fun.

> 
> On Sun, Jun 27, 2010 at 12:02 PM, Shlomi Fish <shlomif at iglu.org.il> wrote:
> 
> > On Sunday 27 Jun 2010 11:27:02 Erez David wrote:
> > > Hi,
> > >
> > > I am reading a file which has some meta characters in it.
> > > This meta characters are: ^[[1m and ^[[0m which are used to bold some
> > text
> > > out.
> > >
> > > I am looking for the best way to remove this meta characters from the
> > file
> > > before I parse it. (Whether remove it by regex or any other way...)
> > >
> >
> > You can use a regex. Untested:
> >
> > s/\e[\[01]m//g
> >
> > Regards,
> >
> >        Shlomi Fish
> >
> > > Thanks
> > >
> > > Erez
> >
> > --
> > -----------------------------------------------------------------
> > Shlomi Fish       http://www.shlomifish.org/
> > Funny Anti-Terrorism Story - http://shlom.in/enemy
> >
> > God considered inflicting XSLT as the tenth plague of Egypt, but then
> > decided against it because he thought it would be too evil.
> >
> > Please reply to list if it's a mailing list post - http://shlom.in/reply .
> >
> 


More information about the Perl mailing list