[Israel.pm] Removing meta characters (^[[1m and ^[[0m) from a file
oron at actcom.co.il
Sun Jun 27 12:52:27 PDT 2010
First a short answer to Guy -- 'col -b' won't help because it removes the
meta sequences of backspaces (that's what the 'b' stands for) used in
Unix/Linux manuals (cat pages FWIW). These sequences were used as a
neutral format that is later translated to terminal specific escape
sequences by programs such as more/less etc.
On Sunday, 27 בJune 2010 12:08:09 Erez David wrote:
> s/\e[\m//g does't do the job. since the first [ is not a real character
> it is a meta character...
Not exactly. The sequences presented are part of ANSI standard of escape
sequences used to highlight text (bold/underline/etc) on terminals.
So we don't talk about a special character, but a special *strings*.
All these sequences have a common form. The simplest format is:
The ASCII code of ESC (decimal 27 as mentioned by someone else here) is
commonly written as ^[ (control+left bracket) because this is actually
the ASCII number of this character.
BTW: if the escape key in your keyboard is broken, you can use control+left
bracket as a substitute because it is really the same character.
The result of this is that when writing the sequence as text, it is often
presented as: ^[[2m
But note that the first bracket is part of "control+bracket" which simply
means the escape character, and the second bracket is the real '['
character which is part of the sequence (the length of the example above
is exactly 4 characters)
Now you should see the problem with your regex -- the '[' in a regex means
open a character class.... so it is special character for regex.
A correct regex should be:
But this also has an error because it's greedy. Let's fix it:
This covers all the simple cases. However, ANSI allows for more complex
sequences that specify two numbers (e.g: two colors) for background
and forground. E.g:
So let's try to generalize:
Hopefully, this will cover all cases (not tested).
> On Sun, Jun 27, 2010 at 12:02 PM, Shlomi Fish <shlomif at iglu.org.il> wrote:
> > On Sunday 27 Jun 2010 11:27:02 Erez David wrote:
> > > Hi,
> > >
> > > I am reading a file which has some meta characters in it.
> > > This meta characters are: ^[[1m and ^[[0m which are used to bold some
> > text
> > > out.
> > >
> > > I am looking for the best way to remove this meta characters from the
> > file
> > > before I parse it. (Whether remove it by regex or any other way...)
> > >
> > You can use a regex. Untested:
> > s/\e[\m//g
> > Regards,
> > Shlomi Fish
> > > Thanks
> > >
> > > Erez
> > --
> > -----------------------------------------------------------------
> > Shlomi Fish http://www.shlomifish.org/
> > Funny Anti-Terrorism Story - http://shlom.in/enemy
> > God considered inflicting XSLT as the tenth plague of Egypt, but then
> > decided against it because he thought it would be too evil.
> > Please reply to list if it's a mailing list post - http://shlom.in/reply .
More information about the Perl