[Israel.pm] Removing meta characters (^[[1m and ^[[0m) from a file

Erez David introx at gmail.com
Sun Jun 27 21:02:36 PDT 2010


Thanks Oron, it works!

On Sun, Jun 27, 2010 at 10:52 PM, Oron Peled <oron at actcom.co.il> wrote:

> Hi,
>
> First a short answer to Guy -- 'col -b' won't help because it removes the
> meta sequences of backspaces (that's what the 'b' stands for) used in
> Unix/Linux manuals (cat pages FWIW). These sequences were used as a
> neutral format that is later translated to terminal specific escape
> sequences by programs such as more/less etc.
>
> On Sunday, 27 בJune 2010 12:08:09 Erez David wrote:
> > s/\e[\[01]m//g  does't do the job. since the first [ is not a real
> character
> > it is a meta character...
>
> Not exactly. The sequences presented are part of ANSI standard of escape
> sequences used to highlight text (bold/underline/etc) on terminals.
> So we don't talk about a special character, but a special *strings*.
>
> All these sequences have a common form. The simplest format is:
>
>  <ESCAPE>[<numeric_code>m
>
> The ASCII code of ESC (decimal 27 as mentioned by someone else here) is
> commonly written as ^[ (control+left bracket) because this is actually
> the ASCII number of this character.
>
> BTW: if the escape key in your keyboard is broken, you can use control+left
>     bracket as a substitute because it is really the same character.
>
> The result of this is that when writing the sequence as text, it is often
> presented as: ^[[2m
>
> But note that the first bracket is part of "control+bracket" which simply
> means the escape character, and the second bracket is the real '['
> character which is part of the sequence (the length of the example above
> is exactly 4 characters)
>
> Now you should see the problem with your regex -- the '[' in a regex means
> open a character class.... so it is special character for regex.
> A correct regex should be:
>    s/\e\[\d+m//
> But this also has an error because it's greedy. Let's fix it:
>    s/\e\[\d+?m//
>
> This covers all the simple cases. However, ANSI allows for more complex
> sequences that specify two numbers (e.g: two colors) for background
> and forground. E.g:
>     ^[[32;45m
> So let's try to generalize:
>     s/\e\[\d+?(;\d+?)?m//
>
> Hopefully, this will cover all cases (not tested).
>
> Have fun.
>
> >
> > On Sun, Jun 27, 2010 at 12:02 PM, Shlomi Fish <shlomif at iglu.org.il>
> wrote:
> >
> > > On Sunday 27 Jun 2010 11:27:02 Erez David wrote:
> > > > Hi,
> > > >
> > > > I am reading a file which has some meta characters in it.
> > > > This meta characters are: ^[[1m and ^[[0m which are used to bold some
> > > text
> > > > out.
> > > >
> > > > I am looking for the best way to remove this meta characters from the
> > > file
> > > > before I parse it. (Whether remove it by regex or any other way...)
> > > >
> > >
> > > You can use a regex. Untested:
> > >
> > > s/\e[\[01]m//g
> > >
> > > Regards,
> > >
> > >        Shlomi Fish
> > >
> > > > Thanks
> > > >
> > > > Erez
> > >
> > > --
> > > -----------------------------------------------------------------
> > > Shlomi Fish       http://www.shlomifish.org/
> > > Funny Anti-Terrorism Story - http://shlom.in/enemy
> > >
> > > God considered inflicting XSLT as the tenth plague of Egypt, but then
> > > decided against it because he thought it would be too evil.
> > >
> > > Please reply to list if it's a mailing list post -
> http://shlom.in/reply .
> > >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.perl.org.il/pipermail/perl/attachments/20100628/47a78028/attachment.htm 


More information about the Perl mailing list