[Israel.pm] RegEx in HTML Character

Yona Shlomo yona at cs.technion.ac.il
Mon Jan 28 21:00:40 PST 2008

On Mon, 28 Jan 2008, Georges EL OJAIMI wrote:

> Hello,
> Yona Shlomo wrote:
>> How does the following help prevent HTML characters and SQL
>> injection into the database?

Can you answer this question? How does this transformation
of yours help prevent SQL injections?

>>> [b]bold[/b]
>>> [i]italic[/i]
>>> [u]underline[/u]
>>> [url=http://www.url.com]url[/url]
>>> I want to replace each tag on the fly by its real HTML tag while
>>> displaying it to the end user.
>>> Is there a way to replace all these tags by there equivalents? I am
>>> having problem detecting the brackets []
> I will remove all escape characters except these ones. example:
> /<[//]{0,1}(B|b)[^><]*>/g by dynamically passing all the needed tags.
>> Can you guarantee that square brackets are only used as your
>> markup?
>> Your is the [url=....] the equevalent to the HTML <a href=...> ?
> Yes, it is

You can try the following hack, but it is risky:

s,<url(="[^"]+")>([^<]+)</url>,<a href=\1>\2</a>,g

See, the above regular expressions do not try to balance
your markup's open and close tags, nor are aware of
whitespace issues, quotations and escaping.

Shlomo Yona
yona at cs.technion.ac.il

