Best practices on handling user-generated HTML

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Best practices on handling user-generated HTML

Ari Najarian
Hi all,

  This is a fairly common question on LassoTalk, and I've read through the previous threads that address my question. However, I'm hoping to solicit some insight on the best approach to processing HTML input to mitigate script injection attacks, because I still don't have a definitive answer.

  In my opinion, regular expressions are a dumpster fire, and will never be able to effectively weed out all the different string permutations that could conceal malicious code. So this approach doesn't seem feasible for me, because it's security theatre whack-a-mole.

  A more effective approach might be XML tree traversal, which would allow me to specify a whitelist of tags and attributes on the first pass, perhaps combining this with regular expressions on the second pass to validate the remaining attributes. This seems like a better approach, but the first rule of programming is "don't", so I'm wondering if anybody else out there has already written this code before. I'd be shocked if I'm the first, and if so, then I'd be happy to share what I write.

  But is this even the best approach? Maybe instead of even allowing users to submit HTML, I configure my rich text editor to use a different markup format, like Markdown. That way, I mitigate the risk of malicious HTML, since whatever input the user supplies would be run through a parser that then generates HTML. A quick search revealed that Jono has already started a Markdown parser at https://github.com/iamjono/markdown , so I wouldn't be violating the first rule of programming. This also enforces a limited subset of HTML tags, which might be more predictable when it's time to render into a template.

  Is there a fourth approach that's better? Is there a community consensus about which approach is the most sensible? Are there tools, libraries or Lasso tags I don't know about that solve this problem? It seems like Lasso's gigantic standard library unfortunately lacks an HTML sanitization method. I'm basically looking for something like CodeIgniter's security->xss_clean($string) method, but without having to debase myself by using PHP.

  Any and all comments, pointers and insight would be appreciated!

  Ari.

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Best practices on handling user-generated HTML

Brad Lindsay
Ari,

Be careful about using Markdown since HTML is completely valid inside
of Markdown and therefore won’t be escaped.

I’ve used two approaches when I’ve needed to do this:

1. Whitelist of tags that can’t have any attributes using regex

2. I’ve also used CKEditor http://ckeditor.com


HTH,
Brad



On June 29, 2016 at 2:09:31 PM, Ari Najarian ([hidden email]) wrote:

> Hi all,
>
> This is a fairly common question on LassoTalk, and I've read through the previous threads
> that address my question. However, I'm hoping to solicit some insight on the best approach
> to processing HTML input to mitigate script injection attacks, because I still don't
> have a definitive answer.
>
> In my opinion, regular expressions are a dumpster fire, and will never be able to effectively
> weed out all the different string permutations that could conceal malicious code. So
> this approach doesn't seem feasible for me, because it's security theatre whack-a-mole.
>
> A more effective approach might be XML tree traversal, which would allow me to specify
> a whitelist of tags and attributes on the first pass, perhaps combining this with regular
> expressions on the second pass to validate the remaining attributes. This seems like
> a better approach, but the first rule of programming is "don't", so I'm wondering if anybody
> else out there has already written this code before. I'd be shocked if I'm the first, and
> if so, then I'd be happy to share what I write.
>
> But is this even the best approach? Maybe instead of even allowing users to submit HTML,
> I configure my rich text editor to use a different markup format, like Markdown. That
> way, I mitigate the risk of malicious HTML, since whatever input the user supplies would
> be run through a parser that then generates HTML. A quick search revealed that Jono has
> already started a Markdown parser at https://github.com/iamjono/markdown , so I wouldn't
> be violating the first rule of programming. This also enforces a limited subset of HTML
> tags, which might be more predictable when it's time to render into a template.
>
> Is there a fourth approach that's better? Is there a community consensus about which
> approach is the most sensible? Are there tools, libraries or Lasso tags I don't know about
> that solve this problem? It seems like Lasso's gigantic standard library unfortunately
> lacks an HTML sanitization method. I'm basically looking for something like CodeIgniter's
> security->xss_clean($string) method, but without having to debase myself by using
> PHP.
>
> Any and all comments, pointers and insight would be appreciated!
>
> Ari.
>
> #############################################################
>
> This message is sent to you because you are subscribed to
> the mailing list Lasso [hidden email]
> Official list archives available at http://www.lassotalk.com
> To unsubscribe, E-mail to:
> Send administrative queries to
>

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Best practices on handling user-generated HTML

Marc Vos-3
In reply to this post by Ari Najarian
There are two tags that strip HTML:

http://www.lassosoft.com/tagswap/detail/mv_stripHTML

http://www.lassosoft.com/tagswap/detail/string_removehtml

Maybe these give you a starting point.

- -
Marc


On 29-06-2016 20:09, "Ari Najarian" <[hidden email] on behalf of [hidden email]> wrote:

Hi all,

  This is a fairly common question on LassoTalk, and I've read through the previous threads that address my question. However, I'm hoping to solicit some insight on the best approach to processing HTML input to mitigate script injection attacks, because I still don't have a definitive answer.

  In my opinion, regular expressions are a dumpster fire, and will never be able to effectively weed out all the different string permutations that could conceal malicious code. So this approach doesn't seem feasible for me, because it's security theatre whack-a-mole.

  A more effective approach might be XML tree traversal, which would allow me to specify a whitelist of tags and attributes on the first pass, perhaps combining this with regular expressions on the second pass to validate the remaining attributes. This seems like a better approach, but the first rule of programming is "don't", so I'm wondering if anybody else out there has already written this code before. I'd be shocked if I'm the first, and if so, then I'd be happy to share what I write.

  But is this even the best approach? Maybe instead of even allowing users to submit HTML, I configure my rich text editor to use a different markup format, like Markdown. That way, I mitigate the risk of malicious HTML, since whatever input the user supplies would be run through a parser that then generates HTML. A quick search revealed that Jono has already started a Markdown parser at https://github.com/iamjono/markdown , so I wouldn't be violating the first rule of programming. This also enforces a limited subset of HTML tags, which might be more predictable when it's time to render into a template.

  Is there a fourth approach that's better? Is there a community consensus about which approach is the most sensible? Are there tools, libraries or Lasso tags I don't know about that solve this problem? It seems like Lasso's gigantic standard library unfortunately lacks an HTML sanitization method. I'm basically looking for something like CodeIgniter's security->xss_clean($string) method, but without having to debase myself by using PHP.

  Any and all comments, pointers and insight would be appreciated!

  Ari.

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>




#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Best practices on handling user-generated HTML

Jolle Carlestam-2
In reply to this post by Ari Najarian
I doubt there’s a safe method to allow any user to submit HTML and clean it up. The CodeIgniter's security->xss_clean you refer to is definitely not it. Seems to have a very bad reputation.

My approach would be one of two.
1/ Trust the user. Is what I do in my present setup. Only authorized users with certain privileges are allowed to submit HTML. I use CKEditor for it.
2/ Use another markup language. In the Lasso 8 days I enhanced something that Fletcher came up with (LassoWiki) that was an alternate markup turned into HTML on the server.
The announcement is here: http://www.lassotalk.com/Ann-WYSIWYG-editor-for-LassoWiki-markup.lasso?234922

I suppose I can dig up that code. But, it is Lasso 8 and I have no plans on making it Lasso 9 for the time being.

HDB
Jolle

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Best practices on handling user-generated HTML

Jolle Carlestam-2
30 juni 2016 kl. 09:45 skrev Jolle Carlestam <[hidden email]>:
>
> 2/ Use another markup language. In the Lasso 8 days I enhanced something that Fletcher came up with (LassoWiki) that was an alternate markup turned into HTML on the server.
> The announcement is here: http://www.lassotalk.com/Ann-WYSIWYG-editor-for-LassoWiki-markup.lasso?234922
>
> I suppose I can dig up that code. But, it is Lasso 8 and I have no plans on making it Lasso 9 for the time being.

I did find it. Looks like there’s some thinking in there that’s still valid. On the client side it was an extension of FCKEditor. Todays replacement would be CKEditor, but to do that both the client side code and the server side code would need to be rewritten.

CKEditor has a BBcode plugin that could be used, as is or as an inspiration.
http://ckeditor.com/addon/bbcode

Bottom line argument for not using HTML. If you can’t trust the user then there’s no way you can trust HTML supplied by the user. And there’s no safe parsing method available to change that.

HDB
Jolle

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Best practices on handling user-generated HTML

jasonhuck
This can also be a useful part of your toolkit for dealing with HTML:

http://www.html-tidy.org/



On Thu, Jun 30, 2016 at 5:35 AM, Jolle Carlestam <[hidden email]>
wrote:

> 30 juni 2016 kl. 09:45 skrev Jolle Carlestam <[hidden email]>:
> >
> > 2/ Use another markup language. In the Lasso 8 days I enhanced something
> that Fletcher came up with (LassoWiki) that was an alternate markup turned
> into HTML on the server.
> > The announcement is here:
> http://www.lassotalk.com/Ann-WYSIWYG-editor-for-LassoWiki-markup.lasso?234922
> >
> > I suppose I can dig up that code. But, it is Lasso 8 and I have no plans
> on making it Lasso 9 for the time being.
>
> I did find it. Looks like there’s some thinking in there that’s still
> valid. On the client side it was an extension of FCKEditor. Todays
> replacement would be CKEditor, but to do that both the client side code and
> the server side code would need to be rewritten.
>
> CKEditor has a BBcode plugin that could be used, as is or as an
> inspiration.
> http://ckeditor.com/addon/bbcode
>
> Bottom line argument for not using HTML. If you can’t trust the user then
> there’s no way you can trust HTML supplied by the user. And there’s no safe
> parsing method available to change that.
>
> HDB
> Jolle
>
> #############################################################
>
> This message is sent to you because you are subscribed to
>   the mailing list Lasso [hidden email]
> Official list archives available at http://www.lassotalk.com
> To unsubscribe, E-mail to: <[hidden email]>
> Send administrative queries to  <[hidden email]>
>

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Best practices on handling user-generated HTML

Bil Corry-3
I suggest using some type of mark up substitution as a first line of defense, then serving the untrusted user input through DOMPurify (a JS library that strips out dangerous HTML):

https://github.com/cure53/DOMPurify/blob/master/README.md

The team behind that library are experts in XSS bypass technics - you're not going to find a better solution (other than to not serve untrusted user input).

- Bil

> On Jun 30, 2016, at 5:12 AM, Jason Huck <[hidden email]> wrote:
>
> This can also be a useful part of your toolkit for dealing with HTML:
>
> http://www.html-tidy.org/
>
>
>
> On Thu, Jun 30, 2016 at 5:35 AM, Jolle Carlestam <[hidden email]>
> wrote:
>
>>> 30 juni 2016 kl. 09:45 skrev Jolle Carlestam <[hidden email]>:
>>>
>>> 2/ Use another markup language. In the Lasso 8 days I enhanced something
>> that Fletcher came up with (LassoWiki) that was an alternate markup turned
>> into HTML on the server.
>>> The announcement is here:
>> http://www.lassotalk.com/Ann-WYSIWYG-editor-for-LassoWiki-markup.lasso?234922
>>>
>>> I suppose I can dig up that code. But, it is Lasso 8 and I have no plans
>> on making it Lasso 9 for the time being.
>>
>> I did find it. Looks like there’s some thinking in there that’s still
>> valid. On the client side it was an extension of FCKEditor. Todays
>> replacement would be CKEditor, but to do that both the client side code and
>> the server side code would need to be rewritten.
>>
>> CKEditor has a BBcode plugin that could be used, as is or as an
>> inspiration.
>> http://ckeditor.com/addon/bbcode
>>
>> Bottom line argument for not using HTML. If you can’t trust the user then
>> there’s no way you can trust HTML supplied by the user. And there’s no safe
>> parsing method available to change that.
>>
>> HDB
>> Jolle
>>
>> #############################################################
>>
>> This message is sent to you because you are subscribed to
>>  the mailing list Lasso [hidden email]
>> Official list archives available at http://www.lassotalk.com
>> To unsubscribe, E-mail to: <[hidden email]>
>> Send administrative queries to  <[hidden email]>
>
> #############################################################
>
> This message is sent to you because you are subscribed to
>  the mailing list Lasso [hidden email]
> Official list archives available at http://www.lassotalk.com
> To unsubscribe, E-mail to: <[hidden email]>
> Send administrative queries to  <[hidden email]>

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>