# sanitize `sanitize` is a Crystal library for transforming HTML/XML trees. It's primarily used to sanitize HTML from untrusted sources in order to prevent [XSS attacks](http://en.wikipedia.org/wiki/Cross-site_scripting) and other adversities. It builds on stdlib's [`XML`](https://crystal-lang.org/api/XML.html) module to parse HTML/XML. Based on [libxml2](http://xmlsoft.org/) it's a solid parser and turns malformed and malicious input into valid and safe markup. * Code: [https://github.com/straight-shoota/sanitize](https://github.com/straight-shoota/sanitize) * API docs: [https://straight-shoota.github.io/sanitize/api/latest/](https://straight-shoota.github.io/sanitize/api/latest/) * Issue tracker: [https://github.com/straight-shoota/sanitize/issues](https://github.com/straight-shoota/sanitize/issues) * Shardbox: [https://shardbox.org/shards/sanitize](https://shardbox.org/shards/sanitize) ## Installation 1. Add the dependency to your `shard.yml`: ```yaml dependencies: sanitize: github: straight-shoota/sanitize ``` 2. Run `shards install` ## Sanitization Features The `Sanitize::Policy::HTMLSanitizer` policy applies the following sanitization steps. Except for the first one (which is essential to the entire process), all can be disabled or configured. * Turns malformed and malicious HTML into valid and safe markup. * Strips HTML elements and attributes not included in the safe list. * Sanitizes URL attributes (like `href` or `src`) with customizable sanitization policy. * Adds `rel="nofollow"` to all links and `rel="noopener"` to links with `target`. * Validates values of accepted attributes `align`, `width` and `height`. * Filters `class` attributes based on a whitelist (by default all classes are rejected). ## Usage Transformation is based on rules defined by `Sanitize::Policy` implementations. The recommended standard policy for HTML sanitization is `Sanitize::Policy::HTMLSanitizer.common` which represents good defaults for most use cases. It sanitizes user input against a known safe list of accepted elements and their attributes. ```crystal require "sanitize" sanitizer = Sanitize::Policy::HTMLSanitizer.common sanitizer.process(%(foo)) # => %(foo) sanitizer.process(%(
)) # => %() sanitizer.process(%(
)) # => %(
)
sanitizer.process(%(| foo | bar |
| foo | bar |
Sanitization with https://shardbox.org/shards/sanitize is not that difficult.
puts "Hello World!"
Hello world!
``` ## Limitations Sanitizing CSS is not supported. Thus `style` attributes can't be accepted in a safe way. CSS sanitization features may be added when a CSS parsing library is available. ## Security If you want to privately disclose security-issues, please contact [straightshoota](https://keybase.io/straightshoota) on Keybase or [straightshoota@gmail.com](mailto:straightshoota@gmail.com) (PGP: `DF2D C9E9 FFB9 6AE0 2070 D5BC F0F3 4963 7AC5 087A`). ## Contributing 1. Fork it ([https://github.com/straight-shoota/sanitize/fork](https://github.com/straight-shoota/sanitize/fork)) 2. Create your feature branch (`git checkout -b my-new-feature`) 3. Commit your changes (`git commit -am 'Add some feature'`) 4. Push to the branch (`git push origin my-new-feature`) 5. Create a new Pull Request ## Contributors - [Johannes Müller](https://github.com/straight-shoota) - creator and maintainer