Module: Oga::XML::Entities

Defined in:
lib/oga/xml/entities.rb

Overview

Module for encoding/decoding XML and HTML entities. The mapping of HTML entities can be found in HTML::Entities::DECODE_MAPPING.

Constant Summary collapse

DECODE_MAPPING =

Hash containing XML entities and the corresponding characters.

The & mapping must come last to ensure proper conversion of non encoded to encoded forms (see Text#to_xml).

Returns:

  • (Hash)
{
  '&lt;'   => '<',
  '&gt;'   => '>',
  '&apos;' => "'",
  '&quot;' => '"',
  '&amp;'  => '&',
}
ENCODE_MAPPING =

Hash containing characters and the corresponding XML entities.

Returns:

  • (Hash)
{
  '&' => '&amp;',
  '>' => '&gt;',
  '<' => '&lt;',
}
ENCODE_ATTRIBUTE_MAPPING =

Hash containing characters and the corresponding XML entities to use when encoding XML/HTML attribute values.

Returns:

  • (Hash)
{
  '&' => '&amp;',
  '>' => '&gt;',
  '<' => '&lt;',
  "'" => '&apos;',
  '"' => '&quot;'
}
AMPERSAND =

Returns:

  • (String)
'&'.freeze
REGULAR_ENTITY =

Regexp for matching XML/HTML entities such as “ ”.

Returns:

  • (Regexp)
/&[a-zA-Z0-9]+;/
NUMERIC_CODE_POINT_ENTITY =

Regexp for matching XML/HTML numeric entities such as “&”.

Returns:

  • (Regexp)
/&#(\d+);/
HEX_CODE_POINT_ENTITY =

Regexp for matching XML/HTML hex entities such as “<”.

Returns:

  • (Regexp)
/&#x([a-fA-F0-9]+);/
ENCODE_REGEXP =

Returns:

  • (Regexp)
Regexp.new(ENCODE_MAPPING.keys.join('|'))
ENCODE_ATTRIBUTE_REGEXP =

Returns:

  • (Regexp)
Regexp.new(ENCODE_ATTRIBUTE_MAPPING.keys.join('|'))

Class Method Summary collapse

Class Method Details

.decode(input, mapping = DECODE_MAPPING) ⇒ String

Decodes XML entities.

Parameters:

  • input (String)
  • mapping (Hash) (defaults to: DECODE_MAPPING)

Returns:

  • (String)


71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# File 'lib/oga/xml/entities.rb', line 71

def self.decode(input, mapping = DECODE_MAPPING)
  return input unless input.include?(AMPERSAND)

  input = input.gsub(REGULAR_ENTITY, mapping)

  if input.include?(AMPERSAND)
    input = input.gsub(NUMERIC_CODE_POINT_ENTITY) do |found|
      pack_string($1, 10) || found
    end
  end

  if input.include?(AMPERSAND)
    input = input.gsub(HEX_CODE_POINT_ENTITY) do |found|
      pack_string($1, 16) || found
    end
  end

  input
end

.encode(input, mapping = ENCODE_MAPPING) ⇒ String

Encodes special characters as XML entities.

Parameters:

  • input (String)
  • mapping (Hash) (defaults to: ENCODE_MAPPING)

Returns:

  • (String)


96
97
98
# File 'lib/oga/xml/entities.rb', line 96

def self.encode(input, mapping = ENCODE_MAPPING)
  input.gsub(ENCODE_REGEXP, mapping)
end

.encode_attribute(input) ⇒ String

Encodes special characters in an XML attribute value.

Parameters:

  • input (String)

Returns:

  • (String)


104
105
106
# File 'lib/oga/xml/entities.rb', line 104

def self.encode_attribute(input)
  input.gsub(ENCODE_ATTRIBUTE_REGEXP, ENCODE_ATTRIBUTE_MAPPING)
end