Class: Oga::XML::SaxParser

Inherits:
Parser
  • Object
show all
Defined in:
lib/oga/xml/sax_parser.rb

Overview

The SaxParser class provides the basic interface for writing custom SAX parsers. All callback methods defined in Parser are delegated to a dedicated handler class.

To write a custom handler for the SAX parser, create a class that implements one (or many) of the following callback methods:

  • on_document
  • on_doctype
  • on_cdata
  • on_comment
  • on_proc_ins
  • on_xml_decl
  • on_text
  • on_element
  • on_element_children
  • on_attribute
  • on_attributes
  • after_element

For example:

class SaxHandler
  def on_element(namespace, name, attrs = {})
    puts name
  end
end

You can then use it as following:

handler = SaxHandler.new
parser  = Oga::XML::SaxParser.new(handler, '<foo />')

parser.parse

For information on the callback arguments see the documentation of the corresponding methods in Parser.

Element Callbacks

The SAX parser changes the behaviour of both on_element and after_element. The latter in the regular parser only takes a Element instance. In the SAX parser it will instead take a namespace name and the element name. This eases the process of figuring out what element a callback is associated with.

An example:

class SaxHandler
  def on_element(namespace, name, attrs = {})
    # ...
  end

  def after_element(namespace, name)
    puts name # => "foo", "bar", etc
  end
end

Attributes

Attributes returned by on_attribute are passed as an Hash as the 3rd argument of the on_element callback. The keys of this Hash are the attribute names (optionally prefixed by their namespace) and their values. You can overwrite on_attribute to control individual attributes and on_attributes to control the final set.

Direct Known Subclasses

HTML::SaxParser

Constant Summary

Constants inherited from Parser

Parser::CONFIG, Parser::TOKEN_ERROR_MAPPING

Instance Method Summary collapse

Methods inherited from Parser

#_rule_0, #_rule_1, #_rule_10, #_rule_11, #_rule_12, #_rule_13, #_rule_14, #_rule_15, #_rule_16, #_rule_17, #_rule_18, #_rule_19, #_rule_2, #_rule_20, #_rule_21, #_rule_22, #_rule_23, #_rule_24, #_rule_25, #_rule_26, #_rule_27, #_rule_28, #_rule_29, #_rule_3, #_rule_30, #_rule_31, #_rule_32, #_rule_33, #_rule_34, #_rule_35, #_rule_36, #_rule_37, #_rule_38, #_rule_39, #_rule_4, #_rule_40, #_rule_41, #_rule_42, #_rule_5, #_rule_6, #_rule_7, #_rule_8, #_rule_9, #each_token, #on_cdata, #on_comment, #on_doctype, #on_document, #on_element_children, #on_proc_ins, #on_xml_decl, #parser_error

Constructor Details

#initialize(handler, *args) ⇒ SaxParser

Returns a new instance of SaxParser

Parameters:

  • handler (Object)

    The SAX handler to delegate callbacks to.

See Also:

  • Oga::XML::SaxParser.[Oga[Oga::XML[Oga::XML::Parser[Oga::XML::Parser#initialize]


71
72
73
74
75
# File 'lib/oga/xml/sax_parser.rb', line 71

def initialize(handler, *args)
  @handler = handler

  super(*args)
end

Instance Method Details

#after_element(namespace_with_name) ⇒ Object

Manually define after_element so it can take a namespace and name. This differs a bit from the regular after_element which only takes an Element instance.

Parameters:

  • namespace_with_name (Array)


93
94
95
96
97
# File 'lib/oga/xml/sax_parser.rb', line 93

def after_element(namespace_with_name)
  run_callback(:after_element, *namespace_with_name)

  return
end

#on_attribute(name, ns = nil, value = nil) ⇒ Object

Manually define this method since for this one we do want the return value so it can be passed to on_element.

See Also:

  • Oga::XML::SaxParser.[Oga[Oga::XML[Oga::XML::Parser[Oga::XML::Parser#on_attribute]


103
104
105
106
107
108
109
110
111
112
113
114
115
# File 'lib/oga/xml/sax_parser.rb', line 103

def on_attribute(name, ns = nil, value = nil)
  if @handler.respond_to?(:on_attribute)
    return run_callback(:on_attribute, name, ns, value)
  end

  key = ns ? "#{ns}:#{name}" : name

  if value
    value = EntityDecoder.try_decode(value, @lexer.html?)
  end

  {key => value}
end

#on_attributes(attrs) ⇒ Hash

Merges the attributes together into a Hash.

Parameters:

  • attrs (Array)

Returns:

  • (Hash)


121
122
123
124
125
126
127
128
129
130
131
132
133
134
# File 'lib/oga/xml/sax_parser.rb', line 121

def on_attributes(attrs)
  if @handler.respond_to?(:on_attributes)
    return run_callback(:on_attributes, attrs)
  end

  merged = {}

  attrs.each do |pair|
    # Hash#merge requires an extra allocation, this doesn't.
    pair.each { |key, value| merged[key] = value }
  end

  merged
end

#on_element(namespace, name, attrs = []) ⇒ Array

Manually define on_element so we can ensure that after_element always receives the namespace and name.

Returns:

  • (Array)

See Also:

  • Oga::XML::SaxParser.[Oga[Oga::XML[Oga::XML::Parser[Oga::XML::Parser#on_element]


82
83
84
85
86
# File 'lib/oga/xml/sax_parser.rb', line 82

def on_element(namespace, name, attrs = [])
  run_callback(:on_element, namespace, name, attrs)

  [namespace, name]
end

#on_text(text) ⇒ Object

Parameters:

  • text (String)


137
138
139
140
141
142
143
144
145
146
147
# File 'lib/oga/xml/sax_parser.rb', line 137

def on_text(text)
  if @handler.respond_to?(:on_text)
    unless inside_literal_html?
      text = EntityDecoder.try_decode(text, @lexer.html?)
    end

    run_callback(:on_text, text)
  end

  return
end