Class: Oga::XML::PullParser

Inherits:
Parser
  • Object
show all
Defined in:
lib/oga/xml/pull_parser.rb

Overview

The PullParser class can be used to parse an XML document incrementally instead of parsing it as a whole. This results in lower memory usage and potentially faster parsing times. The downside is that pull parsers are typically more difficult to use compared to DOM parsers.

Basic parsing using this class works as following:

parser = Oga::XML::PullParser.new('... xml here ...')

parser.parse do |node|
  if node.is_a?(Oga::XML::PullParser)

  end
end

This parses yields proper XML instances such as Element. Doctypes and XML declarations are ignored by this parser.

Constant Summary collapse

DISABLED_CALLBACKS =

Returns:

  • (Array)
[
  :on_document,
  :on_doctype,
  :on_xml_decl,
  :on_element_children
]
BLOCK_CALLBACKS =

Returns:

  • (Array)
[
  :on_cdata,
  :on_comment,
  :on_text,
  :on_proc_ins
]
NODE_SHORTHANDS =

Returns the shorthands that can be used for various node classes.

Returns:

  • (Hash)
{
  :text            => XML::Text,
  :node            => XML::Node,
  :cdata           => XML::Cdata,
  :element         => XML::Element,
  :doctype         => XML::Doctype,
  :comment         => XML::Comment,
  :xml_declaration => XML::XmlDeclaration
}

Constants inherited from Parser

Oga::XML::Parser::CONFIG, Oga::XML::Parser::TOKEN_ERROR_MAPPING

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods inherited from Parser

#_rule_0, #_rule_1, #_rule_10, #_rule_11, #_rule_12, #_rule_13, #_rule_14, #_rule_15, #_rule_16, #_rule_17, #_rule_18, #_rule_19, #_rule_2, #_rule_20, #_rule_21, #_rule_22, #_rule_23, #_rule_24, #_rule_25, #_rule_26, #_rule_27, #_rule_28, #_rule_29, #_rule_3, #_rule_30, #_rule_31, #_rule_32, #_rule_33, #_rule_34, #_rule_35, #_rule_36, #_rule_37, #_rule_38, #_rule_39, #_rule_4, #_rule_40, #_rule_41, #_rule_42, #_rule_5, #_rule_6, #_rule_7, #_rule_8, #_rule_9, #each_token, #on_attribute, #on_attributes, #on_cdata, #on_comment, #on_doctype, #on_document, #on_element_children, #on_proc_ins, #on_text, #on_xml_decl, #parser_error

Constructor Details

#initialize(*args) ⇒ PullParser

Returns a new instance of PullParser



57
58
59
60
# File 'lib/oga/xml/pull_parser.rb', line 57

def initialize(*args)
  super
  @nesting = []
end

Instance Attribute Details

#nestingArray (readonly)

Array containing the names of the currently nested elements.

Returns:

  • (Array)


26
27
28
# File 'lib/oga/xml/pull_parser.rb', line 26

def nesting
  @nesting
end

#nodeOga::XML::Node (readonly)

Returns:



22
23
24
# File 'lib/oga/xml/pull_parser.rb', line 22

def node
  @node
end

Instance Method Details

#after_element(*args) ⇒ Object



146
147
148
149
150
# File 'lib/oga/xml/pull_parser.rb', line 146

def after_element(*args)
  nesting.pop

  return
end

#on(type, nesting = []) ⇒ Object

Calls the supplied block if the current node type and optionally the nesting match. This method allows you to write this:

parser.parse do |node|
  parser.on(:text, %w{people person name}) do
    puts node.text
  end
end

Instead of this:

parser.parse do |node|
  if node.is_a?(Oga::XML::Text) and parser.nesting == %w{people person name}
    puts node.text
  end
end

When calling this method you can specify the following node types:

  • :cdata
  • :comment
  • :element
  • :text

Examples:

parser.on(:element, %w{people person name}) do

end

Parameters:

  • type (Symbol)

    The type of node to act upon. This is a symbol as returned by Node#node_type.

  • nesting (Array) (defaults to: [])

    The element name nesting to act upon.



106
107
108
109
110
111
112
# File 'lib/oga/xml/pull_parser.rb', line 106

def on(type, nesting = [])
  if node.is_a?(NODE_SHORTHANDS[type])
    if nesting.empty? or nesting == self.nesting
      yield
    end
  end
end

#on_element(*args) ⇒ Object



135
136
137
138
139
140
141
142
143
# File 'lib/oga/xml/pull_parser.rb', line 135

def on_element(*args)
  @node = super

  nesting << @node.name

  @block.call(@node)

  return
end

#parse {|| ... } ⇒ Object

Parses the input and yields every node to the supplied block.

Yield Parameters:



65
66
67
68
69
70
71
# File 'lib/oga/xml/pull_parser.rb', line 65

def parse(&block)
  @block = block

  super

  return
end