crow

Utility library


Project maintained by CaptainCrowbar Hosted on GitHub Pages — Theme by mattgraham

XML

Crow Library by Ross Smith

#include "crow/xml.hpp"
namespace Crow::Xml;

Contents

Introduction

The facilities in the Crow::Xml namespace allow generation and parsing of https://www.w3.org/TR/xml/. This is a fairly minimal implementation that does not attempt to support every feature in the XML standard.

Supporting types

class Error: public std::runtime_error {
    std::string text() const;
};

Exception used to report XML construction errors. The text() function returns the part of the input text that appears to be invalid.

enum class Options: int {
    none = 0,
    autoclose,  // Implicit closing
    comments,   // Keep comments
    encoded,    // Text is already encoded
    foldws,     // Collapse whitespace
    icase,      // Case insensitive names
    keyonly,    // Attribute values are optional
    noxmldecl,  // No default XML declaration
    selfclose,  // HTML self-closing elements
    xentity,    // HTML character entities
    xml = none,
    html = autoclose | foldws | icase | keyonly
        | noxmldecl | selfclose | xentity,
};

Bitmask flags controlling the behaviour of XML parsing and generation. Most of these are only relevant to parsing. The default behaviour, with all options switched off, follows the syntax of the XML standard, except as noted above. The html option, which switches on all flags except comments and encoded, parses markup as HTML5.

Except where the list below states otherwise, options should be assumed to only affect XML parsing, and ignored in any other context.

enum class NodeType: int {
    null,
    xmldecl,
    dtd,
    cdata,
    comment,
    entity,
    processing,
    text,
    element,
    document,
};

Values returned by the Node::type() function, matching the type of the node.

class AttributeMap {
    class attr_iterator;
        // Bidirectional const iterator
        // Dereferences to a pair of strings
    std::string attr(const std::string& key) const;
    Irange<attr_iterator> attr_range() const;
    void clear_attrs() noexcept;
    void erase_attr(const std::string& key) noexcept;
    bool has_attr(const std::string& key) const noexcept;
    size_t num_attrs() const noexcept;
    void set_attr(const std::string& key);
    void set_attr(const std::string& key, const std::string& value);
};

A key-value map used for element attributes.

Functions

std::string decode_text(std::string_view str, Options opt = {});
std::string encode_text(std::string_view str);

Encode or decode text using character entities. The decode_text() function respects the Options::xentity flag. The encode_text() function will only generate the standard XML character entities, along with code point entities for non-whitespace control characters.

bool is_valid_name(std::string_view name) noexcept;

True if this is a valid XML element, attribute, or entity name.

NodeType node_type(const NodePtr& node) noexcept;

Returns node->type() if the node is valid, or NodeType::null if the pointer is null.

XML node classes

class Node;
    class SimpleNode: public Node;
        class PrologueNode: public SimpleNode;
            class Xmldecl: public PrologueNode;
            class Dtd: public PrologueNode;
        class InlineNode: public SimpleNode;
            class Cdata: public InlineNode;
            class Comment: public InlineNode;
            class Entity: public InlineNode;
            class Processing: public InlineNode;
            class Text: public InlineNode;
    class CompoundNode: public Node;
        class Element: public CompoundNode;
        class Document: public CompoundNode;

XML node classes, described in detail below.

using NodePtr          = std::shared_ptr<Node>;
using SimpleNodePtr    = std::shared_ptr<SimpleNode>;
using PrologueNodePtr  = std::shared_ptr<PrologueNode>;
using XmldeclPtr       = std::shared_ptr<Xmldecl>;
using DtdPtr           = std::shared_ptr<Dtd>;
using InlineNodePtr    = std::shared_ptr<InlineNode>;
using CdataPtr         = std::shared_ptr<Cdata>;
using CommentPtr       = std::shared_ptr<Comment>;
using EntityPtr        = std::shared_ptr<Entity>;
using ProcessingPtr    = std::shared_ptr<Processing>;
using TextPtr          = std::shared_ptr<Text>;
using CompoundNodePtr  = std::shared_ptr<CompoundNode>;
using ElementPtr       = std::shared_ptr<Element>;
using DocumentPtr      = std::shared_ptr<Document>;

Defined for convenience.

Node base class

class Node {
    class search_iterator;
        // Forward iterator
        // Dereferences to a non-null NodePtr
    using search_range = Irange<search_iterator>;
    virtual ~Node() noexcept;
    virtual NodePtr clone() const = 0;
    virtual NodeType type() const noexcept = 0;
    std::string inner(Options opt = Options::xml) const;
    std::string outer(Options opt = Options::xml) const;
    search_range search(Options opt = Options::none) const;
    search_range search(NodeType type, Options opt = Options::none) const;
    search_range search(const std::string& element, Options opt = Options::none) const;
};

Common abstract base class for all XML node classes. Derived classes do not have public constructors; they must be created on the heap using each concrete class’s static create() function.

The clone() function returns a shallow copy of the current node. The type() function is a quick way of checking the node’s type.

The inner() and outer() functions return the node’s inner and outer XML. The outer() function returns the complete XML for the node. The inner() function returns the XML text inside the node; it will return an empty string for all node types except Element. The only options that affect these functions are keyonly and selfclose().

The search() functions iterate over all descendant nodes of the current node. For nodes that are not derived from CompoundNode this will always be empty. Optionally, a node type or element name can be supplied; only nodes with the specified type, or elements with the specified name, will be visited by the search iterators. The only option that has any effect here is Options::icase, which makes element name matching case insensitive.

Simple node base class

class SimpleNode: public Node;

Abstract base class for nodes that do not contain any other nodes.

Prologue node classes

class PrologueNode: public SimpleNode;

Abstract base class for node types that normally appear only in a document’s prologue (Xmldecl and Dtd).

class Xmldecl: public PrologueNode {
    static std::shared_ptr<Xmldecl> create(std::string_view str = {});
};

An XML declaration of the form "<?xml...?>" (case insensitive). The bracketing "<?xml" and "?>" will be inserted if they are not supplied. If Xmldecl::create() is called with no arguments, or with an empty string, the standard default XML declaration will be used.

class Dtd: public PrologueNode {
    static std::shared_ptr<Dtd> create(std::string_view str);
};

A document type declaration of the form "<!DOCTYPE...>". The bracketing "<!DOCTYPE" and ">" will be inserted if they are not supplied.

Inline node classes

class InlineNode: public SimpleNode;

Abstract base class for simple inline nodes.

class Cdata: public InlineNode {
    static std::shared_ptr<Cdata> create(std::string_view str);
};

A character data block of the form "<[CDATA[...]]>". The bracketing "<[[CDATA" and "]]>" will be inserted if they are not supplied.

class Comment: public InlineNode {
    static std::shared_ptr<Comment> create(std::string_view str);
};

A comment of the form "<!--...-->". The bracketing "<!--" and "-->" will be inserted if they are not supplied.

class Entity: public InlineNode {
    static std::shared_ptr<Entity> create(std::string_view str);
};

An inline entity of the form "&...;>". The bracketing "&" and ";" will be inserted if they are not supplied.

class Processing: public InlineNode {
    static std::shared_ptr<Processing> create(std::string_view str);
};

A processing instruction of the form "<?...?>". The bracketing "<?" and "?>" will be inserted if they are not supplied.

class Text: public InlineNode {
    std::string plain() const;
    static std::shared_ptr<Text> create(std::string_view str,
        Options opt = Options::xml);
};

A plain text block. By default, the input text is assumed to be plain text that needs to be encoded. It will be assumed to be already encoded if Options::encoded is passed. No other option affects this function.

The plain() function returns the node’s contents as decoded plain text.

Compound node classes

class CompoundNode: public Node {
    class iterator;
        // Random access const iterator
        // Dereferences to a non-null NodePtr
    iterator begin() const noexcept;
    iterator end() const noexcept;
    NodePtr child(size_t i) const noexcept;
    size_t children() const noexcept;
    void clear() noexcept;
    bool empty() const noexcept;
    void erase(iterator i) noexcept;
    const NodePtr& front() const noexcept;
    const NodePtr& back() const noexcept;
    iterator insert(iterator i, NodePtr node);
    void pop_back() noexcept;
    void push_back(NodePtr node);
    CompoundNode& operator+=(NodePtr node);
};

Abstract base class for nodes that contain other nodes(Element and Document). Most of these functions have their normal meaning for containers. The child() function will return a child node by index, or a null pointer if the index is out of bounds. The += operator is equivalent to push_back(). Inserting a null NodePtr will have no effect.

class Element: public CompoundNode, public AttributeMap {
    std::string name() const { return name_; }
    static std::shared_ptr<Element> create(const std::string& name);
};

An XML element. This is constructed with only a name; attributes can be added after construction.

class Document: public CompoundNode {
    XmldeclPtr xmldecl() const noexcept;
    DtdPtr dtd() const noexcept;
    static std::shared_ptr<Document> create(Options opt = {});
    static std::shared_ptr<Document> create(std::string_view xml,
        Options opt = {});
};

An XML document. The first version of create() returns an empty document (except possibly for an XML declaration, as described below); the second version parses an existing document.

If the parsed document contains an XML declaration, it will be retained regardless of the Options::noxmldecl flag. Otherwise, a default Xmldecl will be created unless suppressed by the flag.

Document creation Xmldecl in doc? Noxmldecl flag? Document xmldecl
Default create N/A N/A Default xmldecl
Create from options alone N/A No Default xmldecl
    Yes None
Create from XML source No No Default xmldecl
    Yes None
  Yes Ignored Xmldecl from source