Cyber Week

Save big!

All courses under $5 - for a limited time!

Code has been added to clipboard!

PHP Functions and Constants for Quicker and More Efficient XML Parsing

Reading time 5 min
Published Aug 8, 2017
Updated Oct 2, 2019

XML is a universal format for storing data in a plain text readable by both people and computers. XML parsing means taking a XML document and transforming it into a code ready to be read and executed.

PHP coding language has inbuilt functions and predefined constants for Expat XML parser. In this tutorial, we will explain them in detail.

XML Parsing: Main Tips

  • XML functions allow you to parse XML documents but not validate them.
  • Expat is an event based parser which allows you to process and manage XML documents in PHP.

Expat Parser

Expat is an event based parser. Parsers like this take XML files as event series, calling a specified function for dealing with it whenever an event occurs. That makes it lightweight and well-suited for fast web applications.

It is a non-validating parser and ignores DTDs that may be linked to the documents. If a document is not properly formatted, it will end with an XML parsing error message.

Remember: Expat parser is not designed for document validation. However, if some formatting issues are detected, you will be informed with an error message.

List of Functions

Look at the list below. Functions that can be used for XML parsing are listed alphabetically.

Note: all of these functions are part of PHP. Therefore, you do not need to install any third-party applications.

In the colum on the right, versions of PHP in which a certain function is valid are indicated:

Function Description PHP version
utf8_decode() Decode UTF-8 strings into ISO-8859-1 3 and newer
utf8_encode() Encode ISO-8859-1 strings into UTF-8 3 and newer
xml_error_string() Get XML parsing error strings 3 and newer
xml_get_current_byte_index() Get current byte index from PHP XML parser 3 and newer
xml_get_current_column_number() Get current column number from PHP XML parser 3 and newer
xml_get_current_line_number() Get current line number from PHP XML parser 3 and newer
xml_get_error_code() Get XML parsing error code 3 and newer
xml_parse() Parse XML documents 3 and newer
xml_parse_into_struct() Parse XML data into array values 3 and newer
xml_parser_create_ns() Create XML parser that has namespace support 4 and newer
xml_parser_create() Create PHP XML parser 3 and newer
xml_parser_free() Free the PHP XML parser 3 and newer
xml_parser_get_option() Gets options from PHP XML parser 3 and newer
xml_parser_set_option() Sets options in PHP XML parser 3 and newer
xml_set_character_data_handler() Sets handler function for handling char data 3 and newer
xml_set_default_handler() Sets default handler function 3 and newer
xml_set_element_handler() Sets handler function for handling start and end element of elements 3 and newer
xml_set_end_namespace_decl_handler() Sets handler function for handling the end of namespace declarations 4 and newer
xml_set_external_entity_ref_handler() Sets handler function for handling external entities 3 and newer
xml_set_notation_decl_handler() Sets handler function for handling notation declarations 3 and newer
xml_set_object() Uses PHP XML parser within an object 4 and newer
xml_set_processing_instruction_handler() Sets handler function for handling processing instruction 3 and newer
xml_set_start_namespace_decl_handler() Sets handler function for handling the start of namespace declarations 4 and newer
xml_set_unparsed_entity_decl_handler() Sets handler function for handling unparsed entity declarations 3 and newer

Error Codes and Constants

You might encounter errors during parsing. Here are error codes that the xml_parse() function can return:

Constant
XML_ERROR_NONE (int)
XML_ERROR_NO_MEMORY (int)
XML_ERROR_SYNTAX (int)
XML_ERROR_NO_ELEMENTS (int)
XML_ERROR_INVALID_TOKEN (int)
XML_ERROR_UNCLOSED_TOKEN (int)
XML_ERROR_PARTIAL_CHAR (int)
XML_ERROR_TAG_MISMATCH (int)
XML_ERROR_DUPLICATE_ATTRIBUTE (int)
XML_ERROR_JUNK_AFTER_DOC_ELEMENT (int)
XML_ERROR_PARAM_ENTITY_REF (int)
XML_ERROR_UNDEFINED_ENTITY (int)
XML_ERROR_RECURSIVE_ENTITY_REF (int)
XML_ERROR_ASYNC_ENTITY (int)
XML_ERROR_BAD_CHAR_REF (int)
XML_ERROR_BINARY_ENTITY_REF (int)
XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF (int)
XML_ERROR_MISPLACED_XML_PI (int)
XML_ERROR_UNKNOWN_ENCODING (int)
XML_ERROR_INCORRECT_ENCODING (int)
XML_ERROR_UNCLOSED_CDATA_SECTION (int)
XML_ERROR_EXTERNAL_ENTITY_HANDLING (int)

These constants are parameters of xml_parser_set_option:

Constant Description
XML_OPTION_CASE_FOLDING (int) Manages whether case-folding is enabled for XML parser. By default, it is enabled.
XML_OPTION_TARGET_ENCODING (int) Indicates how many characters should be skipped from the beginning of the tag name.
XML_OPTION_SKIP_TAGSTART (int) Indicates whether to ignore values that have whitespace characters.
XML_OPTION_SKIP_WHITE (int) Sets which target encoding to use in this XML parser.

XML Parsing: Summary

  • PHP has an inbuilt extension for a lightweight event-based XML parser called Expat. Event based parsers view XML files as a series of single events.
  • Expat allows the user to parse XML files, but cannot validate them. If a certain document is not formatted correctly, an error message shows up.
  • You can use PHP XML functions listed in the tutorial to start XML parsers and define XML event handlers in your codes.