Which xml parser is best one in java




















This chapter contains these topics:. You can use parsing in validating or nonvalidating mode. XML Schema. XML Namespaces. Namespaces are a mechanism for differentiating element and attribute names. Binary XML. If you require a general introduction to the preceding technologies, consult the XML resources listed in "Related Documents" of the preface. You can find links to the specifications for all three levels at the following URL:. SAX is available in version 1. It is not a W3C specification.

JCR 1. JAXP version 1. An instantiated parser invokes the parse method to read an XML document. Figure illustrates the basic parsing process, using XMLParser. DOM provides classes and methods to navigate and process the tree.

Structural manipulations of the XML tree, such as re-ordering elements, adding to and deleting elements and attributes, and renaming elements, can be performed.

Interactive applications can store the object model in memory, enabling users to access and manipulate it. DOM as a standard does not support XPath. With Oracle 11 g Release 1 This relieves problems of memory inefficiency, limited scalability, and lack of control over the DOM configuration.

Scalable DOM can interact with the data in two ways:. Through the abstract InfosetReader and InfosetWriter interfaces.

Using the lazy materialization mechanism, XDK only creates nodes that are accessed and frees unused nodes from memory. Applications can process very large XML documents with improved scalability. DOM configurations can be made to suit different applications. You can configure the DOM with different access patterns such as read-only, streaming, transient update, and shadow copy, achieving maximum memory use and performance in your applications.

SAX processes the input document element by element and can report events and significant data to callback methods in the application.

It is useful for search operations and other programs that do not need to manipulate an XML tree. In general, the advantage of JAXP is that you can use it to write interoperable applications.

If an application uses features available through JAXP, then it can very easily switch the implementation. Only some of the Oracle-specific features are available through the extension mechanism provided in JAXP.

If an application uses these extensions, however, then the flexibility of switching implementation is lost. The XML parser for Java can parse unqualified element types and attribute names as well as those in namespaces. Namespaces are a mechanism to resolve or avoid name collisions between element types or attributes in XML documents by providing "universal" names. Consider the XML document shown in Example The example declares the following XML namespaces:.

Example associates the com prefix with the first namespace and the emp prefix with the second namespace. Namespace prefix, which is a namespace prefix declared with xmlns. In Example , emp and com are namespace prefixes. Local name, which is the name of an element or attribute without the namespace prefix. In Example , employee and company are local names. Qualified name, which is the local name plus the prefix.

In Example , emp:employee and com:company are qualified names. Expanded name, which is obtained by substituting the namespace URI for the namespace prefix. Applications invoke the parse method to parse XML documents.

Typically, applications invoke initialization and termination methods in association with the parse method. You can use the setValidationMode method defined in oracle. XMLParser to set the parser mode to validating or nonvalidating. If the XML document does conform, then the document is valid, which means that the structure of the document conforms to the DTD or schema rules.

A nonvalidating parser checks for well-formedness only. Tries to validate part or all of the instance document as long as it can find the schema definition. It does not raise an error if it cannot find the definition. See the sample program XSDLax. Tries to validate the whole instance document, raising errors if it cannot find the schema definition or if the instance does not conform to the definition.

If the DTD is not present, then the parser is set to nonvalidating mode. If neither is present, then the parser is set to nonvalidating mode. In addition to setting the validation mode with setValidationMode , you can use the oracle. The XMLParser. Chapter 7, "Using the Schema Processor for Java" to learn about validation. The compression algorithm is based on tokenizing the XML tags. The assumption is that any XML document repeats a number of tags and so tokenizing these tags gives considerable compression.

The degree of compression depends on the type of document: the larger the tags and the lesser the text content, the better the compression. Table describes the two types of compression. The goal is to reduce the size of the XML document without losing the structural and hierarchical information of the DOM tree. The serialized stream regenerates the DOM tree when read back. Use the writeExternal method to generate compressed XML and the readExternal method to reconstruct it. The methods are in the oracle.

XMLDocument class. When the binary stream is read back, the SAX events are generated. To generate compressed XML, instantiate oracle. Pass the object to SAXParser. Use the oracle. You can treat the compressed stream as a serialized stream, but the data in the stream is more controlled and managed than the compression implemented by Java's default serialization. This section contains the following topics:. The demo programs are distributed among the subdirectories described in Table For example, you can use the XSLT stylesheet iden.

This method is used by many of the other demo programs. A request for XML tokens is registered with the setToken method. It transforms an input XML document with a given input stylesheet. This demo builds the result of XSL transformations as a DocumentFragment and so does not support xsl:output features. The demo streams the result of the XSL transformation and so supports xsl:output features. The basic steps are as follows:. For example:. It checks for both well-formedness and validity.

Table oraxml Command-Line Options. You can validate the document family. The W3C standard library org. Along with org. This class implements an XML 1. This class contains factory methods used to created scalable, pluggable DOM.

The program DOMSample. The steps provide reference to tables that provide possible methods and interfaces you can use at that point. The following code fragment from DOMSample. Parse the input XML document by invoking the parse method.

The program builds a tree of Node objects in memory. This code fragment from DOMSample. URL class:. As illustrated by the following code fragment, DOMSample. You can use this handle to access every part of the parsed XML document. See Table It then loops through each item in the list and calls getNodeName to print the name of each element:.

The program implements the printElementAttributes method, which calls Document. It then loops through each element in the list and calls Element.

It then calls Node. Reset the parser state by invoking the reset method. The parser is now ready to parse a new document. The following tables provide useful methods and interfaces to use in creating an application such as the one just created in "Performing Basic DOM Parsing".

Set the validation mode of the parser. Table describes the flags that you can use with this method. Table lists the interfaces that the XMLDocument class implements. Generate a NamedNodeMap containing the attributes of this node if it is an element or null otherwise.

Retrieve recursively all elements that match a given tag name under a certain level. Obtain the expanded name of the element. This method is specified in the NSName interface. Obtain the local name for this element. Obtain the namespace URI of this node, or null if it is unspecified.

Obtain the value of this node, depending on its type. This mode is in the Node interface. Obtain the qualified name for an element.

The underlying data can be either internal or plug-in, and both can be in binary XML. Plug-in data is data that has already been parsed and therefore can be transferred from one processor to another without requiring parsing. Users can also plug in their own implementations. The InfosetReader retrieves sequential events from the XML stream and queries the state and data from these events. In the following example, the XML data is scanned to retrieve the QName s and attributes of all elements:.

Copying: To support shadow copy of DOM across documents, a new copy of InfosetReader can be created to ensure thread safety, using the Clone method. Moving Focus: To support lazy materialization, the InfosetReader may have the ability to move focus to any location specified by Offset Optional. InfosetWriter is an extension of the InfosetReader interface that supports data writing. Users cannot modify this implementation. You can save the XML text as either of the following:.

References to binary XML: You can save the section reference of binary XML instead of actual data, if you know that the data source is available for deserialization. To save as references to binary XML, use true as the argument for the save command. Using lazy materialization, you can plug in an empty DOM, which can pull in more data when needed and free nodes when they are no longer needed.

The rest of the DOM tree can be expanded later if it is accessed. A node may have unexpanded child and sibling nodes, but its parent and ancestors are always expanded.

Each node maintains the InfoSetReader. Offset property of the next node so that the DOM can pull data additional to create the next node. The DOM navigation interface allows access to neighboring nodes such as first child, last child, parent, previous or next sibling. If node creation is needed, it is always done in document order. In the case of scalable DOM, retrieval by index does not cause the expansion of all previous nodes, but their ancestor nodes are materialized.

XPath evaluation can cause materialization of all intermediate nodes in memory. Supporting DOM navigation requires adding cross references among nodes. In automatic dereferencing mode, some of the links are weak references, which can be freed during garbage collection. Node release is based on the importance of the links: Links to parent nodes cannot be dropped because ancestors provide context for in-scope namespaces and it is difficult to retrieve dropped parent nodes using streaming APIs such as InfosetReader.

The scalable DOM always holds its parent and previous sibling strongly but holds its children and following sibling weakly. When the Java Virtual Machine frees the nodes, references to them are still available in the underlying data so they can be recreated if needed.

In this mode, the DOM depends on the application to explicitly dereference a document fragment from the whole tree. There are no weak references. It is recommended that if an application has a deterministic order of processing the data, to avoid the extra overhead of repeatedly releasing and recreating nodes. Note that dereferencing nodes is different from removing nodes from a DOM tree.

The node can still be accessed and recreated from its parent, previous, and following siblings. However, a variable that holds the node will throw an error when accessing the node after the node has been freed.

When the copy method is used, it creates just the root node of the fragment being copied, and the subtree can be expanded on demand. Data sharing is for the underlying data, not the DOM nodes themselves. The DOM specification requires that the clone and its original have different node identities, and that they have different parent nodes.

The DOM API supports update operations such as adding, deleting nodes, setting, deleting, changing, and inserting values. Normal update operations are available and do not interfere with each other. This merges all the changes with the original data and serializes the data in persistent storage. If you do not save a modified DOM explicitly, the changes are lost once the transaction ends. For additional scalability, the scalable DOM can use backend storage for binary data through the PageManager interface.

When the binary stream is read back, the SAX events are generated. The following are the sample Java files in its subdirectories common , comp , dom , jaxp , sax , xslt :.

A request for the XML tokens is registered using the setToken method. During tokenizing, the parser does not validate the document and does not include or read internal or external utilities. Use make for UNIX or make. Run the sample program to build the DOM tree from the compressed stream if you have done the last step.

Run the sample program for regenerating the SAX events from the compressed stream if you have done the last step:. In some applications, it is not necessary to validate the XML document. In this case, a DTD is not required. Optionally, use DOMParser. The following example illustrates how to use the DOMNamespace class:. See the comments in this source code for a guide to the use of methods.

The program begins with these comments:. For the attributes, the method getNodeValue returns the value of this node, depending on its type. Here is another excerpt from later in this program:. Applications can register a SAX handler to receive notification of various parser events. This interface enables an application to set and query features and properties in the parser, to register event handlers for document processing, and to initiate a document parse.

All SAX interfaces are assumed to be synchronous: the parse methods must not return until parsing is complete, and readers must wait for an event-handler callback to return before reporting the next event.

This interface replaces the now deprecated SAX 1. The XMLReader interface contains two important enhancements over the old parser interface:. Figure shows the main steps for coding with the SAXParser class. Declare a new SAXParser object. Table lists all the available methods.

Parse methods return when parsing completes. Meanwhile the process waits for an event-handler callback to return before reporting the next event. This example illustrates how you can use SAXParser class and several handler interfaces. The parser reports parsing events directly through callback functions such as setDocumentLocator and startDocument.

This application uses handlers to deal with the different events. This section contains these topics:. To get the number of elements in a particular tag using the parser, you can use the getElementsByTagName method that returns a node list of all descent elements with a given tag name.

You can then find out the number of elements in that node list to determine the number of the elements in the particular tag.

If you check the DOM specification, referring to the table discussing the node type, you will find that if you are creating an element node, its node value is null , and cannot be set.

However, you can create a text node and append it to the element node. You can then put the value in the text node. Here is how to efficiently obtain the value of first child node of the element without going through the DOM tree. If you do not need the entire tree, use the SAX interface to return the desired data. Since it is event-driven, it does not have to parse the whole document. This method is used to extract contents from the tree or subtree based on the select patterns allowed by XSL.

The optional second parameter of selectNodes , is used to resolve namespace prefixes that is, it returns the expanded namespace URL given a prefix. XMLElement resolves the prefixes based on the input document. You can use the NSResolver interface, if you need to override the namespace definitions. The following sample code uses selectNodes. Here is an example of XML document generation starting from information contained in simple variables, such as when a client fills in a Java form and wants to obtain an XML document.

The sample code is:. For each key in the enumeration, use the createElement on DOM document to create an element by the name of the key with a child text node with the value of the value of the hash table entry for that key.

AppendChild only works within a single tree and the example uses two different ones. You need to use importNode or adoptNode instead. If you create an element node, its nodeValue is null and hence cannot be set. You get the following error:. You have asked for it to be a big text chunk, which is what it will give you. Given a binary input stream with no external encoding information, the parser automatically figures out the character encoding based on the byte order mark and encoding declaration of the XML document.

Any well-formed document in any supported encoding can be successfully parsed using the following sample code:. FileWriter class should not be used in writing XML files because it depends on the default character encoding of the runtime environment. The output file can suffer from a parsing error or data loss if the document contains characters that are not available in the default character encoding.

Using a Java class that assumes the default file encoding can cause problems. The following example shows how to avoid these problems:. The character 0xc2 , 0x82 is valid UTF The character can be distorted when getAsciiStream is called. If this does not work, try to print out the characters to make sure that they are not distorted before they are sent to the parser in step: parser. You can read in accented characters in their hex or decimal format within the XML document, for example:.

Use that encoding or something different, depending on the tool or operating system you are using. If the subsequent bytes do not form a valid UTF-8 sequence, you get an error. This error just means that your editor is not saving the file with UTF-8 encoding.

For example, it might be saving it with ISO encoding. The encoding is a particular scheme used to write the Unicode character number representation to disk. Just adding this string to the top of the document does not cause your editor to write out the bytes representing the file to disk using UTF-8 encoding:.

You need to include the proper encoding declaration in your document according to the specification. You cannot use setEncoding to set the encoding for your input document. SetEncoding is used with oracle. XMLDocument to set the correct encoding for the printing. You cannot use System. You need to use an output stream which is encoding aware for example, OutputStreamWriter. You can construct an OutputStreamWriter and use the write char[] , int , int method to print.

There is no way to directly include binary data within the document; however, there are two ways to work around this:. The limitation on the encoding technique is to ensure that it only produces legal characters for the CDATA section.

Just load the file as you load an HTML page. You must use the entity references:. There is no need to say that all I haven't checked all, but I'm almost sure of the parsers proposed comply with a JAXP implementation so technically you can use all, no matter which. If speed and memory is no problem, dom4j is a really good option. If you need speed, using a StAX parser like Woodstox is the right way, but you have to write more code to get things done and you have to get used to process XML in streams.

I wouldn't recommended this is you've got a lot of "thinking" in your app, but using XSLT could be better and potentially faster with XSLT-to-bytecode compilation than Java manipulation. Stack Overflow for Teams — Collaborate and share knowledge with a private group. Create a free Team What is Teams? Collectives on Stack Overflow. Learn more. Asked 12 years, 11 months ago. Active 6 years ago. Viewed k times. Improve this question. Elliot Chance 4, 8 8 gold badges 41 41 silver badges 69 69 bronze badges.

Evan Evan I think, you can find more players here: xml. Clearly moderators and users have different perspectives on what is constructive. Yes, It seems mods are shortsighted when it comes to questions like this. Yes the answers would be opinionated but definitely based on the experience and most of the times the answers are quantified. Mods need to create probably a different tag to move this questions which are open for discussion which results in constructive criticism and outputs.

Show 2 more comments.



0コメント

  • 1000 / 1000