XML (eXtensible Markup Language) has long been a standard for data representation and configuration in enterprise applications. Whether it's web services (SOAP), Spring configuration files, or data interchange between systems, XML provides a structured, hierarchical way of representing information.
In Java, handling XML files involves parsing, reading, modifying, and writing structured data. The two most commonly used APIs are:
- DOM (Document Object Model): Loads the entire XML into memory, creating a tree structure. Best for small to medium files and when random access is needed.
- SAX (Simple API for XML): Event-driven, sequential parsing. Best for large files or when streaming is required.
This tutorial covers DOM and SAX parsers in Java, with real-world examples and best practices.
Basics of Java I/O
- InputStream/OutputStream → Work with binary data.
- Reader/Writer → Work with text data (XML, JSON, CSV).
- XML being character-based requires Readers/Writers or specialized parsers.
Reading XML Files with DOM Parser
Example XML (employees.xml
)
<employees>
<employee>
<id>101</id>
<name>Alice</name>
<role>Developer</role>
</employee>
<employee>
<id>102</id>
<name>Bob</name>
<role>Manager</role>
</employee>
</employees>
DOM Parser Example
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.*;
import java.io.File;
public class DomReadExample {
public static void main(String[] args) {
try {
File file = new File("employees.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(file);
doc.getDocumentElement().normalize();
NodeList nodeList = doc.getElementsByTagName("employee");
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element element = (Element) node;
String id = element.getElementsByTagName("id").item(0).getTextContent();
String name = element.getElementsByTagName("name").item(0).getTextContent();
String role = element.getElementsByTagName("role").item(0).getTextContent();
System.out.println(id + " - " + name + " - " + role);
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
Writing XML Files with DOM Parser
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.*;
import java.io.File;
public class DomWriteExample {
public static void main(String[] args) {
try {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.newDocument();
Element root = doc.createElement("employees");
doc.appendChild(root);
Element employee = doc.createElement("employee");
root.appendChild(employee);
Element id = doc.createElement("id");
id.appendChild(doc.createTextNode("103"));
employee.appendChild(id);
Element name = doc.createElement("name");
name.appendChild(doc.createTextNode("Charlie"));
employee.appendChild(name);
Element role = doc.createElement("role");
role.appendChild(doc.createTextNode("Designer"));
employee.appendChild(role);
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(new File("employees_out.xml"));
transformer.transform(source, result);
System.out.println("XML file created successfully.");
} catch (Exception e) {
e.printStackTrace();
}
}
}
Reading XML Files with SAX Parser
SAX Parser Example
import javax.xml.parsers.SAXParserFactory;
import javax.xml.parsers.SAXParser;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.*;
public class SaxReadExample {
public static void main(String[] args) {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new DefaultHandler() {
boolean bName = false;
boolean bRole = false;
public void startElement(String uri, String localName, String qName, Attributes attributes) {
if (qName.equalsIgnoreCase("name")) {
bName = true;
}
if (qName.equalsIgnoreCase("role")) {
bRole = true;
}
}
public void characters(char[] ch, int start, int length) {
if (bName) {
System.out.println("Name: " + new String(ch, start, length));
bName = false;
}
if (bRole) {
System.out.println("Role: " + new String(ch, start, length));
bRole = false;
}
}
};
saxParser.parse("employees.xml", handler);
} catch (Exception e) {
e.printStackTrace();
}
}
}
DOM vs SAX: When to Use?
- DOM: Random access, modification of XML, easier API.
- SAX: Faster, memory-efficient, better for large files.
Advanced I/O with NIO.2
- Use Files.newBufferedReader() for XML input streams.
- Monitor XML configuration changes with WatchService.
- Lock files during modifications with FileChannel.lock().
Performance & Best Practices
- Use DOM for small/medium XMLs requiring modification.
- Use SAX for large files or sequential reads.
- Explicitly handle UTF-8 encoding.
- Always use try-with-resources for streams.
- For very large XMLs, consider StAX (Streaming API for XML).
Framework Case Studies
- Spring:
applicationContext.xml
configuration files. - Hibernate:
hibernate.cfg.xml
for ORM configuration. - SOAP Web Services: XML-based communication.
- Log4j: XML configuration files.
- Microservices: Legacy systems often rely on XML APIs.
Real-World Scenarios
- Config Management: Application settings in XML.
- Data Interchange: Import/export utilities.
- Log Analysis: Structured XML logs.
- Legacy Systems: Interfacing with XML-heavy services.
- ETL Pipelines: XML as intermediate format.
📌 What's New in Java Versions?
- Java 7+: NIO.2 simplified file handling (
Files
,Path
). - Java 8: Streams API processes XML lines with lambdas.
- Java 11:
Files.readString
,Files.writeString
for XML. - Java 17: Performance improvements in NIO.
- Java 21: Virtual threads enhance XML-heavy tasks.
Conclusion & Key Takeaways
XML remains a key data format in many enterprise systems. With DOM and SAX parsers, Java developers can handle XML efficiently for both small and large-scale applications.
Key Takeaways:
- Use DOM for editable XML structures.
- Use SAX for large files and streaming.
- Combine with NIO.2 for modern file handling.
- Always consider encoding and performance requirements.
FAQ
Q1. What’s the difference between DOM and SAX parsers?
A: DOM loads entire XML in memory; SAX processes sequentially with events.
Q2. Which parser is faster for large XML files?
A: SAX, as it doesn’t load the entire file.
Q3. Can I modify XML with SAX parser?
A: No, SAX is read-only; use DOM or StAX.
Q4. What is StAX parser?
A: Streaming API for XML — combines pull parsing with efficiency.
Q5. How do I handle XML namespaces?
A: Enable namespace awareness in DocumentBuilderFactory
or SAXParserFactory
.
Q6. Can I pretty-print XML output?
A: Yes, configure the Transformer
with indentation properties.
Q7. Is DOM thread-safe?
A: No, synchronization is needed for multi-threaded access.
Q8. How do frameworks like Spring use XML?
A: For configuration (applicationContext.xml
) and bean definitions.
Q9. What happens if XML is malformed?
A: Parsers throw SAXParseException
or similar errors.
Q10. Should I use XML or JSON?
A: JSON is preferred for modern apps, but XML is still critical for legacy/enterprise systems.