Specifying file encoding when writing dom Documents

Assumed, we got a fully parsed org.w3c.dom.Document:

Document doc;
//parse doc etc...

Just using LSSerializer‘s writeToString method without specifying any encoding will result in (rather impractical) UTF-16 encoded xml file per default

DOMImplementation impl = doc.getImplementation();
DOMImplementationLS implLS = (DOMImplementationLS) impl.getFeature("LS", "3.0");
LSSerializer lsSerializer = implLS.createLSSerializer();
lsSerializer.getDomConfig().setParameter("format-pretty-print", true);
String result = ser.writeToString(doc);

will output

<?xml version="1.0" encoding="UTF-16"?>
...

Unfortunately, specifying an encoding isn’t trivial. Here are two solutions that don’t require any third party libraries:

1. Using org.w3c.dom.ls.LSOutput

DOMImplementation impl = doc.getImplementation();
DOMImplementationLS implLS = (DOMImplementationLS) impl.getFeature("LS", "3.0");
LSSerializer lsSerializer = implLS.createLSSerializer();
lsSerializer.getDomConfig().setParameter("format-pretty-print", true);

LSOutput lsOutput = implLS.createLSOutput();
lsOutput.setEncoding("UTF-8");
Writer stringWriter = new StringWriter();
lsOutput.setCharacterStream(stringWriter);
lsSerializer.write(doc, lsOutput);

String result = stringWriter.toString();

2. Using javax.xml.transform.Transformer

Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
DOMSource source = new DOMSource(doc);
Writer stringWriter = new StringWriter();
StreamResult streamResult = new StreamResult(stringWriter);
transformer.transform(source, streamResult);        
String result = stringWriter.toString();

7 thoughts on “Specifying file encoding when writing dom Documents

  1. avatarAmine

    Thank you it’s a veery useful post ! it helped me a lot.

    however i have this problem trying both solutions, the result is not well formatted, in other word how to add a new line or force it to insert new line after each ‘>’.
    for example : the current result for both solutions is :

    the desired solution is :

  2. Pingback: Trabajando con XML: Java XML API para la creación de ficheros y Simple framework para mensajes SOAP | Sobre IT y más

Leave a Reply

Your email address will not be published.