{"id":1150,"date":"2011-08-10T01:01:35","date_gmt":"2011-08-09T23:01:35","guid":{"rendered":"http:\/\/raftaman.net\/?p=1150"},"modified":"2021-05-15T11:46:39","modified_gmt":"2021-05-15T09:46:39","slug":"specifying-file-encoding-when-writing-dom-documents","status":"publish","type":"post","link":"https:\/\/possiblelossofprecision.net\/?p=1150","title":{"rendered":"Specifying file encoding when writing dom Documents"},"content":{"rendered":"<p>Assumed, we got a fully parsed <a href=\"http:\/\/download.oracle.com\/javase\/6\/docs\/api\/org\/w3c\/dom\/Document.html\">org.w3c.dom.Document<\/a>:<\/p>\n<pre class=\"brush: java; title: ; notranslate\" title=\"\">\r\nDocument doc;\r\n\/\/parse doc etc...\r\n<\/pre>\n<p>Just using <a href=\"http:\/\/download.oracle.com\/javase\/6\/docs\/api\/org\/w3c\/dom\/ls\/LSSerializer.html\">LSSerializer<\/a>&#8216;s writeToString method without specifying any encoding will result in (rather impractical) UTF-16 encoded xml file per default<\/p>\n<pre class=\"brush: java; highlight: [5]; title: ; notranslate\" title=\"\">\r\nDOMImplementation impl = doc.getImplementation();\r\nDOMImplementationLS implLS = (DOMImplementationLS) impl.getFeature(&quot;LS&quot;, &quot;3.0&quot;);\r\nLSSerializer lsSerializer = implLS.createLSSerializer();\r\nlsSerializer.getDomConfig().setParameter(&quot;format-pretty-print&quot;, true);\r\nString result = ser.writeToString(doc);\r\n<\/pre>\n<p>will output <\/p>\n<pre class=\"brush: xml; title: ; notranslate\" title=\"\">\r\n&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-16&quot;?&gt;\r\n...\r\n<\/pre>\n<p>Unfortunately, specifying an encoding isn&#8217;t trivial. Here are two solutions that don&#8217;t require any third party libraries:<\/p>\n<h2>1. Using <a href=\"http:\/\/download.oracle.com\/javase\/6\/docs\/api\/org\/w3c\/dom\/ls\/LSOutput.html\">org.w3c.dom.ls.LSOutput<\/a><\/h2>\n<pre class=\"brush: java; highlight: [6,7,8,9,10]; title: ; notranslate\" title=\"\">\r\nDOMImplementation impl = doc.getImplementation();\r\nDOMImplementationLS implLS = (DOMImplementationLS) impl.getFeature(&quot;LS&quot;, &quot;3.0&quot;);\r\nLSSerializer lsSerializer = implLS.createLSSerializer();\r\nlsSerializer.getDomConfig().setParameter(&quot;format-pretty-print&quot;, true);\r\n\r\nLSOutput lsOutput = implLS.createLSOutput();\r\nlsOutput.setEncoding(&quot;UTF-8&quot;);\r\nWriter stringWriter = new StringWriter();\r\nlsOutput.setCharacterStream(stringWriter);\r\nlsSerializer.write(doc, lsOutput);\r\n\r\nString result = stringWriter.toString();\r\n<\/pre>\n<h2>2. Using <a href=\"http:\/\/download.oracle.com\/javase\/6\/docs\/api\/javax\/xml\/transform\/Transformer.html\">javax.xml.transform.Transformer<\/a><\/h2>\n<pre class=\"brush: java; title: ; notranslate\" title=\"\">\r\nTransformer transformer = TransformerFactory.newInstance().newTransformer();\r\ntransformer.setOutputProperty(OutputKeys.ENCODING, &quot;UTF-8&quot;);\r\ntransformer.setOutputProperty(OutputKeys.INDENT, &quot;yes&quot;);\r\ntransformer.setOutputProperty(&quot;{http:\/\/xml.apache.org\/xslt}indent-amount&quot;, &quot;2&quot;);\r\nDOMSource source = new DOMSource(doc);\r\nWriter stringWriter = new StringWriter();\r\nStreamResult streamResult = new StreamResult(stringWriter);\r\ntransformer.transform(source, streamResult);        \r\nString result = stringWriter.toString();\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Assumed, we got a fully parsed org.w3c.dom.Document: Document doc; \/\/parse doc etc&#8230; Just using LSSerializer&#8216;s writeToString method without specifying any encoding will result in (rather impractical) UTF-16 encoded xml file per default DOMImplementation impl = doc.getImplementation(); DOMImplementationLS implLS = (DOMImplementationLS) impl.getFeature(&quot;LS&quot;, &quot;3.0&quot;); LSSerializer lsSerializer = implLS.createLSSerializer(); lsSerializer.getDomConfig().setParameter(&quot;format-pretty-print&quot;, true); String result = ser.writeToString(doc); will output &lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-16&quot;?&gt; &#8230; Unfortunately, specifying&#8230; <a href=\"https:\/\/possiblelossofprecision.net\/?p=1150\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[4,21,43],"class_list":["post-1150","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-java","tag-utf8","tag-xml"],"_links":{"self":[{"href":"https:\/\/possiblelossofprecision.net\/index.php?rest_route=\/wp\/v2\/posts\/1150","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/possiblelossofprecision.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/possiblelossofprecision.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/possiblelossofprecision.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/possiblelossofprecision.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1150"}],"version-history":[{"count":19,"href":"https:\/\/possiblelossofprecision.net\/index.php?rest_route=\/wp\/v2\/posts\/1150\/revisions"}],"predecessor-version":[{"id":2671,"href":"https:\/\/possiblelossofprecision.net\/index.php?rest_route=\/wp\/v2\/posts\/1150\/revisions\/2671"}],"wp:attachment":[{"href":"https:\/\/possiblelossofprecision.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1150"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/possiblelossofprecision.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1150"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/possiblelossofprecision.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1150"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}