Oracle Type Code Mappings

Use case: Imagine that you need to represent type codes for the Oracle data types across PL/SQL, Java and C.

Challenge: Oracle uses different type codes in different contexts. This is fair enough if they need to adhere to standards (eg JDBC), but there’s really no excuse for using different type codes in the documentation, in PL/SQL, in DBMS_TYPES and Oracle Call Interface (OCI). In my view it’s a bit of a mess.

Solution: I’ve spent quite some time mapping between these slightly different type codes and also chosen a “uniform type code” across these. My rule of thumb has been that I use the type code from the Oracle Database 11g Release 2 documentation (Data Types). If the documentation doesn’t provide one, I use the one returned by one of the DBMS_SQL.DESCRIBE_COLUMNS% procedures as this is probably what most Oracle developers have used over time.

I’ve listed the mappings in the table below. If a cell is empty it’s because it’s missing or not supported for that particular type system (eg DUMP doesn’t work with all data types). I’ve underlined where the type codes differ from the Oracle documentation or between DBMS_SQL and the other type systems.

Data Type Uniform
Type
Code
Oracle
Doc.
DBMS_SQL DBMS_TYPES
TYPECODE_%
JDBC
java.sql.Types
OCI DUMP V$SQL_
BIND_
DATA.
DATATYPE
VARCHAR2 1 1 1 9
VARCHAR2
1
VARCHAR
12
VARCHAR
1
SQLT_CHR
1 1
NVARCHAR2 1 1 1 287
NVARCHAR2
12
VARCHAR
1
SQLT_CHR
1 1
NUMBER 2 2 2 NUMBER 2
NUMERIC
2
SQLT_NUM
2 2
FLOAT 2 2 2 2
NUMBER
2
NUMERIC
2
SQLT_NUM
2 2
LONG 8 8 8 -1
LONGVARCHAR
8
SQLT_LNG
8
DATE 12 12 12 12
DATE
93
TIMESTAMP1
12
SQLT_DAT
12 12
BINARY_FLOAT 21 21 100 100
BFLOAT
100 100
SQLT_IBFLOAT
100 100
BINARY_DOUBLE 22 22 101 101
BDOUBLE
101 101
SQLT_IBFLOAT
101 101
TIMESTAMP 180 180 180 187
TIMESTAMP
93
TIMESTAMP1
187
SQLT_TIMESTAMP
180 180
TIMESTAMP
WITH TIME ZONE
181 181 181 188
TIMESTAMP_TZ
-101 188
SQLT_TIMESTAMP_TZ
181 181
TIMESTAMP
WITH LOCAL
TIME ZONE
231 231 231 232
TIMESTAMP_LTZ
-102 232
SQLT_TIMESTAMP_LTZ
231 231
INTERVAL YEAR
TO
MONTH
182 182 182 189
INTERVAL_YM
-103 189
SQLT_INTERVAL_YM
182 182
INTERVAL DAY
TO
SECOND
183 183 183 190
INTERVAL_DS
-104 190
SQLT_INTERVAL_DS
183 183
RAW 23 23 23 95
RAW
-3
VARBINARY
23
SQLT_BIN
23 23
LONG RAW 24 24 24 -4
LONGVARBINARY
24
SQLT_LBI
24
ROWID 69 69 11 -8 104
SQLT_RDD
69 69
UROWID 208 208 208 104
UROWID
-8 104
SQLT_RDD
208 208
CHAR 96 96 96 96
CHAR
1
CHAR
96
SQLT_AFC
96 96
NCHAR 96 96 96 286
NCHAR
1
CHAR
96
SQLT_AFC
96 96
CLOB 112 112 112 112
CLOB
2005
CLOB
112
SQLT_CLOB
112
NCLOB 112 112 112 288
NCLOB
2005
CLOB
112
SQLT_CLOB
112
BLOB 113 113 113 113
BLOB
2004
BLOB
113
SQLT_BLOB
113
BFILE 114 114 114 114
BFILE
-13 114
SQLT_BFILEE
114
XMLTYPE 109 109 58
OPAQUE2
2007 108
SQLT_NTY
58
ANYDATA 109 109 58
OPAQUE2
2007 108
SQLT_NTY
58
ANYDATASET 109 109 58
OPAQUE2
2007 108
SQLT_NTY
58
ANYTYPE 109 109 58
OPAQUE2
2007 108
SQLT_NTY
58
Object type 109 109 108
OBJECT
2002
STRUCT
108
SQLT_NTY
121
VARRAY 109 109 247
VARRAY
2003
ARRAY
108
SQLT_NTY
Nested table 109 109 248
TABLE
2003
ARRAY
108
SQLT_NTY
REF 111 111 110
REF
2006
REF
110
SQLT_REF
111
Strong REF
CURSOR
102 102 -10 116
SQLT_RSET
102
Weak REF
CURSOR
102 102 -10 116
SQLT_RSET
102

1 Probably because in Java a true Date cannot hold a time value.

2 XMLTYPE, ANYDATA, ANYDATASET and ANYTYPE are not “normal object types”. They are all declared like “CREATE TYPE <OWNER>.<OBJECT_TYPE> AS OPAQUE VARYING (*) USING LIBRARY....

The type codes from PL/SQL, Java and C have been verified through creation of a table or SELECT statement using the various data types and description of these through:

  • PL/SQL: DBMS_SQL.DESCRIBE_COLUMNS3 on a DBMS_SQL cursor representing the above mentioned table/SELECT.
  • Java: java.sql.ResultSetMetaData obtained through the java.sql.ResultSet representing the above mentioned table/SELECT.
  • C: OCIParamGet(…, OCI_HTYPE_STMT, …) + OCIAttrGet(…, OCI_ATTR_DATA_TYPE, …) on a statement handle representing the above mentioned table/SELECT.

The type codes have been verified on Oracle Database 11g Release 2 11.2.0.1.0 Personal Edition on Windows 7. When I have more time I’ll create an automated test suite in order to verify my findings on Oracle Database 10g Release 1, Release 2 and Oracle Database 11g Release 1 and also verify on other platforms like Linux and Solaris.

I welcome your feedback and suggestions for improvements and corrections.

Comparing XML in Oracle

Introduction

This blog post is about comparing a specific type of XML documents with the purpose of obtaining details about their differences, within an Oracle database. This specific type is documents representing table data/resultsets conforming to Oracle’s canonical form, ie XML with the following structure (the comments <!-- ... --> are only included for illustration):

<ROWSET>
  <ROW><!-- Row 1 -->
    <COLUMN_NAME_1>column_value_1</COLUMN_NAME_1>
    <COLUMN_NAME_2>column_value_2</COLUMN_NAME_2>
    <!-- ... -->
    <COLUMN_NAME_N>column_value_n</COLUMN_NAME_N>
  </ROW>
  <ROW><!-- Row 2 -->
    <COLUMN_NAME_1>column_value_1</COLUMN_NAME_1>
    <COLUMN_NAME_2>column_value_2</COLUMN_NAME_2>
    <!-- ... -->
    <COLUMN_NAME_N>column_value_n</COLUMN_NAME_N>
  </ROW>
  <!-- ... -->
  <ROW><!-- Row n -->
    <COLUMN_NAME_1>column_value_1</COLUMN_NAME_1>
    <COLUMN_NAME_2>column_value_2</COLUMN_NAME_2>
    <!-- ... -->
    <COLUMN_NAME_N>column_value_n</COLUMN_NAME_N>
   </ROW>
</ROWSET>

So basically there’s an XML element for each column and the name of the element corresponds to the column name. There’s a <ROW> fragment for each row in the resultset and the root element is called <ROWSET> (all XML documents must have at most one root element).

The diffing method I’m going to describe and implement in this blog post is focusing on a limited version of Oracle canonical form, namely it is assumed that:

  • None of the columns in the table are using object types, including collections and XMLTYPE.
  • Object tables are not used.
  • XML element attributes are not used.
  • NULL values are represented by an empty tag, eg <ENAME/> or <ENAME></ENAME>.
  • We regard the document order to be significant — like comparing sets in SQL.

This means that we can assume that all the column values are atomic, single and simple values, which means that the comparison implementation can be simplified.

The primary use case for this XML is comparison (or XML diffing) for automated unit/regression tests that need to verify that resultsets from the Program Under Test (PUT) are identical to a given expected outcome. It’s a convenient way to specify the expected outcome in XML and the PUT resultsets can then be converted to XML and compared. My blog post Converting Between Oracle Data and XML describes how to convert the Oracle data to XML.

The differ should return NULL in case there are no differences and it must support node values larger than 32KB.

The code in this blog post requires Oracle Database 11g Release 1 or newer and has been tested on the following versions (both on Windows XP Service Pack 3 32-bit):

  • Oracle Database 11g Release 1 11.1.0.6.0.
  • Oracle Database 11g Release 2 11.2.0.1.0.

In a future follow up to this blog post I’ll implement a more generic XML differ that will work with Oracle Database 10g Release 1 or newer.

Example

Here’s an example of what I would like to be able to do. Given the following two XML documents:

<ROWSET>
  <ROW>
    <DEPTNO>10</DEPTNO>
    <DNAME>ACCOUNTING</DNAME>
    <LOC>NEW YORK</LOC>
  </ROW>
  <ROW>
    <DEPTNO>20</DEPTNO>
    <DNAME>RESEARCH</DNAME>
    <LOC>DALLAS</LOC>
  </ROW>
  <ROW>
    <DEPTNO>30</DEPTNO>
    <DNAME>SALES</DNAME>
    <LOC>CHICAGO</LOC>
  </ROW>
  <ROW>
    <DEPTNO>40</DEPTNO>
    <DNAME>OPERATIONS</DNAME>
    <LOC>BOSTON</LOC>
  </ROW>
</ROWSET>

and

<ROWSET>
  <ROW>
    <DEPTNO>20</DEPTNO>
    <DNAME>RESEARCH</DNAME>
    <LOC>DALLAS</LOC>
  </ROW>
  <ROW>
    <DEPTNO>30</DEPTNO>
    <DNAME>SALES</DNAME>
    <LOC>CHICAGO</LOC>
  </ROW>
  <ROW>
    <DEPTNO>40</DEPTNO>
    <DNAME>XOPERATIONSETC</DNAME>
    <LOC>NEW YORK</LOC>
  </ROW>
  <ROW>
    <DEPTNO>50</DEPTNO>
    <DNAME>ACCOUNTING</DNAME>
    <LOC>ALBANY</LOC>
  </ROW>
</ROWSET>

I would like an XML differ to pick up the following differences and only those:

  • Department 10 has been removed.
  • Department 50 has been added.
  • Department 40’s name and location have changed.

The XML differ should return this in a structured form, preferably in XML, perhaps something like this:

<?xml version="1.0" encoding="UTF-8"?>
<!-- Differencing export generated by Altova® DiffDog® version 2009 sp1
  Enterprise Edition - for more information please visit www.altova.com -->
<diff_result>
  <diff_info comparison_mode="text_or_xml">
    <source_left name="c:\temp\xml1.xml" uri="file:///c:/temp/xml1.xml"/>
    <source_right name="c:\temp\xml2.xml" uri="file:///c:/temp/xml2.xml"/>
  </diff_info>
  <xml_diff>
    <left_location>
      <parent xpath="/ROWSET"/>
      <position>1</position>
    </left_location>
    <right_location>
      <parent xpath="/ROWSET"/>
      <position>1</position>
    </right_location>
    <left_content>
      <element>
        <ROW>
          <DEPTNO>10</DEPTNO>
          <DNAME>ACCOUNTING</DNAME>
          <LOC>NEW YORK</LOC>
        </ROW>
      </element>
    </left_content>
  </xml_diff>
  <xml_diff>
    <left_location>
      <parent xpath="/ROWSET/ROW[4]/DNAME"/>
      <position>1</position>
    </left_location>
    <right_location>
      <parent xpath="/ROWSET/ROW[3]/DNAME"/>
      <position>1</position>
    </right_location>
    <left_content>
      <element>OPERATIONS</element>
    </left_content>
    <right_content>
      <element>XOPERATIONSETC</element>
    </right_content>
  </xml_diff>
  <xml_diff>
    <left_location>
      <parent xpath="/ROWSET/ROW[4]/LOC"/>
      <position>1</position>
    </left_location>
    <right_location>
      <parent xpath="/ROWSET/ROW[3]/LOC"/>
      <position>1</position>
    </right_location>
    <left_content>
      <element>BOSTON</element>
    </left_content>
    <right_content>
      <element>NEW YORK</element>
    </right_content>
  </xml_diff>
  <xml_diff>
    <left_location>
      <parent xpath="/ROWSET"/>
      <position>5</position>
    </left_location>
    <right_location>
      <parent xpath="/ROWSET"/>
      <position>4</position>
    </right_location>
    <right_content>
      <element>
        <ROW>
          <DEPTNO>50</DEPTNO>
          <DNAME>ACCOUNTING</DNAME>
          <LOC>ALBANY</LOC>
        </ROW>
      </element>
    </right_content>
  </xml_diff>
</diff_result>

which is what Altova DiffDog 2009 returns, or even better:

<DIFFERENCES>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[DEPTNO="10"]]]></XPATH>
    <OPERATION>DELETE</OPERATION>
    <VALUE1><![CDATA[<ROW>
  <DEPTNO>10</DEPTNO>
  <DNAME>ACCOUNTING</DNAME>
  <LOC>NEW YORK</LOC>
</ROW>]]></VALUE1>
  </DIFFERENCE>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[DEPTNO="50"]]]></XPATH>
    <OPERATION>INSERT</OPERATION>
    <VALUE2><![CDATA[<ROW>
  <DEPTNO>50</DEPTNO>
  <DNAME>ACCOUNTING</DNAME>
  <LOC>ALBANY</LOC>
</ROW>]]></VALUE2>
  </DIFFERENCE>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[DEPTNO="40"]/DNAME]]></XPATH>
    <OPERATION>UPDATE</OPERATION>
    <VALUE1><![CDATA[OPERATIONS]]></VALUE1>
    <VALUE2><![CDATA[XOPERATIONSETC]]></VALUE2>
  </DIFFERENCE>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[DEPTNO="40"]/LOC]]></XPATH>
    <OPERATION>UPDATE</OPERATION>
    <VALUE1><![CDATA[BOSTON]]></VALUE1>
    <VALUE2><![CDATA[NEW YORK]]></VALUE2>
  </DIFFERENCE>
</DIFFERENCES>

which provides the same information but in a more compact and useful format.

Alternatives

For XML diffing within the Oracle database we have various alternatives:

  • Open source PL/SQL XML Compare Project: DBI_XML_COMPAREDBI_XML_COMPARE.GET_XML_DIFFERENCES.
  • Commercial software Quest Code Tester for Oracle’s built-in XML differ: QU_XML.GET_XML_DIFFERENCES.
  • Open source Java code that could be loaded into the database as Java stored procedures, eg XMLUnit.
  • Oracle’s XMLDIFF function that was introduced in Oracle Database 11g Release 1.
  • Write your own from scratch.

Let’s look at each of these alternatives in turn.

PL/SQL XML Compare Project

The function returns a collection of records describing the nodes of the two documents that differ. You can then traverse the result to understand the differences and you could also convert this to XML. Let’s look at what DBI_XML_COMPAREDBI_XML_COMPARE.GET_XML_DIFFERENCES returns when converted to XML using the following code snippet:

create or replace function xmlcdata2(
  c in clob
)
return clob as

/**
 * Wraps CLOB value up in a CDATA section. Oracle's built-in XMLCDATA function
 * was not introduced until 10.2, doesn't supporting "chaining" CDATA sections
 * and doesn't support CLOBs.
 * @param   c       CLOB to be wrapped up in a CDATA section. CDATA section
 *                  chaining is used if C contains the sequence ']]>'.
 * @return  CDATA section.
 */

  result clob;

begin
  dbms_lob.createtemporary(result, false, dbms_lob.call);
  dbms_lob.append(result, '<![CDATA[');
  if c is not null then
    -- Append the value. Replace embedded ]]> -- CDATA chaining.
    dbms_lob.append(result, replace(c, ']]>', ']]]]><![CDATA[>'));
  end if;
  dbms_lob.append(result, ']]>');

  return result;
end xmlcdata2;
/

create or replace function xmldiff_dbi_xml_compare(
  xml1 in xmltype,
  xml2 in xmltype
)
return xmltype as

/**
 * Compares two XML documents and returns the differences in XML. Based on
 * DBI_XML_COMPARE.
 * @param   xml1    First XML document.
 * @param   xml2    Second XML document.
 * @return  Document describing differences based on an XML version of what is
 *          return from DBI_XML_COMPARE.GET_XML_DIFFERENCES. NULL if no
 *          differences.
 */

  result clob;
  differences dbi_xml_compare.t_nodes;

begin
  differences := dbi_xml_compare.get_xml_differences(
    xml1,
    xml2,
    p_ignore_whitespace => dbi_xml_compare.gc_whitespace_normalize
  );

  if differences is not null and differences.count > 0 then
    dbms_lob.createtemporary(result, true, dbms_lob.call);
    dbms_lob.append(result, '<dbi_xml_compare>');
  end if;

  for i in nvl(differences.first, 0) .. nvl(differences.last, -1) loop
    dbms_lob.append(result, '<difference>');
    dbms_lob.append(result, '<document_id>');
    dbms_lob.append(result, xmlcdata2(to_char(differences(i).document_id)));
    dbms_lob.append(result, '</document_id>');
    dbms_lob.append(result, '<node_id>');
    dbms_lob.append(result, xmlcdata2(differences(i).node_id));
    dbms_lob.append(result, '</node_id>');
    dbms_lob.append(result, '<node_values>');

    for j in
        nvl(differences(i).node_values.first, 0) ..
        nvl(differences(i).node_values.last, -1) loop
      dbms_lob.append(result, '<node_value id="' || j || '">');
      dbms_lob.append(result, xmlcdata2(differences(i).node_values(j)));
      dbms_lob.append(result, '</node_value>');
    end loop;

    dbms_lob.append(result, '</node_values>');
    dbms_lob.append(
      result,
      '<difference_type>' ||
        case differences(i).difference_type
          when dbi_xml_compare.gc_identical_value then
            'dbi_xml_compare.gc_identical_value'
          when dbi_xml_compare.gc_different_value then
            'dbi_xml_compare.gc_different_value'
          when dbi_xml_compare.gc_no_equivalent_node then
            'dbi_xml_compare.gc_no_equivalent_node'
          when dbi_xml_compare.gc_ambiguous_element then
            'dbi_xml_compare.gc_ambiguous_element'
        end ||
      '</difference_type></difference>'

    );
  end loop;

  if differences is not null and differences.count > 0 then
    dbms_lob.append(result, '</dbi_xml_compare>');
  end if;

  if result is not null then
    return xmltype(result);
  end if;

  return null;
end xmldiff_dbi_xml_compare;
/

we would get the following differences for our example (abbreviated for legibility, a total of 48 differences were found):

<dbi_xml_compare>
  <difference>
    <document_id><![CDATA[1]]></document_id>
    <node_id><![CDATA[/#document/ROWSET{1}/ROW{1}/LOC{NEW YORK}]]></node_id>
    <node_values/>
    <difference_type>dbi_xml_compare.gc_no_equivalent_node</difference_type>
  </difference>
  <difference>
    <document_id><![CDATA[1]]></document_id>
    <node_id><![CDATA[/#document/ROWSET{1}/ROW{4}/LOC{BOSTON}]]></node_id>
    <node_values/>
    <difference_type>dbi_xml_compare.gc_no_equivalent_node</difference_type>
  </difference>
  <!-- snip -->
  <difference>
    <document_id><![CDATA[1]]></document_id>
    <node_id><![CDATA[/#document/ROWSET{1}/ROW{1}/DEPTNO{10}]]></node_id>
    <node_values/>
    <difference_type>dbi_xml_compare.gc_no_equivalent_node</difference_type>
  </difference>
  <difference>
    <document_id><![CDATA[2]]></document_id>
    <node_id><![CDATA[/#document/ROWSET{1}/ROW{4}/DNAME{ACCOUNTING}/#text]]></node_id>
    <node_values>
      <node_value id="1"><![CDATA[ACCOUNTING]]></node_value>
    </node_values>
    <difference_type>dbi_xml_compare.gc_no_equivalent_node</difference_type>
  </difference>
  <difference>
    <document_id><![CDATA[2]]></document_id>
    <node_id><![CDATA[/#document/ROWSET{1}/ROW{4}/LOC{ALBANY}]]></node_id>
    <node_values/>
    <difference_type>dbi_xml_compare.gc_no_equivalent_node</difference_type>
  </difference>
  <!-- snip -->
  <difference>
    <document_id><![CDATA[2]]></document_id>
    <node_id><![CDATA[/#document/ROWSET{1}/ROW{4}/DNAME{ACCOUNTING}]]></node_id>
    <node_values/>
    <difference_type>dbi_xml_compare.gc_no_equivalent_node</difference_type>
  </difference>
</dbi_xml_compare>

which clearly is not what we need — basically the differ got so confused that it couldn’t match anything.

This was tested with DBI_XML_COMPARE version 1.3. DBI_XML_COMPARE does not support element values larger than 32KB.

Quest Code Tester for Oracle

Like DBI_XML_COMPAREDBI_XML_COMPARE.GET_XML_DIFFERENCES, QU_XML.GET_XML_DIFFERENCES returns a collection of records. Using the following snippet to convert this to XML:

create or replace function xmldiff_qu_xml(
  xml1 in xmltype,
  xml2 in xmltype
)
return xmltype as

/**
 * Compares two XML documents and returns the differences in XML. Based on
 * QU_XML.
 * @param   xml1    First XML document.
 * @param   xml2    Second XML document.
 * @return  Document describing differences based on an XML version of what is
 *          return from QU_XML.GET_XML_DIFFERENCES. NULL if no differences.
 */

  nodes qu_xml.t_nodes;
  value varchar2(32767);

  result clob;

begin
  nodes := qu_xml.get_xml_differences(xml1, xml2);

  if nodes.count > 0 then
    dbms_lob.createtemporary(result, true, dbms_lob.call);
    dbms_lob.append(result, '<qu_xml>');
    for i in nvl(nodes.first, 0) .. nvl(nodes.last, -1) loop
      if nodes(i).node_values.count > 0 then
        value := nodes(i).node_values(nodes(i).node_values.first);
      else
        value := '';
      end if;
      dbms_lob.append(result, '<diff_node>');
      dbms_lob.append(result, '<name>');
      dbms_lob.append(result, xmlcdata2(nodes(i).node_id));
      dbms_lob.append(result, '</name>');
      dbms_lob.append(result, '<value>');
      dbms_lob.append(result, xmlcdata2(value));
      dbms_lob.append(result, '</value>');
      dbms_lob.append(result, '<difference>' ||
          case nodes(i).difference_type
            when 1 then
              'Value difference'
            when 2 then
              'Node removed/added'
            when 3 then
              'Ambiguous'
          end ||
          '</difference></diff_node>'
      );
    end loop;
    dbms_lob.append(result, '</qu_xml>');
  end if;

  if result is not null then
    return xmltype(result);
  end if;

  return null;
end xmldiff_qu_xml;
/

we would get the following differences (abbreviated for legibility, a total of 48 differences were found):

<qu_xml>
  <diff_node>
    <name><![CDATA[/#document/ROWSET{1}/ROW{1}/LOC{NEW YORK}]]></name>
    <value><![CDATA[]]></value>
    <difference>Node removed/added</difference>
  </diff_node>
  <diff_node>
    <name><![CDATA[/#document/ROWSET{1}/ROW{4}/LOC{BOSTON}]]></name>
    <value><![CDATA[]]></value>
    <difference>Node removed/added</difference>
  </diff_node>
  <!-- snip -->
  <diff_node>
    <name><![CDATA[/#document/ROWSET{1}/ROW{4}/DNAME{ACCOUNTING}]]></name>
    <value><![CDATA[]]></value>
    <difference>Node removed/added</difference>
  </diff_node>
</qu_xml>

This was tested with Quest Code Tester for Oracle 1.9.1.505. QU_XML does not support element values larger than 32KB.

Open Source Java Code

It would be convenient if we could find an open source Java project that meets our requirements. If one was found, we could load it into the database using loadjava, assuming that the JDK used by the Java code is compatible with the JDK version of our Oracle database. One such open source project to investigate is XMLUnit, which is an extension to JUnit — the Java unit testing framework.

Here’s some Java code that uses XMLUnit for comparison of our example (setup of the XML documents has been left out):

package com.appatra.blog;

import com.topologi.diffx.DiffXException;

import java.io.IOException;

import java.util.List;

import javax.xml.transform.TransformerConfigurationException;

import org.custommonkey.xmlunit.DetailedDiff;
import org.custommonkey.xmlunit.Difference;
import org.custommonkey.xmlunit.XMLTestCase;

import org.jdom.JDOMException;

import org.xml.sax.SAXException;

/**
 * Test of XMLUnit's capabilities of XML comparison.
 */

public class XmlDiffXmlUnit extends XMLTestCase {
  public XmlDiffXmlUnit() {
    super("XmlDiffXmlUnit");
  }

  private void test()
  throws SAXException, IOException {
    String xml1 = /* Left out... */;
    String xml2 = /* Left out... */;

    DetailedDiff myDiff = new DetailedDiff(compareXML(xml2, xml1));
    List<Difference> allDifferences = (List<Difference>)myDiff.getAllDifferences();

    for (Difference diff : allDifferences) {
      System.err.println(diff);
    }
  }

  public static void main(String[] args)
  throws SAXException, IOException {
    XmlDiffXmlUnit xmlDiffXmlUnit = new XmlDiffXmlUnit();

    xmlDiffXmlUnit.test();
  }
}

This produces the following output (12 differences, wrapped for improved legibility):

Expected text value '20' but was '10' -
  comparing <DEPTNO ...>20</DEPTNO> at
  /ROWSET[1]/ROW[1]/DEPTNO[1]/text()[1] to
  <DEPTNO ...>10</DEPTNO> at /ROWSET[1]/ROW[1]/DEPTNO[1]/text()[1]
Expected text value 'RESEARCH' but was 'ACCOUNTING' -
  comparing <DNAME ...>RESEARCH</DNAME> at
  /ROWSET[1]/ROW[1]/DNAME[1]/text()[1] to
  <DNAME ...>ACCOUNTING</DNAME> at /ROWSET[1]/ROW[1]/DNAME[1]/text()[1]
Expected text value 'DALLAS' but was 'NEW YORK' -
  comparing <LOC ...>DALLAS</LOC> at
  /ROWSET[1]/ROW[1]/LOC[1]/text()[1] to
  <LOC ...>NEW YORK</LOC> at /ROWSET[1]/ROW[1]/LOC[1]/text()[1]
Expected text value '30' but was '20' -
  comparing <DEPTNO ...>30</DEPTNO> at
  /ROWSET[1]/ROW[2]/DEPTNO[1]/text()[1] to
  <DEPTNO ...>20</DEPTNO> at /ROWSET[1]/ROW[2]/DEPTNO[1]/text()[1]
Expected text value 'SALES' but was 'RESEARCH' -
  comparing <DNAME ...>SALES</DNAME> at
  /ROWSET[1]/ROW[2]/DNAME[1]/text()[1] to
  <DNAME ...>RESEARCH</DNAME> at /ROWSET[1]/ROW[2]/DNAME[1]/text()[1]
Expected text value 'CHICAGO' but was 'DALLAS' -
  comparing <LOC ...>CHICAGO</LOC> at
  /ROWSET[1]/ROW[2]/LOC[1]/text()[1] to
  <LOC ...>DALLAS</LOC> at /ROWSET[1]/ROW[2]/LOC[1]/text()[1]
Expected text value '40' but was '30' -
  comparing <DEPTNO ...>40</DEPTNO> at
  /ROWSET[1]/ROW[3]/DEPTNO[1]/text()[1] to
  <DEPTNO ...>30</DEPTNO> at /ROWSET[1]/ROW[3]/DEPTNO[1]/text()[1]
Expected text value 'XOPERATIONSETC' but was 'SALES' -
  comparing <DNAME ...>XOPERATIONSETC</DNAME> at
  /ROWSET[1]/ROW[3]/DNAME[1]/text()[1] to
  <DNAME ...>SALES</DNAME> at /ROWSET[1]/ROW[3]/DNAME[1]/text()[1]
Expected text value 'NEW YORK' but was 'CHICAGO' -
  comparing <LOC ...>NEW YORK</LOC> at
  /ROWSET[1]/ROW[3]/LOC[1]/text()[1] to
  <LOC ...>CHICAGO</LOC> at /ROWSET[1]/ROW[3]/LOC[1]/text()[1]
Expected text value '50' but was '40' -
  comparing <DEPTNO ...>50</DEPTNO> at
  /ROWSET[1]/ROW[4]/DEPTNO[1]/text()[1] to
  <DEPTNO ...>40</DEPTNO> at /ROWSET[1]/ROW[4]/DEPTNO[1]/text()[1]
Expected text value 'ACCOUNTING' but was 'OPERATIONS' -
  comparing <DNAME ...>ACCOUNTING</DNAME> at
  /ROWSET[1]/ROW[4]/DNAME[1]/text()[1] to
  <DNAME ...>OPERATIONS</DNAME> at /ROWSET[1]/ROW[4]/DNAME[1]/text()[1]
Expected text value 'ALBANY' but was 'BOSTON' -
  comparing <LOC ...>ALBANY</LOC> at
  /ROWSET[1]/ROW[4]/LOC[1]/text()[1] to
  <LOC ...>BOSTON</LOC> at /ROWSET[1]/ROW[4]/LOC[1]/text()[

This was tested with XMLUnit version 1.3.

XMLDIFF

Oracle introduced a new function XMLDIFF in Oracle Database 11g Release 1. This function generates an XML document with the differences conforming to an Xdiff schema. Let’s look at how this function works on our example:

create or replace function xmldiff_xmldiff(
  xml1 in xmltype,
  xml2 in xmltype
)
return xmltype as

/**
 * Compares two XML documents and returns the differences in XML. Based on
 * XMLDIFF, hence only supported with Oracle Database 11g Release 1 and newer.
 * @param   xml1    First XML document.
 * @param   xml2    Second XML document.
 * @return  Document describing differences based on an XML version of what is
 *          return from QU_XML.GET_XML_DIFFERENCES. NULL if no differences.
 */

  diff xmltype;

begin
  select xmldiff(xml1, xml2)
  into   diff
  from   dual;
  if diff is not null then
    if diff.extract('/*/*') is not null then
      return diff;
    end if;
  end if;

  return null;
end xmldiff_xmldiff;
/

which returns the following based on our example (once again, 12 differences):

<xd:xdiff
    xsi:schemaLocation="http://xmlns.oracle.com/xdb/xdiff.xsd
    http://xmlns.oracle.com/xdb/xdiff.xsd"
    xmlns:xd="http://xmlns.oracle.com/xdb/xdiff.xsd"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <?oracle-xmldiff operations-in-docorder="true"
    output-model="snapshot"
    diff-algorithm="global"?>
  <xd:update-node xd:node-type="text"
      xd:xpath="/ROWSET[1]/ROW[1]/DEPTNO[1]/text()[1]">
    <xd:content>20</xd:content>
  </xd:update-node>
  <xd:update-node xd:node-type="text"
      xd:xpath="/ROWSET[1]/ROW[1]/DNAME[1]/text()[1]">
    <xd:content>RESEARCH</xd:content>
  </xd:update-node>
  <xd:update-node xd:node-type="text"
      xd:xpath="/ROWSET[1]/ROW[1]/LOC[1]/text()[1]">
    <xd:content>DALLAS</xd:content>
  </xd:update-node>
  <xd:update-node xd:node-type="text"
      xd:xpath="/ROWSET[1]/ROW[2]/DEPTNO[1]/text()[1]">
    <xd:content>30</xd:content>
  </xd:update-node>
  <xd:update-node xd:node-type="text"
      xd:xpath="/ROWSET[1]/ROW[2]/DNAME[1]/text()[1]">
    <xd:content>SALES</xd:content>
  </xd:update-node>
  <xd:update-node xd:node-type="text"
      xd:xpath="/ROWSET[1]/ROW[2]/LOC[1]/text()[1]">
    <xd:content>CHICAGO</xd:content>
  </xd:update-node>
  <xd:update-node xd:node-type="text"
      xd:xpath="/ROWSET[1]/ROW[3]/DEPTNO[1]/text()[1]">
    <xd:content>40</xd:content>
  </xd:update-node>
  <xd:update-node xd:node-type="text"
      xd:xpath="/ROWSET[1]/ROW[3]/DNAME[1]/text()[1]">
    <xd:content>XOPERATIONSETC</xd:content>
  </xd:update-node>
  <xd:update-node xd:node-type="text"
      xd:xpath="/ROWSET[1]/ROW[3]/LOC[1]/text()[1]">
    <xd:content>NEW YORK</xd:content>
  </xd:update-node>
  <xd:update-node xd:node-type="text"
      xd:xpath="/ROWSET[1]/ROW[4]/DEPTNO[1]/text()[1]">
    <xd:content>50</xd:content>
  </xd:update-node>
  <xd:update-node xd:node-type="text"
      xd:xpath="/ROWSET[1]/ROW[4]/DNAME[1]/text()[1]">
    <xd:content>ACCOUNTING</xd:content>
  </xd:update-node>
  <xd:update-node xd:node-type="text"
      xd:xpath="/ROWSET[1]/ROW[4]/LOC[1]/text()[1]">
    <xd:content>ALBANY</xd:content>
  </xd:update-node>
</xd:xdiff>

Common Issue

A common problem with these alternatives is that none of these differs has a concept of a primary key for an XML fragment. In our example, the XML is Oracle’s canoncial form representing the data in the SCOTT.DEPT table. The primary key of this table is the DEPTNO column. If the differs had known this or had spent time analyzing the XML and detected it, they would have produced a much more useful output, instead of getting very confused and thinking that everything has changed.

The last one of the alternatives was to write your own XML differ from scratch and I’ll come back to this in a later blog post. For now, let’s look at how we can improve on XMLDIFF by writing a wrapper around it that detects primary key usage and internally uses XMLDIFF to diff the fragments for each primary key value found.

XMLDIFF2

The algorithm in our extended XMLDIFF function that we’re going to call XMLDIFF2 is:

  1. Prepare CLOB variable for constructing the difference XML.
  2. Find minimum number of leading elements that form primary key values (unique lookup).
  3. Look for primary key values in XML1 not present in XML2: These primary key values have been deleted.
  4. Look for primary key values in XML2 not present in XML1: These primary key values have been inserted.
  5. Diff all common primary key values, ie in both XML1 and XML2.
  6. Return NULL if no differences found.

The implementation uses the XMLCDATA2 function that I’ve listed earlier. Here is the listing of XMLDIFF2:

create or replace function xmldiff2(
  xml1 in xmltype,
  xml2 in xmltype
)
return xmltype as

/**
 * Extension to Oracle's XMLDIFF function, available with Oracle 11.1 and newer.
 * This version is to be used primarily with simple XML documents in Oracles
 * "canonical form", ie:
 * <ROWSET>
 *   <ROW>
 *     <COLUMN_NAME_1>column_value_1</COLUMN_NAME_1>
 *     <COLUMN_NAME_2>column_value_2</COLUMN_NAME_2>
 *     ...
 *     <COLUMN_NAME_N>column_value_n</COLUMN_NAME_N>
 *   </ROW>
 * </ROWSET>
 * Assumptions: It is assumed that none of the column elements in the XML
 * contain XML, such that the maximum level of XML elements is 3. It is also
 * assumed that datetime XML elements are passed in using XML Schema compliant
 * formats ('yyyy-mm-dd"T"hh24:mi:ss' for DATE, 'yyyy-mm-dd"T"hh24:mi:ss.ff9'
 * for TIMESTAMP and 'yyyy-mm-dd"T"hh24:mi:ss.ff9tzh:tzm' for TIMESTAMP WITH
 * [LOCAL] TIME ZONE).
 * Extensions: This version understands a concept of leading elements at level 3
 * forming a primary key and ignores document element order.
 * Differences are returned in the following form:
 * <DIFFERENCES>
 *   <DIFFERENCE>
 *     <XPATH><![CDATA[<XPATH1>]]></XPATH>
 *     <OPERATION>(INSERT|DELETE|UPDATE)</OPERATION>
 *     <VALUE1><![CDATA[<VALUE_FROM_XML1>]]></VALUE1>
 *     <VALUE2><![CDATA[<VALUE_FROM_XML2>]]></VALUE2>
 *   </DIFFERENCE>
 *   <DIFFERENCE>
 *     <XPATH><![CDATA[<XPATH1>]]></XPATH>
 *     <OPERATION>(INSERT|DELETE|UPDATE)</OPERATION>
 *     <VALUE1><![CDATA[<VALUE_FROM_XML1>]]></VALUE1>
 *     <VALUE2><![CDATA[<VALUE_FROM_XML2>]]></VALUE2>
 *   </DIFFERENCE>
 *   ...
 *   <DIFFERENCE>
 *     <XPATH><![CDATA[<XPATH1>]]></XPATH>
 *     <OPERATION>(INSERT|DELETE|UPDATE)</OPERATION>
 *     <VALUE1><![CDATA[<VALUE_FROM_XML1>]]></VALUE1>
 *     <VALUE2><![CDATA[<VALUE_FROM_XML2>]]></VALUE2>
 *   </DIFFERENCE>
 * </DIFFERENCES>
 * <VALUE1> is only present for DELETE and UPDATE operations. <VALUE2> is only
 * present for INSERT and UPDATE operations.
 * Limitations. Namespace and attributes are not supported. Works only with
 * Oracle 11.1 and newer.
 * @param   xml1    First XML document.
 * @param   xml2    Second XML document.
 * @return  Differences, NULL if none found.
 */

  ---  Xdiff namespace.
  xmldiff_ns constant varchar2(48) :=
      'xmlns:xd="http://xmlns.oracle.com/xdb/xdiff.xsd"';

  --- Element overview, ie name and count.
  type element_overview_t is record (
    name   varchar2(4000),
    occurs integer
  );
  type level2_elements_c is
  table of element_overview_t
  index by binary_integer;

  --- Level 2 element overview for XML1.
  level2_elements1 level2_elements_c;
  --- Level 2 element overview for XML2.
  level2_elements2 level2_elements_c;

  -- Primary key XPath expressions.
  type varchar2_4000_lookup is
  table of boolean
  index by varchar2(4000);
  pkvs varchar2_4000_lookup;

  --- Number of elements required to form primary key.
  pk_element_count integer := null;

  --- Assuming Oracle canonica form: Root element name.
  element1_name constant varchar2(6) := 'ROWSET';
  --- Assuming Oracle canonica form: Element name for each row (level 2).
  element2_name constant varchar2(3) := 'ROW';

  ---
  type pk_element_t is record (
    name  varchar2(4000),
    value varchar2(4000)
  );
  type pk_element_c is
  table of pk_element_t
  index by binary_integer;
  pk_elements pk_element_c;

  -- Extract of XML1.
  xml1e xmltype;
  -- Extract of XML2.
  xml2e xmltype;
  --- Result from SYS.XMLDIFF.
  diff2 xmltype;
  --- Return result.
  result xmltype;

  --- Column value for XML1.
  cv1 clob;
  --- Column value for XML2.
  cv2 clob;
  --- Primary key XPath.
  pk_xpath varchar2(32767);
  --- Primary key XPath for XML1.
  pk_xpath1 varchar2(4000);
  --- Primary key XPath for XML2.
  pk_xpath2 varchar2(4000);
  --- Primary key value.
  pkv varchar2(4000);
  --- Column name.
  cname varchar2(32767);

  -- XPath.
  xpath varchar2(4000);

  --- Difference document.
  diff clob;
  --- XML update fragment.
  xmlu clob;

  procedure diff_nodes;

/**
 * Parses the level 2 elements in given XML document.
 * @param   xml   XML document.
 * @return  Collection of level 2 elements.
 */

  function get_level2_elements(
    xml in xmltype
  )
  return level2_elements_c as

    level2_elements level2_elements_c;

  begin
    select name,
           count(*)
    bulk collect into level2_elements
    from   (
             select x.getrootelement() name
             from   table(xmlsequence(extract(xml, '/*/*'))) x
           )
    group  by name;

    return level2_elements;
  end get_level2_elements;

/**
 * Finds the number of leading elements it takes to form a unique lookup, ie the
 * minimum number of elements that are required for a primary key.
 * @param   xml     XML document.
 * @param   level2_elements
 *                  Collection of level 2 elements.
 * @return  Number of elements in primary key for this document.
 */

  function get_pk_element_count(
    xml in xmltype,
    level2_elements in level2_elements_c
  )
  return pls_integer as

    j pls_integer;
    n pls_integer := 1;
    m pls_integer;
    xpath varchar2(32767);
    pk_count pls_integer;
    done boolean := false;

  begin
    for i in 1 .. level2_elements(1).occurs loop
      -- For each element in the collection.
      j := 1;
      xpath := '/' || element1_name || '/' || element2_name || '[';
      for pk in (
            select value(x) element
            from   table(
                     xmlsequence(
                       extract(xml, '/' || element1_name || '/' ||
                           element2_name || '[' || i || ']/*'
                       )
                     )
                   ) x
          ) loop
        -- For each element for the ith element of the collection.
        pk_elements(j).name := pk.element.getrootelement;
        pk_elements(j).value := pk.element.extract('//text()').getstringval;

        if j > 1 then
          xpath := rtrim(xpath, ']') || ' and ';
        end if;
        xpath := xpath || pk_elements(j).name || '="' ||
            pk_elements(j).value || '"';
        xpath := xpath || ']';

        if j = n then
          select count(*)
          into   pk_count
          from   table(xmlsequence(extract(xml, xpath)));

          done := pk_count = 1 or j = 32;
          exit when done;

          n := n + 1;
        end if;

        j := j + 1;
      end loop;
    end loop;

    return n;
  end get_pk_element_count;

/**
 * Appends a fragment to the differences XML document.
 */

  procedure append_fragment(
    fragment in clob
  ) as

  begin
    dbms_lob.append(diff, fragment);
  end append_fragment;

/**
 * Check that each fragment from first document identified through each primary
 * key can be found in the second document.
 * @param   operation
 *                  Operation to use for difference fragment, ie 'DELETE' if
 *                  it's missing or 'INSERT' if it has been inserted.
 * @param   xml1    First XML document.
 * @param   xml2    Second XML document.
 */

  procedure check_existence(
    operation in varchar2,
    xml1 in xmltype,
    xml2 in xmltype
  ) as

    relative_xpath varchar2(32767);
    pk_exists boolean;
    value_id pls_integer;

  begin
    for collection_elements in (
          select value(x) node
          from   table(
                   xmlsequence(
                     extract(
                       xml1, '/' || element1_name || '/' || element2_name
                     )
                   )
                 ) x
        ) loop
      relative_xpath := '/' || element1_name || '/' || element2_name || '[';
      for i in 1 .. pk_element_count loop
        if i > 1 then
          relative_xpath := relative_xpath || ' and ';
        end if;
        relative_xpath := relative_xpath || pk_elements(i).name || '="' ||
            collection_elements.node.extract(
              '/*/*[' || i || ']/text()'
            ).getstringval || '"';
      end loop;
      relative_xpath := relative_xpath || ']';

      pk_exists := pkvs.exists(relative_xpath);
      if pk_exists or xml2.existsnode(relative_xpath) = 1 then
        pkvs(relative_xpath) := true;
      else
        dbms_lob.createtemporary(xmlu, true, dbms_lob.call);
        dbms_lob.append(
          xmlu,
          '<DIFFERENCE>' ||
            '<XPATH>' || xmlcdata2(relative_xpath) || '</XPATH>' ||
            '<OPERATION>' || operation || '</OPERATION>'
        );
        if operation = 'INSERT' then
          value_id := 2;
        elsif operation = 'DELETE' then
          value_id := 1;
        end if;
        dbms_lob.append(xmlu, '<VALUE' || value_id || '>');
        dbms_lob.append(
          xmlu,
          xmlcdata2(rtrim(xml1.extract(relative_xpath).getclobval, chr(10)))
        );
        dbms_lob.append(xmlu, '</VALUE' || value_id || '>');
        dbms_lob.append(xmlu, '</DIFFERENCE>');
        append_fragment(xmlu);
      end if;
    end loop;
  end check_existence;

/**
 * Diff the fragments found for the primary key values common between XML1 and
 * XML2. This is done by calling Oracle's XMLDIFF function.
 */

  procedure diff_common_pks as

  begin
    pkv := pkvs.first;
    while pkv is not null loop
      xpath := pkv;
      -- Note the extra GETCLOBVAL and XMLTYPE. Seem that Oracle XMLTYPE has a
      -- bug here as XML1E = XML1 if not done!
      xml1e := xmltype(xml1.extract(xpath).getclobval);
      xml2e := xmltype(xml2.extract(xpath).getclobval);

      select xmldiff(xml1e, xml2e)
      into   diff2
      from   dual;

      diff_nodes;

      pkv := pkvs.next(pkv);
    end loop;
  end diff_common_pks;

/**
 * Inspect the diff result from Oracle's XMLDIFF and append to our difference
 * document.
 */

  procedure diff_nodes as

    node_type varchar2(100);
    operation varchar2(6);
    diff_type varchar2(18);
    xpath xmltype;
    content xmltype;

  begin
    for cv in (
          select value(x) v
          from   table(xmlsequence(extract(diff2, '/xd:xdiff/*', xmldiff_ns))) x
        ) loop
      -- For each difference found.
      xpath := cv.v.extract('//@xd:xpath', xmldiff_ns);
      if xpath is null then
        xpath := cv.v.extract('//@xd:parent-xpath', xmldiff_ns);
      end if;
      cname := xpath.getstringval;
      cname := substr(cname, instr(cname, '/', 2) + 1);
      cname := substr(cname, 1, instr(cname, '[') - 1);
      node_type := cv.v.extract('//@xd:node-type', xmldiff_ns).getstringval;

      diff_type := cv.v.getrootelement;
      if diff_type like '%delete-node' then
        -- Node was deleted. Was it an element value or an element?
        cv1 := xml1e.extract('//' || cname || '/text()').getclobval;
        cv2 := '';
        if node_type = 'text' then
          -- Was non-NULL but is now NULL.
          operation := 'UPDATE';
        else
          -- Node has been removed.
          operation := 'DELETE';
        end if;
      elsif diff_type like '%append-node' then
        -- Node has been added. Was it an element value or an element?
        cv1 := '';
        cv2 := xml2e.extract('//' || cname || '/text()').getclobval;
        if node_type = 'text' then
          operation := 'UPDATE';
        else
          -- Node has been inserted.
          operation := 'INSERT';
        end if;
      elsif diff_type like '%insert-node-before' then
        -- Node has been added, find which.
        content := cv.v.extract('/*/xd:content/*', xmldiff_ns);
        cname := content.getrootelement;
        cv1 := '';
        cv2 := content.extract('//text()').getclobval;
        operation := 'INSERT';
      elsif diff_type like '%update-node' then
        -- Node has been updated.
        cv1 := xml1e.extract('//' || cname || '/text()').getclobval;
        cv2 := cv.v.extract('//xd:content/text()', xmldiff_ns).getclobval;
        operation := 'UPDATE';
      end if;

      dbms_lob.createtemporary(xmlu, false, dbms_lob.call);
      dbms_lob.append(
        xmlu,
        '<DIFFERENCE>' ||
          '<XPATH>' || xmlcdata2(pkv || '/' || cname) || '</XPATH>' ||
          '<OPERATION>' || operation || '</OPERATION>'
      );

      if operation in ('DELETE', 'UPDATE') then
        dbms_lob.append(xmlu, '<VALUE1>');
        dbms_lob.append(xmlu, xmlcdata2(cv1));
        dbms_lob.append(xmlu, '</VALUE1>');
      end if;
      if operation in ('INSERT', 'UPDATE') then
        dbms_lob.append(xmlu, '<VALUE2>');
        dbms_lob.append(xmlu, xmlcdata2(cv2));
        dbms_lob.append(xmlu, '</VALUE2>');
      end if;

      dbms_lob.append(xmlu, '</DIFFERENCE>');
      append_fragment(xmlu);
    end loop;
  end diff_nodes;

begin
  pk_xpath1 := substr(pk_xpath, 1, instr(pk_xpath, '/', -1) - 1);
  pk_xpath2 := substr(pk_xpath, length(pk_xpath1) + 2);

  dbms_lob.createtemporary(diff, true, dbms_lob.call);
  dbms_lob.append(diff, '<DIFFERENCES>');
  -- TODO: NULL handling.

  level2_elements1 := get_level2_elements(xml1);
  level2_elements2 := get_level2_elements(xml2);

  -- Find minimum number of leading elements that form PK values.
  pk_element_count := greatest(
    get_pk_element_count(xml1, level2_elements1),
    get_pk_element_count(xml2, level2_elements2)
  );

  -- Look for PKs in XML1 not present in XML2: DELETE fragment.
  check_existence('DELETE', xml1, xml2);

  -- Look for PKs in XML2 not present in XML1: INSERT fragment.
  check_existence('INSERT', xml2, xml1);

  -- Diff all common PKs: UPDATE fragment.
  diff_common_pks;

  dbms_lob.append(diff, '</DIFFERENCES>');

  if length(diff) = 13 + 14 and diff = '<DIFFERENCES></DIFFERENCES>' then
    -- No differences.
    result := null;
  else
    result := xmltype(diff);
  end if;

  return result;
end xmldiff2;
/

Examples

Here are a few examples on how to use XMLDIFF2.

The first example is the example used for comparison of the various alternatives described earlier:

declare
  xml1 xmltype := xmltype('
<ROWSET>
  <ROW>
    <DEPTNO>10</DEPTNO>
    <DNAME>ACCOUNTING</DNAME>
    <LOC>NEW YORK</LOC>
  </ROW>
  <ROW>
    <DEPTNO>20</DEPTNO>
    <DNAME>RESEARCH</DNAME>
    <LOC>DALLAS</LOC>
  </ROW>
  <ROW>
    <DEPTNO>30</DEPTNO>
    <DNAME>SALES</DNAME>
    <LOC>CHICAGO</LOC>
  </ROW>
  <ROW>
    <DEPTNO>40</DEPTNO>
    <DNAME>OPERATIONS</DNAME>
    <LOC>BOSTON</LOC>
  </ROW>
</ROWSET>');
  xml2 xmltype := xmltype('
<ROWSET>
  <ROW>
    <DEPTNO>20</DEPTNO>
    <DNAME>RESEARCH</DNAME>
    <LOC>DALLAS</LOC>
  </ROW>
  <ROW>
    <DEPTNO>30</DEPTNO>
    <DNAME>SALES</DNAME>
    <LOC>CHICAGO</LOC>
  </ROW>
  <ROW>
    <DEPTNO>40</DEPTNO>
    <DNAME>XOPERATIONSETC</DNAME>
    <LOC>NEW YORK</LOC>
  </ROW>
  <ROW>
    <DEPTNO>50</DEPTNO>
    <DNAME>ACCOUNTING</DNAME>
    <LOC>ALBANY</LOC>
  </ROW>
</ROWSET>');
  diff xmltype;
begin
  diff := xmldiff2(xml1, xml2);
  dbms_output.put_line(diff.getclobval(0, 2));
end;
/

which results in:

<DIFFERENCES>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[DEPTNO="10"]]]></XPATH>
    <OPERATION>DELETE</OPERATION>
    <VALUE1><![CDATA[<ROW>
  <DEPTNO>10</DEPTNO>
  <DNAME>ACCOUNTING</DNAME>
  <LOC>NEW YORK</LOC>
</ROW>]]></VALUE1>
  </DIFFERENCE>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[DEPTNO="50"]]]></XPATH>
    <OPERATION>INSERT</OPERATION>
    <VALUE2><![CDATA[<ROW>
  <DEPTNO>50</DEPTNO>
  <DNAME>ACCOUNTING</DNAME>
  <LOC>ALBANY</LOC>
</ROW>]]></VALUE2>
  </DIFFERENCE>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[DEPTNO="40"]/DNAME]]></XPATH>
    <OPERATION>UPDATE</OPERATION>
    <VALUE1><![CDATA[OPERATIONS]]></VALUE1>
    <VALUE2><![CDATA[XOPERATIONSETC]]></VALUE2>
  </DIFFERENCE>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[DEPTNO="40"]/LOC]]></XPATH>
    <OPERATION>UPDATE</OPERATION>
    <VALUE1><![CDATA[BOSTON]]></VALUE1>
    <VALUE2><![CDATA[NEW YORK]]></VALUE2>
  </DIFFERENCE>
</DIFFERENCES>

Another example, this time with just one row in the resultset:

declare
  xml1 xmltype := xmltype('
<ROWSET>
  <ROW>
    <EMPNO>7934</EMPNO>
    <ENAME>MILLER</ENAME>
    <JOB>CLERK</JOB>
    <MGR>7782</MGR>
    <HIREDATE>1982-01-23T00:00:00</HIREDATE>
    <SAL>1300</SAL>
    <COMM/>
    <DEPTNO>10</DEPTNO>
  </ROW>
</ROWSET>');
  xml2 xmltype := xmltype('
<ROWSET>
  <ROW>
    <EMPNO>7934</EMPNO>
    <ENAME>MILLER</ENAME>
    <JOB>ANALYST</JOB>
    <MGR>7782</MGR>
    <HIREDATE>1982-01-23T00:00:00</HIREDATE>
    <SAL>1500</SAL>
    <COMM>100</COMM>
    <DEPTNO>20</DEPTNO>
  </ROW>
</ROWSET>');
  diff xmltype;
begin
  diff := xmldiff2(xml1, xml2);
  dbms_output.put_line(diff.getclobval(0, 2));
end;
/

which results in the following:

<DIFFERENCES>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[EMPNO="7934"]/JOB]]></XPATH>
    <OPERATION>UPDATE</OPERATION>
    <VALUE1><![CDATA[CLERK]]></VALUE1>
    <VALUE2><![CDATA[ANALYST]]></VALUE2>
  </DIFFERENCE>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[EMPNO="7934"]/SAL]]></XPATH>
    <OPERATION>UPDATE</OPERATION>
    <VALUE1><![CDATA[1300]]></VALUE1>
    <VALUE2><![CDATA[1500]]></VALUE2>
  </DIFFERENCE>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[EMPNO="7934"]/COMM]]></XPATH>
    <OPERATION>UPDATE</OPERATION>
    <VALUE1><![CDATA[]]></VALUE1>
    <VALUE2><![CDATA[100]]></VALUE2>
  </DIFFERENCE>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[EMPNO="7934"]/DEPTNO]]></XPATH>
    <OPERATION>UPDATE</OPERATION>
    <VALUE1><![CDATA[10]]></VALUE1>
    <VALUE2><![CDATA[20]]></VALUE2>
  </DIFFERENCE>
</DIFFERENCES>

And a final example, this time an element that is removed and an element that is added:

declare
  xml1 xmltype := xmltype('
<ROWSET>
  <ROW>
    <EMPNO>7934</EMPNO>
    <ENAME>MILLER</ENAME>
    <JOB>CLERK</JOB>
    <MGR>7782</MGR>
    <HIREDATE>1982-01-23T00:00:00</HIREDATE>
    <REMOVED>abc</REMOVED>
    <SAL>1300</SAL>
    <COMM/>
    <DEPTNO>10</DEPTNO>
  </ROW>
</ROWSET>');
  xml2 xmltype := xmltype('
<ROWSET>
  <ROW>
    <EMPNO>7934</EMPNO>
    <ENAME>MILLER</ENAME>
    <JOB>CLERK</JOB>
    <MGR>7782</MGR>
    <ADDED>xyz</ADDED>
    <HIREDATE>1982-01-23T00:00:00</HIREDATE>
    <SAL>1300</SAL>
    <COMM/>
    <DEPTNO>10</DEPTNO>
  </ROW>
</ROWSET>');
  diff xmltype;
begin
  diff := xmldiff2(xml1, xml2);
  dbms_output.put_line(diff.getclobval(0, 2));
end;
/

which gives the following result:

<DIFFERENCES>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[EMPNO="7934"]/ADDED]]></XPATH>
    <OPERATION>INSERT</OPERATION>
    <VALUE2><![CDATA[xyz]]></VALUE2>
  </DIFFERENCE>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[EMPNO="7934"]/REMOVED]]></XPATH>
    <OPERATION>DELETE</OPERATION>
    <VALUE1><![CDATA[abc]]></VALUE1>
  </DIFFERENCE>
</DIFFERENCES>

Conclusion

I’ve described different options for comparing XML documents in the Oracle database and implemented an enhanced version of Oracle’s XMLDIFF that can be used to compare SQL datasets for relational tables not using object types, when represented on Oracle canonical form.

References

Converting Between Oracle Data and XML

Introduction

This blog post describes various mechanisms for converting between Oracle data and XML, and vice versa, focusing on the former. Many blog posts have been written on this subject. Most are very short and only demonstrate the usage but not the associated issues and I’ll try to dig a little deeper than most others and also mention the issues I’ve encountered or are aware of using these techniques.

I find combining Oracle data and XML useful for the following use cases:

  • Extraction, Transformation and Load (ETL) for a datawarehouse or other types of transfers: Data is on a structured form with widespread tool (XML editors), programming language and standard support (eg XSLT, JAX, DOM etc) and is easier to transform in the process.
  • Easier to manipulate in a decent XML editor with expand/collapse (hiding irrelevant data sets) and XPath functionality (used for extracting fragments of the XML document).
  • Comparison: Comparing data sets for equality, eg in automated unit or regression testing. It’s easier to write a good, generic XML differ than a good, generic SQL result set differ. My next blog post will cover this.

The examples in this blog post assume that you are logged on as SCOTT and that you work in a SQL*Plus environment with the following settings and data in DEPT and EMP tables intact from installation:

set serveroutput on format truncated
set long 100000
set pagesize 50000
set linesize 1000

alter session set nls_date_format = 'dd.mm.yyyy hh24:mi:ss';
alter session set nls_timestamp_format = 'dd.mm.yyyy hh24:mi:ss.ff';
alter session set nls_timestamp_tz_format = 'dd.mm.yyyy hh24:mi:ss.ff tzh:tzm';

Oracle Functionality

In the following sections I’ll give an overview of the functionality Oracle provides in terms of converting between Oracle data and XML. Most of this functionality works on what Oracle calls canonical form, which is XML with the following structure (the comments <!-- ... --> are only included for illustration):

<ROWSET>
 <ROW><!-- Row 1 -->
  <COLUMN_NAME_1>column_value_1</COLUMN_NAME_1>
  <COLUMN_NAME_2>column_value_2</COLUMN_NAME_2>
  <!-- ... -->
  <COLUMN_NAME_N>column_value_n</COLUMN_NAME_N>
 </ROW>
 <ROW><!-- Row 2 -->
 <COLUMN_NAME_1>column_value_1</COLUMN_NAME_1>
  <COLUMN_NAME_2>column_value_2</COLUMN_NAME_2>
  <!-- ... -->
  <COLUMN_NAME_N>column_value_n</COLUMN_NAME_N>
 </ROW>
 <!-- ... -->
 <ROW><!-- Row n -->
  <COLUMN_NAME_1>column_value_1</COLUMN_NAME_1>
  <COLUMN_NAME_2>column_value_2</COLUMN_NAME_2>
  <!-- ... -->
  <COLUMN_NAME_N>column_value_n</COLUMN_NAME_N>
 </ROW>
</ROWSET>

So basically there’s an XML element for each column and the name of the element corresponds to the column name. There’s a <ROW> fragment for each row in the query and the root element is called <ROWSET> (all XML documents must have at most one root element).

In this blog post I’ll focus on DBMS_XMLGEN and XMLTYPE.

DBMS_XMLQUERY

DBMS_XMLQUERY provides functionality for obtaining XML data based on a dynamic SQL SELECT statement, optionally using bind parameters (all of datatype VARCHAR2). It’s possible to control names of tags generated and you can also control the date/time format (same format for all) but unfortunately ownly down to milliseconds as internally, Java’s java.text.SimpleDateFormat is used, which doesn’t support higher precision than milliseconds. Also, it’s inconvenient that it’s not possible to have empty elements with no attributes for NULL values, only either no element or an empty element with an attribute NULL="YES". XML is returned as CLOB.

There are two ways of using DBMS_XMLQUERY:

  • Quick and easy in one call, but less flexible: GETXML that works on a SQLQUERY parameter of type VARCHAR2 or CLOB and returning a CLOB.
  • Fully flexible by a series of calls, all working on a context handle:
    • NEWCONTEXT called with SQLQUERY parameter of type VARCHAR2 or CLOB and returning a context handle of type CTXTYPE. This handle must be used in subsequent calls.
    • One or more calls that set various options controlling the XML generation (the SET% procedures), such as NULL value handling, number of rows to process, bind variable values, tag names, etc.
    • CLOSECONTEXT called with the context handle, which closes the context and frees resources.

DBMS_XMLQUERY was introduced with Oracle 8i Release 3 and it has been superseded by DBMS_XMLGEN. DBMS_XMLQUERY is implemented in Java and hence is not supported with Oracle Database Express Edition.

DBMS_XMLGEN

DBMS_XMLGEN was introduced with Oracle Database 9i Release 1. It supersedes DBMS_XMLQUERY, is implemented in C (hence faster) and is a little more flexible than DBMS_XMLQUERY in some ways so in general you should use DBMS_XMLGEN over DBMS_XMLQUERY. Bind variable support (all of datatype VARCHAR2), XSLT support and pretty printing was introduced in Oracle Database 10g Release 1.

In general, DBMS_XMLGEN mimics the functionality in DBMS_XMLQUERY with subtle differences. One of these is that it’s not possible to pass in a query larger than 32KB, but contrary to DBMS_XMLQUERY you can also pass in a SYS_REFCURSOR, so you can use the OPEN-FOR statement that from Oracle Database 11g Release 1 supports CLOB. It’s also possible to obtain the XML as an XMLTYPE, not only a CLOB.

Here are a few examples:

Selecting all rows and columns from SCOTT.DEPT will produce the following result:

select dbms_xmlgen.getxml('select * from dept order by deptno') xml
from   dual;

XML
---------------------------
<?xml version="1.0"?>
<ROWSET>
 <ROW>
  <DEPTNO>10</DEPTNO>
  <DNAME>ACCOUNTING</DNAME>
  <LOC>NEW YORK</LOC>
 </ROW>
 <ROW>
  <DEPTNO>20</DEPTNO>
  <DNAME>RESEARCH</DNAME>
  <LOC>DALLAS</LOC>
 </ROW>
 <!-- Abbreviated... -->
 <ROW>
  <DEPTNO>40</DEPTNO>
  <DNAME>OPERATIONS</DNAME>
  <LOC>BOSTON</LOC>
 </ROW>
</ROWSET>

Selecting all rows and columns from SCOTT.EMP will produce the following result:

select dbms_xmlgen.getxml('select * from emp order by empno') xml
from   dual;

XML
--------------------------------
<?xml version="1.0"?>
<ROWSET>
 <ROW>
  <EMPNO>7369</EMPNO>
  <ENAME>SMITH</ENAME>
  <JOB>CLERK</JOB>
  <MGR>7902</MGR>
  <HIREDATE>17.12.1980 00:00:00</HIREDATE>
  <SAL>800</SAL>
  <DEPTNO>20</DEPTNO>
 </ROW>
 <ROW>
  <EMPNO>7499</EMPNO>
  <ENAME>ALLEN</ENAME>
  <JOB>SALESMAN</JOB>
  <MGR>7698</MGR>
  <HIREDATE>20.02.1981 00:00:00</HIREDATE>
  <SAL>1600</SAL>
  <COMM>300</COMM>
  <DEPTNO>30</DEPTNO>
 </ROW>
 <!-- Abbreviated... -->
 <ROW>
  <EMPNO>7934</EMPNO>
  <ENAME>MILLER</ENAME>
  <JOB>CLERK</JOB>
  <MGR>7782</MGR>
  <HIREDATE>23.01.1982 00:00:00</HIREDATE>
  <SAL>1300</SAL>
  <DEPTNO>10</DEPTNO>
 </ROW>
</ROWSET>

It’s noted how the date values match the session NLS_DATE_FORMAT 'dd.mm.yyyy hh24:mi:ss'. It is also noted that no elements are returned for column with NULL values, eg for COMM for EMPNO = 7934. If you prefer to always have an element, either with or without an attribute indicating NULL, you can control this with the following type of code, also demonstrating how to work with the context handle:

variable xml clob

declare
  context dbms_xmlgen.ctxtype;
begin
  context := dbms_xmlgen.newcontext('select * from emp where empno = 7934');
  dbms_xmlgen.setnullhandling(context, dbms_xmlgen.empty_tag);
  :xml := dbms_xmlgen.getxml(context);
  dbms_xmlgen.closecontext(context);
end;
/

print :xml

XML
------------------------------------------
<?xml version="1.0"?>
<ROWSET>
 <ROW>
  <EMPNO>7934</EMPNO>
  <ENAME>MILLER</ENAME>
  <JOB>CLERK</JOB>
  <MGR>7782</MGR>
  <HIREDATE>23.01.1982 00:00:00</HIREDATE>
  <SAL>1300</SAL>
  <COMM/>
  <DEPTNO>10</DEPTNO>
 </ROW>
</ROWSET>

Note how the empty element <COMM/> is now included in the XML. Another alternative is:

variable xml clob

declare
  context dbms_xmlgen.ctxtype;
begin
  context := dbms_xmlgen.newcontext('select * from emp where empno = 7934');
  dbms_xmlgen.setnullhandling(context, dbms_xmlgen.null_attr);
  dbms_xmlgen.setindentationwidth(context, 2);
  :xml := dbms_xmlgen.getxml(context);
  dbms_xmlgen.closecontext(context);
end;
/

print :xml

XML
----------------------------------------------------------------
<?xml version="1.0"?>
<ROWSET xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance">
  <ROW>
    <EMPNO>7934</EMPNO>
    <ENAME>MILLER</ENAME>
    <JOB>CLERK</JOB>
    <MGR>7782</MGR>
    <HIREDATE>23.01.1982 00:00:00</HIREDATE>
    <SAL>1300</SAL>
    <COMM xsi:nil = "true"/>
    <DEPTNO>10</DEPTNO>
  </ROW>
</ROWSET>

Note how the attribute xsi:nil is now included with a value of "true". Seen that this uses a namespace xsi a namespace specification has been added to the root element <ROWSET>. Also note that we added a call to control the indentation width for each level, in this case a value of 2 instead of the default 1.

If you work with LONG columns it’s worth knowing that DBMS_XMLGEN truncates values of LONG columns to 32KB. For more information, please refer to Adrian Billington’s blog post mentioned at the end of this blog post. I’m also describing a slightly different technique for converting LONG to CLOB in my blog post Converting a LONG Column to a CLOB on the fly.

XMLELEMENT, XMLFOREST, XMLAGG etc

Oracle provides a collections of functions that you can use to construct the XML based on your relational (optionally object-relational) data. Here’s an example on a hierarchical XML construction with employees under departments:

select xmlelement(
         "DEPARTMENTS",
         xmlagg(
           xmlelement(
             "DEPARTMENT",
             xmlforest(
               d.deptno,
               d.dname,
               (
                 select xmlagg(
                          xmlelement(
                            "EMPLOYEE",
                             xmlforest(
                               e.empno,
                               xmlcdata(e.ename) ename,
                               e.hiredate
                             )
                           )
                        )
                 from   emp e
                 where  e.deptno = d.deptno and
                        e.empno in (7369, 7499, 7934)
               ) employees
             )
           )
         )
       ) x
from   dept d;

(output wrapped and indented for improved readability)

<DEPARTMENTS>
  <DEPARTMENT>
    <DEPTNO>10</DEPTNO>
    <DNAME>ACCOUNTING</DNAME>
    <EMPLOYEES>
      <EMPLOYEE>
        <EMPNO>7934</EMPNO>
        <ENAME><![CDATA[MILLER]]></ENAME>
        <HIREDATE>1982-01-23</HIREDATE>
      </EMPLOYEE>
    </EMPLOYEES>
  </DEPARTMENT>
  <DEPARTMENT>
    <DEPTNO>20</DEPTNO>
    <DNAME>RESEARCH</DNAME>
    <EMPLOYEES>
      <EMPLOYEE>
        <EMPNO>7369</EMPNO>
        <ENAME><![CDATA[SMITH]]></ENAME>
        <HIREDATE>1980-12-17</HIREDATE>
      </EMPLOYEE>
    </EMPLOYEES>
  </DEPARTMENT>
  <DEPARTMENT>
    <DEPTNO>30</DEPTNO>
    <DNAME>SALES</DNAME>
    <EMPLOYEES>
      <EMPLOYEE>
        <EMPNO>7499</EMPNO>
        <ENAME><![CDATA[ALLEN]]></ENAME>
        <HIREDATE>1981-02-20</HIREDATE>
      </EMPLOYEE>
    </EMPLOYEES>
  </DEPARTMENT>
  <DEPARTMENT>
    <DEPTNO>40</DEPTNO>
    <DNAME>OPERATIONS</DNAME>
  </DEPARTMENT>
</DEPARTMENTS>

If you prefer empty tags for NULL values, you can take this approach instead, writing a little more code (it would have been nice if XMLFOREST/XMLELEMENT etc would take options for handling NULL values):

select xmlelement(
         "DEPARTMENTS",
         xmlagg(
           xmlelement(
             "DEPARTMENT",
             xmlforest(
               d.deptno,
               d.dname,
               (
                 select xmlagg(
                          xmlelement(
                            "EMPLOYEE",
                             xmlelement("EMPNO", e.empno),
                             xmlelement("ENAME", xmlcdata(e.ename)),
                             xmlelement("HIREDATE", e.hiredate),
                             xmlelement("COMM", e.comm)
                           )
                        )
                 from   emp e
                 where  e.deptno = d.deptno and
                        e.empno in (7369, 7499, 7934)
               ) employees
             )
           )
         )
       ).getclobval() x
from   dept d;

(output wrapped and indented for improved readability)

<DEPARTMENTS>
  <DEPARTMENT>
    <DEPTNO>10</DEPTNO>
    <DNAME>ACCOUNTING</DNAME>
    <EMPLOYEES>
      <EMPLOYEE>
        <EMPNO>7934</EMPNO>
        <ENAME><![CDATA[MILLER]]></ENAME>
        <HIREDATE>1982-01-23</HIREDATE>
        <COMM></COMM>
      </EMPLOYEE>
    </EMPLOYEES>
  </DEPARTMENT>
  <DEPARTMENT>
    <DEPTNO>20</DEPTNO>
    <DNAME>RESEARCH</DNAME>
    <EMPLOYEES>
      <EMPLOYEE>
        <EMPNO>7369</EMPNO>
        <ENAME><![CDATA[SMITH]]></ENAME>
        <HIREDATE>1980-12-17</HIREDATE>
        <COMM></COMM>
      </EMPLOYEE>
    </EMPLOYEES>
  </DEPARTMENT>
  <DEPARTMENT>
    <DEPTNO>30</DEPTNO>
    <DNAME>SALES</DNAME>
    <EMPLOYEES>
      <EMPLOYEE>
        <EMPNO>7499</EMPNO>
        <ENAME><![CDATA[ALLEN]]></ENAME>
        <HIREDATE>1981-02-20</HIREDATE>
        <COMM>300</COMM>
      </EMPLOYEE>
    </EMPLOYEES>
  </DEPARTMENT>
  <DEPARTMENT>
    <DEPTNO>40</DEPTNO>
    <DNAME>OPERATIONS</DNAME>
  </DEPARTMENT>
</DEPARTMENTS>

It’s worth noting that these functions generate XML with date/time values conforming to XML Schema standards, which is great. This was changed in Oracle Database 10g Release 2 and in earlier releases the NLS settings were used (not so great). In this example we enclosed the values of ENAME in a CDATA section (starting with <![CDATA[ and ending with ]]>), which means that Oracle doesn’t have to escape the special characters that could be found in the ENAME column. The Oracle function XMLCDATA is used for this, but it has several limitations: It doesn’t support values larger than 4000 bytes and it doesn’t handle embedded ]]> characters within the text (that would terminate the CDATA section), which is normally treated by chaining several CDATA sections for the element value, ie by replacing any occurrence of ']]>' with ']]]]><![CDATA[>'.

Please be aware that DATE values with time parts are truncated to the date value only, which is very unfortunate:

select xmlelement(
         "DATETIME",
         xmlforest(
           trunc(sysdate) date_no_time,
           sysdate date_with_time,
           systimestamp timestamp_with_tz,
           localtimestamp timestamp_local_tz
         )
       ) xml
from   dual;

(output wrapped and indented for improved readability)

<DATETIME>
  <DATE_NO_TIME>2011-01-26</DATE_NO_TIME>
  <DATE_WITH_TIME>2011-01-26</DATE_WITH_TIME>
  <TIMESTAMP_WITH_TZ>2011-01-26T10:47:38.026000+01:00</TIMESTAMP_WITH_TZ>
  <TIMESTAMP_LOCAL_TZ>2011-01-26T10:47:38.026000</TIMESTAMP_LOCAL_TZ>
</DATETIME>

You can convert the dates to timestamps, but it’s a little inconvenient and this means you get fractional seconds as well, even though they’ll always be zero:

select xmlelement(
         "DATETIME",
         xmlforest(
           trunc(sysdate) date_no_time,
           cast(sysdate as timestamp) date_with_time,
           systimestamp timestamp_with_tz,
           localtimestamp timestamp_local_tz
         )
       ) xml
from   dual;

(output wrapped and indented for improved readability)

<DATETIME>
  <DATE_NO_TIME>2011-01-26</DATE_NO_TIME>
  <DATE_WITH_TIME>2011-01-26T10:50:32.000000</DATE_WITH_TIME>
  <TIMESTAMP_WITH_TZ>2011-01-26T10:47:38.026000+01:00</TIMESTAMP_WITH_TZ>
  <TIMESTAMP_LOCAL_TZ>2011-01-26T10:47:38.026000</TIMESTAMP_LOCAL_TZ>
</DATETIME>

Personally, I prefer DBMS_XMLGEN to these functions as you need to write more fiddly SQL to get it working (it would be nice if you could work on all columns with a * for instance, but instead you need to list all the columns), but I guess that’s a matter of taste.

XMLTYPE

The XMLTYPE object type has various means of conversions, eg from any object type instance to corresponding XML (CREATEXML static function and constructor taking a parameter of type "<ADT_1>") and vice versa, ie from the XMLTYPE instance to an object type instance (member procedure TOOBJECT).

Here’s an example:

select xmltype(sys.aq$_agent('name', 'address', 1)).getclobval() xml
from   dual;

XML
--------------------------------------------------------------------------------
<AQ_x0024__AGENT><NAME>name</NAME><ADDRESS>address</ADDRESS><PROTOCOL>1</PROTOCO
L></AQ_x0024__AGENT>

This output is not that user-friendly as GETCLOBVAL by default has it’s own idea on how and when to indent the elements. If you’re on Oracle Database 11g Release 1 or newer, there’s an enhanced version of GETCLOBVAL you can use:

select xmltype(sys.aq$_agent('name', 'address', 1)).getclobval(0, 2) xml
from   dual;

XML
----------------------------
<AQ_x0024__AGENT>
  <NAME>name</NAME>
  <ADDRESS>address</ADDRESS>
  <PROTOCOL>1</PROTOCOL>
</AQ_x0024__AGENT>

It may come as a surprise that the root tag isn’t called <AQ$_AGENT> but $ is replaced by _x0024_ because $ is an invalid character for XML element names.

DBMS_XMLSAVE

DBMS_XMLSAVE can be used to INSERT, UPDATE and DELETE data in a table based on XML in Oracle canonical form. It works on a context handle that you obtain, set options, call either INSERTXML, UPDATEXML or DELETEXML and then close the context again.

Example:

select *
from   dept
order  by deptno;

declare
  context dbms_xmlsave.ctxtype;
  row_count integer;
  xml_insert xmltype := xmltype('
<ROWSET>
 <ROW>
  <DEPTNO>50</DEPTNO>
  <DNAME>ACCOUNTING</DNAME>
  <LOC>ALBANY</LOC>
 </ROW>
 <ROW>
  <DEPTNO>60</DEPTNO>
  <DNAME>TO_BE_DELETED</DNAME>
  <LOC>N/A</LOC>
 </ROW>
</ROWSET>
  ');
  xml_update xmltype := xmltype('
<ROWSET>
 <ROW>
  <DEPTNO>40</DEPTNO>
  <DNAME>XOPERATIONSETC</DNAME>
  <LOC>NEW YORK</LOC>
 </ROW>
</ROWSET>
  ');
  xml_delete xmltype := xmltype('
<ROWSET>
 <ROW>
  <DEPTNO>60</DEPTNO>
 </ROW>
</ROWSET>
  ');
begin
  -- Create new context for INSERT.
  context := dbms_xmlsave.newcontext('SCOTT.DEPT');
  -- Insert rows.
  row_count := dbms_xmlsave.insertxml(context, xml_insert.getclobval);
  -- Close the context.
  dbms_xmlsave.closecontext(context);
  dbms_output.put_line(row_count || ' row(s) inserted.');

  -- Create new context for UPDATE.
  context := dbms_xmlsave.newcontext('SCOTT.DEPT');
  -- Set primary key column for UPDATE to work.
  dbms_xmlsave.clearupdatecolumnlist(context);
  dbms_xmlsave.setkeycolumn(context, 'DEPTNO');
  -- UPDATE the document.
  row_count := dbms_xmlsave.updatexml(context, xml_update.getclobval);
  -- Close the context.
  dbms_xmlsave.closecontext(context);
  dbms_output.put_line(row_count || ' row(s) updated.');

  -- Create new context for DELETE.
  context := dbms_xmlsave.newcontext('SCOTT.DEPT');
  -- Set primary key column for DELETE to work.
  dbms_xmlsave.clearupdatecolumnlist(context);
  dbms_xmlsave.setkeycolumn(context, 'DEPTNO');
  -- DELETE the document.
  row_count := dbms_xmlsave.deletexml(context, xml_delete.getclobval);
  -- Close the context.
  dbms_xmlsave.closecontext(context);
  dbms_output.put_line(row_count || ' row(s) deleted.');
end;
/

select *
from   dept
order  by deptno;

rollback;

Running the above would produce the following output:

    DEPTNO DNAME          LOC
---------- -------------- -------------
        10 ACCOUNTING     NEW YORK
        20 RESEARCH       DALLAS
        30 SALES          CHICAGO
        40 OPERATIONS     BOSTON

2 row(s) inserted.
1 row(s) updated.
1 row(s) deleted.

PL/SQL procedure successfully completed.

    DEPTNO DNAME          LOC
---------- -------------- -------------
        10 ACCOUNTING     NEW YORK
        20 RESEARCH       DALLAS
        30 SALES          CHICAGO
        40 XOPERATIONSETC NEW YORK
        50 ACCOUNTING     ALBANY

Rollback complete.

Like DBMS_XMLQUERY, DBMS_XMLSAVE is implemented in Java and hence it’s not supported with Oracle Database Express Edition.

DBMS_XMLSTORE

DBMS_XMLSTORE was introduced with Oracle Database 10g Release 1 and should be used instead of DBMS_XMLSAVE in order to work around limitations in the latter, such as unsupported TIMESTAMP fractional seconds with precisions larger than 3 decimal places and also that the time part of DATE values is mandatory. DBMS_XMLSTORE is written in C and provides better performance than DBMS_XMLSAVE. DBMS_XMLSTORE uses NLS settings for date/time values so you must be careful to set these for XML Schema compliance as described elsewhere in this blog post.

Example: As the example for DBMS_XMLSAVE, simply replace DBMS_XMLSAVE with DBMS_XMLSTORE.

DBMS_METADATA

DBMS_METADATA can be used to obtain DDL for your database objects. This can either be returned as DDL or XML. Example:

select dbms_metadata.get_xml('TABLE', 'DEPT') xml
from   dual;

(Output abbreviated for legibility)

DBMS_METADATA.GET_XML('TABLE','DEPT')
------------------------------------------------------------
<?xml version="1.0"?>
<ROWSET>
 <ROW>
  <TABLE_T>
   <VERS_MAJOR>1</VERS_MAJOR>
   <VERS_MINOR>3 </VERS_MINOR>
   <OBJ_NUM>73194</OBJ_NUM>
   <SCHEMA_OBJ>
    <OBJ_NUM>73194</OBJ_NUM>
    <DATAOBJ_NUM>73194</DATAOBJ_NUM>
    <OWNER_NUM>84</OWNER_NUM>
    <OWNER_NAME>SCOTT</OWNER_NAME>
    <NAME>DEPT</NAME>
    <NAMESPACE>1</NAMESPACE>
    <!-- Snip -->
   </SCHEMA_OBJ>
   <STORAGE>
    <FILE_NUM>4</FILE_NUM>
    <BLOCK_NUM>130</BLOCK_NUM>
    <TYPE_NUM>5</TYPE_NUM>
    <TS_NUM>4</TS_NUM>
    <BLOCKS>8</BLOCKS>
    <EXTENTS>1</EXTENTS>
    <INIEXTS>8</INIEXTS>
    <MINEXTS>1</MINEXTS>
    <MAXEXTS>2147483645</MAXEXTS>
    <EXTSIZE>128</EXTSIZE>
    <!-- Snip -->
   </STORAGE>
   <COL_LIST>
    <COL_LIST_ITEM>
     <OBJ_NUM>73194</OBJ_NUM>
     <COL_NUM>1</COL_NUM>
     <INTCOL_NUM>1</INTCOL_NUM>
     <SEGCOL_NUM>1</SEGCOL_NUM>
     <PROPERTY>0</PROPERTY>
     <NAME>DEPTNO</NAME>
     <TYPE_NUM>2</TYPE_NUM>
     <LENGTH>22</LENGTH>
     <PRECISION_NUM>2</PRECISION_NUM>
     <SCALE>0</SCALE>
     <NOT_NULL>1</NOT_NULL>
     <!-- Snip -->
    </COL_LIST_ITEM>
   </COL_LIST>
  </TABLE_T>
 </ROW>
</ROWSET>

XSQL

Oracle XSQL pages publishing framework is an extensible platform for publishing XML in multiple formats. It can also be used to call PL/SQL APIs, INSERT/UPDATE/DELETE data based on one (unfortunately only one) posted document. It has been available since Oracle Database 8i Release 3 and is accessible through either a Java Servlet hosted on your application server, the command line (xsql.bat batch file/xsql shell script) or your own Java program. XSQL is not available from inside the database, but you can load the oraclexsql.jar into the database with loadjava and write your own PL/SQL wrapper for it. Example of how to insert two rows into DEPT and query all the rows afterwards:
Contents of dept.xsql:

<page connection="scott" xmlns:xsql="urn:oracle-xsql">
  <xsql:insert-request table="dept"/>
  <xsql:query>
    select *
    from   dept
  </xsql:query>
</page>

Contents of dept.xml:

<ROWSET>
 <ROW>
  <DEPTNO>-1</DEPTNO>
  <DNAME>Dept A</DNAME>
  <LOC>Loc A</LOC>
 </ROW>
 <ROW>
  <DEPTNO>-2</DEPTNO>
  <DNAME>Dept B</DNAME>
  <LOC>Loc B</LOC>
 </ROW>
</ROWSET>

Contents of dept_xsql.bat (adapt this, potentially %ORACLE_HOME%\bin\xsql.bat and XSQLConfig.xml to your environment):

@setlocal
rem %ORACLE_HOME%\xdk\admin\XSQLConfig.xml must be configured with a connection called "scott".
rem Set all CP in C:\Oracle\O112\BIN\xsql.bat
set ORACLE_HOME=c:\oracle\o112
set JAVA_HOME=%ORACLE_HOME%\jdk
set CP=%ORACLE_HOME%\RDBMS\jlib\xdb.jar
xsql.bat dept.xsql posted-xml=dept.xml
@endlocal

After invoking dept_xsql.bat the result is (output wrapped and indented for improved legibility):

<page>
 <xsql-status action="xsql:insert-request" rows="2"/>
 <ROWSET>
  <ROW num="1">
   <DEPTNO>-1</DEPTNO>
   <DNAME>Dept A</DNAME>
   <LOC>Loc A</LOC>
  </ROW>
  <ROW num="2">
   <DEPTNO>-2</DEPTNO>
   <DNAME>Dept B</DNAME>
   <LOC>Loc B</LOC>
  </ROW>
  <ROW num="3">
   <DEPTNO>10</DEPTNO>
   <DNAME>ACCOUNTING</DNAME>
   <LOC>NEW YORK</LOC>
  </ROW>
  <ROW num="4">
   <DEPTNO>20</DEPTNO>
   <DNAME>RESEARCH</DNAME>
   <LOC>DALLAS</LOC>
  </ROW>
  <ROW num="5">
   <DEPTNO>30</DEPTNO>
   <DNAME>SALES</DNAME>
   <LOC>CHICAGO</LOC>
  </ROW>
  <ROW num="6">
   <DEPTNO>40</DEPTNO>
   <DNAME>OPERATIONS</DNAME>
   <LOC>BOSTON</LOC>
  </ROW>
 </ROWSET>
</page>

XML Schema Compliance

In order to be compliant with XML Schema, all DATE and TIMESTAMP values must adhere to the following Oracle formats:

  • DATE: 'yyyy-mm-dd"T"hh24:mi:ss' (for time part as well, note the T between date and time part) or 'yyyy-mm-dd' without time part.
  • TIMESTAMP: 'yyyy-mm-dd"T"hh24:mi:ss.ff9' or 'yyyy-mm-dd"T"hh24:mi:ss.ff'.
  • TIMESTAMP WITH [LOCAL] TIME ZONE: 'yyyy-mm-dd"T"hh24:mi:ss.ff9tzh:tzm' or 'yyyy-mm-dd"T"hh24:mi:ss.fftzh:tzm'.

Unfortunately, Oracle’s DBMS_XMLGEN and XMLTYPE (plus XMLELEMENT, XMLFOREST etc prior to Oracle Database 10g Release 2) do not adhere to the XML Schema standard and instead by default they generate the values according to the current NLS settings. Your session’s NLS_DATE_FORMAT, NLS_TIMESTAMP_FORMAT and NLS_TIMESTAMP_TZ_FORMAT must be set to one of the values mentioned above in order to work around this.

XML Character Entities

The default behavior is that characters that have a special meaning in XML are converted to their corresponding XML character entities as it would otherwise be invalid XML. Such characters are <, >, " and ', which are converted to &lt;, &gt;, &quot; and &apos; respectively:

select dbms_xmlgen.getxml(q'[select '<>"''' test from dual]') xml
from   dual;

XML
-----------------------------------
<?xml version="1.0"?>
<ROWSET>
 <ROW>
  <TEST>&lt;&gt;&quot;&apos;</TEST>
 </ROW>
</ROWSET>

By enclosing the text in CDATA sections you avoid this required escaping as previously mentioned.

Encoding

Encoding of an XML document can be specified in the document prolog, such as:

<?xml version="1.0" encoding="ISO-8859-1"?>
<blogs>
  <blog>
    <url>ellebaek.wordpress.com</url>
    <author>Finn Ellebæk Nielsen</author>
  </blog>
</blogs>

If you omit the encoding, UTF-8 is assumed by default by most XML parsers.

Code

I’ve written a PL/SQL package in order to demonstrate how to generate XML based on Oracle data, using DBMS_XMLGEN, XMLTYPE and a few tricks. My primary goal was to achieve the the following functionality:

  • Get XML from dynamic SQL query with optional support for bind variables.
  • Get XML from open REF CURSOR.
  • Get XML from object type instance.
  • Get XML from collection.
  • Get XML from ANYDATA instance.
  • Get XML from ANYDATASET instance.
  • Get XML from ANYTYPE instance.

The code has been tested with the following releases and editions (all 32-bit versions under Windows XP SP3):

  • Oracle Database 10g Release 1 10.1.0.5.0 Personal Edition.
  • Oracle Database 10g Release 2 10.2.0.4.0 Personal Edition.
  • Oracle Database 10g Release 2 10.2.0.1.0 Express Edition.
  • Oracle Database 11g Release 1 11.1.0.6.0 Personal Edition.
  • Oracle Database 11g Release 2 11.2.0.1.0 Personal Edition.

The first version of this package had the following issues:

  1. It used packaged types for the bind variable collection. This is fine if you use PL/SQL to call the function FROM_SELECT with a bind variable collection. However, it wasn’t possible to call this function directly from a PL/SQL expression or from SQL, so I changed the implementation to use two object types instead.
  2. FROM_REF_CURSOR didn’t generate date values in the XML Schema compliant format. Instead, date values were generated according to the NLS_DATE_FORMAT active when the REF CURSOR was opened, at least on some Oracle versions on some platforms and sometimes only for the first row. This was attempted fixed by calling DBMS_XMLGEN.RESTARTQUERY after having changed the NLS settings but this also proved unstable.
  3. Changing the NLS settings on the fly has the following side effect: Any call to TO_CHAR/TO_DATE/TO_TIMESTAMP etc in the incoming query/REF CURSOR (including PL/SQL functions called) without a specific date/time format will change semantics as the result will adhere to the XML Schema compliant formats (for dates, truncated to the least of the lengths of the original NLS_DATE_FORMAT and the XML compliant NLS_DATE_FORMAT), instead of the original NLS setting in place when calling XML_GEN. This may seem critical, however it’s considered poor practice to call these functions without a specific format.

Anyway, in order to work around the issues with XML Schema-compliant date/time values I decided to fix this for the third updated version of the code. My idea was to implement some code that would do the following:

  • Fetch the dynamic SQL/REF CURSOR into memory or a table.
  • Change the NLS settings for XML Schema-compliance.
  • Open a new REF CURSOR on the copy in memory/table.
  • Call DBMS_XMLGEN on the new REF CURSOR.
  • Change the NLS settings back to the previous session NLS.

This way we would have no side effects as the NLS would be changed after the original data had been fetched. However, this proved to be a major task to implement and in fact I’ve written two separate blog posts about it:

In fact, had I known it would have been that difficult, I would probably have chosen to write an improved version of DBMS_XMLGEN instead. Perhaps a subject for a future blog post or product?

You control the NLS handling through the optional parameter DATE_TIME_OPTION. If using this with a value of either DATE_TIME_XML_SET_NLS_COPY_MEM or DATE_TIME_XML_SET_NLS_COPY_TAB you need the version of REF_CURSOR_COPY from this blog post: Copying/Transforming a REF CURSOR in Oracle 10g+.

Here is the code:

create or replace type bind_variable_t as object (

/**
 * Bind variables: Name and value.
 * @version     $Revision: 1 $
 */

  --- Name.
  name varchar2(30),
  --- Value.
  value varchar2(32767)
);
/

create or replace type bind_variable_c as

/**
 * Unordered collection of bind variables.
 * @version     $Revision: 1 $
 */

table of bind_variable_t;
/

create or replace package xml_gen
authid current_user as
/**
 * XML utilities for converting Oracle data and objects to XML.
 * @version   $Revision: 3 $
 * @author    Finn Ellebaek Nielsen.
 */

  /**
   * Generate date/time values according to current session NLS settings.
   */
  date_time_sess_nls             constant pls_integer := 1;

  /**
   * Generate XML Schema-compliant date/time values. This is done by internally
   * by changing the NLS settings and generating the XML. This has the side
   * effect that any implit/explicit calls to TO_CHAR without a specific format
   * mask will work on the internal NLS setting. Also, for REF CURSORs, the
   * internal NLS setting will not be in effect in all cases, because Oracle
   * pre-allocated bindings to the session NLS in effect when the REF CURSOR was
   * opened.
   */
  date_time_xml_set_nls          constant pls_integer := 2;

  /**
   * Generate XML Schema-compliant date/time values. This is done by internally
   * by changing the NLS settings, restarting the query and generating the XML.
   * For dynamic SQL, this has the side effect that any implit/explicit calls to
   * TO_CHAR without a specific format mask will work on the internal NLS
   * setting. For REF CURSORs, the internal NLS setting sometimes will be in
   * effect because the REF CURSOR was restarted. However, when is very unstable
   * across Oracle versions. On some versions DATE columns will have a truncated
   * value if the session NLS_DATE_FORMAT was shorter than the XML Schema-
   * compliant one, on some it will have a format according to the session
   * NLS_DATE_FORMAT. TIMESTAMP% columns seem to work correctly. On some
   * versions Oracle will throw an error "ORA-01801: date format is too long for
   * internal buffer" if the session NLS format mask in effect when opening the
   * REF CURSOR was shorter that what is required for XML Schema-compliance.
   * This also depends on how many rows are returned by the REF CURSOR.
   * Sometimes it doesn't work correctly for the first row, but it works
   * correctly for subsequent rows.
   * In general, you should avoid using this setting as it's very unstable.
   * Setting your session's NLS_DATE_FORMAT to at least 23 characters makes it
   * slightly more stable.
   */
  date_time_xml_set_nls_restart  constant pls_integer := 3;

  /**
   * Generate XML Schema-compliant date/time values. This is done by internally
   * by fetching the incoming dynamic SQL or REF CURSOR to memory, changing the
   * NLS settings, generating the XML on the memory copy. This has no side
   * effects on implit/explicit calls to TO_CHAR without a specific format mask
   * but the performance is affected.
   * Used with the FROM_SELECT function this is only supported on Oracle 11.1
   * and newer.
   */
  date_time_xml_set_nls_copy_mem constant pls_integer := 4;

  /**
   * Generate XML Schema-compliant date/time values. This is done by internally
   * by fetching the incoming dynamic SQL or REF CURSOR to a global temporary
   * table, changing the NLS settings, generating the XML on the table copy.
   * This has no side effects on implit/explicit calls to TO_CHAR without a
   * specific format mask but the performance is affected.
   * Used with the FROM_SELECT function on Oracle versions older than 11 (or if
   * you're using quoted bind variable names -- necessary because DBMS_SQL does
   * not support quoted bind variable names) this attempts to locate the bind
   * variables in SELECT_STATEMENT and this is not done in a fool proof manner
   * if you're using quoted bind variable names using non-identifier characters.
   */
  date_time_xml_set_nls_copy_tab constant pls_integer := 5;

  function from_select(
    select_statement in varchar2,
    date_time_option in pls_integer := date_time_xml_set_nls
  )
  return xmltype;

  function from_select(
    select_statement in varchar2,
    bind_variables in bind_variable_c,
    date_time_option in pls_integer := date_time_xml_set_nls
  )
  return xmltype;

  function from_ref_cursor(
    ref_cursor in out sys_refcursor,
    date_time_option in pls_integer := date_time_xml_set_nls
  )
  return xmltype;

  function from_anydata(
    ad in anydata,
    date_time_option in pls_integer := date_time_xml_set_nls
  )
  return xmltype;

  function from_anydataset(
    ads in anydataset,
    date_time_option in pls_integer := date_time_xml_set_nls
  )
  return xmltype;

  function from_anytype(
    "at" in anytype,
    date_time_option in pls_integer := date_time_xml_set_nls
  )
  return xmltype;

  procedure set_xml_nls_formats;
  procedure set_session_nls_formats;
end xml_gen;
/

create or replace package body xml_gen as
  --- @version  $Revision: 3 $

  /**
   * XML Schema compliant datetime format, no fraction, no time zone
   * (Oracle DATE).
   */
  xml_nls_date_format constant varchar2(23) :=
      'yyyy-mm-dd"T"hh24:mi:ss';
  /**
   * XML Schema compliant datetime format, with fraction, no time zone (Oracle
   * TIMESTAMP).
   */
  xml_nls_timestamp_format constant varchar2(27) :=
      'yyyy-mm-dd"T"hh24:mi:ss.ff9';
  /**
   * XML Schema compliant datetime format, with fraction, with time zone (Oracle
   * TIMESTAMP WITH [LOCAL] TIME ZONE).
   */
  xml_nls_timestamp_tz_format constant varchar2(34) :=
      'yyyy-mm-dd"T"hh24:mi:ss.ff9tzh:tzm';

  /**
   * Collection of V$NLS_PARAMETERS.VALUE, indexed by
   * V$NLS_PARAMETERS.PARAMETER.
   */
  type nls_parameters_c is table of
  v$nls_parameters.value%type
  index by v$nls_parameters.parameter%type;

  --- Session NLS formats.
  session_nls_parameters nls_parameters_c;
  --- NLS Formats required for XML compliance.
  xml_nls_parameters nls_parameters_c;

  -- Empty bind variable collection.
  bind_variables_empty bind_variable_c;

  function get_xml(
    context in dbms_xmlgen.ctxtype,
    date_time_option in pls_integer := date_time_xml_set_nls
  )
  return xmltype;

/**
 * Gets the result set of a given SELECT statement as XML in Oracle's
 * "canonical form", ie:
 * <ROWSET>
 *   <ROW>
 *     <COLUMN_NAME_1>column_value_1</COLUMN_NAME_1>
 *     <COLUMN_NAME_2>column_value_2</COLUMN_NAME_2>
 *     ...
 *     <COLUMN_NAME_N>column_value_n</COLUMN_NAME_N>
 *   </ROW>
 * </ROWSET>
 * Text columns will have their values converted to XML character entities, also
 * if it's XML, HTML etc. This is not done with XMLTYPE column values as we
 * assume it's valid XML.
 * @param   select_statement
 *                  SELECT statement.
 * @param   date_time_option
 *                  Date/time options. One of the DATE_TIME_% constants from the
 *                  package specification.
 * @return  Result set as XML.
 */

  function from_select(
    select_statement in varchar2,
    date_time_option in pls_integer := date_time_xml_set_nls
  )
  return xmltype as

  begin
    return from_select(
      select_statement, bind_variables_empty, date_time_option
    );
  end from_select;

/**
 * Gets the result set of a given SELECT statement as XML in Oracle's
 * "canonical form", ie:
 * <ROWSET>
 *   <ROW>
 *     <COLUMN_NAME_1>column_value_1</COLUMN_NAME_1>
 *     <COLUMN_NAME_2>column_value_2</COLUMN_NAME_2>
 *     ...
 *     <COLUMN_NAME_N>column_value_n</COLUMN_NAME_N>
 *   </ROW>
 * </ROWSET>
 * Text columns will have their values converted to XML character entities, also
 * if it's XML, HTML etc. This is not done with XMLTYPE column values as we
 * assume it's valid XML.
 * @param   select_statement
 *                  SELECT statement with bind variables.
 * @param   bind_variables
 *                  Collection of bind variables to be bound.
 * @param   date_time_option
 *                  Date/time options. One of the DATE_TIME_% constants from the
 *                  package specification.
 * @return  Result set as XML.
 */

  function from_select(
    select_statement in varchar2,
    bind_variables in bind_variable_c,
    date_time_option in pls_integer := date_time_xml_set_nls
  )
  return xmltype as

    context dbms_xmlgen.ctxtype;
    dbms_sql_cursor binary_integer;
    n pls_integer;
    ref_cursor sys_refcursor;
    quoted_bind_variable_names boolean := false;

  begin
    if substr(
         upper(ltrim(select_statement, chr(13) || chr(10) || ' ')),
         1,
         6
       ) != 'SELECT' then
      -- Invalid statement given, not a SELECT. Attempting to prevent injection.
      raise_application_error(
        -20000,
        'xml_gen.from_select(select_statement => ''' ||
          select_statement || '''): ' ||
          'Only SELECT statements are allowed'
      );
    end if;
    if date_time_option in (
          date_time_xml_set_nls_copy_mem,
          date_time_xml_set_nls_copy_tab
        ) then
      if bind_variables is null or bind_variables.count = 0 then
        open ref_cursor for
        select_statement;
      else
        for i in
              nvl(bind_variables.first, 0) ..
              nvl(bind_variables.last, -1)
            loop
          if substr(bind_variables(i).name, 1, 1) = '"' then
            quoted_bind_variable_names := true;
          end if;
        end loop;

        if portable.get_major_version < 11 or quoted_bind_variable_names then
          -- Older than Oracle 11.1. Dynamically build REF CURSOR OPEN
          -- statement. This is not foolproof if quoted bind variable names are
          -- used and we might not find them correctly.
          declare
            select_statement_upper varchar2(32767) := upper(select_statement);
            first_bind_positions dbms_utility.number_array;
            sorted_bind_variables bind_variable_c := bind_variable_c();
            j pls_integer;
            open_statement varchar2(32767);
            next_char varchar2(1);
            quoted_bind_variable_name boolean;
          begin
            for i in
                  nvl(bind_variables.first, 0) ..
                  nvl(bind_variables.last, -1)
                loop
              -- Locate the bind variables in the SELECT statement.
              n := 1;
              quoted_bind_variable_name :=
                substr(bind_variables(i).name, 1, 1) = '"';
              loop
                if quoted_bind_variable_name then
                  -- Case sensitive search, bind variable name quoted..
                  j := instr(
                    select_statement,
                    ':' || bind_variables(i).name,
                    1,
                    n
                  );
                else
                  -- Case insensitive search, bind variable name not quoted.
                  j := instr(
                    select_statement_upper,
                    ':' || upper(bind_variables(i).name),
                    1,
                    n
                  );
                end if;
                -- Ensure that this is the whole word and not part of another
                -- bind variable. Check that next character cannot be part of an
                -- identifier.
                next_char := substr(
                  select_statement_upper,
                  j + length(bind_variables(i).name) + 1,
                  1
                );
                exit when next_char is null or
                    ascii(next_char) <= 32 or
                    instr('+-/*(),|''', next_char) > 0;
                n := n + 1;
              end loop;
              if j = 0 then
                -- Not found, raise error.
                raise_application_error(
                  -20001,
                  'xml_gen.from_select: Bind variable ' ||
                      bind_variables(i).name || ' not found in select_statement'
                );
              end if;
              first_bind_positions(j) := i;
            end loop;

            j := first_bind_positions.first;
            while j is not null loop
              -- Sorted list of bind positions, sort the bind variables
              -- accordingly.
              sorted_bind_variables.extend(1);
              sorted_bind_variables(sorted_bind_variables.count) :=
                  bind_variables(first_bind_positions(j));
              j := first_bind_positions.next(j);
            end loop;

            open_statement :=
                'begin' || chr(10) ||
                '  open :rc for ' || chr(10) ||
                '''' || replace(select_statement, '''', '''''') || '''' || chr(10) ||
                '  using ';

            for i in
                  nvl(sorted_bind_variables.first, 0) ..
                  nvl(sorted_bind_variables.last, -1)
                loop
              open_statement := open_statement ||
                  '''' || replace(sorted_bind_variables(i).value, '''', '''''') ||
                      ''', ';
            end loop;
            open_statement :=
                rtrim(open_statement, ', ') || ';' || chr(10) ||
                'end;';

            execute immediate open_statement
            using in out ref_cursor;
          end;
        else
          -- Oracle 11.1 or newer -- open with DBMS_SQL, then convert to REF
          -- CURSOR. This does not support quoted bind variable names as
          -- DBMS_SQL will raise a "ORA-01008: not all variables bound" even
          -- through it doesn't raise a "ORA-01006: bind variable does not
          -- exist".
          dbms_sql_cursor := dbms_sql.open_cursor;
          dbms_sql.parse(dbms_sql_cursor, select_statement, dbms_sql.native);

          for i in
                nvl(bind_variables.first, 0) ..
                nvl(bind_variables.last, -1)
              loop
            -- Bind each bind variable.
            dbms_sql.bind_variable(
              dbms_sql_cursor,
              bind_variables(i).name,
              bind_variables(i).value
            );
          end loop;

          begin
            n := dbms_sql.execute(dbms_sql_cursor);
          exception
            when others then
              dbms_output.put_line('xml_gen.from_select');
              dbms_output_put_line(
                'select_statement = ''' || select_statement || ''''
              );
              if sqlcode = -1008 then
                -- ORA-01008: not all variables bound.
                for i in
                      nvl(bind_variables.first, 0) ..
                      nvl(bind_variables.last, -1)
                    loop
                  -- Bind each bind variable.
                  dbms_output_put_line(
                    bind_variables(i).name || ' => ' ||
                        '''' || bind_variables(i).value || ''''
                  );
                  if substr(bind_variables(i).name, 1, 1) = '"' then
                    dbms_output.put_line(
                      'ERROR: DBMS_SQL doesn''t support quoted bind ' ||
                          'variable names (EXECUTE IMMEDIATE does)'
                    );
                  end if;
                end loop;
              end if;
              dbms_output.put_line(sqlerrm);
              dbms_output.put_line(dbms_utility.format_error_backtrace);
              raise;
          end;

          execute immediate
          'begin :ref_cursor := dbms_sql.to_refcursor(:dbms_sql_cursor); end;'
          using out ref_cursor, in out dbms_sql_cursor;
        end if;
      end if;

      return from_ref_cursor(ref_cursor, date_time_option);
    else
      context := dbms_xmlgen.newcontext(select_statement);

      if bind_variables is not null then
        for i in
              nvl(bind_variables.first, 0) ..
              nvl(bind_variables.last, -1)
            loop
          -- Bind each bind variable.
          dbms_xmlgen.setbindvalue(
            context,
            bind_variables(i).name,
            bind_variables(i).value
          );
        end loop;
      end if;
    end if;

    return get_xml(context, date_time_option);
  end from_select;

/**
 * Gets the result set of a given REF CURSOR as XML in Oracle's "canonical
 * form", ie:
 * <ROWSET>
 *   <ROW>
 *     <COLUMN_NAME_1>column_value_1</COLUMN_NAME_1>
 *     <COLUMN_NAME_2>column_value_2</COLUMN_NAME_2>
 *     ...
 *     <COLUMN_NAME_N>column_value_n</COLUMN_NAME_N>
 *   </ROW>
 * </ROWSET>
 * Text columns will have their values converted to XML character entities, also
 * if it's XML, HTML etc. This is not done with XMLTYPE column values as we
 * assume it's valid XML.
 * @param   select_statement
 *                  SELECT statement.
 * @param   date_time_option
 *                  Date/time options. One of the DATE_TIME_% constants from the
 *                  package specification.
 * @return  Result set as XML.
 */

  function from_ref_cursor(
    ref_cursor in out sys_refcursor,
    date_time_option in pls_integer := date_time_xml_set_nls
  )
  return xmltype is

    context dbms_xmlgen.ctxtype;
    result xmltype;
    rc2 sys_refcursor;
    destination char(1);
    type_name user_types.type_name%type;

  begin
    if date_time_option in (
          date_time_xml_set_nls_copy_mem,
          date_time_xml_set_nls_copy_tab
        ) then
      -- Copy to memory/table first, then change NLS, then generate off the
      -- copy.
      if date_time_option = date_time_xml_set_nls_copy_mem then
        destination := 'M';
      else
        destination := 'T';
      end if;

      rc2 := ref_cursor_copy.to_ref_cursor(
        ref_cursor,
        type_name,
        destination,
        plsql_block_before_copy_open => 'xml_gen.set_xml_nls_formats;'
      );

      ref_cursor := rc2;
    end if;

    context := dbms_xmlgen.newcontext(ref_cursor);
    result := get_xml(context, date_time_option);
    begin
      if ref_cursor%isopen then
        close ref_cursor;
      end if;
    exception
      when others then
        null;
    end;
    return result;
  end from_ref_cursor;

/**
 * Gets the result set of a given ANYDATA instance as XML in Oracle's "canonical
 * form", ie:
 * <ROWSET>
 *   <ROW>
 *     <COLUMN_NAME_1>column_value_1</COLUMN_NAME_1>
 *     <COLUMN_NAME_2>column_value_2</COLUMN_NAME_2>
 *     ...
 *     <COLUMN_NAME_N>column_value_n</COLUMN_NAME_N>
 *   </ROW>
 * </ROWSET>
 * This is the function to use when you need to obtain XML for object type/
 * collection instances and ANYDATA in general. You use ANYDATA.CONVERTOBJECT
 * and ANYDATA.CONVERTCOLLECTION respectively to convert from object types and
 * collections for this. XMLTYPE can also convert these directly but has several
 * flaws in terms of incurrent DATE/TIMESTAMP formats, missing elements for NULL
 * values, etc.
 * Text columns will have their values converted to XML character entities, also
 * if it's XML, HTML etc. This is not done with XMLTYPE column values as we
 * assume it's valid XML.
 * @param   ad      ANYDATA instance.
 * @param   date_time_option
 *                  Date/time options. One of the DATE_TIME_% constants from the
 *                  package specification.
 * @return  ANYDATA instance as XML.
 */

  function from_anydata(
    ad in anydata,
    date_time_option in pls_integer := date_time_xml_set_nls
  )
  return xmltype as

    ref_cursor sys_refcursor;

  begin
    open ref_cursor for
    select ad
    from   dual;

    -- Get rid of surrounding XML due to the SELECT statement.
    return from_ref_cursor(ref_cursor, date_time_option).extract('/*/*/*/*');
  end from_anydata;

/**
 * Gets the result set of a given ANYDATASET instance as XML in Oracle's
 * "canonical form", ie:
 * <ROWSET>
 *   <ROW>
 *     <COLUMN_NAME_1>column_value_1</COLUMN_NAME_1>
 *     <COLUMN_NAME_2>column_value_2</COLUMN_NAME_2>
 *     ...
 *     <COLUMN_NAME_N>column_value_n</COLUMN_NAME_N>
 *   </ROW>
 * </ROWSET>
 * Text columns will have their values converted to XML character entities, also
 * if it's XML, HTML etc. This is not done with XMLTYPE column values as we
 * assume it's valid XML.
 * @param   ads     ANYDATASET instance.
 * @param   date_time_option
 *                  Date/time options. One of the DATE_TIME_% constants from the
 *                  package specification.
 * @return  ANYDATASET instance as XML.
 */

  function from_anydataset(
    ads in anydataset,
    date_time_option in pls_integer := date_time_xml_set_nls
  )
  return xmltype as

    ref_cursor sys_refcursor;

  begin
    open ref_cursor for
    select ads
    from   dual;

    -- Get rid of surrounding XML due to the SELECT statement.
    return from_ref_cursor(ref_cursor, date_time_option).extract('/*/*/*/*');
  end from_anydataset;

/**
 * Gets the result set of a given ANYTYPE instance as XML in Oracle's "canonical
 * form", ie:
 * <ROWSET>
 *   <ROW>
 *     <COLUMN_NAME_1>column_value_1</COLUMN_NAME_1>
 *     <COLUMN_NAME_2>column_value_2</COLUMN_NAME_2>
 *     ...
 *     <COLUMN_NAME_N>column_value_n</COLUMN_NAME_N>
 *   </ROW>
 * </ROWSET>
 * Text columns will have their values converted to XML character entities, also
 * if it's XML, HTML etc. This is not done with XMLTYPE column values as we
 * assume it's valid XML.
 * @param   "at"    ANYTYPE instance.
 * @param   date_time_option
 *                  Date/time options. One of the DATE_TIME_% constants from the
 *                  package specification.
 * @return  ANYTYPE instance as XML.
 */

  function from_anytype(
    "at" in anytype,
    date_time_option in pls_integer := date_time_xml_set_nls
  )
  return xmltype as

    ref_cursor sys_refcursor;

  begin
    open ref_cursor for
    select "at"
    from   dual;

    -- Get rid of surrounding XML due to the SELECT statement.
    return from_ref_cursor(ref_cursor, date_time_option).extract('/*/*/*/*');
  end from_anytype;

/**
 * Gets the result set of a given DBMS_XMLGEN context as XML in Oracle's
 * "canonical form", ie:
 * <ROWSET>
 *   <ROW>
 *     <COLUMN_NAME_1>column_value_1</COLUMN_NAME_1>
 *     <COLUMN_NAME_2>column_value_2</COLUMN_NAME_2>
 *     ...
 *     <COLUMN_NAME_N>column_value_n</COLUMN_NAME_N>
 *   </ROW>
 * </ROWSET>
 * Text columns will have their values converted to XML character entities, also
 * if it's XML, HTML etc. This is not done with XMLTYPE column values as we
 * assume it's valid XML.
 * @param   context DBMS_XMLGEN context created based on either SELECT statement
 *                  or REF CURSOR.
 * @param   date_time_option
 *                  Date/time options. One of the DATE_TIME_% constants from the
 *                  package specification.
 * @return  Result set as XML.
 */

  function get_xml(
    context in dbms_xmlgen.ctxtype,
    date_time_option in pls_integer := date_time_xml_set_nls
  )
  return xmltype as

    result xmltype;

  begin
    if date_time_option in (
          date_time_xml_set_nls,
          date_time_xml_set_nls_restart
        ) then
      set_xml_nls_formats;
    end if;
    if date_time_option = date_time_xml_set_nls_restart then
      dbms_xmlgen.restartquery(context);
    end if;
    dbms_xmlgen.setnullhandling(context, dbms_xmlgen.empty_tag);
    dbms_xmlgen.setconvertspecialchars(context, true);
    result := dbms_xmlgen.getxmltype(context);
    dbms_xmlgen.closecontext(context);
    if date_time_option in (
          date_time_xml_set_nls,
          date_time_xml_set_nls_restart,
          date_time_xml_set_nls_copy_mem,
          date_time_xml_set_nls_copy_tab
        ) then
      set_session_nls_formats;
    end if;

    return result;
  exception
    when others then
      if date_time_option in (
            date_time_xml_set_nls,
            date_time_xml_set_nls_restart,
            date_time_xml_set_nls_copy_mem,
            date_time_xml_set_nls_copy_tab
          ) then
        set_session_nls_formats;
      end if;
      raise;
  end get_xml;

/**
 * Sets specific NLS date/timestamp formats required for XML compliance.
 */

  procedure set_xml_nls_formats is
  pragma autonomous_transaction;

    needs_commit boolean := false;

  begin
    for nls in (
          select /*+ cursor_sharing_exact */
                 parameter,
                 value
          from   v$nls_parameters
          where  parameter like 'NLS_%FORMAT' and
                 parameter not like 'NLS\_TIME\_%' escape '\'
        ) loop
      session_nls_parameters(nls.parameter) := nls.value;
      if nls.value != xml_nls_parameters(nls.parameter) then
        -- Change NLS format for XML-compliance.
        dbms_session.set_nls(
          nls.parameter,
          '''' || xml_nls_parameters(nls.parameter) || ''''
        );

        needs_commit := true;
      end if;
    end loop;

    if needs_commit then
      commit;
    end if;
  end set_xml_nls_formats;

/**
 * Restores NLS date/timestamp formats as it was before setting up for XML
 * compliance.
 */

  procedure set_session_nls_formats is
  pragma autonomous_transaction;

    needs_commit boolean := false;
    nls_parameter v$nls_parameters.parameter%type;

  begin
    nls_parameter := xml_nls_parameters.first;
    while nls_parameter is not null loop
      if xml_nls_parameters(nls_parameter) !=
          session_nls_parameters(nls_parameter) then
        -- Restore original NLS format.
        dbms_session.set_nls(
          nls_parameter,
          '''' || session_nls_parameters(nls_parameter) || ''''
        );

        needs_commit := true;
      end if;

      nls_parameter := xml_nls_parameters.next(nls_parameter);
    end loop;

    if needs_commit then
      commit;
    end if;
  exception
    when no_data_found then
      -- Ignore that we don't have a backup. Presumably because
      -- SET_XML_NLS_FORMATS hasn't been called.
      null;
  end set_session_nls_formats;

begin
  xml_nls_parameters('NLS_DATE_FORMAT') := xml_nls_date_format;
  xml_nls_parameters('NLS_TIMESTAMP_FORMAT') := xml_nls_timestamp_format;
  xml_nls_parameters('NLS_TIMESTAMP_TZ_FORMAT') := xml_nls_timestamp_tz_format;
end xml_gen;
/

Example Usage

Here are a few examples of usage:

set long 100000
set pagesize 50000
set linesize 1000

alter session set nls_date_format = 'dd.mm.yyyy hh24:mi:ss';
alter session set nls_timestamp_format = 'dd.mm.yyyy hh24:mi:ss.ff';
alter session set nls_timestamp_tz_format = 'dd.mm.yyyy hh24:mi:ss.ff tzh:tzm';

variable xml clob

prompt INSERT

begin
  :xml := '';
  :xml := xml_gen.from_select(
    'insert into dept values (50, ''abc'', ''def'')'
  ).getclobval();
end;
/

print :xml

prompt FROM_SELECT

begin
  :xml := '';
  :xml := xml_gen.from_select('select * from dept').getclobval();
end;
/

print :xml

prompt FROM_SELECT with bind variables

begin
  :xml := xml_gen.from_select(
    'select * from emp where empno in (:empno1, :empno2)',
    bind_variable_c(
      bind_variable_t('empno1', '7499'),
      bind_variable_t('empno2', '7934')
    )
  ).getclobval();
end;
/

print :xml

prompt FROM_REF_CURSOR with bind variables

declare
  rc sys_refcursor;
begin
  open rc for
  'select * from emp where empno in (:empno1, :empno2)'
  using 7369, 7499;

  :xml := xml_gen.from_ref_cursor(rc).getclobval();
end;
/

print :xml

-- grant execute on sys.aq$_subscribers to scott;
-- grant execute on sys.aq$_agent to scott;

prompt FROM_ANYDATA object type instance

begin
  :xml := '';
  :xml := xml_gen.from_anydata(
    anydata.convertobject(sys.aq$_agent('name', 'address', 1))
  ).getclobval();
end;
/

print :xml

prompt FROM_ANYDATA collection

begin
  :xml := '';
  :xml := xml_gen.from_anydata(
    anydata.convertcollection(
      sys.aq$_subscribers(
        sys.aq$_agent('name1', 'address1', 1),
        sys.aq$_agent('name2', 'address2', 2)
      )
    )
  ).getclobval();
end;
/

print :xml

prompt FROM_ANYTYPE throws ORA-19201

declare
  a anytype;
begin
  anytype.begincreate(
    typecode => dbms_types.typecode_object,
    atype => a
  );
  a.addattr(
    typecode => dbms_types.typecode_number,
    aname => 'n',
    prec => 38,
    scale => 0,
    len => null,
    csid => null,
    csfrm => null
  );
  a.addattr(
    typecode => dbms_types.typecode_date,
    aname => 'd',
    prec => 5,
    scale => 5,
    len => null,
    csid => null,
    csfrm => null
  );
  a.endcreate;

  begin
    :xml := '';
    :xml := xml_gen.from_anytype(a).getclobval();
  exception
    when others then
      dbms_output.put_line(sqlerrm);
  end;

  begin
    select sys_xmlgen(a).getclobval()
    into   :xml
    from   dual;
  exception
    when others then
      dbms_output.put_line(sqlerrm);
  end;
end;
/

print :xml

prompt FROM_ANYDATASET throws ORA-19201

declare
  a anytype;
  ads anydataset;
begin
  anytype.begincreate(
    typecode => dbms_types.typecode_object,
    atype => a
  );
  a.addattr(
    typecode => dbms_types.typecode_number,
    aname => 'n',
    prec => 38,
    scale => 0,
    len => null,
    csid => null,
    csfrm => null
  );
  a.addattr(
    typecode => dbms_types.typecode_varchar2,
    aname => 'vc2',
    prec => null,
    scale => null,
    len => 10,
    csid => null,
    csfrm => null
  );
  a.endcreate;

  anydataset.begincreate( dbms_types.typecode_object, a, ads);
  for i in 1 .. 2 loop
    ads.addinstance;
    ads.piecewise;
    ads.setnumber(i);
    ads.setvarchar2('vc2 ' || i);
  end loop;
  ads.endcreate;

  begin
    :xml := '';
    :xml := xml_gen.from_anydataset(ads).getclobval();
  exception
    when others then
      dbms_output.put_line(sqlerrm);
  end;

  begin
    select sys_xmlgen(ads).getclobval()
    into   :xml
    from   dual;
  exception
    when others then
      dbms_output.put_line(sqlerrm);
  end;
end;
/

print :xml

Source Code

You can download the source code here.
Feel free to use the code at your own risk. I welcome your feedback and suggestions for improvements but the code as such is not supported.

Conclusion

Did I achieve my goals? Not entirely, but close. I’ll describe this in detail in the following sections.

Object Type Instance

Object type instance: Unfortunately, Oracle doesn’t allow us to write custom PL/SQL code that can receive or return every possible object type instance. Oracle reserves the datatypes "<ADT_1>" (ADT is an acronym for Abstract Data Type, another term used for object types) and "<COLLECTION_1>" for SYS objects so we’re not allowed to use them. This is very unfortunate, as in this case we could just pass them on to other Oracle functionality that would know what to do with them.

We have a few workarounds:

  • Convert the object type instance and collection to an ANYDATA instance by calling the ANYDATA.CONVERTOBJECT and ANYDATA.CONVERTCOLLECTION “constructors” respectively and then calling XML_GEN.FROM_ANYDATA.
  • Add an overloaded FROM_OBJECT function in XML_GEN for each object type you need to support. This should simply convert the object type instance to an ANYDATA instance (by calling the ANYDATA.CONVERTOBJECT “constructor”) and invoke XML_GEN.FROM_ANYDATA. If your object types inherit from a few base object types this would be easily done by just adding overloads for the base object types.
  • Add a member function to each of the object types called TO_XML, which basically does what the overloaded FROM_OBJECT function mentioned above would do.

ANYTYPE and ANYDATASET

ANYTYPE and ANYDATASET are not supported by DBMS_XMLGEN and any of the other XML generating functionality. Oracle throws the following error if you invoke with any of those datatypes:

ORA-19201: Datatype not supported

I considered implementing a workaround that would generate the XML by recursively inspecting the structure (and values for ANYDATASET) but I’ll leave this for a future version of my XML_GEN package.

Related Blog Posts

A few other related and interesting blog posts are (please notify me if you know of other interesting, relevant blog posts on this subject and I’ll add them to this list):

References

Converting a LONG Column to a CLOB on the fly

Background

The arcane LONG datatype has as you’re aware of many intrinsic limitations but unfortunately Oracle still uses quite a few LONG columns in the data dictionary. It’s beyond me why these haven’t been converted to CLOB a very long time ago (when Oracle deprecated usage of LONG). At the same time, Oracle only provides the following means of converting these LONG columns to eg CLOB:

  1. TO_LOB: A function that converts a LONG column to a CLOB. Can only be used in a SELECT list of a subquery in an INSERT statement. This means that the function is only useful if you’re converting the underlying table definition.
  2. ALTER TABLE <table_name> MODIFY <long_column_name> CLOB: This converts the column datatype and the data as well.
  3. Use DBMS_REDEFINITION to redefine the column datatype.
  4. Use Oracle Data Pump to convert the column datatype.
  5. Use CAST function to cast from LONG to CLOB: Unfortunately, CAST doesn’t support LONG.
  6. Write a PL/SQL function that performs a SELECT ... INTO l FROM user_views, where l is a PL/SQL variable of type LONG. However, in PL/SQL, a LONG variable can only hold up to 32,760 characters (yes, not 32,767, ie different from VARCHAR2) so this will only solve the problem for small to medium sized views.
  7. Use dynamic SQL with DBMS_SQL.COLUMN_VALUE_LONG to access the LONG piecewise.

Problem

Imagine the following scenario:

  • You need to access the source of a given view.
  • USER_VIEWS.TEXT contains the view source but it is a column with datatype LONG so very limited in use.
  • Using DBMS_METADATA to obtain the source is not an option for various reasons, eg performance, practicality, etc.

Solution

Seen that we need to access the data from the view, we cannot use TO_LOB or any of the other solutions that alter the underlying table column definition. However, the DBMS_SQL.COLUMN_VALUE_LONG function comes to the rescue as this allows us to fetch the LONG data piecewise and construct a CLOB with the same data. This obviously means that we have to use dynamic SQL for the query of the table/view we’re trying to convert the LONG column for.

We have two different methods of applying dynamic SQL to this problem:

  1. Dynamic SQL for just the LONG column. We need to write a function that receives the primary key value(s) of the underlying table/view (could be ROWID if a table) as input parameter(s), builds a SELECT statement for the underlying table/view for the LONG column using bind variable(s) for the primary key value(s), does the DBMS_SQL magic and uses the DBMS_SQL.COLUMN_VALUE_LONG function. For scalability, the solution should keep a collection of parsed statements and re-use those without re-parsing.
  2. Dynamic SQL for the whole underlying table/view generated and called in a pipelined table function that utilizes object types.

For this blog post I’ll use solution 2, for which I’ll demonstrate the following:

  1. Create a standalone function with “LONG-to-CLOB” functionality.
  2. Create an object type with attributes corresponding to USER_VIEWS. This object type will have member methods used with DBMS_SQL.
  3. Create an object type collection based on the object type from 2.
  4. Create a standalone function that takes an optional argument for a WHERE clause used against USER_VIEWS, using a combination of DBMS_SQL and 1., 2., and 3. above. This method is a pipelined table function.
  5. Optionally create a view on top of the standalone function from 4.

You can apply the structure of this solution in use cases where you need to access a LONG value in a table/view as a CLOB, without converting the underlying persistent column: The standalone function from 1. above is generic — items 2. through 5. are specific to the underlying table/view.

We’ll go through the different parts in the following sections.

“LONG-to-CLOB” Function

This is the function that converts a LONG column to a CLOB value through a DBMS_SQL cursor that has been parsed, prepared (given column “defined” with DBMS_SQL.DEFINE_COLUMN_LONG) and executed:

create or replace function long_to_clob(
  dbms_sql_cursor in integer,
  col_id in integer
)
return clob as

/**
 * Fetches LONG column value and converts it to a CLOB.
 * @param   dbms_sql_cursor
 *                  DBMS_SQL cursor parsed, prepared (given column "defined"
 *                  with DBMS_SQL.DEFINE_COLUMN_LONG) and executed.
 * @param   col_id  Column ID.
 * @return  LONG column value as a CLOB.
 */

  long_val long;
  long_len integer;
  buf_len  integer := 32760;
  cur_pos  number := 0;

  result   clob;

begin
  -- Create CLOB.
  dbms_lob.createtemporary(result, false, dbms_lob.call);

  -- Piecewise fetching of the LONG column, appending to the CLOB.
  loop
    dbms_sql.column_value_long(
      dbms_sql_cursor,
      col_id,
      buf_len,
      cur_pos,
      long_val,
      long_len
    );
    exit when long_len = 0;
    dbms_lob.append(result, long_val);
    cur_pos := cur_pos + long_len;
  end loop;

  return result;
end long_to_clob;
/

Object Type

In Oracle 11.2.0.1.0, USER_VIEWS has the following columns:

SQL> desc user_views
 Name             Null?    Type
 ---------------- -------- --------------
 VIEW_NAME        NOT NULL VARCHAR2(30)
 TEXT_LENGTH               NUMBER
 TEXT                      LONG
 TYPE_TEXT_LENGTH          NUMBER
 TYPE_TEXT                 VARCHAR2(4000)
 OID_TEXT_LENGTH           NUMBER
 OID_TEXT                  VARCHAR2(4000)
 VIEW_TYPE_OWNER           VARCHAR2(30)
 VIEW_TYPE                 VARCHAR2(30)
 SUPERVIEW_NAME            VARCHAR2(30)
 EDITIONING_VIEW           VARCHAR2(1)
 READ_ONLY                 VARCHAR2(1)

which means that we could create our object type like the following (notice how TEXT is represented by a CLOB instead of the original LONG):

create or replace type user_views_t as object (
/**
 * Object type representing columns in data dictionary view USER_VIEWS, with the
 * TEXT column represented by a CLOB instead of a LONG.
 */

  view_name        varchar2(30),
  text_length      number,
  -- CLOB instead of LONG.
  text             clob,
  type_text_length number,
  type_text        varchar2(4000),
  oid_text_length  number,
  oid_text         varchar2(4000),
  view_type_owner  varchar2(30),
  view_type        varchar2(30),
  superview_name   varchar2(30),
  editioning_view  varchar2(1),
  read_only        varchar2(1),

  constructor function user_views_t
  return self as result,
  constructor function user_views_t(dbms_sql_cursor in integer)
  return self as result,

  member procedure define_columns(dbms_sql_cursor in integer)
);
/

The object type has two constructors and one member function. The member function is used to define the columns for DBMS_SQL and in order to be able to use the attributes, this has to be a member function (working on an object type instance as opposed to a static function) and the first constructor is used to create such a dummy instance, with all the attributes set to NULL. The second constructor sets all attributes to corresponding column values in a given fetched DBMS_SQL cursor.

This is the implementation of the object type body:

create or replace type body user_views_t as
/**
 * Constructor. Sets all attributes to NULL.
 * @return  New object type instance.
 */

  constructor function user_views_t
  return self as result as

  begin
    return;
  end user_views_t;

/**
 * Constructor. Sets all attributes to corresponding column values in fetched
 * DBMS_SQL cursor.
 * @param   dbms_sql_cursor
 *                  Executed and fetched DBMS_SQL cursor on a query from
 *                  USER_VIEWS.
 * @return  New object type instance.
 */

  constructor function user_views_t(dbms_sql_cursor in integer)
  return self as result as

  begin
    dbms_sql.column_value(dbms_sql_cursor, 01, view_name);
    dbms_sql.column_value(dbms_sql_cursor, 02, text_length);
    -- Convert LONG to CLOB.
    text := long_to_clob(dbms_sql_cursor,  03);
    dbms_sql.column_value(dbms_sql_cursor, 04, type_text_length);
    dbms_sql.column_value(dbms_sql_cursor, 05, type_text);

    dbms_sql.column_value(dbms_sql_cursor, 06, oid_text_length);
    dbms_sql.column_value(dbms_sql_cursor, 07, oid_text);
    dbms_sql.column_value(dbms_sql_cursor, 08, view_type_owner);
    dbms_sql.column_value(dbms_sql_cursor, 09, view_type);
    dbms_sql.column_value(dbms_sql_cursor, 10, superview_name);

    dbms_sql.column_value(dbms_sql_cursor, 11, editioning_view);
    dbms_sql.column_value(dbms_sql_cursor, 12, read_only);

    return;
  end user_views_t;

/**
 * Defines all columns in DBMS_SQL cursor.
 * @param   Parsed DBMS_SQL cursor on a query from USER_VIEWS.
 */

  member procedure define_columns(dbms_sql_cursor in integer) as

  begin
    dbms_sql.define_column(dbms_sql_cursor, 01, view_name, 30);
    dbms_sql.define_column(dbms_sql_cursor, 02, text_length);
    -- LONG column.
    dbms_sql.define_column_long(dbms_sql_cursor, 03);
    dbms_sql.define_column(dbms_sql_cursor, 04, type_text_length);
    dbms_sql.define_column(dbms_sql_cursor, 05, type_text, 4000);

    dbms_sql.define_column(dbms_sql_cursor, 06, oid_text_length);
    dbms_sql.define_column(dbms_sql_cursor, 07, oid_text, 4000);
    dbms_sql.define_column(dbms_sql_cursor, 08, view_type_owner, 30);
    dbms_sql.define_column(dbms_sql_cursor, 09, view_type, 30);
    dbms_sql.define_column(dbms_sql_cursor, 10, superview_name, 30);

    dbms_sql.define_column(dbms_sql_cursor, 11, editioning_view, 1);
    dbms_sql.define_column(dbms_sql_cursor, 12, read_only, 1);
  end define_columns;
end;
/

Notice how the second constructor uses our LONG_TO_CLOB function.

The process of matching the original table/view columns into attributes and handling of these in one of the constructors and the DEFINE_COLUMNS method is tedious and error prone. If you need to do this often you should consider writing a generator that generates the object type specification and body based on the definition of a given table/view.

Object Type Collection

This is the implementation of the object type collection, using a nested table:

create or replace type user_views_c as
table of user_views_t;
/

We need this object type collection for the pipelined table function as this pipes back a collection of object type instances back to the SQL engine — namely, one object type instance for each row in USER_VIEWS this function finds.

Pipelined Table Function

This is the implementation of the standalone pipelined table function that takes an optional parameter to be matched against the VIEW_NAME column in a LIKE expression and uses DBMS_SQL to parse the query from USER_VIEWS, uses the USER_VIEWS_T and USER_VIEWS_C object types (which in turn calls LONG_TO_CLOB) and pipes the rows back to the SQL engine (type PTF suffix refers to “Pipelined Table Function”):

create or replace function user_views_ptf(
  view_name_like in varchar2 := '%'
)
return user_views_c pipelined as

/**
 * Gets collection of user views representing rows in USER_VIEWS. Resolved
 * through a dynamic SQL query against USER_VIEWS and the TEXT column is
 * converted from a LONG to a CLOB.
 * @param   view_name_like
 *                  LIKE expression used in a WHERE clause predicate against
 *                  USER_VIEWS.VIEW_NAME. Default '%', ie all.
 * @return  Collection that can be used in a FROM clause with a TABLE() cast.
 */

  query           varchar2(200);
  dbms_sql_cursor binary_integer;
  n               pls_integer;
  each            user_views_t;

begin
  query :=
     'select * ' ||
     'from   user_views uv ' ||
     'where  uv.view_name like :view_name_like';

  -- Create cursor, parse and bind.
  dbms_sql_cursor := dbms_sql.open_cursor;
  dbms_sql.parse(dbms_sql_cursor, query, dbms_sql.native);
  dbms_sql.bind_variable(dbms_sql_cursor, 'view_name_like', view_name_like);

  -- Define columns through dummy object type instance.
  each := user_views_t();
  each.define_columns(dbms_sql_cursor);

  -- Execute.
  n := dbms_sql.execute(dbms_sql_cursor);

  -- Fetch all rows, pipe each back.
  while dbms_sql.fetch_rows(dbms_sql_cursor) > 0 loop
    each := user_views_t(dbms_sql_cursor);

    pipe row(each);
  end loop;

  dbms_sql.close_cursor(dbms_sql_cursor);
exception
  when others then
    dbms_output.put_line('long_to_clob: ' || sqlerrm);
    dbms_output.put_line(dbms_utility.format_error_backtrace);
    if dbms_sql.is_open(dbms_sql_cursor) then
      dbms_sql.close_cursor(dbms_sql_cursor);
    end if;

    raise;
end user_views_ptf;
/

It’s inconvenient that we can’t create this function as a member function on USER_VIEWS_T where it really belongs. However this is not possible because that would introduce a cyclical dependency between USER_VIEWS_T and USER_VIEWS_C, which is not allowed (even using forward object type declarations).

SELECT

You can select from the pipelined table function with a TABLE expression, such as:

select *
from   table(user_views_ptf('V%'));

Installation

You need to install the solution objects in the following order:

  1. LONG_TO_CLOB function.
  2. USER_VIEWS_T object type specification.
  3. USER_VIEWS_T object type body.
  4. USER_VIEWS_C object type collection.
  5. USER_VIEWS_PTF function.

Test

A small test case:

create or replace view v1 as
select 1 n,
       'abc' vc2,
       sysdate d,
       systimestamp(6) t
from   dual;

SQL> select view_name,
  2         text
  3  from   table(user_views_ptf('V%'));

VIEW_NAME                      TEXT
------------------------------ ------------------------
V1                             select 1 n,
                                      'abc' vc2,
                                      sysdate d,
                                      systimestamp(6) t
                               from   dual

View

You can optionally create a view on top of the pipelined table function. In this case, you cannot push the LIKE expression into the argument to the table function so the optimizer has no alternative to perform a full table scan on USER_VIEWS and then a match on the returned rows on whatever predicate the view is used with.

Maintenance

Every time you need to convert a LONG column in a table/view to a CLOB you need to do the following:

  1. Create the object type with attributes corresponding to the columns of the table/view you need to obtain data for. The object type needs two constructors and the member function DEFINE_COLUMNS as for the USER_VIEWS_T object type.
  2. Create the object type collection for 1.
  3. Create the standalone pipelined table function.
  4. Optionally create a view on top of 3.

If you prefer to use packages over standalone functions, you could bundle them up in a package called LONG_TO_CLOB and rename the function LONG_TO_CLOB to TO_CLOB.