Comparing XML in Oracle

Introduction

This blog post is about comparing a specific type of XML documents with the purpose of obtaining details about their differences, within an Oracle database. This specific type is documents representing table data/resultsets conforming to Oracle’s canonical form, ie XML with the following structure (the comments <!-- ... --> are only included for illustration):

<ROWSET>
  <ROW><!-- Row 1 -->
    <COLUMN_NAME_1>column_value_1</COLUMN_NAME_1>
    <COLUMN_NAME_2>column_value_2</COLUMN_NAME_2>
    <!-- ... -->
    <COLUMN_NAME_N>column_value_n</COLUMN_NAME_N>
  </ROW>
  <ROW><!-- Row 2 -->
    <COLUMN_NAME_1>column_value_1</COLUMN_NAME_1>
    <COLUMN_NAME_2>column_value_2</COLUMN_NAME_2>
    <!-- ... -->
    <COLUMN_NAME_N>column_value_n</COLUMN_NAME_N>
  </ROW>
  <!-- ... -->
  <ROW><!-- Row n -->
    <COLUMN_NAME_1>column_value_1</COLUMN_NAME_1>
    <COLUMN_NAME_2>column_value_2</COLUMN_NAME_2>
    <!-- ... -->
    <COLUMN_NAME_N>column_value_n</COLUMN_NAME_N>
   </ROW>
</ROWSET>

So basically there’s an XML element for each column and the name of the element corresponds to the column name. There’s a <ROW> fragment for each row in the resultset and the root element is called <ROWSET> (all XML documents must have at most one root element).

The diffing method I’m going to describe and implement in this blog post is focusing on a limited version of Oracle canonical form, namely it is assumed that:

  • None of the columns in the table are using object types, including collections and XMLTYPE.
  • Object tables are not used.
  • XML element attributes are not used.
  • NULL values are represented by an empty tag, eg <ENAME/> or <ENAME></ENAME>.
  • We regard the document order to be significant — like comparing sets in SQL.

This means that we can assume that all the column values are atomic, single and simple values, which means that the comparison implementation can be simplified.

The primary use case for this XML is comparison (or XML diffing) for automated unit/regression tests that need to verify that resultsets from the Program Under Test (PUT) are identical to a given expected outcome. It’s a convenient way to specify the expected outcome in XML and the PUT resultsets can then be converted to XML and compared. My blog post Converting Between Oracle Data and XML describes how to convert the Oracle data to XML.

The differ should return NULL in case there are no differences and it must support node values larger than 32KB.

The code in this blog post requires Oracle Database 11g Release 1 or newer and has been tested on the following versions (both on Windows XP Service Pack 3 32-bit):

  • Oracle Database 11g Release 1 11.1.0.6.0.
  • Oracle Database 11g Release 2 11.2.0.1.0.

In a future follow up to this blog post I’ll implement a more generic XML differ that will work with Oracle Database 10g Release 1 or newer.

Example

Here’s an example of what I would like to be able to do. Given the following two XML documents:

<ROWSET>
  <ROW>
    <DEPTNO>10</DEPTNO>
    <DNAME>ACCOUNTING</DNAME>
    <LOC>NEW YORK</LOC>
  </ROW>
  <ROW>
    <DEPTNO>20</DEPTNO>
    <DNAME>RESEARCH</DNAME>
    <LOC>DALLAS</LOC>
  </ROW>
  <ROW>
    <DEPTNO>30</DEPTNO>
    <DNAME>SALES</DNAME>
    <LOC>CHICAGO</LOC>
  </ROW>
  <ROW>
    <DEPTNO>40</DEPTNO>
    <DNAME>OPERATIONS</DNAME>
    <LOC>BOSTON</LOC>
  </ROW>
</ROWSET>

and

<ROWSET>
  <ROW>
    <DEPTNO>20</DEPTNO>
    <DNAME>RESEARCH</DNAME>
    <LOC>DALLAS</LOC>
  </ROW>
  <ROW>
    <DEPTNO>30</DEPTNO>
    <DNAME>SALES</DNAME>
    <LOC>CHICAGO</LOC>
  </ROW>
  <ROW>
    <DEPTNO>40</DEPTNO>
    <DNAME>XOPERATIONSETC</DNAME>
    <LOC>NEW YORK</LOC>
  </ROW>
  <ROW>
    <DEPTNO>50</DEPTNO>
    <DNAME>ACCOUNTING</DNAME>
    <LOC>ALBANY</LOC>
  </ROW>
</ROWSET>

I would like an XML differ to pick up the following differences and only those:

  • Department 10 has been removed.
  • Department 50 has been added.
  • Department 40’s name and location have changed.

The XML differ should return this in a structured form, preferably in XML, perhaps something like this:

<?xml version="1.0" encoding="UTF-8"?>
<!-- Differencing export generated by Altova® DiffDog® version 2009 sp1
  Enterprise Edition - for more information please visit www.altova.com -->
<diff_result>
  <diff_info comparison_mode="text_or_xml">
    <source_left name="c:\temp\xml1.xml" uri="file:///c:/temp/xml1.xml"/>
    <source_right name="c:\temp\xml2.xml" uri="file:///c:/temp/xml2.xml"/>
  </diff_info>
  <xml_diff>
    <left_location>
      <parent xpath="/ROWSET"/>
      <position>1</position>
    </left_location>
    <right_location>
      <parent xpath="/ROWSET"/>
      <position>1</position>
    </right_location>
    <left_content>
      <element>
        <ROW>
          <DEPTNO>10</DEPTNO>
          <DNAME>ACCOUNTING</DNAME>
          <LOC>NEW YORK</LOC>
        </ROW>
      </element>
    </left_content>
  </xml_diff>
  <xml_diff>
    <left_location>
      <parent xpath="/ROWSET/ROW[4]/DNAME"/>
      <position>1</position>
    </left_location>
    <right_location>
      <parent xpath="/ROWSET/ROW[3]/DNAME"/>
      <position>1</position>
    </right_location>
    <left_content>
      <element>OPERATIONS</element>
    </left_content>
    <right_content>
      <element>XOPERATIONSETC</element>
    </right_content>
  </xml_diff>
  <xml_diff>
    <left_location>
      <parent xpath="/ROWSET/ROW[4]/LOC"/>
      <position>1</position>
    </left_location>
    <right_location>
      <parent xpath="/ROWSET/ROW[3]/LOC"/>
      <position>1</position>
    </right_location>
    <left_content>
      <element>BOSTON</element>
    </left_content>
    <right_content>
      <element>NEW YORK</element>
    </right_content>
  </xml_diff>
  <xml_diff>
    <left_location>
      <parent xpath="/ROWSET"/>
      <position>5</position>
    </left_location>
    <right_location>
      <parent xpath="/ROWSET"/>
      <position>4</position>
    </right_location>
    <right_content>
      <element>
        <ROW>
          <DEPTNO>50</DEPTNO>
          <DNAME>ACCOUNTING</DNAME>
          <LOC>ALBANY</LOC>
        </ROW>
      </element>
    </right_content>
  </xml_diff>
</diff_result>

which is what Altova DiffDog 2009 returns, or even better:

<DIFFERENCES>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[DEPTNO="10"]]]></XPATH>
    <OPERATION>DELETE</OPERATION>
    <VALUE1><![CDATA[<ROW>
  <DEPTNO>10</DEPTNO>
  <DNAME>ACCOUNTING</DNAME>
  <LOC>NEW YORK</LOC>
</ROW>]]></VALUE1>
  </DIFFERENCE>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[DEPTNO="50"]]]></XPATH>
    <OPERATION>INSERT</OPERATION>
    <VALUE2><![CDATA[<ROW>
  <DEPTNO>50</DEPTNO>
  <DNAME>ACCOUNTING</DNAME>
  <LOC>ALBANY</LOC>
</ROW>]]></VALUE2>
  </DIFFERENCE>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[DEPTNO="40"]/DNAME]]></XPATH>
    <OPERATION>UPDATE</OPERATION>
    <VALUE1><![CDATA[OPERATIONS]]></VALUE1>
    <VALUE2><![CDATA[XOPERATIONSETC]]></VALUE2>
  </DIFFERENCE>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[DEPTNO="40"]/LOC]]></XPATH>
    <OPERATION>UPDATE</OPERATION>
    <VALUE1><![CDATA[BOSTON]]></VALUE1>
    <VALUE2><![CDATA[NEW YORK]]></VALUE2>
  </DIFFERENCE>
</DIFFERENCES>

which provides the same information but in a more compact and useful format.

Alternatives

For XML diffing within the Oracle database we have various alternatives:

  • Open source PL/SQL XML Compare Project: DBI_XML_COMPAREDBI_XML_COMPARE.GET_XML_DIFFERENCES.
  • Commercial software Quest Code Tester for Oracle’s built-in XML differ: QU_XML.GET_XML_DIFFERENCES.
  • Open source Java code that could be loaded into the database as Java stored procedures, eg XMLUnit.
  • Oracle’s XMLDIFF function that was introduced in Oracle Database 11g Release 1.
  • Write your own from scratch.

Let’s look at each of these alternatives in turn.

PL/SQL XML Compare Project

The function returns a collection of records describing the nodes of the two documents that differ. You can then traverse the result to understand the differences and you could also convert this to XML. Let’s look at what DBI_XML_COMPAREDBI_XML_COMPARE.GET_XML_DIFFERENCES returns when converted to XML using the following code snippet:

create or replace function xmlcdata2(
  c in clob
)
return clob as

/**
 * Wraps CLOB value up in a CDATA section. Oracle's built-in XMLCDATA function
 * was not introduced until 10.2, doesn't supporting "chaining" CDATA sections
 * and doesn't support CLOBs.
 * @param   c       CLOB to be wrapped up in a CDATA section. CDATA section
 *                  chaining is used if C contains the sequence ']]>'.
 * @return  CDATA section.
 */

  result clob;

begin
  dbms_lob.createtemporary(result, false, dbms_lob.call);
  dbms_lob.append(result, '<![CDATA[');
  if c is not null then
    -- Append the value. Replace embedded ]]> -- CDATA chaining.
    dbms_lob.append(result, replace(c, ']]>', ']]]]><![CDATA[>'));
  end if;
  dbms_lob.append(result, ']]>');

  return result;
end xmlcdata2;
/

create or replace function xmldiff_dbi_xml_compare(
  xml1 in xmltype,
  xml2 in xmltype
)
return xmltype as

/**
 * Compares two XML documents and returns the differences in XML. Based on
 * DBI_XML_COMPARE.
 * @param   xml1    First XML document.
 * @param   xml2    Second XML document.
 * @return  Document describing differences based on an XML version of what is
 *          return from DBI_XML_COMPARE.GET_XML_DIFFERENCES. NULL if no
 *          differences.
 */

  result clob;
  differences dbi_xml_compare.t_nodes;

begin
  differences := dbi_xml_compare.get_xml_differences(
    xml1,
    xml2,
    p_ignore_whitespace => dbi_xml_compare.gc_whitespace_normalize
  );

  if differences is not null and differences.count > 0 then
    dbms_lob.createtemporary(result, true, dbms_lob.call);
    dbms_lob.append(result, '<dbi_xml_compare>');
  end if;

  for i in nvl(differences.first, 0) .. nvl(differences.last, -1) loop
    dbms_lob.append(result, '<difference>');
    dbms_lob.append(result, '<document_id>');
    dbms_lob.append(result, xmlcdata2(to_char(differences(i).document_id)));
    dbms_lob.append(result, '</document_id>');
    dbms_lob.append(result, '<node_id>');
    dbms_lob.append(result, xmlcdata2(differences(i).node_id));
    dbms_lob.append(result, '</node_id>');
    dbms_lob.append(result, '<node_values>');

    for j in
        nvl(differences(i).node_values.first, 0) ..
        nvl(differences(i).node_values.last, -1) loop
      dbms_lob.append(result, '<node_value id="' || j || '">');
      dbms_lob.append(result, xmlcdata2(differences(i).node_values(j)));
      dbms_lob.append(result, '</node_value>');
    end loop;

    dbms_lob.append(result, '</node_values>');
    dbms_lob.append(
      result,
      '<difference_type>' ||
        case differences(i).difference_type
          when dbi_xml_compare.gc_identical_value then
            'dbi_xml_compare.gc_identical_value'
          when dbi_xml_compare.gc_different_value then
            'dbi_xml_compare.gc_different_value'
          when dbi_xml_compare.gc_no_equivalent_node then
            'dbi_xml_compare.gc_no_equivalent_node'
          when dbi_xml_compare.gc_ambiguous_element then
            'dbi_xml_compare.gc_ambiguous_element'
        end ||
      '</difference_type></difference>'

    );
  end loop;

  if differences is not null and differences.count > 0 then
    dbms_lob.append(result, '</dbi_xml_compare>');
  end if;

  if result is not null then
    return xmltype(result);
  end if;

  return null;
end xmldiff_dbi_xml_compare;
/

we would get the following differences for our example (abbreviated for legibility, a total of 48 differences were found):

<dbi_xml_compare>
  <difference>
    <document_id><![CDATA[1]]></document_id>
    <node_id><![CDATA[/#document/ROWSET{1}/ROW{1}/LOC{NEW YORK}]]></node_id>
    <node_values/>
    <difference_type>dbi_xml_compare.gc_no_equivalent_node</difference_type>
  </difference>
  <difference>
    <document_id><![CDATA[1]]></document_id>
    <node_id><![CDATA[/#document/ROWSET{1}/ROW{4}/LOC{BOSTON}]]></node_id>
    <node_values/>
    <difference_type>dbi_xml_compare.gc_no_equivalent_node</difference_type>
  </difference>
  <!-- snip -->
  <difference>
    <document_id><![CDATA[1]]></document_id>
    <node_id><![CDATA[/#document/ROWSET{1}/ROW{1}/DEPTNO{10}]]></node_id>
    <node_values/>
    <difference_type>dbi_xml_compare.gc_no_equivalent_node</difference_type>
  </difference>
  <difference>
    <document_id><![CDATA[2]]></document_id>
    <node_id><![CDATA[/#document/ROWSET{1}/ROW{4}/DNAME{ACCOUNTING}/#text]]></node_id>
    <node_values>
      <node_value id="1"><![CDATA[ACCOUNTING]]></node_value>
    </node_values>
    <difference_type>dbi_xml_compare.gc_no_equivalent_node</difference_type>
  </difference>
  <difference>
    <document_id><![CDATA[2]]></document_id>
    <node_id><![CDATA[/#document/ROWSET{1}/ROW{4}/LOC{ALBANY}]]></node_id>
    <node_values/>
    <difference_type>dbi_xml_compare.gc_no_equivalent_node</difference_type>
  </difference>
  <!-- snip -->
  <difference>
    <document_id><![CDATA[2]]></document_id>
    <node_id><![CDATA[/#document/ROWSET{1}/ROW{4}/DNAME{ACCOUNTING}]]></node_id>
    <node_values/>
    <difference_type>dbi_xml_compare.gc_no_equivalent_node</difference_type>
  </difference>
</dbi_xml_compare>

which clearly is not what we need — basically the differ got so confused that it couldn’t match anything.

This was tested with DBI_XML_COMPARE version 1.3. DBI_XML_COMPARE does not support element values larger than 32KB.

Quest Code Tester for Oracle

Like DBI_XML_COMPAREDBI_XML_COMPARE.GET_XML_DIFFERENCES, QU_XML.GET_XML_DIFFERENCES returns a collection of records. Using the following snippet to convert this to XML:

create or replace function xmldiff_qu_xml(
  xml1 in xmltype,
  xml2 in xmltype
)
return xmltype as

/**
 * Compares two XML documents and returns the differences in XML. Based on
 * QU_XML.
 * @param   xml1    First XML document.
 * @param   xml2    Second XML document.
 * @return  Document describing differences based on an XML version of what is
 *          return from QU_XML.GET_XML_DIFFERENCES. NULL if no differences.
 */

  nodes qu_xml.t_nodes;
  value varchar2(32767);

  result clob;

begin
  nodes := qu_xml.get_xml_differences(xml1, xml2);

  if nodes.count > 0 then
    dbms_lob.createtemporary(result, true, dbms_lob.call);
    dbms_lob.append(result, '<qu_xml>');
    for i in nvl(nodes.first, 0) .. nvl(nodes.last, -1) loop
      if nodes(i).node_values.count > 0 then
        value := nodes(i).node_values(nodes(i).node_values.first);
      else
        value := '';
      end if;
      dbms_lob.append(result, '<diff_node>');
      dbms_lob.append(result, '<name>');
      dbms_lob.append(result, xmlcdata2(nodes(i).node_id));
      dbms_lob.append(result, '</name>');
      dbms_lob.append(result, '<value>');
      dbms_lob.append(result, xmlcdata2(value));
      dbms_lob.append(result, '</value>');
      dbms_lob.append(result, '<difference>' ||
          case nodes(i).difference_type
            when 1 then
              'Value difference'
            when 2 then
              'Node removed/added'
            when 3 then
              'Ambiguous'
          end ||
          '</difference></diff_node>'
      );
    end loop;
    dbms_lob.append(result, '</qu_xml>');
  end if;

  if result is not null then
    return xmltype(result);
  end if;

  return null;
end xmldiff_qu_xml;
/

we would get the following differences (abbreviated for legibility, a total of 48 differences were found):

<qu_xml>
  <diff_node>
    <name><![CDATA[/#document/ROWSET{1}/ROW{1}/LOC{NEW YORK}]]></name>
    <value><![CDATA[]]></value>
    <difference>Node removed/added</difference>
  </diff_node>
  <diff_node>
    <name><![CDATA[/#document/ROWSET{1}/ROW{4}/LOC{BOSTON}]]></name>
    <value><![CDATA[]]></value>
    <difference>Node removed/added</difference>
  </diff_node>
  <!-- snip -->
  <diff_node>
    <name><![CDATA[/#document/ROWSET{1}/ROW{4}/DNAME{ACCOUNTING}]]></name>
    <value><![CDATA[]]></value>
    <difference>Node removed/added</difference>
  </diff_node>
</qu_xml>

This was tested with Quest Code Tester for Oracle 1.9.1.505. QU_XML does not support element values larger than 32KB.

Open Source Java Code

It would be convenient if we could find an open source Java project that meets our requirements. If one was found, we could load it into the database using loadjava, assuming that the JDK used by the Java code is compatible with the JDK version of our Oracle database. One such open source project to investigate is XMLUnit, which is an extension to JUnit — the Java unit testing framework.

Here’s some Java code that uses XMLUnit for comparison of our example (setup of the XML documents has been left out):

package com.appatra.blog;

import com.topologi.diffx.DiffXException;

import java.io.IOException;

import java.util.List;

import javax.xml.transform.TransformerConfigurationException;

import org.custommonkey.xmlunit.DetailedDiff;
import org.custommonkey.xmlunit.Difference;
import org.custommonkey.xmlunit.XMLTestCase;

import org.jdom.JDOMException;

import org.xml.sax.SAXException;

/**
 * Test of XMLUnit's capabilities of XML comparison.
 */

public class XmlDiffXmlUnit extends XMLTestCase {
  public XmlDiffXmlUnit() {
    super("XmlDiffXmlUnit");
  }

  private void test()
  throws SAXException, IOException {
    String xml1 = /* Left out... */;
    String xml2 = /* Left out... */;

    DetailedDiff myDiff = new DetailedDiff(compareXML(xml2, xml1));
    List<Difference> allDifferences = (List<Difference>)myDiff.getAllDifferences();

    for (Difference diff : allDifferences) {
      System.err.println(diff);
    }
  }

  public static void main(String[] args)
  throws SAXException, IOException {
    XmlDiffXmlUnit xmlDiffXmlUnit = new XmlDiffXmlUnit();

    xmlDiffXmlUnit.test();
  }
}

This produces the following output (12 differences, wrapped for improved legibility):

Expected text value '20' but was '10' -
  comparing <DEPTNO ...>20</DEPTNO> at
  /ROWSET[1]/ROW[1]/DEPTNO[1]/text()[1] to
  <DEPTNO ...>10</DEPTNO> at /ROWSET[1]/ROW[1]/DEPTNO[1]/text()[1]
Expected text value 'RESEARCH' but was 'ACCOUNTING' -
  comparing <DNAME ...>RESEARCH</DNAME> at
  /ROWSET[1]/ROW[1]/DNAME[1]/text()[1] to
  <DNAME ...>ACCOUNTING</DNAME> at /ROWSET[1]/ROW[1]/DNAME[1]/text()[1]
Expected text value 'DALLAS' but was 'NEW YORK' -
  comparing <LOC ...>DALLAS</LOC> at
  /ROWSET[1]/ROW[1]/LOC[1]/text()[1] to
  <LOC ...>NEW YORK</LOC> at /ROWSET[1]/ROW[1]/LOC[1]/text()[1]
Expected text value '30' but was '20' -
  comparing <DEPTNO ...>30</DEPTNO> at
  /ROWSET[1]/ROW[2]/DEPTNO[1]/text()[1] to
  <DEPTNO ...>20</DEPTNO> at /ROWSET[1]/ROW[2]/DEPTNO[1]/text()[1]
Expected text value 'SALES' but was 'RESEARCH' -
  comparing <DNAME ...>SALES</DNAME> at
  /ROWSET[1]/ROW[2]/DNAME[1]/text()[1] to
  <DNAME ...>RESEARCH</DNAME> at /ROWSET[1]/ROW[2]/DNAME[1]/text()[1]
Expected text value 'CHICAGO' but was 'DALLAS' -
  comparing <LOC ...>CHICAGO</LOC> at
  /ROWSET[1]/ROW[2]/LOC[1]/text()[1] to
  <LOC ...>DALLAS</LOC> at /ROWSET[1]/ROW[2]/LOC[1]/text()[1]
Expected text value '40' but was '30' -
  comparing <DEPTNO ...>40</DEPTNO> at
  /ROWSET[1]/ROW[3]/DEPTNO[1]/text()[1] to
  <DEPTNO ...>30</DEPTNO> at /ROWSET[1]/ROW[3]/DEPTNO[1]/text()[1]
Expected text value 'XOPERATIONSETC' but was 'SALES' -
  comparing <DNAME ...>XOPERATIONSETC</DNAME> at
  /ROWSET[1]/ROW[3]/DNAME[1]/text()[1] to
  <DNAME ...>SALES</DNAME> at /ROWSET[1]/ROW[3]/DNAME[1]/text()[1]
Expected text value 'NEW YORK' but was 'CHICAGO' -
  comparing <LOC ...>NEW YORK</LOC> at
  /ROWSET[1]/ROW[3]/LOC[1]/text()[1] to
  <LOC ...>CHICAGO</LOC> at /ROWSET[1]/ROW[3]/LOC[1]/text()[1]
Expected text value '50' but was '40' -
  comparing <DEPTNO ...>50</DEPTNO> at
  /ROWSET[1]/ROW[4]/DEPTNO[1]/text()[1] to
  <DEPTNO ...>40</DEPTNO> at /ROWSET[1]/ROW[4]/DEPTNO[1]/text()[1]
Expected text value 'ACCOUNTING' but was 'OPERATIONS' -
  comparing <DNAME ...>ACCOUNTING</DNAME> at
  /ROWSET[1]/ROW[4]/DNAME[1]/text()[1] to
  <DNAME ...>OPERATIONS</DNAME> at /ROWSET[1]/ROW[4]/DNAME[1]/text()[1]
Expected text value 'ALBANY' but was 'BOSTON' -
  comparing <LOC ...>ALBANY</LOC> at
  /ROWSET[1]/ROW[4]/LOC[1]/text()[1] to
  <LOC ...>BOSTON</LOC> at /ROWSET[1]/ROW[4]/LOC[1]/text()[

This was tested with XMLUnit version 1.3.

XMLDIFF

Oracle introduced a new function XMLDIFF in Oracle Database 11g Release 1. This function generates an XML document with the differences conforming to an Xdiff schema. Let’s look at how this function works on our example:

create or replace function xmldiff_xmldiff(
  xml1 in xmltype,
  xml2 in xmltype
)
return xmltype as

/**
 * Compares two XML documents and returns the differences in XML. Based on
 * XMLDIFF, hence only supported with Oracle Database 11g Release 1 and newer.
 * @param   xml1    First XML document.
 * @param   xml2    Second XML document.
 * @return  Document describing differences based on an XML version of what is
 *          return from QU_XML.GET_XML_DIFFERENCES. NULL if no differences.
 */

  diff xmltype;

begin
  select xmldiff(xml1, xml2)
  into   diff
  from   dual;
  if diff is not null then
    if diff.extract('/*/*') is not null then
      return diff;
    end if;
  end if;

  return null;
end xmldiff_xmldiff;
/

which returns the following based on our example (once again, 12 differences):

<xd:xdiff
    xsi:schemaLocation="http://xmlns.oracle.com/xdb/xdiff.xsd
    http://xmlns.oracle.com/xdb/xdiff.xsd"
    xmlns:xd="http://xmlns.oracle.com/xdb/xdiff.xsd"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <?oracle-xmldiff operations-in-docorder="true"
    output-model="snapshot"
    diff-algorithm="global"?>
  <xd:update-node xd:node-type="text"
      xd:xpath="/ROWSET[1]/ROW[1]/DEPTNO[1]/text()[1]">
    <xd:content>20</xd:content>
  </xd:update-node>
  <xd:update-node xd:node-type="text"
      xd:xpath="/ROWSET[1]/ROW[1]/DNAME[1]/text()[1]">
    <xd:content>RESEARCH</xd:content>
  </xd:update-node>
  <xd:update-node xd:node-type="text"
      xd:xpath="/ROWSET[1]/ROW[1]/LOC[1]/text()[1]">
    <xd:content>DALLAS</xd:content>
  </xd:update-node>
  <xd:update-node xd:node-type="text"
      xd:xpath="/ROWSET[1]/ROW[2]/DEPTNO[1]/text()[1]">
    <xd:content>30</xd:content>
  </xd:update-node>
  <xd:update-node xd:node-type="text"
      xd:xpath="/ROWSET[1]/ROW[2]/DNAME[1]/text()[1]">
    <xd:content>SALES</xd:content>
  </xd:update-node>
  <xd:update-node xd:node-type="text"
      xd:xpath="/ROWSET[1]/ROW[2]/LOC[1]/text()[1]">
    <xd:content>CHICAGO</xd:content>
  </xd:update-node>
  <xd:update-node xd:node-type="text"
      xd:xpath="/ROWSET[1]/ROW[3]/DEPTNO[1]/text()[1]">
    <xd:content>40</xd:content>
  </xd:update-node>
  <xd:update-node xd:node-type="text"
      xd:xpath="/ROWSET[1]/ROW[3]/DNAME[1]/text()[1]">
    <xd:content>XOPERATIONSETC</xd:content>
  </xd:update-node>
  <xd:update-node xd:node-type="text"
      xd:xpath="/ROWSET[1]/ROW[3]/LOC[1]/text()[1]">
    <xd:content>NEW YORK</xd:content>
  </xd:update-node>
  <xd:update-node xd:node-type="text"
      xd:xpath="/ROWSET[1]/ROW[4]/DEPTNO[1]/text()[1]">
    <xd:content>50</xd:content>
  </xd:update-node>
  <xd:update-node xd:node-type="text"
      xd:xpath="/ROWSET[1]/ROW[4]/DNAME[1]/text()[1]">
    <xd:content>ACCOUNTING</xd:content>
  </xd:update-node>
  <xd:update-node xd:node-type="text"
      xd:xpath="/ROWSET[1]/ROW[4]/LOC[1]/text()[1]">
    <xd:content>ALBANY</xd:content>
  </xd:update-node>
</xd:xdiff>

Common Issue

A common problem with these alternatives is that none of these differs has a concept of a primary key for an XML fragment. In our example, the XML is Oracle’s canoncial form representing the data in the SCOTT.DEPT table. The primary key of this table is the DEPTNO column. If the differs had known this or had spent time analyzing the XML and detected it, they would have produced a much more useful output, instead of getting very confused and thinking that everything has changed.

The last one of the alternatives was to write your own XML differ from scratch and I’ll come back to this in a later blog post. For now, let’s look at how we can improve on XMLDIFF by writing a wrapper around it that detects primary key usage and internally uses XMLDIFF to diff the fragments for each primary key value found.

XMLDIFF2

The algorithm in our extended XMLDIFF function that we’re going to call XMLDIFF2 is:

  1. Prepare CLOB variable for constructing the difference XML.
  2. Find minimum number of leading elements that form primary key values (unique lookup).
  3. Look for primary key values in XML1 not present in XML2: These primary key values have been deleted.
  4. Look for primary key values in XML2 not present in XML1: These primary key values have been inserted.
  5. Diff all common primary key values, ie in both XML1 and XML2.
  6. Return NULL if no differences found.

The implementation uses the XMLCDATA2 function that I’ve listed earlier. Here is the listing of XMLDIFF2:

create or replace function xmldiff2(
  xml1 in xmltype,
  xml2 in xmltype
)
return xmltype as

/**
 * Extension to Oracle's XMLDIFF function, available with Oracle 11.1 and newer.
 * This version is to be used primarily with simple XML documents in Oracles
 * "canonical form", ie:
 * <ROWSET>
 *   <ROW>
 *     <COLUMN_NAME_1>column_value_1</COLUMN_NAME_1>
 *     <COLUMN_NAME_2>column_value_2</COLUMN_NAME_2>
 *     ...
 *     <COLUMN_NAME_N>column_value_n</COLUMN_NAME_N>
 *   </ROW>
 * </ROWSET>
 * Assumptions: It is assumed that none of the column elements in the XML
 * contain XML, such that the maximum level of XML elements is 3. It is also
 * assumed that datetime XML elements are passed in using XML Schema compliant
 * formats ('yyyy-mm-dd"T"hh24:mi:ss' for DATE, 'yyyy-mm-dd"T"hh24:mi:ss.ff9'
 * for TIMESTAMP and 'yyyy-mm-dd"T"hh24:mi:ss.ff9tzh:tzm' for TIMESTAMP WITH
 * [LOCAL] TIME ZONE).
 * Extensions: This version understands a concept of leading elements at level 3
 * forming a primary key and ignores document element order.
 * Differences are returned in the following form:
 * <DIFFERENCES>
 *   <DIFFERENCE>
 *     <XPATH><![CDATA[<XPATH1>]]></XPATH>
 *     <OPERATION>(INSERT|DELETE|UPDATE)</OPERATION>
 *     <VALUE1><![CDATA[<VALUE_FROM_XML1>]]></VALUE1>
 *     <VALUE2><![CDATA[<VALUE_FROM_XML2>]]></VALUE2>
 *   </DIFFERENCE>
 *   <DIFFERENCE>
 *     <XPATH><![CDATA[<XPATH1>]]></XPATH>
 *     <OPERATION>(INSERT|DELETE|UPDATE)</OPERATION>
 *     <VALUE1><![CDATA[<VALUE_FROM_XML1>]]></VALUE1>
 *     <VALUE2><![CDATA[<VALUE_FROM_XML2>]]></VALUE2>
 *   </DIFFERENCE>
 *   ...
 *   <DIFFERENCE>
 *     <XPATH><![CDATA[<XPATH1>]]></XPATH>
 *     <OPERATION>(INSERT|DELETE|UPDATE)</OPERATION>
 *     <VALUE1><![CDATA[<VALUE_FROM_XML1>]]></VALUE1>
 *     <VALUE2><![CDATA[<VALUE_FROM_XML2>]]></VALUE2>
 *   </DIFFERENCE>
 * </DIFFERENCES>
 * <VALUE1> is only present for DELETE and UPDATE operations. <VALUE2> is only
 * present for INSERT and UPDATE operations.
 * Limitations. Namespace and attributes are not supported. Works only with
 * Oracle 11.1 and newer.
 * @param   xml1    First XML document.
 * @param   xml2    Second XML document.
 * @return  Differences, NULL if none found.
 */

  ---  Xdiff namespace.
  xmldiff_ns constant varchar2(48) :=
      'xmlns:xd="http://xmlns.oracle.com/xdb/xdiff.xsd"';

  --- Element overview, ie name and count.
  type element_overview_t is record (
    name   varchar2(4000),
    occurs integer
  );
  type level2_elements_c is
  table of element_overview_t
  index by binary_integer;

  --- Level 2 element overview for XML1.
  level2_elements1 level2_elements_c;
  --- Level 2 element overview for XML2.
  level2_elements2 level2_elements_c;

  -- Primary key XPath expressions.
  type varchar2_4000_lookup is
  table of boolean
  index by varchar2(4000);
  pkvs varchar2_4000_lookup;

  --- Number of elements required to form primary key.
  pk_element_count integer := null;

  --- Assuming Oracle canonica form: Root element name.
  element1_name constant varchar2(6) := 'ROWSET';
  --- Assuming Oracle canonica form: Element name for each row (level 2).
  element2_name constant varchar2(3) := 'ROW';

  ---
  type pk_element_t is record (
    name  varchar2(4000),
    value varchar2(4000)
  );
  type pk_element_c is
  table of pk_element_t
  index by binary_integer;
  pk_elements pk_element_c;

  -- Extract of XML1.
  xml1e xmltype;
  -- Extract of XML2.
  xml2e xmltype;
  --- Result from SYS.XMLDIFF.
  diff2 xmltype;
  --- Return result.
  result xmltype;

  --- Column value for XML1.
  cv1 clob;
  --- Column value for XML2.
  cv2 clob;
  --- Primary key XPath.
  pk_xpath varchar2(32767);
  --- Primary key XPath for XML1.
  pk_xpath1 varchar2(4000);
  --- Primary key XPath for XML2.
  pk_xpath2 varchar2(4000);
  --- Primary key value.
  pkv varchar2(4000);
  --- Column name.
  cname varchar2(32767);

  -- XPath.
  xpath varchar2(4000);

  --- Difference document.
  diff clob;
  --- XML update fragment.
  xmlu clob;

  procedure diff_nodes;

/**
 * Parses the level 2 elements in given XML document.
 * @param   xml   XML document.
 * @return  Collection of level 2 elements.
 */

  function get_level2_elements(
    xml in xmltype
  )
  return level2_elements_c as

    level2_elements level2_elements_c;

  begin
    select name,
           count(*)
    bulk collect into level2_elements
    from   (
             select x.getrootelement() name
             from   table(xmlsequence(extract(xml, '/*/*'))) x
           )
    group  by name;

    return level2_elements;
  end get_level2_elements;

/**
 * Finds the number of leading elements it takes to form a unique lookup, ie the
 * minimum number of elements that are required for a primary key.
 * @param   xml     XML document.
 * @param   level2_elements
 *                  Collection of level 2 elements.
 * @return  Number of elements in primary key for this document.
 */

  function get_pk_element_count(
    xml in xmltype,
    level2_elements in level2_elements_c
  )
  return pls_integer as

    j pls_integer;
    n pls_integer := 1;
    m pls_integer;
    xpath varchar2(32767);
    pk_count pls_integer;
    done boolean := false;

  begin
    for i in 1 .. level2_elements(1).occurs loop
      -- For each element in the collection.
      j := 1;
      xpath := '/' || element1_name || '/' || element2_name || '[';
      for pk in (
            select value(x) element
            from   table(
                     xmlsequence(
                       extract(xml, '/' || element1_name || '/' ||
                           element2_name || '[' || i || ']/*'
                       )
                     )
                   ) x
          ) loop
        -- For each element for the ith element of the collection.
        pk_elements(j).name := pk.element.getrootelement;
        pk_elements(j).value := pk.element.extract('//text()').getstringval;

        if j > 1 then
          xpath := rtrim(xpath, ']') || ' and ';
        end if;
        xpath := xpath || pk_elements(j).name || '="' ||
            pk_elements(j).value || '"';
        xpath := xpath || ']';

        if j = n then
          select count(*)
          into   pk_count
          from   table(xmlsequence(extract(xml, xpath)));

          done := pk_count = 1 or j = 32;
          exit when done;

          n := n + 1;
        end if;

        j := j + 1;
      end loop;
    end loop;

    return n;
  end get_pk_element_count;

/**
 * Appends a fragment to the differences XML document.
 */

  procedure append_fragment(
    fragment in clob
  ) as

  begin
    dbms_lob.append(diff, fragment);
  end append_fragment;

/**
 * Check that each fragment from first document identified through each primary
 * key can be found in the second document.
 * @param   operation
 *                  Operation to use for difference fragment, ie 'DELETE' if
 *                  it's missing or 'INSERT' if it has been inserted.
 * @param   xml1    First XML document.
 * @param   xml2    Second XML document.
 */

  procedure check_existence(
    operation in varchar2,
    xml1 in xmltype,
    xml2 in xmltype
  ) as

    relative_xpath varchar2(32767);
    pk_exists boolean;
    value_id pls_integer;

  begin
    for collection_elements in (
          select value(x) node
          from   table(
                   xmlsequence(
                     extract(
                       xml1, '/' || element1_name || '/' || element2_name
                     )
                   )
                 ) x
        ) loop
      relative_xpath := '/' || element1_name || '/' || element2_name || '[';
      for i in 1 .. pk_element_count loop
        if i > 1 then
          relative_xpath := relative_xpath || ' and ';
        end if;
        relative_xpath := relative_xpath || pk_elements(i).name || '="' ||
            collection_elements.node.extract(
              '/*/*[' || i || ']/text()'
            ).getstringval || '"';
      end loop;
      relative_xpath := relative_xpath || ']';

      pk_exists := pkvs.exists(relative_xpath);
      if pk_exists or xml2.existsnode(relative_xpath) = 1 then
        pkvs(relative_xpath) := true;
      else
        dbms_lob.createtemporary(xmlu, true, dbms_lob.call);
        dbms_lob.append(
          xmlu,
          '<DIFFERENCE>' ||
            '<XPATH>' || xmlcdata2(relative_xpath) || '</XPATH>' ||
            '<OPERATION>' || operation || '</OPERATION>'
        );
        if operation = 'INSERT' then
          value_id := 2;
        elsif operation = 'DELETE' then
          value_id := 1;
        end if;
        dbms_lob.append(xmlu, '<VALUE' || value_id || '>');
        dbms_lob.append(
          xmlu,
          xmlcdata2(rtrim(xml1.extract(relative_xpath).getclobval, chr(10)))
        );
        dbms_lob.append(xmlu, '</VALUE' || value_id || '>');
        dbms_lob.append(xmlu, '</DIFFERENCE>');
        append_fragment(xmlu);
      end if;
    end loop;
  end check_existence;

/**
 * Diff the fragments found for the primary key values common between XML1 and
 * XML2. This is done by calling Oracle's XMLDIFF function.
 */

  procedure diff_common_pks as

  begin
    pkv := pkvs.first;
    while pkv is not null loop
      xpath := pkv;
      -- Note the extra GETCLOBVAL and XMLTYPE. Seem that Oracle XMLTYPE has a
      -- bug here as XML1E = XML1 if not done!
      xml1e := xmltype(xml1.extract(xpath).getclobval);
      xml2e := xmltype(xml2.extract(xpath).getclobval);

      select xmldiff(xml1e, xml2e)
      into   diff2
      from   dual;

      diff_nodes;

      pkv := pkvs.next(pkv);
    end loop;
  end diff_common_pks;

/**
 * Inspect the diff result from Oracle's XMLDIFF and append to our difference
 * document.
 */

  procedure diff_nodes as

    node_type varchar2(100);
    operation varchar2(6);
    diff_type varchar2(18);
    xpath xmltype;
    content xmltype;

  begin
    for cv in (
          select value(x) v
          from   table(xmlsequence(extract(diff2, '/xd:xdiff/*', xmldiff_ns))) x
        ) loop
      -- For each difference found.
      xpath := cv.v.extract('//@xd:xpath', xmldiff_ns);
      if xpath is null then
        xpath := cv.v.extract('//@xd:parent-xpath', xmldiff_ns);
      end if;
      cname := xpath.getstringval;
      cname := substr(cname, instr(cname, '/', 2) + 1);
      cname := substr(cname, 1, instr(cname, '[') - 1);
      node_type := cv.v.extract('//@xd:node-type', xmldiff_ns).getstringval;

      diff_type := cv.v.getrootelement;
      if diff_type like '%delete-node' then
        -- Node was deleted. Was it an element value or an element?
        cv1 := xml1e.extract('//' || cname || '/text()').getclobval;
        cv2 := '';
        if node_type = 'text' then
          -- Was non-NULL but is now NULL.
          operation := 'UPDATE';
        else
          -- Node has been removed.
          operation := 'DELETE';
        end if;
      elsif diff_type like '%append-node' then
        -- Node has been added. Was it an element value or an element?
        cv1 := '';
        cv2 := xml2e.extract('//' || cname || '/text()').getclobval;
        if node_type = 'text' then
          operation := 'UPDATE';
        else
          -- Node has been inserted.
          operation := 'INSERT';
        end if;
      elsif diff_type like '%insert-node-before' then
        -- Node has been added, find which.
        content := cv.v.extract('/*/xd:content/*', xmldiff_ns);
        cname := content.getrootelement;
        cv1 := '';
        cv2 := content.extract('//text()').getclobval;
        operation := 'INSERT';
      elsif diff_type like '%update-node' then
        -- Node has been updated.
        cv1 := xml1e.extract('//' || cname || '/text()').getclobval;
        cv2 := cv.v.extract('//xd:content/text()', xmldiff_ns).getclobval;
        operation := 'UPDATE';
      end if;

      dbms_lob.createtemporary(xmlu, false, dbms_lob.call);
      dbms_lob.append(
        xmlu,
        '<DIFFERENCE>' ||
          '<XPATH>' || xmlcdata2(pkv || '/' || cname) || '</XPATH>' ||
          '<OPERATION>' || operation || '</OPERATION>'
      );

      if operation in ('DELETE', 'UPDATE') then
        dbms_lob.append(xmlu, '<VALUE1>');
        dbms_lob.append(xmlu, xmlcdata2(cv1));
        dbms_lob.append(xmlu, '</VALUE1>');
      end if;
      if operation in ('INSERT', 'UPDATE') then
        dbms_lob.append(xmlu, '<VALUE2>');
        dbms_lob.append(xmlu, xmlcdata2(cv2));
        dbms_lob.append(xmlu, '</VALUE2>');
      end if;

      dbms_lob.append(xmlu, '</DIFFERENCE>');
      append_fragment(xmlu);
    end loop;
  end diff_nodes;

begin
  pk_xpath1 := substr(pk_xpath, 1, instr(pk_xpath, '/', -1) - 1);
  pk_xpath2 := substr(pk_xpath, length(pk_xpath1) + 2);

  dbms_lob.createtemporary(diff, true, dbms_lob.call);
  dbms_lob.append(diff, '<DIFFERENCES>');
  -- TODO: NULL handling.

  level2_elements1 := get_level2_elements(xml1);
  level2_elements2 := get_level2_elements(xml2);

  -- Find minimum number of leading elements that form PK values.
  pk_element_count := greatest(
    get_pk_element_count(xml1, level2_elements1),
    get_pk_element_count(xml2, level2_elements2)
  );

  -- Look for PKs in XML1 not present in XML2: DELETE fragment.
  check_existence('DELETE', xml1, xml2);

  -- Look for PKs in XML2 not present in XML1: INSERT fragment.
  check_existence('INSERT', xml2, xml1);

  -- Diff all common PKs: UPDATE fragment.
  diff_common_pks;

  dbms_lob.append(diff, '</DIFFERENCES>');

  if length(diff) = 13 + 14 and diff = '<DIFFERENCES></DIFFERENCES>' then
    -- No differences.
    result := null;
  else
    result := xmltype(diff);
  end if;

  return result;
end xmldiff2;
/

Examples

Here are a few examples on how to use XMLDIFF2.

The first example is the example used for comparison of the various alternatives described earlier:

declare
  xml1 xmltype := xmltype('
<ROWSET>
  <ROW>
    <DEPTNO>10</DEPTNO>
    <DNAME>ACCOUNTING</DNAME>
    <LOC>NEW YORK</LOC>
  </ROW>
  <ROW>
    <DEPTNO>20</DEPTNO>
    <DNAME>RESEARCH</DNAME>
    <LOC>DALLAS</LOC>
  </ROW>
  <ROW>
    <DEPTNO>30</DEPTNO>
    <DNAME>SALES</DNAME>
    <LOC>CHICAGO</LOC>
  </ROW>
  <ROW>
    <DEPTNO>40</DEPTNO>
    <DNAME>OPERATIONS</DNAME>
    <LOC>BOSTON</LOC>
  </ROW>
</ROWSET>');
  xml2 xmltype := xmltype('
<ROWSET>
  <ROW>
    <DEPTNO>20</DEPTNO>
    <DNAME>RESEARCH</DNAME>
    <LOC>DALLAS</LOC>
  </ROW>
  <ROW>
    <DEPTNO>30</DEPTNO>
    <DNAME>SALES</DNAME>
    <LOC>CHICAGO</LOC>
  </ROW>
  <ROW>
    <DEPTNO>40</DEPTNO>
    <DNAME>XOPERATIONSETC</DNAME>
    <LOC>NEW YORK</LOC>
  </ROW>
  <ROW>
    <DEPTNO>50</DEPTNO>
    <DNAME>ACCOUNTING</DNAME>
    <LOC>ALBANY</LOC>
  </ROW>
</ROWSET>');
  diff xmltype;
begin
  diff := xmldiff2(xml1, xml2);
  dbms_output.put_line(diff.getclobval(0, 2));
end;
/

which results in:

<DIFFERENCES>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[DEPTNO="10"]]]></XPATH>
    <OPERATION>DELETE</OPERATION>
    <VALUE1><![CDATA[<ROW>
  <DEPTNO>10</DEPTNO>
  <DNAME>ACCOUNTING</DNAME>
  <LOC>NEW YORK</LOC>
</ROW>]]></VALUE1>
  </DIFFERENCE>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[DEPTNO="50"]]]></XPATH>
    <OPERATION>INSERT</OPERATION>
    <VALUE2><![CDATA[<ROW>
  <DEPTNO>50</DEPTNO>
  <DNAME>ACCOUNTING</DNAME>
  <LOC>ALBANY</LOC>
</ROW>]]></VALUE2>
  </DIFFERENCE>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[DEPTNO="40"]/DNAME]]></XPATH>
    <OPERATION>UPDATE</OPERATION>
    <VALUE1><![CDATA[OPERATIONS]]></VALUE1>
    <VALUE2><![CDATA[XOPERATIONSETC]]></VALUE2>
  </DIFFERENCE>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[DEPTNO="40"]/LOC]]></XPATH>
    <OPERATION>UPDATE</OPERATION>
    <VALUE1><![CDATA[BOSTON]]></VALUE1>
    <VALUE2><![CDATA[NEW YORK]]></VALUE2>
  </DIFFERENCE>
</DIFFERENCES>

Another example, this time with just one row in the resultset:

declare
  xml1 xmltype := xmltype('
<ROWSET>
  <ROW>
    <EMPNO>7934</EMPNO>
    <ENAME>MILLER</ENAME>
    <JOB>CLERK</JOB>
    <MGR>7782</MGR>
    <HIREDATE>1982-01-23T00:00:00</HIREDATE>
    <SAL>1300</SAL>
    <COMM/>
    <DEPTNO>10</DEPTNO>
  </ROW>
</ROWSET>');
  xml2 xmltype := xmltype('
<ROWSET>
  <ROW>
    <EMPNO>7934</EMPNO>
    <ENAME>MILLER</ENAME>
    <JOB>ANALYST</JOB>
    <MGR>7782</MGR>
    <HIREDATE>1982-01-23T00:00:00</HIREDATE>
    <SAL>1500</SAL>
    <COMM>100</COMM>
    <DEPTNO>20</DEPTNO>
  </ROW>
</ROWSET>');
  diff xmltype;
begin
  diff := xmldiff2(xml1, xml2);
  dbms_output.put_line(diff.getclobval(0, 2));
end;
/

which results in the following:

<DIFFERENCES>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[EMPNO="7934"]/JOB]]></XPATH>
    <OPERATION>UPDATE</OPERATION>
    <VALUE1><![CDATA[CLERK]]></VALUE1>
    <VALUE2><![CDATA[ANALYST]]></VALUE2>
  </DIFFERENCE>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[EMPNO="7934"]/SAL]]></XPATH>
    <OPERATION>UPDATE</OPERATION>
    <VALUE1><![CDATA[1300]]></VALUE1>
    <VALUE2><![CDATA[1500]]></VALUE2>
  </DIFFERENCE>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[EMPNO="7934"]/COMM]]></XPATH>
    <OPERATION>UPDATE</OPERATION>
    <VALUE1><![CDATA[]]></VALUE1>
    <VALUE2><![CDATA[100]]></VALUE2>
  </DIFFERENCE>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[EMPNO="7934"]/DEPTNO]]></XPATH>
    <OPERATION>UPDATE</OPERATION>
    <VALUE1><![CDATA[10]]></VALUE1>
    <VALUE2><![CDATA[20]]></VALUE2>
  </DIFFERENCE>
</DIFFERENCES>

And a final example, this time an element that is removed and an element that is added:

declare
  xml1 xmltype := xmltype('
<ROWSET>
  <ROW>
    <EMPNO>7934</EMPNO>
    <ENAME>MILLER</ENAME>
    <JOB>CLERK</JOB>
    <MGR>7782</MGR>
    <HIREDATE>1982-01-23T00:00:00</HIREDATE>
    <REMOVED>abc</REMOVED>
    <SAL>1300</SAL>
    <COMM/>
    <DEPTNO>10</DEPTNO>
  </ROW>
</ROWSET>');
  xml2 xmltype := xmltype('
<ROWSET>
  <ROW>
    <EMPNO>7934</EMPNO>
    <ENAME>MILLER</ENAME>
    <JOB>CLERK</JOB>
    <MGR>7782</MGR>
    <ADDED>xyz</ADDED>
    <HIREDATE>1982-01-23T00:00:00</HIREDATE>
    <SAL>1300</SAL>
    <COMM/>
    <DEPTNO>10</DEPTNO>
  </ROW>
</ROWSET>');
  diff xmltype;
begin
  diff := xmldiff2(xml1, xml2);
  dbms_output.put_line(diff.getclobval(0, 2));
end;
/

which gives the following result:

<DIFFERENCES>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[EMPNO="7934"]/ADDED]]></XPATH>
    <OPERATION>INSERT</OPERATION>
    <VALUE2><![CDATA[xyz]]></VALUE2>
  </DIFFERENCE>
  <DIFFERENCE>
    <XPATH><![CDATA[/ROWSET/ROW[EMPNO="7934"]/REMOVED]]></XPATH>
    <OPERATION>DELETE</OPERATION>
    <VALUE1><![CDATA[abc]]></VALUE1>
  </DIFFERENCE>
</DIFFERENCES>

Conclusion

I’ve described different options for comparing XML documents in the Oracle database and implemented an enhanced version of Oracle’s XMLDIFF that can be used to compare SQL datasets for relational tables not using object types, when represented on Oracle canonical form.

References

Advertisements

About ellebaek
Sun Certified Programmer for Java 5 (SCJP). Oracle Advanced PL/SQL Developer Certified Professional (OCP Advanced PL/SQL). ISTQB Certified Tester Foundation Level (ISTQB CTFL).

2 Responses to Comparing XML in Oracle

  1. Mihails says:

    Dear Author,

    Small TIP:
    I guess the simple XSL transformation would be much better in your case:
    1. Create XSLT template
    2. Transform XMLDIFF result using XSLT

    This will considerably decrease XMLDIFF2 function’s code and improve performance.

    Cheers!

    • ellebaek says:

      Hi

      Thanks for your comment.

      I cannot see how my requirements can be achieved with XMLDIFF + XSLT: XMLDIFF doesn’t report what I need (because it has no concept of “primary keys”) and it reports lots of stuff that needs to be removed (eg if one “primary key” has been added at the top, everything will be reported as different).

      However, you’re welcome to show me how you could create an XSLT implementing my requirements.

      Cheers

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: