Importing XML using openxml

The openxml procedure is used in the FROM clause of a query to generate a result set from an XML document. openxml uses a subset of the XPath query language to select nodes from an XML document.

Using XPath expressions

When you use openxml, the XML document is parsed and the result is modeled as a tree. The tree is made up of nodes. XPath expressions are used to select nodes in the tree. The following list describes some commonly-used XPath expressions:

  • /   indicates the root node of the XML document

  • //   indicates all descendants of the root, including the root node

  • . (single period)   indicates the current node of the XML document

  • .//   indicates all descendants of the current node, including the current node

  • ..   indicates the parent node of the current node

  • ./@attributename   indicates the attribute of the current node having the name attributename

  • ./childname   indicates the children of the current node that are elements having the name childname

Consider the following XML document:

<inventory>
  <product ID="301" size="Medium">Tee Shirt
    <quantity>54</quantity>
  </product>
  <product ID="302" size="One Size fits all">Tee Shirt
    <quantity>75</quantity>
  </product>
  <product ID="400" size="One Size fits all">Baseball Cap
    <quantity>112</quantity>
  </product>
</inventory>

The <inventory> element is the root node. You can refer to it using the following XPath expression:

/inventory

Suppose that the current node is a <quantity> element. You can refer to this node using the following XPath expression:

.

To find all the <product> elements that are children of the <inventory> element, use the following XPath expression:

/inventory/product

If the current node is a <product> element and you want to refer to the size attribute, use the following XPath expression:

./@size

For a complete list of XPath syntax supported by openxml, see openxml system procedure.

For information about the XPath query language, see [external link] http://www.w3.org/TR/xpath.

Generating a result set using openxml

Each match for the first xpath-query argument to openxml generates one row in the result set. The WITH clause specifies the schema of the result set and how the value is found for each column in the result set. For example, consider the following query:

SELECT * FROM openxml( '<inventory>
                         <product>Tee Shirt
                           <quantity>54</quantity>
                           <color>Orange</color>
                         </product>
                         <product>Baseball Cap
                 <quantity>112</quantity>
                           <color>Black</color>
                         </product>
                        </inventory>',
                       '/inventory/product' )
WITH ( Name CHAR (25) './text()',
       Quantity CHAR(3) 'quantity',
       Color CHAR(20) 'color');

The first xpath-query argument is /inventory/product, and there are two <product> elements in the XML, so two rows are generated by this query.

The WITH clause specifies that there are three columns: Name, Quantity, and Color. The values for these columns are taken from the <product>, <quantity> and <color> elements. The query above generates the following result:

Name Quantity Color
Tee Shirt 54 Orange
Baseball Cap 112 Black

For more information, see openxml system procedure.

Using openxml to generate an edge table

The openxml procedure can be used to generate an edge table, a table that contains a row for every element in the XML document. You may want to generate an edge table so that you can query the data in the result set using SQL.

The following SQL statement creates a variable, x, that contains an XML document. The XML generated by the query has a root element called <root>, which is generated using the XMLELEMENT function, and elements are generated for each column in the Employees, SalesOrders, and Customers tables using FOR XML AUTO with the ELEMENTS modifier specified.

For information about the XMLELEMENT function, see XMLELEMENT function [String].

For information about FOR XML AUTO, see Using FOR XML AUTO.

CREATE VARIABLE x XML;
SET x=(SELECT XMLELEMENT( NAME root,
         (SELECT * FROM Employees
         KEY JOIN SalesOrders
         KEY JOIN Customers
         FOR XML AUTO, ELEMENTS)));
SELECT x;

The generated XML looks as follows (the result has been formatted to make it easier to read—the result returned by the query is one continuous string):

<root>
 <Employees>
  <EmployeeID>299</EmployeeID>
  <ManagerID>902</ManagerID>
  <Surname>Overbey</Surname>
  <GivenName>Rollin</GivenName>
  <DepartmentID>200</DepartmentID>
  <Street>191 Companion Ct.</Street>
  <City>Kanata</City>
  <State>CA</State>
  <Country>USA</Country>
  <PostalCode>94608</PostalCode>
  <Phone>5105557255</Phone>
  <Status>A</Status>
  <SocialSecurityNumber>025487133</SocialSecurityNumber>
  <Salary>39300.000</Salary>
  <StartDate>1987-02-19</StartDate>
  <BirthDate>1964-03-15</BirthDate>
  <BenefitHealthInsurance>Y</BenefitHealthInsurance>
  <BenefitLifeInsurance>Y</BenefitLifeInsurance>
  <BenefitDayCare>N</BenefitDayCare>
  <Sex>M</Sex>
  <SalesOrders>
  <ID>2001</ID>
  <CustomerID>101</CustomerID>
  <OrderDate>2000-03-16</OrderDate>
  <FinancialCode>r1</FinancialCode>
  <Region>Eastern</Region>
  <SalesRepresentative>299</SalesRepresentative>
    <Customers>
    <ID>101</ID>
    <Surname>Devlin</Surname>
    <GivenName>Michael</GivenName>
    <Street>114 Pioneer Avenue</Street>
    <City>Kingston</City>
    <State>NJ</State>
    <PostalCode>07070</PostalCode>
    <Phone>2015558966</Phone>
    <CompanyName>The Power Group</CompanyName>
    </Customers>
  </SalesOrders>
</Employees>
...

The following query uses the descendant-or-self (//*) XPath expression to match every element in the above XML document, and for each element the id metaproperty is used to obtain an ID for the node, and the parent (../) XPath expression is used with the ID metaproperty to get the parent node. The localname metaproperty is used to obtain the name of each element. Metaproperty names are case sensitive, so ID or LOCALNAME cannot be used as metaproperty names.

SELECT * FROM openxml( x, '//*' )
 WITH (ID INT '@mp:id',
       parent INT '../@mp:id',
       name CHAR(25) '@mp:localname',
       text LONG VARCHAR 'text()' )
ORDER BY ID;

The result set generated by this query shows the ID of each node, the ID of the parent node, and the name and content for each element in the XML document.

ID parent name text
5 (NULL) root (NULL)
16 5 Employees (NULL)
28 16 EmployeeID 299
55 16 ManagerID 902
79 16 Surname Overbey
... ... ... ...
Using openxml with xp_read_file

So far, XML that was generated with a procedure like XMLELEMENT has been used. You can also read XML from a file and parse it using the xp_read_file procedure. Suppose the file c:\inventory.xml has the following contents:

<inventory>
   <product>Tee Shirt
      <quantity>54</quantity>
      <color>Orange</color>
   </product>
   <product>Baseball Cap
      <quantity>112</quantity>
      <color>Black</color>
   </product>
</inventory>

You can use the following statement to read and parse the XML in the file:

CREATE VARIABLE x XML;
SELECT xp_read_file( 'c:\\inventory.xml' ) 
 INTO x;
SELECT * FROM openxml( x, '//*' )
 WITH (ID INT '@mp:id',
       parent INT '../@mp:id',
       name CHAR(128) '@mp:localname',
       text LONG VARCHAR 'text()' )
ORDER BY ID;
Querying XML in a column

If you have a table with a column that contains XML, you can use openxml to query all the XML values in the column at once. This can be done using a lateral derived table.

The following statements create a table with two columns, ManagerID and Reports. The Reports column contains XML data generated from the Employees table.

CREATE TABLE test (ManagerID INT, Reports XML);
INSERT INTO test
SELECT ManagerID, XMLELEMENT( NAME reports,
              XMLAGG( XMLELEMENT( NAME e, EmployeeID))) 
FROM Employees
GROUP BY ManagerID;

Execute the following query to view the data in the test table:

SELECT * FROM test
ORDER BY ManagerID;

This query produces the following result:

ManagerID Reports
501
<reports>
 <e>102</e>
 <e>105</e>
 <e>160</e>
 <e>243</e>
 ...
</reports>
703
<reports>
 <e>191</e> 
 <e>750</e>
 <e>868</e>
 <e>921</e>
 ...
</reports>
902
<reports>
 <e>129</e>
 <e>195</e>
 <e>299</e>
 <e>467</e>
 ...
</reports>
1293
<reports>
 <e>148</e>
 <e>390</e>
 <e>586</e>
 <e>757</e>
 ...
</reports>
... ...

The following query uses a lateral derived table to generate a result set with two columns: one that lists the ID for each manager, and one that lists the ID for each employee that reports to that manager:

SELECT ManagerID, EmployeeID
FROM test, LATERAL( openxml( test.Reports, '//e' )
WITH (EmployeeID INT '.') ) DerivedTable
ORDER BY ManagerID, EmployeeID;

This query generates the following result:

ManagerID EmployeeID
501 102
501 105
501 160
501 243
... ...

For more information about lateral derived tables, see FROM clause.