The sample Order document is designed for a purchase order application. Customers submit orders, which are identified by a date and a customer ID. Each order item has an item ID, an item name, a quantity, and a unit designation.
It might display on screen like this:
ORDER
Date: July 4, 2003
Customer ID: 123
Customer Name: Acme Alpha
Items:
Item ID |
Item Name |
Quantity |
---|---|---|
987 |
Coupler |
5 |
654 |
Connector |
3 dozen |
579 |
Clasp |
1 |
The following is one representation of this data in XML:
<?xml version="1.0"?> <Order> <Date>2003/07/04</Date> <CustomerId>123</CustomerId> <CustomerName>Acme Alpha</CustomerName>
<Item> <ItemId> 987</ItemId> <ItemName>Coupler</ItemName> <Quantity>5</Quantity> </Item>
<Item> <ItemId>654</ItemId> <ItemName>Connector</ItemName> <Quantity unit="12">3</Quantity> </Item>
<Item> <ItemId>579</ItemId> <ItemName>Clasp</ItemName> <Quantity>1</Quantity> </Item>
</Order>
The XML document has two unique characteristics:
The XML document does not indicate type, style, or color for specifying item display.
The markup tags are strictly nested. Each opening tag (<tag> ) has a corresponding closing (</tag>).
The XML document for the order data consists of:
The XML declaration, <?xml version=“1.0”?>, identifying “Order” as an XML document.
XML represents documents as character data. In each document, you specify the character encoding (character set), either explicitly or implicitly. To explicitly specify the character set, include it in the XML declaration. For example:
<?xml version=”1.0” encoding=”ISO-8859-1”>
If you do not include the character set in the XML declaration, the default, UTF8, is used.
When the default character sets of the client and server differ, Adaptive Server bypasses normal character-set translations so that the declared character set continues to match the actual character set. See “Character sets and XML data”.
User-created element tags, such as <Order>…</Order>, <CustomerId>…</CustomerId>, <Item>….</Item>.
Text data, such as “Acme Alpha,” “Coupler,” and “579.”
Attributes embedded in element tags, such as <Quantity unit = “12”>. This embedding allows you to customize elements.
If your document contains these components, and the element tags are strictly nested, it is called a well-formed XML document. In the example above, element tags describe the data they contain, and the document contains no formatting instructions.
Here is another example of an XML document:
<?xml version="1.0"?> <Info> <OneTag>1999/07/04</OneTag> <AnotherTag>123</AnotherTag> <LastTag>Acme Alpha</LastTag>
<Thing> <ThingId> 987</ThingId> <ThingName>Coupler</ThingName> <Amount>5</Amount> <Thing/>
<Thing> <ThingId>654</ThingId> <ThingName>Connecter</ThingName> </Thing>
<Thing> <ThingId>579</ThingId> <ThingName>Clasp</ThingName> <Amount>1</Amount> </Thing> </Info>
This example, called “Info,” is also a well-formed document and has the same structure and data as the XML Order document. However, it would not be recognized by a processor designed for Order documents because the document type definition (DTD) that Info uses is different from that of the Order document. For more information about DTDs, see “XML document types”).
Consider a purchase order application. Customers submit orders, which are identified by a Date and the CustomerID, and which list one or more items, each of which has an ItemID, ItemName, Quantity, and units.
The data for such an order might be displayed on a screen as follows:
ORDER
Date: July 4, 1999
Customer ID: 123
Customer Name: Acme Alpha
Items:
Item ID |
Item Name |
Quantity |
---|---|---|
987 |
Coupler |
5 |
654 |
Connector |
3 dozen |
579 |
Clasp |
1 |
This data indicates that the customer named “Acme Alpha,” whose Customer Id is “123”, submitted an order on 1999/07/04 for couplers, connectors, and clasps.
The HTML text for this display of order data is as follows:
<html> <body> <p>ORDER <p>Date: July 4, 1999 <p>Customer ID: 123 <p>Customer Name: Acme Alpha <p>Items:</p> <table bgcolor=white align=left border=”3” cellpadding=3> <tr><td><B>Item ID </B></tr> <td><B>Item Name </B></tr> <td><B>Quantity </B> </td></td></tr> <tr><td>987</td> <td>Coupler</td> <td>5</td></tr> <tr><td>654</td> <td>Connector</td> <td>3 dozen</td></tr> <tr><td>579</td> <td>Clasp</td> <td>1</td></tr> </table> </body> </html>
This HTML text has certain limitations:
It contains both data and formatting specifications.
The data is the Customer Id and the various Customer Names, Item Names, and Quantities.
The formatting specifications are the indications for type style (<b>....</b>), color (bcolor=white), and layout (<table>....</table>, and also the supplementary field names, such as “Customer Name”, and so on.
The structure of HTML documents is not well suited for extracting data.Some elements, such as tables, require strictly bracketed opening and closing tags, but other elements, such as paragraph tags (“<p>”), have optional closing tags.Some elements, such as paragraph tags (“<p>”) are used for many sorts of data, so it is difficult to distinguish between a “123” that is a Customer ID and a “123” that is an Item ID, without specialized inference from surrounding field names.
This merging of data and formatting, and the lack of strict phrase structure, makes it difficult to adapt HTML documents to different presentation styles, and makes it difficult to use HTML documents for data interchange and storage. XML is similar to HTML, but includes restrictions and extensions that address these drawbacks.