Regular expression syntax varies from product to product. SQL Anywhere syntax for regular expressions matches closely the Perl 5 syntax, but also supports some conventions found in other products. Use the following section to understand the full list of syntax conventions supported by SQL Anywhere.
Depending on what you are searching for, and how your data is structured, there can be several ways to construct the regular expression you need for matching. The syntax examples in the next sections are very basic, and are designed to show the syntax in simple context. In many cases, there will be several ways you can construct the expression to return the same result.
Regular expression matching is case insensitive on case-insensitive databases.
The REGEXP, and SIMILAR TO search conditions attempt to match an entire string, not a substring within the string. For example,
'car'
matches only car, not carwash or vicar. However, you could specify wildcards on either side of the pattern you are trying
to match, to achieve substring matching behavior. For example, '%car%'
matches car, as well as carwash, and vicar. Alternatively, you can rewrite your query to make use the REGEXP_SUBSTR function,
which is designed to search for substrings within a string. See REGEXP_SUBSTR function [String].
Metacharacters are symbols or characters that have a special meaning within a regular expression. The list of regular expression metacharacters are:
Wildcards allow you to define the type of characters to look for, without specifying what the character should be. For example, the underscore character (_) matches any single character.
Quantifiers specify the number of times a pattern must repeat to constitute a match. For example, .*abc
means "find a pattern that ends with abc, and starts with any character repeated any number of times". Consequently, aabc,
xyzabc, and abc match, but bc and abcc do not.
The REGEXP search condition and the REGEXP_SUBSTR function support all of the syntax conventions in the table below, but the SIMILAR TO search expression does not. Conventions supported by SIMILAR TO have a Y in the SIMILAR TO column.
Character | Additional information | SIMILAR TO |
---|---|---|
% | Matches any number of characters (including no characters). | Y |
. | Matches any single character (equivalent to '_'). For example, 'a.cd' matches any string of four characters that starts with a and ends with cd.
|
|
_ (underscore character) | Matches any single character. For example, 's_t' matches any string of three characters that starts with s and ends with t.
|
Y |
( ) | Used for grouping parts of the regular expression so that you can apply quantifiers. For example, (ab)* will match zero or more repetitions of ab. As with mathematical expressions, you can use grouping to control the order in
which the parts of a regular expression are evaluated.
|
Y |
[ ] |
Matches a single character against the enclosed set of characters. Square brackets around a set of characters constitute a character class. Wildcards and quantifiers have no special meaning within the brackets. You can use a hyphen in a character class to denote a range. For example, To include a literal minus sign (-), caret (^), or right-angle bracket (]) character in the matching set of characters, it must be escaped. For SIMILAR TO (but not REGEXP or REGEXP_SUBSTR function), the meta characters *, ?, +, _, |, (, ), { must be escaped within a character class. When a caret as the first character inside of a bracket expression, it matches a single character not listed in the brackets.
For example, If a caret is between characters in a character set, it matches anything to the left of the caret, minus those in the set
on the right side of the caret. For example, SQL Anywhere also supports sub-character classes such as POSIX character classes. See Regular expressions: Special sub-character classes. |
Y |
| | Matches against the expressions on either side of the vertical bar (also referred to as alternation). The vertical bar is interpreted as an OR, and matching stops at the first match made starting from the leftmost pattern. Consequently, you should list the patterns in descending order of preference. You can specify an unlimited number of alternative patterns. | Y |
* | Matches a character 0 or more times. For example, 'bre*' matches br, bre, and bree.
|
Y |
? | Matches a character 0 or 1 times. For example, 'colou?r' matches color and colour.
|
Y |
+ | Matches a character 1 or more times. For example, 'bre+' matches bre and bree, but not br.
|
Y |
{m} | Matches a character exactly m times. For example, '519-[0-9]{3}-[0-9]{4}' matches a phone number in the 519 area code (providing the data is formatted in the manner defined in the syntax).
|
Y |
{m, } | Matches a character at least m times. For example, '[0-9]{5,}' matches any string of five or more numbers.
|
Y |
{m, n} | Matches a character at least m times, but not more than n times. For example, '_{5,10}' matches any string with between 5 and 10 (inclusive) characters.
|
Y |
Assertions test whether a condition is true, and affect the position in the string where matching begins. Assertions do not return characters, in that the assertion pattern is not included in the final match. These assertions are supported by the REGEXP search condition and the REGEXP_SUBSTR function. These conventions are not supported by the SIMILAR TO search expression.
Lookahead and lookbehind assertions can be useful with REGEXP_SUBSTR when trying to split a string. For example, you can return the list of street names (without the street numbers) in the Address column of the Customers table by executing the following statement:
SELECT REGEXP_SUBSTR( Street, '(?<=^\\S+\\s+).*$' ) FROM Customers; |
Another example is if you want to use a regular expression to verify that a password conforms to certain rules. You could use a zero width assertion similar to the following:
IF password REGEXP '(?=%[[:digit:]])(?=%[[:alpha:]]%[[:alpha:]])[[:word:]]{4,12}' MESSAGE 'Password conforms' TO CLIENT; ELSE MESSAGE 'Password does not conform' TO CLIENT; END IF |
The password is valid if all of the following are true:
The following table contains the assertions supported by SQL Anywhere:
Syntax | Meaning |
---|---|
(?=pattern) |
Positive lookahead zero-width assertion Looks to see if the current position in the string is immediately followed by an occurrence of pattern, without pattern becoming part of the match string. For example, |
(?!pattern) |
Negative lookahead zero-width assertions Looks to see if the current position in the string is not immediately followed by an occurrence of pattern, without pattern becoming part of the match string. So, For example, |
(?<=pattern) |
Positive lookbehind zero-width assertions Looks to see if the current position in the string is immediately preceded by an occurrence of pattern, without pattern becoming part of the match string. So, For example, |
(?<!pattern) |
Negative lookbehind zero-width assertions Looks to see if the current position in the string is not immediately preceded by an occurrence of pattern, without pattern becoming part of the match string. For example, |
(?>pattern) |
Possessive local subexpression Matches only the largest prefix of the remaining string that matches pattern. For example, in |
(?:pattern) |
Non-capturing block This is functionally equivalent to just pattern, and is provided for compatibility. For example, in Likewise, |
(?#text) |
Used for comments. The content of text is ignored. |
Character classes are a set of characters, enclosed in square brackets, against which characters in a string are matched.
Sub-character classes are special character classes embedded within a larger character class. In addition to custom character classes where you
define the set of characters to match (for example, [abxq4]
limits the set of matching characters to a, b, x, q, and 4), SQL Anywhere supports sub-character classes such as most of
the POSIX character classes. For example, [[:alpha:]]
represents the set of all upper- and lower-case letters.
The REGEXP search condition and the REGEXP_SUBSTR function support all of the syntax conventions in the table below, but the SIMILAR TO search expression does not. Conventions supported by SIMILAR TO have a Y in the SIMILAR TO column.
In REGEXP and when using the REGEXP_SUBSTR function, sub-character classes can be negated using a caret. For example, [[:^alpha:]]
matches the set of all characters except alpha characters.
Sub-character class | Additional information | SIMILAR TO | ||
---|---|---|---|---|
[:alpha:] |
Matches upper- and lowercase alphabetic characters in the current collation. For example, |
Y | ||
[:alnum:] |
Match digits, and upper- and lowercase alphabetic characters in the current collation. For example, |
Y | ||
[:digit:] |
Match digits in the current collation. For example, |
Y | ||
[:lower:] |
Match lowercase alphabetic characters in the current collation. For example, Equivalent to |
Y | ||
[:space:] |
Match a single blank (' '). For example, the following statement searches Contacts.City for any city with a two word name:
|
Y | ||
[:upper:] |
Match uppercase alphabetic characters in the current collation. For example, Equivalent to |
Y | ||
[:whitespace:] |
Match a whitespace character such as space, tab, formfeed, and carriage return. |
Y | ||
[:ascii:] |
Match any seven-bit ASCII character (ordinal value between 0 and 127). |
|||
[:blank:] |
Match a blank space, or a horizontal tab. |
|||
[:cntrl:] |
Match ASCII characters with an ordinal value of less than 32, or character value 127 (control characters). Control characters include newline, form feed, backspace, and so on. |
|||
[:graph:] |
Match printed characters.
|
|||
[:print:] |
Match printed characters and spaces.
|
|||
[:punct:] |
Match one of: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~. The |
|||
[:word:] |
Match alphabetic, digit, or underscore characters in the current collation.
|
|||
[:xdigit:] |
Match a character that is both ASCII and in the character class [0-9A-Fa-f].
|
The following syntax conventions are supported by the REGEXP search condition and the REGEXP_SUBSTR function. These conventions are not supported by the SIMILAR TO search expression.
Regular expression syntax | Name and meaning | ||
---|---|---|---|
^ |
When the caret is outside of a character set, the caret matches the start of a string. For example, A caret used inside a character class denotes negation or subtraction. See the description of angle brackets ([]) for character sets for information on the use of the caret in character sets: Regular expressions: Wildcards and quantifiers. |
||
$ |
Matches the end of a string. For example, |
||
\0xxx |
Matches the character whose value is \0xxx, where xxx is any sequence of octal digits, and 0 is a zero. For example, |
||
\a |
Matches the bell character. |
||
\A |
Used outside of a character set to match the start of a string. Equivalent to |
||
\b |
Matches a backspace character. |
||
\B |
Matches the backslash character (\). |
||
\cX |
Matches a named control character. For example, \cZ for ctrl-Z. |
||
\d |
Matches a digit in the current collation. For example, the following statement searches Contacts.Phone for all phone numbers that end with 00:
\d can be used both inside and outside of character classes, and is equivalent to |
||
\D |
Matches anything that is not a digit. This is the opposite of \d. \D can be used both inside and outside of character classes, and is equivalent to Be careful when using the negated shorthands inside square brackets. |
||
\e |
Matches the escape character. |
||
\E |
Ends the treatment of metacharacters as non-metacharacters, initiated by a \Q. For a list of regular expression metacharacters, see Metacharacters used for regular expression syntax. |
||
\f |
Matches a form feed. |
||
\n |
Matches a new line. |
||
\Q |
Treat all metacharacters as non-metacharacters, until \E is encountered. For example, For a list of regular expression metacharacters, see Metacharacters used for regular expression syntax. |
||
\r |
Matches a carriage return. |
||
\s |
Matches a space or a character treated as whitespace. For example, the following statement returns all product names from Products.ProductName that have at least one space in the name:
\s can be used both inside and outside of character classes, and is equivalent to |
||
\S |
Matches a non-whitespace character. This is the opposite of \s, and is equivalent to \S can be used both inside and outside of character classes. See Regular expressions: Special sub-character classes. Be careful when using the negated shorthands inside square brackets. |
||
\t |
Matches a horizontal tab. |
||
\v |
Matches a vertical tab. |
||
\w |
Matches a alphabetic character, digit, or underscore in the current collation. For example, the following statement returns all surnames from Contacts.Surname that are exactly seven alpha-numeric characters in length:
\w can be used both inside and outside of character classes. See Regular expressions: Special sub-character classes. Equivalent to |
||
\W |
Matches anything that is not an alphabetic character, digit, or underscore in the current collation. This is the opposite
of \w, and is equivalent to This regular expression can be used both inside and outside of character classes. See Regular expressions: Special sub-character classes. |
||
\xhh |
Matches the character whose value is 0xhh, where hh is, at most, two hex digits. For example, Equivalent to \x{hh}. |
||
\x{hhh} |
Matches the character whose value is 0xhhh, where hhh is, at most, eight hex digits. |
||
\z and \Z |
Matches the position (not character) at the end of the string. Equivalent to |
Send feedback about this page via email or DocCommentXchange | Copyright © 2008, iAnywhere Solutions, Inc. - SQL Anywhere 11.0.0 |