Regular expressions are supported with the SIMILAR TO, and REGEXP search conditions, and the REGEXP_SUBSTR function. For SIMILAR TO, regular expression syntax is consistent with the ANSI/ISO SQL standard. For REGEXP and REGEXP_SUBSTR, regular expression syntax and support is consistent with Perl 5.
Regular expressions are used by REGEXP and SIMILAR TO to match a string, whereas regular expressions are used by REGEXP_SUBSTR to match a substring. To achieve substring matching behavior for REGEXP and SIMILAR TO, you can specify wildcards on either side of the pattern
you are trying to match. For example, REGEXP '.*car.*'
matches car, carwash, and vicar. Or, you can rewrite your query to make use the REGEXP_SUBSTR function. See REGEXP_SUBSTR function [String].
Regular expression matching with SIMILAR TO is case- and accent-insensitive. REGEXP and REGEXP_SUBSTR is not affected by the database accent and case sensitivity. See LIKE, REGEXP, and SIMILAR TO: Differences in character comparisons.
Metacharacters are symbols or characters that have a special meaning within a regular expression.
The treatment of metacharacters can vary depending on:
whether the regular expression is being used with the SIMILAR TO or REGEXP search conditions, or the REGEXP_SUBSTR function
whether the metacharacter is inside of a character class in the regular expression
Before continuing, you should understand the definition of a character class. A character class is a set of characters enclosed in square brackets, against which characters in a string are matched.
For example, in the syntax SIMILAR TO 'ab[1-9]'
, [1-9] is a character class and matches one digit in the range of 1 to 9, inclusive. The treatment of metacharacters in a
regular expression can vary depending on whether the metacharacter is placed inside a character class. Specifically, most
metacharacters are handled as regular characters when positioned inside of a character class.
For SIMILAR TO (only), the metacharacters *, ?, +, _, |, (, ), { must be escaped within a character class.
To include a literal minus sign (-), caret (^), or right-angle bracket (]) character in a character class, it must be escaped.
The list of supported regular expression metacharacters is provided below. Almost all metacharacters are treated the same when used by SIMILAR TO, REGEXP, and REGEXP_SUBSTR:
Character | Additional information |
---|---|
[ and ] |
Left and right square brackets are used to specify a character class. A character class is a set of characters to match against. With the exception of the hyphen (-) and the caret (^), metacharacters and quantifiers (such as * and {m}, respectively) specified within a character class have no special meaning and are evaluated as actual characters. SQL Anywhere also supports sub-character classes such as POSIX character classes. See Regular expressions: Special sub-character classes. |
* | The asterisk can be used to match a character 0 or more times. For example, REGEXP '.*abc' matches a string that ends with abc, and starts with any prefix. So, aabc, xyzabc, and abc match, but bc and abcc do not.
|
? | The question mark can be used to match a character 0 or 1 times. For example, 'colou?r' matches color and colour.
|
+ | The plus sign can be used to match a character 1 or more times. For example, 'bre+' matches bre and bree, but not br.
|
- |
A hyphen can be used within a character class to denote a range. For example, For details on how ranges are evaluated by REGEXP and SIMILAR TO, see LIKE, REGEXP, and SIMILAR TO: Differences in character comparisons. |
% |
The percent sign can be used with SIMILAR TO to match any number of characters. The percent sign is not considered a metacharacter for REGEXP and REGEXP_SUBSTR. When specified, it matches a percent sign (%). |
_ (underscore character) |
The underscore can be used with SIMILAR TO to match a single character. The underscore is not considered a metacharacter for REGEXP and REGEXP_SUBSTR. When specified, it matches an underscore (_). |
| |
The pipe symbol is used to specify alternative patterns to use for matching the string. In a string of patterns separated by a vertical bar, the vertical bar is interpreted as an OR and matching stops at the first match made starting from the leftmost pattern. So, you should list the patterns in descending order of preference. You can specify an unlimited number of alternative patterns. |
( and ) | Left and right parenthesis are metacharacters when used for grouping parts of the regular expression. For example, (ab)* matches zero or more repetitions of ab. As with mathematical expressions, you use grouping to control the order in which
the parts of a regular expression are evaluated.
|
{ and } |
Left and right curly braces are metacharacters when used for specifying quantifiers. Quantifiers specify the number of times a pattern must repeat to constitute a match. For example:
|
\ | The backslash is used as an escape character for metacharacters. It can also be used to escape non-metacharacters. |
^ |
For REGEXP and REGEXP_SUBSTR, when a caret is outside a character class, the caret matches the start of a string. For example,
When used inside a character class, the following behavior applies:
|
$ |
When used with REGEXP and REGEXP_SUBSTR, matches the end of a string. For example, When used with SIMILAR TO, it matches a question mark. |
. |
When used with REGEXP and REGEXP_SUBSTR, matches any single character. For example, When used with SIMILAR TO, matches a period (.). |
: |
The colon is used within a character set to specify a subcharacter class. For example, |
Sub-character classes are special character classes embedded within a larger character class. In addition to custom character classes where you
define the set of characters to match (for example, [abxq4]
limits the set of matching characters to a, b, x, q, and 4), SQL Anywhere supports sub-character classes such as most of
the POSIX character classes. For example, [[:alpha:]]
represents the set of all upper- and lower-case letters.
The REGEXP search condition and the REGEXP_SUBSTR function support all the syntax conventions in the table below, but the SIMILAR TO search expression does not. Conventions supported by SIMILAR TO have a Y in the SIMILAR TO column.
In REGEXP and when using the REGEXP_SUBSTR function, sub-character classes can be negated using a caret. For example, [[:^alpha:]]
matches the set of all characters except alpha characters.
Sub-character class | Additional information | SIMILAR TO | ||
---|---|---|---|---|
[:alpha:] |
Matches upper- and lowercase alphabetic characters in the current collation. For example, |
Y | ||
[:alnum:] |
Match digits, and upper- and lowercase alphabetic characters in the current collation. For example, |
Y | ||
[:digit:] |
Match digits in the current collation. For example, |
Y | ||
[:lower:] |
Match lowercase alphabetic characters in the current collation. For example, |
Y | ||
[:space:] |
Match a single blank (' '). For example, the following statement searches Contacts.City for any city with a two word name:
|
Y | ||
[:upper:] |
Match uppercase alphabetic characters in the current collation. For example, |
Y | ||
[:whitespace:] |
Match a whitespace character such as space, tab, formfeed, and carriage return. |
Y | ||
[:ascii:] |
Match any seven-bit ASCII character (ordinal value between 0 and 127). |
|||
[:blank:] |
Match a blank space, or a horizontal tab.
|
|||
[:cntrl:] |
Match ASCII characters with an ordinal value of less than 32, or character value 127 (control characters). Control characters include newline, form feed, backspace, and so on. |
|||
[:graph:] |
Match printed characters.
|
|||
[:print:] |
Match printed characters and spaces.
|
|||
[:punct:] |
Match one of: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~. The |
|||
[:word:] |
Match alphabetic, digit, or underscore characters in the current collation.
|
|||
[:xdigit:] |
Match a character that is in the character class [0-9A-Fa-f]. |
The following syntax conventions are supported by the REGEXP search condition and the REGEXP_SUBSTR function, and they assume that the backslash is the escape character. These conventions are not supported by the SIMILAR TO search expression.
Regular expression syntax | Name and meaning | ||
---|---|---|---|
\0 xxx |
Matches the character whose value is \0xxx, where xxx is any sequence of octal digits, and 0 is a zero. For example, |
||
\a |
Matches the bell character. |
||
\A |
Used outside a character set to match the start of a string. Equivalent to |
||
\b |
Matches a backspace character. |
||
\B |
Matches the backslash character (\). |
||
\c X |
Matches a named control character. For example, \cZ for ctrl-Z. |
||
\d |
Matches a digit in the current collation. For example, the following statement searches Contacts.Phone for all phone numbers that end with 00:
\d can be used both inside and outside character classes, and is equivalent to |
||
\D |
Matches anything that is not a digit. This is the opposite of \d. \D can be used both inside and outside character classes, and is equivalent to Be careful when using the negated shorthands inside square brackets. |
||
\e |
Matches the escape character. |
||
\E |
Ends the treatment of metacharacters as non-metacharacters, initiated by a \Q. For a list of regular expression metacharacters, see Regular expressions: Metacharacters. |
||
\f |
Matches a form feed. |
||
\n |
Matches a new line. |
||
\Q |
Treat all metacharacters as non-metacharacters, until \E is encountered. For example, For a list of regular expression metacharacters, see Regular expressions: Metacharacters. |
||
\r |
Matches a carriage return. |
||
\s |
Matches a space or a character treated as whitespace. For example, the following statement returns all product names from Products.ProductName that have at least one space in the name:
\s can be used both inside and outside character classes, and is equivalent to |
||
\S |
Matches a non-whitespace character. This is the opposite of \s, and is equivalent to \S can be used both inside and outside character classes. See Regular expressions: Special sub-character classes. Be careful when using the negated shorthands inside square brackets. |
||
\t |
Matches a horizontal tab. |
||
\v |
Matches a vertical tab. |
||
\w |
Matches a alphabetic character, digit, or underscore in the current collation. For example, the following statement returns all surnames from Contacts.Surname that are exactly seven alpha-numeric characters in length:
\w can be used both inside and outside character classes. See Regular expressions: Special sub-character classes. Equivalent to |
||
\W |
Matches anything that is not an alphabetic character, digit, or underscore in the current collation. This is the opposite
of \w, and is equivalent to This regular expression can be used both inside and outside character classes. See Regular expressions: Special sub-character classes. |
||
\x hh |
Matches the character whose value is 0xhh, where hh is, at most, two hex digits. For example, Equivalent to \x{hh}. |
||
\x{ hhh } |
Matches the character whose value is 0xhhh, where hhh is, at most, eight hex digits. |
||
\z and \Z |
Matches the position (not character) at the end of the string. Equivalent to |
Assertions test whether a condition is true, and affect the position in the string where matching begins. Assertions do not return characters; the assertion pattern is not included in the final match. These assertions are supported by the REGEXP search condition and the REGEXP_SUBSTR function. These conventions are not supported by the SIMILAR TO search expression.
Lookahead and lookbehind assertions can be useful with REGEXP_SUBSTR when trying to split a string. For example, you can return the list of street names (without the street numbers) in the Address column of the Customers table by executing the following statement:
SELECT REGEXP_SUBSTR( Street, '(?<=^\\S+\\s+).*$' ) FROM Customers; |
Another example is if you want to use a regular expression to verify that a password conforms to certain rules. You could use a zero width assertion similar to the following:
IF password REGEXP '(?=.*[[:digit:]])(?=.*[[:alpha:]].*[[:alpha:]])[[:word:]]{4,12}' MESSAGE 'Password conforms' TO CLIENT; ELSE MESSAGE 'Password does not conform' TO CLIENT; END IF |
The password is valid when the following are true:
password has at least one digit (zero width positive assertion with [[:digit:]])
password has at least two alphabetic characters (zero width positive assertion with [[:alpha:]].*[[:alpha:]])
password contains only alpha-numeric or underscore characters ([[:word:]])
password is at least 4 characters, and at most 12 characters ({4,12})
The following table contains the assertions supported by SQL Anywhere:
Syntax | Meaning |
---|---|
(?= pattern ) |
Positive lookahead zero-width assertion
Looks to see if the current position in the string is immediately followed by an occurrence of pattern, without pattern becoming part of the match string. For example, |
(?! pattern ) |
Negative lookahead zero-width assertions
Looks to see if the current position in the string is not immediately followed by an occurrence of pattern, without pattern becoming part of the match string. So, For example, |
(?<= pattern ) |
Positive lookbehind zero-width assertions
Looks to see if the current position in the string is immediately preceded by an occurrence of pattern, without pattern becoming part of the match string. So, For example, |
(?<! pattern ) |
Negative lookbehind zero-width assertions Looks to see if the current position in the string is not immediately preceded by an occurrence of pattern, without pattern becoming part of the match string. For example, |
(?> pattern ) |
Possessive local subexpression Matches only the largest prefix of the remaining string that matches pattern. For example, in |
(?: pattern ) |
Non-capturing block This is functionally equivalent to just pattern, and is provided for compatibility. For example, in Likewise, |
(?# text ) |
Used for comments. The content of text is ignored. |
Discuss this page in DocCommentXchange.
|
Copyright © 2010, iAnywhere Solutions, Inc. - SQL Anywhere 12.0.0 |