regexp man page on BSDi

Printed from http://www.polarhome.com/service/man/?qf=regexp&af=0&tf=2&of=BSDi



regexp(n)	      Tcl Built-In Commands		regexp(n)

_________________________________________________________________

NAME
       regexp - Match a regular expression against a string

SYNOPSIS
       regexp  ?switches? exp string ?matchVar? ?subMatchVar sub-
       MatchVar ...?
_________________________________________________________________

DESCRIPTION
       Determines whether the regular expression exp matches part
       or  all	of  string  and	 returns  1  if	 it does, 0 if it
       doesn't.

       If additional arguments are specified  after  string  then
       they  are  treated  as  the names of variables in which to
       return information about which part(s) of  string  matched
       exp.   MatchVar	will  be  set to the range of string that
       matched all of exp.  The first  subMatchVar  will  contain
       the  characters in string that matched the leftmost paren-
       thesized subexpression within exp,  the	next  subMatchVar
       will  contain  the characters that matched the next paren-
       thesized subexpression to the right in exp, and so on.

       If the initial arguments to regexp start with - then  they
       are  treated as switches.  The following switches are cur-
       rently supported:

       -nocase	 Causes upper-case characters  in  string  to  be
		 treated  as  lower case during the matching pro-
		 cess.

       -indices	 Changes what  is  stored  in  the  subMatchVars.
		 Instead  of storing the matching characters from
		 string, each variable will contain a list of two
		 decimal  strings giving the indices in string of
		 the first and last characters	in  the	 matching
		 range of characters.

       --	 Marks the end of switches.  The argument follow-
		 ing this one will be treated as exp even  if  it
		 starts with a -.

       If  there are more subMatchVar's than parenthesized subex-
       pressions within exp, or if a particular subexpression  in
       exp  doesn't  match  the	 string (e.g. because it was in a
       portion of the expression that wasn't matched),	then  the
       corresponding  subMatchVar  will	 be  set  to ``-1 -1'' if
       -indices has been specified or to an empty  string  other-
       wise.

Tcl								1

regexp(n)	      Tcl Built-In Commands		regexp(n)

REGULAR EXPRESSIONS
       Regular	expressions are implemented using Henry Spencer's
       package (thanks, Henry!), and much of the  description  of
       regular expressions below is copied verbatim from his man-
       ual entry.

       A regular expression is zero or more  branches,	separated
       by  ``|''.   It	matches	 anything that matches one of the
       branches.

       A branch is zero or more pieces, concatenated.  It matches
       a match for the first, followed by a match for the second,
       etc.

       A piece is an atom possibly followed by ``*'',  ``+'',  or
       ``?''.	An atom followed by ``*'' matches a sequence of 0
       or more matches of the atom.  An atom  followed	by  ``+''
       matches	a  sequence of 1 or more matches of the atom.  An
       atom followed by ``?'' matches a match of the atom, or the
       null string.

       An atom is a regular expression in parentheses (matching a
       match for the regular expression), a  range  (see  below),
       ``.''   (matching  any  single character), ``^'' (matching
       the null string at the beginning	 of  the  input	 string),
       ``$''  (matching	 the  null string at the end of the input
       string), a ``\'' followed by a single character	(matching
       that  character), or a single character with no other sig-
       nificance (matching that character).

       A range is a sequence of characters  enclosed  in  ``[]''.
       It   normally   matches	any  single  character	from  the
       sequence.  If the sequence begins with ``^'',  it  matches
       any  single  character  not from the rest of the sequence.
       If two characters in the sequence are separated by  ``-'',
       this  is	 shorthand  for the full list of ASCII characters
       between them (e.g. ``[0-9]'' matches any	 decimal  digit).
       To  include  a  literal ``]'' in the sequence, make it the
       first character (following a possible ``^'').  To  include
       a literal ``-'', make it the first or last character.

CHOOSING AMONG ALTERNATIVE MATCHES
       In general there may be more than one way to match a regu-
       lar expression to an input string.  For example,	 consider
       the command
	      regexp  (a*)b*  aabaaabb	x  y
       Considering only the rules given so far, x and y could end
       up with the values aabb and aa, aaab and aaa, ab and a, or
       any of several other combinations.  To resolve this poten-
       tial ambiguity regexp chooses among alternatives using the
       rule ``first then longest''.  In other words, it considers
       the possible matches in order working from left	to  right
       across  the  input string and the pattern, and it attempts

Tcl								2

regexp(n)	      Tcl Built-In Commands		regexp(n)

       to match longer pieces of the input string before  shorter
       ones.   More  specifically,  the	 following rules apply in
       decreasing order of priority:

       [1]    If a regular expression could match  two	different
	      parts of an input string then it will match the one
	      that begins earliest.

       [2]    If a regular expression contains |  operators  then
	      the leftmost matching sub-expression is chosen.

       [3]    In  *, +, and ? constructs, longer matches are cho-
	      sen in preference to shorter ones.

       [4]    In sequences of expression  components  the  compo-
	      nents are considered from left to right.

       In  the	example from above, (a*)b* matches aab:	 the (a*)
       portion of the pattern is matched first	and  it	 consumes
       the  leading  aa;  then the b* portion of the pattern con-
       sumes the next b.  Or, consider the following example:
	      regexp  (ab|a)(b*)c  abc	x  y  z
       After this command x will be abc, y will be ab, and z will
       be  an  empty  string.	Rule 4 specifies that (ab|a) gets
       first shot at the input string and Rule 2  specifies  that
       the  ab sub-expression is checked before the a sub-expres-
       sion.  Thus the b has already been claimed before the (b*)
       component  is checked and (b*) must match an empty string.

KEYWORDS
       match, regular expression, string

Tcl								3

[top]

List of man pages available for BSDi

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net