awk Command
Purpose
Finds lines in files that match a pattern and performs specified actions
on those lines.
Syntax
awk [ -F Ere ] [ -v Assignment ] ... { -f ProgramFile | 'Program' }
[ [ File ... | Assignment ... ] ] ...
Description
The awk command utilizes a set of user-supplied instructions
to compare a set of files, one line at a time, to extended regular expressions
supplied by the user. Then actions are performed upon any line that matches
the extended regular expressions.
The pattern searching of the awk command is more
general than that of the grep command, and it allows
the user to perform multiple actions on input text lines. The awk command programming language requires no compiling, and allows the
user to use variables, numeric functions, string functions, and logical operators.
The awk command is affected by the LANG, LC_ALL, LC_COLLATE, LC_CTYPE, LC_MESSAGES, LC_NUMERIC, NLSPATH, and PATH environment variables.
The following topics are covered in this article:
Input for the awk Command
The awk command takes two types of input: input text
files and program instructions.
Input Text Files
Searching and actions are performed on input text files. The files are
specified by:
- Specifying the File variable on the command line.
- Modifying the special variables ARGV and ARGC.
- Providing standard input in the absence of the File variable.
If multiple files are specified with the File variable,
the files are processed in the order specified.
Program Instructions
Instructions provided by the user control the actions of the awk command. These instructions come from either the `Program' variable on the command line or from a file specified by the -f flag together with the ProgramFile variable. If multiple program files are specified, the files are concatenated
in the order specified and the resultant order of instructions is used.
Output for the awk Command
The awk command produces three types of output from
the data within the input text file:
- Selected data can be printed to standard output, without alteration to
the input file.
- Selected portions of the input file can be altered.
- Selected data can be altered and printed to standard output, with or without
altering the contents of the input file.
All of these types of output can be performed on the same file. The programming
language recognized by the awk command allows the user
to redirect output.
File Processing with Records and Fields
Files are processed in the following way:
- The awk command scans its instructions and executes
any actions specified to occur before the input file is read.
The BEGIN statement in the awk programming language
allows the user to specify a set of instructions to be done before the first
record is read. This is particularly useful for initializing special variables.
- One record is read from the input file.
A record is a set of data separated
by a record separator. The default value for the record separator is the new-line
character, which makes each line in the file a separate record. The record
separator can be changed by setting the RS special variable.
- The record is compared against each pattern specified by the awk command's instructions.
The command instructions can specify that
a specific field within the record be compared. By default, fields are separated
by white space (blanks or tabs). Each field is referred to by a field variable.
The first field in a record is assigned the $1 variable,
the second field is assigned the $2 variable, and so
forth. The entire record is assigned to the $0 variable.
The field separator can be changed by using the -F flag
on the command line or by setting the FS special variable. The FS special variable
can be set to the values of: blank, single character, or extended regular expression.
- If the record matches a pattern, any actions associated with that pattern
are performed on the record.
- After the record is compared to each pattern, and all specified actions
are performed, the next record is read from input; the process is repeated
until all records are read from the input file.
- If multiple input files have been specified, the next file is then opened
and the process repeated until all input files have been read.
- After the last record in the last file is read, the awk command executes any instructions specified to occur after the input
processing.
The END statement in the awk programming language allows the user to specify actions to be performed
after the last record is read. This is particularly useful for sending messages
about what work was accomplished by the awk command.
The awk Command Programming Language
The awk command programming language consists of
statements in the form:
Pattern { Action }
If a record matches the specified pattern, or contains a field which matches
the pattern, the associated action is then performed. A pattern can be specified
without an action, in which case the entire line containing the pattern is
written to standard output. An action specified without a pattern is performed
for every input record.
Patterns
There are four types of patterns used in the awk command
language syntax:
Regular Expressions
The extended regular expressions used by the awk command
are similar to those used by the grep command.
The simplest form of an extended regular expression is a string of characters
enclosed in slashes. For an example, suppose a file named testfile had the following contents:
smawley, andy
smiley, allen
smith, alan
smithern, harry
smithhern, anne
smitters, alexis
Entering the following command line:
awk '/smi/' testfile
would print to standard output of all records that contained an occurrence
of the string smi. In this example, the program '/smi/' for the awk command
is a pattern with no action. The output is:
smiley, allen
smith, alan
smithern, harry
smithhern, anne
smitters, alexis
The following special characters are used to form extended regular expressions:
| Character |
Function |
|
+ |
Specifies that a string matches if one or more occurrences of the
character or extended regular expression that precedes the + (plus) are within the string. The command line:
awk '/smith+ern/' testfile
prints to standard output any record that contained a string
with the characters smit, followed by one or more h characters, and then ending with the characters ern. The output in this example is:
smithern, harry
smithhern, anne |
|
? |
Specifies that a string matches if zero or one occurrences of the
character or extended regular expression that precedes the ? (question mark)
are within the string. The command line:
awk '/smith?/' testfile
prints to standard output of all records that contain the characters smit, followed by zero or one instance of the h character. The output in this example is:
smith, alan
smithern, harry
smithhern, anne
smitters, alexis |
|
| |
Specifies that a string matches if either of the strings separated
by the | (vertical line) are within the string. The command line:
awk '/allen
|
alan /' testfile
prints to standard output of all records that
contained the string allen or alan. The output in this example is:
smiley, allen
smith, alan |
|
( ) |
Groups strings together in regular expressions. The command line:
awk '/a(ll)?(nn)?e/' testfile
prints to standard output of all records with the string ae or alle or anne or allnne. The output in this example is:
smiley, allen
smithhern, anne |
|
{m} |
Specifies that a string matches if exactly m occurrences
of the pattern are within the string. The command line:
awk '/l{2}/' testfile
prints to standard output
smiley, allen |
|
{m,} |
Specifies that a string matches if at least m occurrences of the pattern are within the string. The command line:
awk '/t{2,}/' testfile
prints to standard
output:
smitters, alexis |
|
{m, n} |
Specifies that a string matches if between m and n, inclusive, occurrences of the pattern are within the
string ( where m <= n).
The command line:
awk '/er{1, 2}/' testfile
prints to standard output:
smithern, harry
smithern, anne
smitters, alexis |
|
[String] |
Signifies that the regular expression matches any characters specified
by the String variable within the square brackets.
The command line:
awk '/sm[a-h]/' testfile
prints to standard output of all records with the characters sm followed by any character in alphabetical order
from a to h. The output
in this example is:
smawley, andy |
|
[^ String] |
A ^ (caret) within the [ ] (square brackets) and at the beginning
of the specified string indicates that the regular expression does not match any characters within the square brackets. Thus, the command
line:
awk '/sm[^a-h]/' testfile
prints to standard output:
smiley, allen
smith, alan
smithern, harry
smithhern, anne
smitters, alexis |
|
~,!~ |
Signifies a conditional statement that a specified variable matches
(tilde) or does not match (tilde, exclamation point) the regular expression.
The command line:
awk '$1 ~ /n/' testfile
prints to standard output of all records whose first field
contained the character n. The output in this
example is:
smithern, harry
smithhern, anne |
|
^ |
Signifies the beginning of a field or record. The command line:
awk '$2 ~ /^h/' testfile
prints
to standard output of all records with the character h as the first character of the second field. The output in this example
is:
smithern, harry |
|
$ |
Signifies the end of a field or record. The command line:
awk '$2 ~ /y$/' testfile
prints
to standard output of all records with the character y as the last character of the second field. The output in this example
is:
smawley, andy
smithern, harry |
|
. (period) |
Signifies any one character except the terminal new-line character
at the end of a space. The command line:
awk '/a..e/' testfile
prints to standard output of all records with
the characters a and e separated by two characters.
The output in this example is:
smawley, andy
smiley, allen
smithhern, anne |
|
*(asterisk) |
Signifies zero or more of any characters. The command line:
awk '/a.*e/' testfile
prints to
standard output of all records with the characters a and e separated by zero or more characters. The output in this example
is:
smawley, andy
smiley, allen
smithhern, anne
smitters, alexis |
|
\ (backslash) |
The escape character. When preceding any of the characters that have
special meaning in extended regular expressions, the escape character removes
any special meaning for the character. For example, the command line:
/a\/\//
would match the pattern
a //, since the backslashes negate the usual meaning of the slash as a delimiter
of the regular expression. To specify the backslash itself as a character,
use a double backslash. See the following item on escape sequences for more
information on the backslash and its uses. |
Recognized Escape Sequences
The awk command recognizes most
of the escape sequences used in C language conventions, as well as several
that are used as special characters by the awk command
itself. The escape sequences are:
| Escape Sequence |
Character Represented |
|
\" |
\" (double-quotation) mark |
|
\/ |
/ (slash) character |
|
\ddd |
Character whose encoding is represented by a one-, two- or three-digit
octal integer, where d represents an octal digit |
|
\\ |
\ (backslash) character |
|
\a |
Alert character |
|
\b |
Backspace character |
|
\f |
Form-feed character |
|
\n |
New-line character (see following note) |
|
\r |
Carriage-return character |
|
\t |
Tab character |
|
\v |
Vertical tab. |
Note:
Except in the gsub, match, split, and sub built-in
functions, the matching of extended regular expressions is based on input
records. Record-separator characters (the new-line character by default) cannot
be embedded in the expression, and no expression matches the record-separator
character. If the record separator is not the new-line character, then the
new-line character can be matched. In the four built-in functions specified,
matching is based on text strings, and any character (including the record
separator) can be embedded in the pattern so that the pattern matches the
appropriate character. However, in all regular-expression matching with the awk command, the use of one or more NULL characters in the
pattern produces undefined results.
Relational Expressions
The relational operators < (less than), > (greater
than), <= (less than or equal to), >= (greater than or equal to), = = (equal
to), and ! = (not equal to) can be used to form patterns. For example, the
pattern:
$1 < $4
matches records where the first field is less than
the fourth field. The relational operators also work with string values. For
example:
$1 =! "q"
matches all records where the first field is not a q. String values can also be matched on collation
values. For example:
$1 >= "d"
matches all records where the first field starts with
a character that is a, b, c, or d. If
no other information is given, field variables are compared as string values.
Combinations of Patterns
Patterns can be combined using three options:
- Ranges are specified by two patterns separated
with a , (comma). Actions are performed on every record starting with the
record that matches the first pattern, and continuing through and including
the record that matches the second pattern. For example:
/begin/,/end/
matches the record containing the string begin, and every record between it and the record containing the string end, including the record containing the string end.
- Parentheses ( ) group patterns together.
- The boolean operators || (or), && (and), and ! (not) combine patterns
into expressions that match if they evaluate true, otherwise they do not match.
For example, the pattern:
$1 == "al" && $2 == "123"
matches records where the first field is al and the second field is 123.
BEGIN and END Patterns
Actions specified with the BEGIN pattern
are performed before any input is read. Actions specified with the END pattern are performed after all input has been read. Multiple BEGIN and END patterns are allowed
and processed in the order specified. An END pattern
can precede a BEGIN pattern within the program statements.
If a program consists only of BEGIN statements, the
actions are performed and no input is read. If a program consists only of END statements, all the input is read prior to any actions
being taken.
Actions
There are several types of action statements:
Action Statements
Action statements are enclosed in { } (braces). If the statements are specified
without a pattern, they are performed on every record. Multiple actions can
be specified within the braces, but must be separated by new-line characters
or ; (semicolons), and the statements are processed in the order they appear.
Action statements include:
- Arithmetical Statements
- The mathematical operators + (plus), - (minus), / (division), ^ (exponentiation),
* (multiplication), % (modulus) are used in the form:
Expression Operator Expression
Thus, the statement:
$2 = $1 ^ 3
assigns the value of the first field raised to the third
power to the second field.
- Unary Statements
- The unary - (minus) and unary + (plus) operate as in the C programming language:
+Expression or -Expression
- Increment and Decrement Statements
- The pre-increment and pre-decrement statements operate as in the C programming
language:
++Variable or --Variable
The post-increment
and post-decrement statements operate as in the C programming language:
Variable++ or Variable--
- Assignment Statements
- The assignment operators += (addition), -= (subtraction), /= (division),
and *= (multiplication) operate as in the C programming language, with the
form:
Variable += Expression
Variable -= Expression
Variable /= Expression
Variable *= Expression
For
example, the statement:
$1 *= $2
multiplies the field
variable $1 by the field variable $2 and then assigns the new value to $1.
The assignment operators ^= (exponentiation) and %= (modulus) have the form:
Variable1^=Expression1
AND
Variable2%=Expression2
and they are equivalent to the C programming language statements:
Variable1=pow(Variable1, Expression1)
AND
Variable2=fmod(Variable2, Expression2)
where pow is the pow subroutine and fmod is the fmod subroutine.
- String Concatenation Statements
- String values can be concatenated by stating them side by side. For
example:
$3 = $1 $2
assigns the concatenation of the
strings in the field variables $1 and $2 to the field variable $3.
Built-In Functions
The awk command language uses arithmetic functions,
string functions, and general functions. The close Subroutine statement is
necessary if you intend to write a file, then read it later in the same program.
Arithmetic Functions
The following arithmetic functions perform the same
actions as the C language subroutines by the same name:
| Function |
Action |
|
atan2( y, x ) |
Returns arctangent of y/x. |
|
cos( x ) |
Returns cosine of x; x is in radians. |
|
sin( x ) |
Returns sin of x; x is
in radians. |
|
exp( x ) |
Returns the exponential function of x. |
|
log( x ) |
Returns the natural logarithm of x. |
|
sqrt( x ) |
Returns the square root of x. |
|
int( x ) |
Returns the value of x truncated to an integer. |
|
rand( ) |
Returns a random number n, with 0 <= n < 1. |
|
srand( [Expr] ) |
Sets the seed value for the rand function to
the value of the Expr parameter, or use the time of
day if the Expr parameter is omitted. The previous
seed value is returned. |
String Functions
The string functions are:
| Function |
Action |
|
gsub( Ere, Repl, [ In ] ) |
Performs exactly as the sub function, except
that all occurrences of the regular expression are replaced. |
|
sub( Ere, Repl, [ In ] ) |
Replaces the first occurrence of the extended regular expression
specified by the Ere parameter in the string specified
by the In parameter with the string specified by the Repl parameter. The sub function
returns the number of substitutions. An & (ampersand) appearing in the
string specified by the Repl parameter is replaced
by the string in the In parameter that matches the
extended regular expression specified by the Ere parameter.
If no In parameter is specified, the default value
is the entire record ( the $0 record variable). |
|
index( String1, String2 ) |
Returns the position, numbering from 1, within the string specified
by the String1 parameter where the string specified
by the String2 parameter occurs. If the String2 parameter does not occur in the String1 parameter,
a 0 (zero) is returned. |
|
length [(String)] |
Returns the length, in characters, of the string specified by the String parameter. If no String parameter
is given, the length of the entire record (the $0 record
variable) is returned. |
|
blength [(String)] |
Returns the length, in bytes, of the string specified by the String parameter. If no String parameter
is given, the length of the entire record (the $0 record
variable) is returned. |
|
substr( String, M, [ N ] ) |
Returns a substring with the number of characters specified by the N parameter. The substring is taken from the string specified
by the String parameter, starting with the character
in the position specified by the M parameter. The M parameter is specified with the first character in the String parameter as number 1. If the N parameter is not specified, the length of the substring will be from
the position specified by the M parameter until the
end of the String parameter. |
|
match( String, Ere ) |
Returns the position, in characters, numbering from 1, in the string
specified by the String parameter where the extended
regular expression specified by the Ere parameter
occurs, or else returns a 0 (zero) if the Ere parameter
does not occur. The RSTART special variable is set to
the return value. The RLENGTH special variable is set
to the length of the matched string, or to -1 (negative one) if no match is
found. |
|
split( String, A, [Ere] ) |
Splits the string specified by the String parameter
into array elements A[1], A[2],
. . ., A[n], and returns the
value of the n variable. The separation is done with
the extended regular expression specified by the Ere parameter
or with the current field separator (the FS special
variable) if the Ere parameter is not given. The elements
in the A array are created with string values, unless
context indicates a particular element should also have a numeric value. |
|
tolower( String ) |
Returns the string specified by the String parameter,
with each uppercase character in the string changed to lowercase. The uppercase
and lowercase mapping is defined by the LC_CTYPE category
of the current locale. |
|
toupper( String ) |
Returns the string specified by the String parameter,
with each lowercase character in the string changed to uppercase. The uppercase
and lowercase mapping is defined by the LC_CTYPE category
of the current locale. |
|
sprintf(Format, Expr, Expr, . . . ) |
Formats the expressions specified by the Expr parameters according to the printf subroutine
format string specified by the Format parameter and
returns the resulting string. |
General Functions
The general functions are:
| Function |
Action |
|
close( Expression ) |
Close the file or pipe opened by a print or printf statement or a call to the getline function with the same string-valued Expression parameter.
If the file or pipe is successfully closed, a 0 is returned; otherwise a non-zero
value is returned. The close statement is necessary
if you intend to write a file, then read the file later in the same program. |
|
system(Command ) |
Executes the command specified by the Command parameter and returns its exit status. Equivalent to the systemsubroutine. |
|
Expression | getline [ Variable ] |
Reads a record of input from a stream piped from the output of a
command specified by the Expression parameter and
assigns the value of the record to the variable specified by the Variable parameter. The stream is created if no stream is currently
open with the value of the Expression parameter as
its command name. The stream created is equivalent to one created by a call
to the popen subroutine with the Command parameter taking the value of the Expression parameter and the Mode parameter set to a value
of r. Each subsequent call to the getline function reads another record, as long as the stream remains
open and the Expression parameter evaluates to the
same string. If a Variable parameter is not specified,
the $0 record variable and the NF special
variable are set to the record read from the stream. |
|
getline [ Variable ] < Expression |
Reads the next record of input from the file named by the Expression parameter and sets the variable specified by the Variable parameter to the value of the record. Each subsequent call to
the getline function reads another record, as long as
the stream remains open and the Expression parameter
evaluates to the same string. If a Variable parameter
is not specified, the $0 record variable and the NF special variable are set to the record read from the
stream. |
|
getline [ Variable ] |
Sets the variable specified by the Variable parameter
to the next record of input from the current input file. If no Variable parameter is specified, $0 record variable
is set to the value of the record, and the NF, NR, and FNR special variables are also set. |
Note:
All forms of the getline function return
1 for successful input, zero for end of file, and -1 for an error.
User-Defined Functions
User-defined functions are declared in the following form:
function Name (Parameter, Parameter,...) { Statements }
A function can be referred to anywhere in an awk command
program, and its use can precede its definition. The scope of the function
is global.
Function parameters can be either scalars or arrays. Parameter names are
local to the function; all other variable names are global. The same name
should not be used for different entities; for example, a parameter name should
not be duplicated as a function name, or special variable. Variables with
global scope should not share the name of a function. Scalars and arrays should
not have the same name in the same scope.
The number of parameters in the function definition does not have to match
the number of parameters used when the function is called. Excess formal parameters
can be used as local variables. Extra scalar parameters are initialized with
a string value equivalent to the empty string and a numeric value of 0 (zero);
extra array parameters are initialized as empty arrays.
When invoking a function, no white space is placed between the function
name and the opening parenthesis. Function calls can be nested and recursive.
Upon return from any nested or recursive function call, the values of all
the calling function's parameters shall be unchanged, except for array parameters
passed by reference. The return statement can be used
to return a value.
Within a function definition, the new-line characters are optional before
the opening { (brace) and after the closing } (brace).
An example of a function definition is:
function average ( g,n)
{
for (i in g)
sum=sum+g[i]
avg=sum/n
return avg
}
The function average is passed an array, g,
and a variable, n, with the number of elements
in the array. The function then obtains an average and returns it.
Conditional Statements
Most conditional statements in the awk command programming
language have the same syntax and function as conditional statements in the
C programming language. All of the conditional statements allow the use of
{ } (braces) to group together statements. An optional new-line can be used
between the expression portion and the statement portion of the conditional
statement, and new-lines or ; (semicolon) are used to separate multiple statements
in { } (braces). Six conditional statements in C language are:
| Conditional statement |
Required syntax or description |
|
if |
if ( Expression )
{ Statement } [ else Action ] |
|
while |
while ( Expression )
{ Statement } |
|
for |
for ( Expression ; Expression ; Expression ) { Statement } |
|
break |
Causes the program loop to be exited when the break statement is used in either a while or for statement. |
|
continue |
Causes the program loop to move to the next iteration when the continue statement is used in either a while or for statement. |
Five conditional statements in the awk command programming
language that do not follow C-language rules are:
| Conditional statement |
Required syntax or description |
|
for...in |
for ( Variable in Array ) { Statement }
The for...in statement sets the Variable parameter to each
index value of the Array variable, one index at a
time and in no particular order, and performs the action specified by the Statement parameter with each iteration. See the delete statement for an example of a for...in statement. |
|
if...in |
if ( Variable in Array ) { Statement }
The if...in statement
searches for the existence of the Array element. The
statement is performed if the Array element is found. |
|
delete |
delete Array [ Expression ]
The delete statement deletes both the array element specified by the Array parameter and the index specified by the Expression parameter. For example, the statements:
for (i in g)
delete g[i];
would delete every element of
the g[] array. |
|
exit |
exit [ Expression ]
The exit statement first invokes
all END actions in the order they occur, then terminates
the awk command with an exit status specified by the Expression parameter. No subsequent END actions are invoked if the exit statement occurs
within an END action. |
|
# |
# Comment
The # statement places comments. Comments should always
end with a new-line but can begin anywhere on a line. |
|
next |
Stops the processing of the current input record and proceeds with
the next input record. |
Output Statements
Two output statements in the awk command
programming language are:
| Output statement |
Syntax and description |
|
print |
print [ ExpressionList ] [ Redirection ] [ Expression ]
The print statement writes the value
of each expression specified by the ExpressionList parameter
to standard output. Each expression is separated by the current value of the OFS special variable, and each record is terminated by the
current value of the ORS special variable.
The output can be redirected using the Redirection parameter,
which can specify the three output redirections with the > (greater than),
>> (double greater than), and the | (pipe). The Redirection parameter specifies how the output is redirected, and the Expression parameter is either a path name to a file (when Redirection parameter is > or >> ) or the name of a command ( when the Redirection parameter is a | ). |
|
printf |
printf Format [ , ExpressionList ] [ Redirection ]
[ Expression ]
The printf statement writes to standard output the expressions specified by the ExpressionList parameter in the format specified by the Format parameter. The printf statement
functions exactly like the printf command, except for
the c conversion specification (%c). The Redirection and Expression parameters function
the same as in the print statement.
For the c conversion specification: if the argument has a
numeric value, the character whose encoding is that value will be output.
If the value is zero or is not the encoding of any character in the character
set, the behavior is undefined. If the argument does not have a numeric value,
the first character of the string value will be output; if the string does
not contain any characters the bahaviour is undefined. |
Note:
If the Expression parameter specifies
a path name for the Redirection parameter, the Expression parameter should be enclosed in double quotes
to insure that it is treated as a string.
Variables
Variables can be scalars, field variables, arrays, or special variables.
Variable names cannot begin with a digit.
Variables can be used just by referencing them. With the exception of function
parameters, they are not explicitly declared. Uninitialized scalar variables
and array elements have both a numeric value of 0 (zero) and a string value
of the null string (" ").
Variables take on numeric or string values according to context. Each variable
can have a numeric value, a string value, or both. For example:
x = "4" + "8"
assigns the value of 12 to the variable x. For string constants, expressions should be enclosed
in " " (double quotation) marks.
There are no explicit conversions between numbers and strings. To force
an expression to be treated as a number, add 0 (zero) to it. To force an expression
to be treated as a string, append a null string (" ").
Field Variables
Field variables are designated by a $ (dollar sign)
followed by a number or numerical expression. The first field in a record
is assigned the $1 variable , the second field is assigned
to the $2 variable, and so forth. The $0 field variable is assigned to the entire record. New field variables
can be created by assigning a value to them. Assigning a value to a non-existent
field, that is, any field larger than the current value of $NF field variable, forces the creation of any intervening fields (set
to the null string), increases the value of the NF special
variable, and forces the value of $0 record variable
to be recalculated. The new fields are separated by the current field separator
( which is the value of the FS special variable). Blanks
and tabs are the default field separators. To change the field separator,
use the -F flag, or assign the FS special
variable a different value in the awk command program.
Arrays
Arrays are initially empty and their sizes change dynamically.
Arrays are represented by a variable with subscripts in [ ] (square brackets).
The subscripts, or element identifiers, can be numbers of strings, which provide
a type of associative array capability. For example, the program:
/red/ { x["red"]++ }
/green/ { y["green"]++ }
increments counts for both the red counter and the green counter.
Arrays can be indexed with more than one subscript,
similar to multidimensional arrays in some programming languages. Because
programming arrays for the awk command are really one
dimensional, the comma-separated subscripts are converted to a single string
by concatenating the string values of the separate expressions, with each
expression separated by the value of the SUBSEP environmental
variable. Therefore, the following two index operations are equivalent:
x[expr1, expr2,...exprn]
AND
x[expr1SUBSEPexpr2SUBSEP...SUBSEPexprn]
When using the in operator, a
multidimensional Index value should be contained within
parentheses. Except for the in operator, any reference
to a nonexistent array element automatically creates that element.
Special Variables
The following variables have special meaning for the awk command:
| Special variable |
Description |
|
ARGC |
The number of elements in the ARGV array. This
value can be altered. |
|
ARGV |
The array with each member containing one of the File variables or Assignment variables, taken
in order from the command line, and numbered from 0 (zero) to ARGC -1. As each input file is finished, the next member of the ARGV array provides the name of the next input file, unless:
-
The next member is an Assignment statement,
in which case the assignment is evaluated.
-
The next member has a null value, in which case the member
is skipped. Programs can skip selected input files by setting the member of
the ARGV array that contains that input file to a null
value.
-
The next member is the current value of ARGV [ARGC -1], which the awk command interprets as the end of the input files.
|
|
CONVFMT |
The printf format for converting numbers to
strings (except for output statements, where the OFMT special
variable is used). The default is "%.6g". |
|
ENVIRON |
An array representing the environment under which the awk command operates. Each element of the array is of the form:
ENVIRON [ "Environment VariableName" ] = EnvironmentVariableValue
The values are set when the awk command
begins execution, and that environment is used until the end of execution,
regardless of any modification of the ENVIRON special
variable. |
|
FILENAME |
The path name of the current input file. During the execution of
a BEGIN action, the value of FILENAME is undefined. During the execution of an END action,
the value is the name of the last input file processed. |
|
FNR |
The number of the current input record in the current file. |
|
FS |
The input field separator. The default value is a blank. If the input
field separator is a blank, any number of locale-defined spaces can separate
fields. The FS special variable can take two additional
values:
-
With FS set to a single character,
fields are separated by each single occurrence of the character.
-
With FS set to an extended regular expression, each occurrence of a sequence matching the
extended regular expression separates fields.
|
|
NF |
The number of fields in the current record, with a limit of 99. Inside
a BEGIN action, the NF special
variable is undefined unless a getline function without
a Variable parameter has been issued previously. Inside
an END action, the NF special
variable retains the value it had for the last record read, unless a subsequent,
redirected, getline function without a Variable parameter is issued prior to entering the END action. |
|
NR |
The number of the current input record. Inside a BEGIN action the value of the NR special variable
is 0 (zero). Inside an END action, the value is the
number of the last record processed. |
|
OFMT |
The printf format for converting numbers to
strings in output statements. The default is "% .6g". |
|
OFS |
The output field separator (default is a space). |
|
ORS |
The output record separator (default is a new-line character). |
|
RLENGTH |
The length of the string matched by the match function. |
|
RS |
Input record separator (default is a new-line character). If the RS special variable is null, records are separated by sequences
of one or more blank lines; leading or trailing blank lines do not result
in empty records at the beginning or end of input; and the new-line character
is always a field separator, regardless of the value of the FS special variable. |
|
RSTART |
The starting position of the string matched by the match function, numbering from 1. Equivalent to the return value of the match function. |
|
SUBSEP |
Separates multiple subscripts. The default is \031. |
Flags
|
-f ProgramFile |
Obtains instructions for the awk command from
the file specified by the ProgramFile variable. If
the -f flag is specified multiple times, the concatenation
of the files, in the order specified, will be used as the set of instructions. |
|
-F Ere |
Uses the extended regular expression specified by the Ere variable as the field separator. The default field separator is a
blank. |
|
-v Assignment |
Assigns a value to a variable for the awk command's
programming language. The Assignment parameter is
in the form of Name = Value. The Name portion specifies the name of the variable and can be any combination
of underscores, digits, and alphabetic characters, but it must start with
either an alphabetic character or an underscore. The Value portion is also composed of underscores, digits, and alphabetic characters,
and is treated as if it were preceded and followed by a " (double-quotation
character, similar to a string value). If the Value portion
is numeric, the variable will also be assigned the numeric value.
The assignment specified by the -v flag occurs before
any portion of the awk command's program is executed,
including the BEGIN section. |
|
Assignment |
Assigns a value to a variable for the awk command's
programming language. It has the same form and function as the Assignment variable with the -v flag, except for
the time each is processed. The Assignment parameter
is processed just prior to the input file (specified by the File variable) that follows it on the command line. If the Assignment parameter is specified just prior to the first of multiple
input files, the assignments are processed just after the BEGIN sections (if any). If an Assignment parameter
occurs after the last file, the assignment is processed before the END sections (if any). If no input files are specified, the assignments
are processed the standard input is read. |
|
File |
Specifies the name of the file that contains the input for processing.
If no File variable is specified, or if a - (minus) sign is specified, standard input is processed. |
|
'Program' |
Contains the instructions for the awk command.
If the -f flag is not specified, the Program variable should be the first item on the command line. It should
be bracketed by ' ' (single quotes). |
Exit Status
This command returns the following exit values:
|
0 |
Successful completion. |
|
>0 |
An error occurred. |
Examples
- To display the lines of a file that are longer than 72 characters, enter:
awk 'length >72' chapter1
This selects each line of
the chapter1 file that is longer than 72 characters
and writes these lines to standard output, because no Action is specified. A tab character is counted as 1 byte.
- To display all lines between the words start and stop, including "start" and "stop", enter:
awk '/start/,/stop/' chapter1
- To run an awk command program, sum2.awk, that processes the file, chapter1,
enter:
awk -f sum2.awk chapter1
The following
program, sum2.awk, computes the sum and average
of the numbers in the second column of the input file, chapter1:
{
sum += $2
}
END {
print "Sum: ", sum;
print "Average:", sum/NR;
} The first action adds the value of the second field of each line
to the variable sum. All variables are initialized
to the numeric value of 0 (zero) when first referenced. The pattern END before the second action causes those actions to be performed after
all of the input file has been read. The NR special
variable, which is used to calculate the average, is a special variable specifying
the number of records that have been read.
- To print the first two fields in opposite order,
enter:
awk '{ print $2, $1 }' chapter1
- The following awk program
awk -f sum3.awk chapter2
prints the first two fields of the file chapter2 with input fields separated by comma and/or blanks and tabs, and then
adds up the first column, and prints the sum and average:
BEGIN {FS = ",|[ \t]+"}
{print $1, $2}
{s += $1}
END {print "sum is",s,"average is", s/NR }
Related Information
The grep command and the sed command.