NR | Number of records (lines) that have been read by this program so far. The current line number, accumulated across all files read so far. | |
FNR | Number of records (lines) that have been read by the current file being read. Resets to 1 when you start a new file. | |
FILENAME | The name of the current file being read. | |
FS | Field separator. The string that separates fields from one another in the input file. Defaults to a single space (" "), which means "any amount of whitespace (/[[:space:]]+/)". It can be changed to a single tab for tab delimited files, or any other string, such as "," -- which would be used in a comma delimited file with each field quoted. | |
OFS | The field separator used in write operations. This can be different from FS. | |
NF | Number of fields on the current line. | |
$0 | The line just read -- the current line. | |
$1, $2, $3, ... | Fields 1, 2, 3 and so-on. Can be referenced programatically as $i like this:for(i=1; i <= NF; i++) {print $i}The preceding would print each field on its own line. |
|
person_id lname fname job_id |
#!/usr/local/bin/mawk -We |
[slitt@mydesk awk]$ cat people.table | ./hello.awk |
#!/usr/local/bin/mawk -We |
[slitt@mydesk awk]$ cat people.table | ./hello.awk |
#!/usr/local/bin/mawk -We |
[slitt@mydesk awk]$ cat people.table | ./hello.awk |
#!/usr/local/bin/mawk -We |
[slitt@mydesk awk]$ cat people.table | ./hello.awk |
#!/usr/local/bin/mawk -We |
[slitt@mydesk awk]$ cat people.table | ./hello.awk |
NF | |||
$0 | The entire line | ||
NF | Number of fields. This is the number of fields on the line. | ||
$1, $2, $3... | The first, second, third fields, etc. You can iterate
through fields like this:for(i=1; i<=NF; i++){act_on_field()} |
||
FS | Field separator. This is what separates the fields from each other. The default is " ", a single space character which means "any combination of whitespace". For tab delimited lines you can change it to "\x09", representing a single tab. On a comma delimited line with every field enclosed in doublequotes, it could be "\",\"", but only if ALL fields are quoted. "Intelligent" quoting, where a field is quoted only if it contains commas, would be a nightmare. The tab separator can be a solid character like "|", or a string of them like ":-:". | ||
OFS | Output field separator. Defaults to " ", but can be
set. In a print statement like this:print $3, $4, $2the fields would be separated by the field separator. |
||
#!/usr/local/bin/mawk -We |
[slitt@mydesk awk]$ cat people.table | ./hello.awk |
#!/usr/local/bin/mawk -We |
[slitt@mydesk awk]$ cat people.table | ./hello.awk |
#!/usr/local/bin/mawk -We |
[slitt@mydesk awk]$ cat people.table | ./hello.awk |
[slitt@mydesk awk]$ cat people.table | ./hello.awk |
DANGER WILL ROBINSON
The OFS variable will take effect only if one of the fields is changed or at least meddled with. A simple $i=$i will do, but you must meddle with it in some way. An exception to this inconvenience occurs when you use commas in a print statement such as this: print $3,$2, $4The preceding print statement will honor OFS. |
#!/usr/local/bin/mawk -We |
[slitt@mydesk awk]$ cat people.table | ./hello.awk |
#!/usr/local/bin/mawk -We |
[slitt@mydesk awk]$ ./hello.awk | sort > test.file |
#!/usr/local/bin/mawk -We |
$1 != lastdollar1{
if(NR > 1){
print "There were " fieldcount " lines with value " lastdollar1 "."
print ""
}
lastdollar1 = $1
fieldcount = 0
}
[slitt@mydesk awk]$ cat test.file | ./hello.awk |
#!/usr/local/bin/mawk -We |
addfields.awk | addheaders.awk | |
#!/usr/local/bin/mawk -We |
#!/usr/local/bin/mawk -We |
[slitt@mydesk awk]$ cat test.file | ./addfields.awk | sort | ./addheaders.awk |
|
The code at the left is the first part of merge version
of a tealeaves algorithm. The only change to the BEGIN section is the
addition of the filename that will hold each key group's totals. The NR==1 rule functions to write the first main file (stdout) header flag. The purpose of the header and footer flags is to simplify the algorithm in the next program in the pipeline. By having this header marker, the next program down the pipe can print a header, and nothing but a header, confident that all data lines will follow. The $1!=lastdollar1 rule prints the total for the last key to the merge file, then prints the footer flag for the last key to the main file, then prints the header for the new key. Lastly, it resets the break variable and zeros the total. The always true rule prints a data record. The END rule prints the final key's total to the merge file, and prints the footer flag for the final key to the main file. |
STDOUT | MERGE FILE | |||
|
|
|
The code at the left identifies the merge file in the BEGIN section. The three rules correspond to the three type of records -- header flags (10), data lines (11) and footer flags(12). You can see all three in the output preceding this code. All three are mutually exclusive, so there's never a need to drop through and execute anything else. This makes the algorithm incredibly simple. On encountering a header flag, the program reads the next line of the merge file and uses its data to write the top header. On encountering a data line, the line is simply printed. On encountering a footer flag line, a footer is printed. |
[slitt@mydesk awk]$ cat test.file | ./addfields.awk | ./addheaders.awk |
[slitt@mydesk awk]$ cat people.config |
^Aperson_id ^Alname ^Afname ^Ajob_id |
#!/usr/local/bin/mawk -We |
[slitt@mydesk awk]$ ./hello.awk people.config people.table |
|
The BEGIN section is businss as usual, except that it sets filenumber to instantiate file number break logic. The FNR==1 rule increments the file number when FNR drops back to 1, meaning a new file has been encountered. Remember, according to gawk's documentation, gawk provides you automatically with ARGIND to take the place of filenumber. The first file does nothing but load an array called fields with the fields listed in the first file. This configures the program to print those fields in that order. The reason fields[0] is set to FNR+1000 is so that an upper limit can be recorded without using another global variable. The reason 1000 is added to all subscripts is so that subscripts will be compared correctly, whether the comparison is a string or a numerical comparison. |
[slitt@mydesk awk]$ ./hello.awk people.config people.table |
#!/usr/local/bin/mawk -We |
[slitt@mydesk awk]$ ./hello.awk --mood=happy people.config --job="Awk Professor" people.table |
#!/usr/local/bin/mawk -We |
[slitt@mydesk awk]$ ./args.awk one.txt --junk1=junk2 two.txt --junk3 three.txt --junk3=junk2 four.txt |
The preceding code produces the expected output:
|
The function definition starts with the word function
followed by the function's name, followed by its arguments enclosed in
parentheses. The body of the function's code is enclosed in curly
braces. If desired, a return value is returned via a return statement. Variables declared or used in the body of the function are global. They overwrite identically named variables in the program's actions (or other functions), and upon entry have the value of such identically named variables. Often this is not what you want... |
#!/usr/local/bin/mawk -We |
[slitt@mydesk awk]$ ./hello.awk |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
#include <stdio.h> |
#!/usr/local/bin/mawk -We |
|
[slitt@mydesk awk]$ gcc test.c |
[slitt@mydesk awk]$ ./test.awk |
#!/usr/local/bin/mawk -We |
[slitt@mydesk awk]$ ./test.awk |
|
Stack functions are accomplished by functions push() and pop(). push()
simply appends its value
argument to the end of the array identified by the stack argument.
It increments stack["lastss"]
so future pop()
and push()
and stacklook()
calls will work on the right element. The pop() function deletes the last element from the stack and returns it via function return. Notice the locx "argument". It's not an argument at all -- it's a local variable. In awk local arguments can be declared only in the same parentheses as arguments -- after the arguments. The stacklook() function is a way to non-destructively observe the stack. stacklook(myarray,0) returns the element that would be returned by pop() if you were to call pop(). stacklook(1) returns the most deeply embedded element in the stack -- the last valid pop(). Another way to look at stacklook() is to see the stack as an array instead of a stack. Positive numerical arguments to stacklook() correspond to array subscripts. Negative numerical arguments indicate how far from the end of the array you want to look (the stack interpretation would be how many pops you'd need to do before popping that argument). All three functions, push(), pop() and stacklook() contain code to return NULL if a numerical argument points to something before or beyond the array comprising the stack, or if the stack had no elements, indicating a spent stack. HOWEVER, that can backfire if a NULL element was pushed -- how can you differentiate a deliberate NULL element from a spent stack or out of bounds numerical argument? Two functions, stackoutofrange() and stackspent() are included to check the actual stack rather than testing the return value. stackspent() returns 0 if the stack is not spent, a positive number otherwise. stackoutofrange() returns 0 if the stack is not spent and the numerical argument is within the stack's range. It returns 3 if the stack is spent, 1 if a positive numerical argument is too positive, and -1 if a negative numerical argument is too negative. When using loops, always test using stackoutofrange() and stackspent(), because you never know, especially during development, whether a NULL has accidentally been pushed onto the stack, or arrived on the stack otherwise (using array techniques for instance). |
===== TESTING stackoutofrange() BELOW ======== |