Troubleshooters.Com and Code Corner Present

Litt's Lua Laboratory:
Lua Closures and Iterators
(With Snippets)

Copyright (C) 2011 by Steve Litt



Debug like a Ninja

Contents

  • Introduction
  • Closure Hello World
  • A Real Iterator
  • Fibonacci Iterator
  • A Practical Line Skipper Iterator
  • Conclusion
  • Introduction

    Closures are hard to describe. But to paraphrase Supreme Court justice Potter Stewart, "You know it when you see it." Basically,  a closure  has the following properties:
    You can use closures for a variety of powerful features including:

    Closure Hello World

    When viewing the following code and its output please remember what was said about closures in the Introduction:
    Now let's view the closure Hello World:
    #!/usr/bin/lua

    function maker()
    local n = 0
    function iter()
    n = n + 1
    return n
    end
    return iter
    end

    iter_a = maker() -- Make an iterator
    iter_b = maker() -- Make a different iterator
    print(iter_a()) -- Should print 1
    print(iter_a()) -- Should print 2
    print(iter_b()) -- Should print 1
    print(iter_a()) -- Should print 3
    print(iter_b()) -- Should print 2
    In the preceding code, function iter() is the function within a function, and iter() can see surrounding function maker()'s local variable, n. Now here's the thing: Because n is a local variable of maker(), every time maker() is run, a completely new and separate copy of n is instantiated and initialized (to 0). Also, every time maker() is run, it returns a new copy of iter(), which of course sees the newly initialized (to 0) n from maker(). iter() always increments before returning the new n so the originally instantiated 0 value is incremented to 1, so n returns 1 the first time it's called. So maker() serves as sort of a factory (which is why it's called "maker") for copies of iter(). Since iter() increments the value of its copy of n every time it runs, it keeps returning higher values.

    Returning the Lua Way

    Naming the function you return
      
    Returning an anonymous function
    function maker()
    local n = 0
    function iter()
    n = n + 1
    return n
    end
    return iter
    end

    function maker()
    local n = 0
    return function()
    n = n + 1
    return n
    end
    end

    Remember, functions are just data. You can assign a function to a variable. And when you get right down to it, in Lua a variable is just a name for a piece of data. So in the left hand version of function maker() above, a function is assigned to a variable called iter., and then iter is returned. On the right side version of maker(), we don't bother assigning to a variable, but instead simply pass it back as the return. The calling function assigns the return to a variable, so there's no need for a variable inside maker() itself. Therefore, when returning a function, most Lua programmers return an anonymous function as on the right. It saves an otherwise useless variable name, it saves a line of code, and for those used to the convention, the intent of the programmer is clearer.

    Watch Out For Infinite Loops

    Don't put function maker() in a for loop, because it will never stop -- it will infinitely loop. But there are ways around that. Read on...

    A Real Iterator

    Iterators are made with closures very much like you saw in the Closure Hello World article. But iterators that can be used with generic for loops must iterate over a finite series of values. Sometimes the finite nature is defined by the number of elements -- for instance when iterating over a table. But in this case we'll just declare the upper limit with an argument to the function that in the Hello World article was called maker(). Except we're not calling it maker() this time, because the Lua convention is to call the iterator maker what the iterator produces -- pairs(), ipairs(), or in this case positive_integers(). See the following code:
    #!/usr/bin/lua

    function positive_integers(max)
    local n = 0
    return function()
    n = n + 1
    if n > max then
    return nil
    else
    return n
    end
    end
    end

    for v in positive_integers(3) do
    print(v)
    end
    print("================")
    for v in positive_integers(5) do
    print(v)
    end
    The preceding code yields the following output:
    slitt@mydesk:/d/websites/tjunct/codecorn/lua$ ./test.lua
    1
    2
    3
    ================
    1
    2
    3
    4
    5
    slitt@mydesk:/d/websites/tjunct/codecorn/lua$
    Cool! You tell it to iterate 3 times, it iterates 3. Tell it 5, it iterates 5. Let's see what happened:
    function positive_integers(max)
    local n = 0
    return function()
    n = n + 1
    if n > max then
    return nil
    else
    return n
    end
    end
    end
       
    The code on the left is pretty similar to the code from the Hello World article except you're passing the maximum value in as the argument to the maker, positive_integers(). In the returned function you not only increment n, but also if it's over the maximum you return nil. Generic for loops stop the first time their iterator returns nil. That's why they don't loop infinitely. So any time you make an iterator meant to be used in a generic for loop, be sure that iterator returns nil after the last good data.

    Iterators and For Loops

    It's worth taking some time to discuss the exact syntax of a generic for loop so you can truly understand it when rolling your own iterators. The time you spend understanding this will be a blessing when writing Lua code, because generic for loops and iterators are a big part of what makes Lua so productive.

    Remember, an iterator is a function inside a function where the inner function sees the outer function's local variables. The inner function does something to increment or cycle through the local variable(s) in the outer function, returning the new value of the outer function's local variable, or something depending on that new value. The outer function passes the inner function back as a function return.

    Let's say you have an iterator maker called iter_maker(). You can do this:
    iterator_fcn = iter_maker()    -- Assign new iterator function to iterator_fcn
    for k,v in iterator_fcn do
    print(string.format("k=%s, v=%s", tostring(k), tostring(v)))
    end
    The for loop operates on the iterator function. Not on what the iterator function returns, and not on the maker function, but on the iterator function.  Notice that on the for line, iterator_fcn had no parentheses.

    Do you notice the possible shortcut? In the preceding code you assigned the return of iter_maker() to a variable called iterator_fcn, which you then put on the for line. Your only use of iterator_fcn is to transfer the return of iter_maker() to the for line. Why not just do it directly, like this:
    for k,v in iter_maker() do
    print(string.format("k=%s, v=%s", tostring(k), tostring(v)))
    end
    Look at the preceding code until you understand why it's the same as the code before it. The for line requires an iterator function, which is exactly what the maker function returns.

    Keep going over this subsection on iterators and for loops until you really understand it. Do experiments to bolster this understanding. Once you truly understand this, you'll have a much nicer life in Lua.

    A few more points:

    The following are a few more points to consider:

    Fibonacci Iterator

    Now we'll make a slightly more complex iterator doing something a little more complex than counting, while at the same time demonstrating the usual iterator behavior of returning both a key and a value. We'll write an iterator to produce the Fibonacci series of integers.

    The Fibonacci series is a series starting with 0 and 1 as its first two elements, for which each successive element is the sum of the last two. So 0+1 produces another 1, and 1+1 produces 2, and 1+2 produces 3, and 2+3 produces 5, on and on. As in the iterator in the Real Iterator article,
    Here's the code and the produced output:
    #!/usr/bin/lua

    local sf = string.format -- alias to shorten print lines

    function fibonaccis(max)
    local b4 = -1
    local now = 1
    local key = -1
    return function()
    value = b4 + now
    if value > max then
    return nil, nil
    else
    b4 = now
    now = value
    key = key + 1
    return key, value
    end
    end
    end

    for k,v in fibonaccis(100) do
    print(sf("k=%2d, v=%2d", k, v))
    end
    The preceding code produces the following output:
    slitt@mydesk:/d/websites/tjunct/codecorn/lua$ ./test.lua
    k= 0, v= 0
    k= 1, v= 1
    k= 2, v= 1
    k= 3, v= 2
    k= 4, v= 3
    k= 5, v= 5
    k= 6, v= 8
    k= 7, v=13
    k= 8, v=21
    k= 9, v=34
    k=10, v=55
    k=11, v=89
    slitt@mydesk:/d/websites/tjunct/codecorn/lua$
    Remember that k and v stand for "key" and "value", and they not have to be declared, and they're local to the loop. If you need either outside the loop, and it's entirely possible you will, you need to assign the one you need (or both) to a variable(s) with scope outside the loop.

    Please go over the preceding code until you understand it. The only real difference is from previous loops on this page are:

    Dual Interpretation of Max

    Some people might want to see the Fibonaccis up to 100, and some might want to see the first 10 Fibonaccis. One of many ways that can be accommodated is to make it so if the argument max is negative, it means maximum key, while if it's positive it means maximum value. There are many ways of doing that, one of which follows:
    #!/usr/bin/lua

    local sf = string.format -- alias to shorten print lines

    function fibonaccis(max)
    local maxkey_flag = false
    local b4 = -1
    local now = 1
    local key = -1
    if max < 0 then
    maxkey_flag = true
    max = max * -1
    end
    return function()
    key = key + 1
    value = b4 + now
    if maxkey_flag and key > max or
    not maxkey_flag and value > max then
    return nil, nil
    else
    b4 = now
    now = value
    return key, value
    end
    end
    end

    for k,v in fibonaccis(13) do
    print(sf("k=%2d, v=%2d", k, v))
    end

    print("============")

    for k,v in fibonaccis(-8) do
    print(sf("k=%2d, v=%2d", k, v))
    end
    The preceding code predictably produces the following output:
    slitt@mydesk:/d/websites/tjunct/codecorn/lua$ ./test.lua
    k= 0, v= 0
    k= 1, v= 1
    k= 2, v= 1
    k= 3, v= 2
    k= 4, v= 3
    k= 5, v= 5
    k= 6, v= 8
    k= 7, v=13
    ============
    k= 0, v= 0
    k= 1, v= 1
    k= 2, v= 1
    k= 3, v= 2
    k= 4, v= 3
    k= 5, v= 5
    k= 6, v= 8
    k= 7, v=13
    k= 8, v=21
    slitt@mydesk:/d/websites/tjunct/codecorn/lua$
    When max was positive, it terminated based on value. When max was negative it terminated based on key. Study the preceding until you understand it. It's really not much different from the first Fibonacci iterator.

    A Practical Line Skipper Iterator

    Now let's make something practical. Have you noticed that Lua has no continue statement to skip back to the top of a loop? There are several practical and philosophical reasons for that lack, but sometimes you wish you had a continue statement.

    I use continue statements looping through a file, where certain lines should not be processed. At the top of the loop I must determine whether the line is one that should not be processed. I have two choices for avoiding processing:
    1. Set a process_this_line flag false and have the processing in an if process_this_line if statement, or
    2. Use a continue
    If you've been around for awhile you know that continue statements (and their cousin break) are actually gotos that can produce spaghetti code. Used in long loops, or used wrong, they can turn your code into a logical mess. But if  you've been around for awhile you also know that properly used, continue saves an if statement and a level of indentation. After the last continue statement you typically put a comment that says "if you got this far, you should process the line". Very readable, very efficient, but Lua has no continue statement.

    Not to worry. Instead of determining line eligibility in the loop, you can determine it in the iterator. This makes the loop much simpler and more readable. What you'll see here is an iterator that can easily be modified, via a callback function as explained on the Lua Callbacks page, to do all sorts of eligibility screening.

    First, here's the closure itself:
    -- relevent_lines.lua Copyright (C) 2011 by Steve Litt, all rights reserved.
    --
    -- Permission is hereby granted, free of charge, to any person obtaining a copy
    -- of this software and associated documentation files (the "Software"), to deal
    -- in the Software without restriction, including without limitation the rights
    -- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
    -- copies of the Software, and to permit persons to whom the Software is
    -- furnished to do so, subject to the following conditions:
    --
    -- The above copyright notice and this permission notice shall be included in
    -- all copies or substantial portions of the Software.
    --
    -- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
    -- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
    -- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
    -- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
    -- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
    -- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
    -- THE SOFTWARE.`

    -- Version 0.0.1, pre-alpha
    -- relevant_lines() is an iterator maker that takes a table as its one and only
    -- argument. At minimum this table must have a key called "file" whose value is
    -- either the filename of an input file or an open-for-read handle. Other possible
    -- elements to put in this table include:
    -- this_line_number: One less than the first key to be delivered, defaults to 0
    -- is_relevant: A callback function determining whether to bestow or skip this line
    -- is_relevent defaults to "pass back all non-blank lines"
    -- tweak: A callback to do other operations to relevant lines
    --
    -- Because of the table nature of the argument, you can put pretty much
    -- everything but the kitchen sink, and as long as either is_relevant() or
    -- tweak() calls it, it will form part of the algorithm.

    module(..., package.seeall);

    function relevant_lines(tab)
    --print(type(tab.file))
    --os.exit(1)
    local handle
    tab.this_line_text = nil
    tab.prev_line_text = nil
    tab.this_line_number = tab.this_line_number or 0
    tab.prev_line_number = -1

    --### GET FILE HANDLE UP AND RUNNING
    if tab == nil then
    io.stderr:write("ERROR: Function relevant_lines() must have a single argument, a table.\n")
    io.stderr:write("Aborting...\n\n")
    os.exit(1)
    end
    if type(tab.file) == "nil" then
    io.stderr:write("ERROR: Table argument to relevant_lines() must have an element called file.\n")
    io.stderr:write("Aborting...\n\n")
    os.exit(1)
    elseif type(tab.file) == "userdata" then
    handle = tab.file
    elseif type(tab.file) == "string" then
    handle = assert(io.open(tab.file, "r"))
    else
    io.stderr:write("ERROR: Function relevant_lines(): tab.file has wierd type.\n")
    io.stderr:write("Aborting...\n\n")
    os.exit(1)
    end

    --### IF YOU GOT HERE, YOU OPENED THE FILE FOR INPUT

    --### DEFAULT THE CALLBACK IF NECESSARY. DEFAULT TO SKIP BLANK LINES
    if not tab.is_relevant then
    tab.is_relevant = function()
    return string.match(tab.this_line_text, "%S") -- skip blanks
    end
    end

    --### DEFINE THE ITERATOR TO RETURN
    return function()
    -- Read line and increment line count
    tab.this_line_text = handle:read("*line")
    tab.this_line_number = tab.this_line_number + 1

    -- Blow off any nonrelevant lines
    while tab.this_line_text and not tab.is_relevant() do
    tab.this_line_text = handle:read("*line")
    tab.this_line_number = tab.this_line_number + 1
    end

    -- Return nil if eof, and close handle if made from string
    if tab.this_line_text == nil then
    if type(tab.file) == "string" then
    io.close(handle)
    end
    return nil, nil

    else -- Run tweak procedure and then return the line number and text
    if tab.tweak then tab.tweak() end;
    return tab.this_line_number, tab.this_line_text
    end
    end
    end
    OK, let's discuss the preceding code. It's a Lua module. You know that by the fact that the first non-comment line that looks like this:
    module(..., package.seeall);
    I'm not going to explain the preceding line other than to say that line is how you make a Lua file into a module that can be incorporated by the require() function.

    The module has filename relevant_lines.lua and is entirely comprised of one function, relevant_lines(), which is an iterator maker. You can see at the bottom of relevant_lines() that what it returns is a function, specifically an iterator function. relevant_lines() takes one argument, a table. I'll explain the reason for that a little later. That table contains all the information that defines what the iterator does and does not send to the generic for loop. The iterator's behavior is determined by the callback function called is_relevant() within the table. If there is no is_relevant() in the table, then relevant_lines() installs its own callback -- one that ignores all blank lines.

    The table can contain a second callback function called tweak(). Callback is_relevant() operates on every line in the file, whereas tweak() operates only on the lines that is_relevant() returns true on. With one callback operating on every line and another operating only on relevant lines, and an entire table within which to stash any variables, you can use those two callbacks to implement pretty much any break logic you desire. Personally I would do heavy duty break logic in the caller and use relevant_lines() only to screen out the most obviously unneeded lines, but the capability for complex break logic is there thanks to the single table argument design.

    The table argument is required to have only one element -- an element called file that contains either a filename for a file to open or a file handle already open for read. The first several lines of function relevant_lines() are devoted to determing whether it's a filename or a handle, and doing the right stuff to read from the file. Note that if it's a filename, relevant_lines() both opens and closes it, whereas if it's a handle relevant_lines() does neither. Observe also that it would be trivial to add a third alternative -- a table with numeric keys.

    The iterator returns the line number and the line contents. Note that the line number is the line number in the input file, so that any error messages can reference the actual line in the file, not the number of the relevant line.

    Using the Module

    In this subsection we'll build a program to exercise the relevant_lines() module in order to iterate through a file called test.txt containing the following:
    one
    two

    # three
    #four



    #five
    six

    seven
    As you can see, several lines are blank, either consisting only of a newline, or of a newline plus some spaces or tabs. Sime lines are commented out with the traditional Unix comment character #. By default relevant_lines() screens out blank lines but leaves commented lines intact.

    At the top of your exercise program appear the following four lines:
    #!/usr/bin/lua
    require("relevant_lines")
    sf = string.format
    local relevant_lines = relevant_lines.relevant_lines
    The first line is the standard Lua shebang indicating to run the code through Lua. The second line imports relevant_lines.lua, which as you remember you specially outfitted as a module with the module() command. The third line assigns function string_format to shorter named variable sf. This makes print lines much shorter. I always do this. The fourth line assigns function relevant_lines within module relevant_lines to local variable relevant_lines so you needn't write relevant_lines.relevant_lines() everywhere. In general be careful naming a local function the same name as the module's name, because once you do that, the module is no longer accessible. That's OK here because the module contains only one function and you already assigned it to the local.

    So, by the end of the fourth line you have the module's one function set to local variable relevant_lines and string.format set to local variable sf. Now you iterate. Let's take a look at the next few lines:
    tab = {file = "test.txt"}
    for k, v in relevant_lines(tab) do
    print(sf("k=%d, v=%s", tab.this_line_number, tab.this_line_text))
    end
    The preceding iterates through test.txt using the iterator's default is_relevant(), which passes all lines containing a nonblank character. The next few lines are slightly different:
    tab.this_line_number = 0
    tab.is_relevant = function() return true end
    print("=====================")

    for k, v in relevant_lines(tab) do
    print(sf("k=%d, v=%s", tab.this_line_number, tab.this_line_text))
    end
    The first line sets the table's line counter back to zero, because the last iteration had iterated the line counter. The second line defines an is_relevant() function that returns all lines, even blank ones. The third line prints a line to separate this iteration from the last one, and then the generic for loop prints all the lines.

    Now let's take a look at the next few lines:
    tab.this_line_number = 0
    tab.is_relevant = function()
    return string.match(tab.this_line_text, "%S") and
    not string.match(tab.this_line_text, "^%s*#")
    end
    tab.tweak = function() tab.this_line_text = string.upper(tab.this_line_text) end
    print("=====================")

    for k, v in relevant_lines(tab) do
    print(sf("k=%d, v=%s", tab.this_line_number, tab.this_line_text))
    end
    In the preceding code, once again the first line resets the line counter. The next four lines define a more complex is_relevant() function, specifically one that not only contains at least one printable character, but also does not start with spaces and then a pound sign. In other words, it passes everything but blank lines and comment lines (in Bash or Perl). The next line defines a tweak() function that upper cases the line, the next one draws a line to separate from previous iterations, and then the generic for loop returns the few nonblank, non-comment lines, after turning them upper case.

    The final iteration is a little bit different:
    tab.this_line_number = 1000
    tab.file = io.stdin

    print("=====================")

    for k, v in relevant_lines(tab) do
    print(sf("k=%d, v=%s", tab.this_line_number, tab.this_line_text))
    end
    The preceding code first set the beginning line number to 1000 so the first returned line would be number 1001. Then it set the file to stdin, so it works against text you type in or text that gets piped in. The is_relevant() and tweak() functions remain unchanged, so once again only nonblank non-Bash-comment lines come into the loop, and once again those lines are  upper case.

    The following is the output of the exercise program:
    slitt@mydesk:~$ ./testrel.lua
    k=1, v=one
    k=2, v=two
    k=4, v=# three
    k=5, v=#four
    k=9, v= #five
    k=10, v= six
    k=12, v=seven
    =====================
    k=1, v=one
    k=2, v=two
    k=3, v=
    k=4, v=# three
    k=5, v=#four
    k=6, v=
    k=7, v=
    k=8, v=
    k=9, v= #five
    k=10, v= six
    k=11, v=
    k=12, v=seven
    =====================
    k=1, v=ONE
    k=2, v=TWO
    k=10, v= SIX
    k=12, v=SEVEN
    =====================
    Line one
    k=1001, v=LINE ONE
    # comment line two
    #comment line three

    Line five
    k=1005, v= LINE FIVE
    slitt@mydesk:~$
    As expected, the first loop showed only nonblanks, the second showed all lines, the third showed only nonblank, non comments and capitalized them, and the fourth took text I typed in, and for nonblank, non-comment lines it printed them, capitalized.

    Conclusion

    Closures are functions within functions where the outer function's local variables are still in scope within the inner function. One, but certainly not the only, use of closures is in iterator maker functions that return an iterator function. You've seen several of those on this page.

    Some iterator makers, like ipairs() and pairs() iterate over tables. Some, such as relevant_lines() iterate over lines in a file. Some, like the Fibonacci iterator, iterate over a series of numbers and need a stopping point to be specified. Iterators can operate over other things too.

    On the surface it may appear that Lua's generic for loop is like Perl's foreach loop, but it just seems that way. Because you can make almost any kind of iterator in Lua, you can have a generic for loop iterate over almost anything.

    Closures are amazingly powerful, and trivially easy in Lua. Learn the various ways to use them, and use them often.


     [ Troubleshooters.com| Code Corner | Email Steve Litt ]

    Copyright (C) 2011 by Steve Litt --Legal