Troubleshooting Professional Magazine
Self-Documenting Code |
Let's talk about readability. I'm often brought in to help make the enhance or rewrite decision. The big factor -- readability. The client doesn't want to pay mega-hours just to decide which way to go. So a timely decision must be made, based on the existing code and the needed enhancements and bug fixes. If I can't quickly read the code well enough to understand the original programmer's design intent, I'll probably recommend rewrite and comment on the lack of that programmer's readability.
In other words, a program can be very well designed from the machine point of view, and even from the human point of view. But if it does not clearly display that design, its usefullness is severely curtailed.
This issue of Troubleshooting Professional Magazine is devoted to code readability. With an ever greater proportion of functionality implemented in software, we Troubleshooters deserve readable code. So kick back, relax, and enjoy this issue of Troubleshooting Professional. Learn my thoughts on readability, and enhance them with your own habits and discoveries. And remember, if you're a Troubleshooter, Technologist, or free software user, this is your magazine. Enjoy!
I was taught the new methods from the start. Structured programming. Use comments often. Use modern languages like C whose compilers don't care what line or column the code appears in. Our instructors laughed at indentation dependent languages like RPG. Those were the way of the spaghetti specialist, not young turks like us.
Our instructors were split on the subject of Cobol. Yes, it had (at least in those days) no local variables, and it was indentation dependent, but it was "self documenting". What could be clearer than "move employee record to print line."? The school's elite looked down their nose at Cobol, that sissy language. I wasn't so sure. Self-documentation sure had its advantages. I became a professional C programmer in 1984 -- one of SMC's first students to do so. And the more I did C, the more I yearned for readability. I couldn't even work on stuff I'd written six months ago. But that's getting ahead of the story.
And we rewrote, rewrote, rewrote. All that creeping crud made by the spaghetti swarm, we couldn't read it -- we replaced it. And because our design methodologies were much more scalable than spaghetti producing flow charts, we got the job done quickly. But somehow, our stuff wasn't much more readable than that of the flow chart fellows in the unemployment lines. My reaction: Comment even more. The late 80's and early 90's saw me producing programs that were 30% comments. I never saw the problem. After all, I was jumping around so much I didn't maintain my own code. That changed in the early 90's.
When you maintain commented code, you must change the code and the comments to match each other. Well actually, "must" is too strong a word. In hot deadline situations, many changed the code and left the comments alone. So when the next maintenance programmer came along, he was misled. Kludges followed kludges. Good software gone bad. I started commenting less.
Sure, we displaced the spaghetti guys in the marketplace, but not before they subtly indoctrinated us. That indoctrination -- write efficient code. Oh sure, we knew to trade some efficiency to substitute subroutine calls for gotos, but we didn't want too many levels of subroutine calls. There's speed and stack to consider.
I told the junior programmer he'd need to be less a artist and more productionist and efficiency expert if he wanted to make it in our shop. I soon moved on, never again seeing the junior programmer or his beautifully crafted code, which I (wrongly) associated with non-productivity. It would be years before I really appreciated what that programmer had done.
Hungarian was nice while I was in that shop, because everyone was using it. But I used it less and less after leaving that shop because it didn't deliver the promised readability. And you shouldn't use global vars (g) anyway. And the world was turning away from strongly typed languages...
I'd heard such great things about Python -- why had Guido Van Rossem given it this syntax straight out of the Disco era? But I wanted to see what all my friends were raving about, so I took a few hours coding Python. And after 3 hours I realized, "Man, that's readable!".
Remember all those brace placement feuds that made Windows/Linux look like a friendly chat? Gone! Remember that turkey maintenance programmer who screwed up the indentation, followed by yet another turkey maintenance programmer who believed the design to be represented by the indentation, moved the braces and messed up the program? Gone. With the "compiler" enforcing indentation, what you see is what you get. (and where do you want to go today?:-).
If some foolish maintenance programmer messes up your Python indentation, chances are it will immediately error out, giving a line number. But even if he's unlucky enough to have exdented the final subordinate statement, the wrong answers will quickly let him know there's a problem, and he'll go back to the last area he worked on and find it. And if you're *really* concerned that someone will delete an indent and create a hard to find problem, a single comment saying
# end of if x < 1will alert all but the densest maintainers to what's going on. And if you *ULTIMATELY* want to enforce the intended indentation, have the final statement be a call to thisIsTheEndOfTheBlock(), a do-nothing routine taking one argument -- an argument declared in the block that will abort with a NameError out if not earlier declared within the block :-)
Just like the beautiful program the junior programmer showed me so many years before. The one I criticized him for. The one where the guy had subroutines just to execute one statement. Nahh, I couldn't advocate that, could I?
Imagine a program taking a person's first and last names as the second and third command line arguments. Which statement is easier reading?
print sys.argv[3]or
print getLastname()Of course, to use the latter there must be code like this somewhere:
def getLastname(): return sys.argv[3]But by the time you get to that point, the source of the last name is a mere implementation detail. The point is, the programmer has documented his design intent.
But at what cost? Golly gee, the machine cycles, the stack!
If you're using Perl or Python, chances are performance isn't your priority anyway, so who cares. Performance people like C and C++. And in those languages you can make getLastname() a macro. Have your cake and eat it too. It compiles to argv[3], but the source reads getLastname().
This really starts gaining power when using objects to represent the state of a thing:
print employee.getLastname(), employee.getEmployeeNumber() print lawyer.getLastname(), lawyer.getPhone(), lawyer.getDegree() print defendent.getLastname(), defendent.getLastConvictionDate()
A design methodology!
That's right. If you order our compiler enforced self documentation methodology, we'll throw in, absolutely free, a design methodology. Want a simple example?
We all know you should never sit down and just start typing a program. Right? You diagram it, or use an outline processor, or use case tools, or whatever. But what if your code looks like English:
#!/usr/bin/python def main(): (age,firstname,lastname,gender) = getCommandLineArguments() if correctDemographics(age,gender): sendSalesLetter(age,firstname,lastname,gender) def getCommandLineArguments(): return (eval(sys.argv[1]), sys.argv[2], sys.argv[3], sys.argv[4]) def correctDemographics(age, gender): if 20 <= age < 30 and gender == "male": return(1) else: return(0) def sendSalesLetter(age,firstname,lastname,gender): printLetterhead() printSalutation(gender, lastname) printBody(gender, firstname, lastname)Of course you need to code the three routines called in sendSalesLetter, and at the very bottom you need a call to main(). But you get the picture. To a certain extent you can type in your design document, which has English like readability. The design doc is also your code. But wait. What about OOP? Same thing. Each object is an actor in a script, so name all the leading men and ladies in the main routine. Name them descriptive names -- job titles if you will. Each of their method names exactly describes a single behavior. Then the main routine becomes a script narrating how they all interact with each other. Hardly any syntax -- anyone can understand it, and everyone knows the design intent. Supporting actors might be similarly declared in the bodies or methods of the main actors, and likewise their interactions are narrated, and so forth.
Let's look at Cobol. It's self-documenting at a computer instruction level, but not *necessarily* at a conceptual level. That wordiness:
move employee-record to printlineis really not a whole lot more descriptive, for somebody knowing both languages, than
memcpy(printline, employee_record, sizeof(employee_record)); printline[sizeof(employee_record)] = '\0';They both describe what's occuring on an instruction level, but say nothing about what's happening on a purpose level. I've seen many Cobol programs comprised of a bunch of highly readable instructions, which when taken as a whole, seem meaningless.
Orthodox use of Hungarian merely guarantees that wrong types won't be assigned or passed, but does nothing toward revealing the underlying design.
Gracie Slick is an old lady, Culture Club is a speck on music history's dustheap, and text format source code is now catching its second wind.
I still author the mag as a single web page. But when I'm all done, I can run Perl program splitmag.pl on the single file to create the individual ones, create the table of contents, and create all the forward and back links. I simply need to make sure that every article starts with an <H1> style title preceded by an anchor beginning with an underscore. The script takes care of everything else.
I wipped out this script in about 12 hours. I never intended it to be
a showpiece. But I think it's a (small sized) example of the self-documentation
methodologies I've previously discussed:
#!/usr/bin/perl -w # by Steve Litt. Public domain. First published 7/26/1999. # NO WARRANTEE! use strict; sub read_source_file { my($fname) = ""; $fname = $_[0]; open(MYINPUTFILE, "<" . $fname) or die "Could not open " . $fname; my(@lines) = <MYINPUTFILE>; close(MYINPUTFILE); return(@lines); } sub make_contents_string { my(@tags) = @{$_[0]}; my(@titles) = @{$_[1]}; my($contents) = "<center><b><font size=+3>CONTENTS</font></b></center><p><b><ul>\n"; my($title); my($tag); foreach $tag (@tags) { $title = shift(@titles); if($tag =~ m/^_/) { $contents = $contents . "<li><a href=\"$tag.htm\">" . $title ."</li>\n"; } } $contents = $contents . "</ul></b><p>\n"; return($contents); } sub get_issue_title { my($page) = $_[0]; $page =~ m/<title>(.*?)<\/title>/si; my($rtrn) = $1; return($rtrn); } sub fill_article_lists { my($oldpage) = $_[0]; my(@tags, @titles, @starts, @lengths); # LOCAL VERSIONS OF PASSED BACK ARGS #### LOOP CONTROL AND STATE VARS #### my($prevStart) = 0; my($ss) = 0; my($start); #### PRIMING TAG PUSH, NOTE NO @lengths PRIME, IT HAPPENS AT BOTTOM #### push(@tags, "%startdoc"); push(@titles, "STARTDOC"); push(@starts, 0); #### FILL THE ARTICLE LISTS #### while($oldpage =~ m/(<h1>[\n\w]*?<a NAME=")([_%].+?)("><\/a>)(.+?)(<\/h1>)/sg) { if(!defined($1)) {last;} $start = pos($oldpage) - length($1) - length($2) - length($3) - length($4) - length($5); push(@tags, $2); push(@titles, $4); push(@starts, $start); push(@lengths, $start - $prevStart); $prevStart = $start; $ss++; } #### FINISH LAST ARTCILE'S LENGTH #### $oldpage =~ m/<\/body>/sig; $start = pos($oldpage); push(@lengths, $start - $prevStart); #### ADD LIST MEMBERS FOR TRAILING <\BODY><\HTML> #### push(@tags, "%enddoc"); push(@titles, "ENDDOC"); push(@starts, $start); push(@lengths, length($oldpage) + 1 - $start); #### RETURN LISTS THRU ARGS #### @{$_[1]} = @tags; @{$_[2]} = @titles; @{$_[3]} = @starts; @{$_[4]} = @lengths; } sub hdr_string { my($title, $issuetitle) = @_; return( "<!doctype html public \"-//w3c//dtd html 4.0 transitional//en\">\n" . "<html>\n" . "<head>\n" . " <meta http-equiv=\"Content-Type\" content=\"text/html; charset=iso-8859-1\">\n" . " <title>$title</title>\n" . "</head>\n" . "<body text=\"#000000\" bgcolor=\"#FFFFFF\" link=\"#0000EE\" vlink=\"#551A8B\" alink=\"#FF0000\">\n" . "\n" . "<center><font size=+2><b><a href=\"../../troubleshooters.htm\">Troubleshooters.Com</a> Presents</b></font></center><p>\n" . "<center><font size=+1><b><a href=\"index.htm\">$issuetitle</a></b></font>\n" . "<b><font size=-2></font></b>\n" . "<p><b><font size=-2>Copyright (C) 1999 by Steve Litt. All rights reserved.\n" . "Materials from guest authors copyrighted by them and licensed for perpetual\n" . "use to Troubleshooting Professional Magazine. All rights reserved to the\n" . "copyright holder, except for items specifically marked otherwise (certain\n" . "free software source code, GNU/GPL, etc.). All material herein provided\n" . "\"As-Is\". User assumes all risk and responsibility for any outcome.</font></b></center>\n" . "<p>\n" ); } sub write_article { my($article) = $_[0]; my($ss) = $_[1]; my($issueTitle) = $_[2]; my($tag) = $_[3]; my(@tags) = @{$_[4]}; my(@titles) = @{$_[5]}; open(OUF, ">" . $tag . ".htm") or die "Can not write file for $tag"; print OUF &hdr_string($titles[$ss], $issueTitle); ##### CREATE TOP OF PAGE PREVIOUS AND NEXT POINTERS ##### print OUF "<center><table BORDER COLS=1 WIDTH=\"100%\" BGCOLOR=\"#FFCCCC\" ><tr><td><center><b><font size=-1>\n"; if($ss <= 0) { print OUF "<--"; } elsif(!($tags[$ss-1] =~ m/^_/)) { print OUF "<--"; } else { print OUF "<a href=\"$tags[$ss-1].htm\"><--$titles[$ss-1]</a>"; } print OUF " | "; print OUF "<a href=\"./index.htm\">Contents</a>"; print OUF " | "; if($ss >= scalar(@tags)) { print OUF "-->"; } elsif(!($tags[$ss+1] =~ m/^_/)) { print OUF "-->"; } else { print OUF "<a href=\"$tags[$ss+1].htm\">$titles[$ss+1]--></a>"; } print OUF "</font></b></center></td></tr></table></center>\n"; ##### WRITE ARTICLE ITSELF ##### print OUF "$article\n"; print OUF "<hr WIDTH=\"100%\"><b><ul>\n"; ##### CREATE BOTTOM OF PAGE NEXT, PREVIOUS AND CONTENTS POINTERS ##### if($ss >= scalar(@tags)) { print OUF "<li>Next article:</li>\n"; } elsif(!($tags[$ss+1] =~ m/^_/)) { print OUF "<li>Next article:</li>\n"; } else { print OUF "<li><a href=\"$tags[$ss+1].htm\">Next article: $titles[$ss+1]</a></li>\n"; } if($ss <= 0) { print OUF "<li>Previous article:</li>\n"; } elsif(!($tags[$ss-1] =~ m/^_/)) { print OUF "<li>Previous article:</li>\n"; } else { print OUF "<li><a href=\"$tags[$ss-1].htm\">Previous article: $titles[$ss-1]</a></li>\n"; } print OUF "<li><a href=\"index.htm\">Magazine Contents</a></li>\n"; print OUF "<li><font size=+1><a href=\"../../troubleshooters.htm\">Troubleshooters.Com</a></font></li>\n"; print OUF "</ul></b>"; print OUF "<\/body><\/html>\n"; close(OUF); } sub writem { my($page) = $_[0]; my($infname) = $_[1]; my($issueTitle) = $_[2]; my(@tags) = @{$_[3]}; my(@titles) = @{$_[4]}; my(@starts) = @{$_[5]}; my(@lengths) = @{$_[6]}; my($ss) = 0; open(MAIN, ">index.htm") or die "Cannot open index.htm"; my($tag); foreach $tag (@tags) { my($article) = substr($page,$starts[$ss],$lengths[$ss]); if($tag =~ m/^_/) { write_article( $article, $ss, $issueTitle, $tag, \@tags, \@titles ); } elsif($tag eq "%startdoc") { print MAIN $article; } elsif($tag eq "%contents") { print MAIN &make_contents_string(\@tags, \@titles); } elsif($tag eq "%enddoc") { print MAIN "</body></html>\n"; } else { die "Cannot print undefined tag."; } $ss = $ss + 1; } } sub main() { my($srcfile) = $ARGV[0]; my(@tags, @titles, @starts, @lengths); my(@sourcelines) = &read_source_file($srcfile); if(!defined(@sourcelines)){die "Must have source file as single argument";} my($sourcepage) = join("", @sourcelines); &fill_article_lists($sourcepage, \@tags, \@titles, \@starts, \@lengths); my($issueTitle) = &get_issue_title($sourcepage); &writem( $sourcepage, $srcfile, $issueTitle, \@tags, \@titles, \@starts, \@lengths ); } &main(); |
Let's examine splitmag.pl. The first and fourth lines of main() document the command syntax. Basically the main routine says read the source file and mold into a string called $sourcepage. Fill the four article lists, get the issue title, then write the article pages. Subroutine writem() should have been called write_web_pages(). All in all a fine, self documenting main routine.
Routines read_source_file(), get_issue_title() and fill_article_lists() are pretty obvious (though the latter needs a few comments for complete readability). Poorly named writem() starts out readable, identifying its arguments and an output file, but gets a little hairy in the foreach $tag (@tags) loop. But it's not too bad, and at least identifies @tags as the controlling entity. writem() calls write_article() with six arguments.
All pretense of readability is dropped in write_article(). It does an admirable job identifying the arguments and opening the output file, but then descends into a conglomeration of code, which comments tell us perform three sequential tasks: 1)Create page top pointers, 2)Create the page itself, 3)Create page bottom pointers. Note that #2 is trivial. What I should have done is had #1 and #3 be subroutine calls print_page_top() and print_page_bottom() respectively. The problem is I didn't want to pass all those darn arguments again, and all those lists and subscripts were getting confusing.
In retrospect, of course, the solution was to encapsulate lists @tags, @titles, @starts, @lengths and subscript $ss into an object with a name like $articles. $articles might have methods like next(), previous(), tag(), title(), start(), length(), getSubscript(), setSubscript(). The constructor would take the main routine's $sourcepage as an argument and basically do the same parsing as the existing fill_article_lists(). Such a construction would have cut arguments passed to routines, and encouraged further functional decomposition, resulting in vastly improved readability approaching true self-documentation.
Everybody knows all real app dev is done with Delphi, Clarion, VB, Powerbuilder, JBuilder, Visual J++, PowerJ, Visual Fox and the like. These days we must have rapid development. Only the backward looking dinosaurs use text source code. These tips would have been useful in 1990, but they're a day late and a dollar short.
Well, except that they can be used in drag n drop RAD. Most of these RAD environments force you to design your program objects around screens. Ugh. (And thank you Powerbuilder and Clarion for not being like that). But even so, considering that every screen *should* have some function, correct naming of the screens and limitations of functionality will produce (drum roll please) self documenting code. And even in the most RADical RAD, you can design a complex object around a "hello world" style screen. So if you're a good designer, the tips outlined earlier in this Troubleshooting Professional Magazine issue produce self documenting RAD.
And then there's the fact that the fundamental premise of this article might be false. Is source code dead? Was Linux written in VB? Or even VC++ with MFC? Or even C++? Nope, C. C for the kernal, C for enhancements, C for drivers. But that's all systems programming. You can't do apps in code, right?
Try tcl/tk -- compare its productivity to VB. Don't like tcl's unusual syntax? Program in Python or Perl, and use their tk modules for screens. RAD code is not an oxymoron.
But certainly as Linux matures drag and drop environments will displace code. Right? Well, that's really two questions. Will drag and drop environments appear? Certainly. With free Python, Perl and GNU C, the d&d environment is a proprietorists last bastion of price gouging. Will d&d displace code? The Linux marketplace has a proven track record of knowing when the king has no clothes. Only time will tell.
And all of this may be a moot argument. Because Linux, with its no-hassle licensing, modularity, and wealth of utilities, just may tip the scales toward the software-manufacturing model the deified but not achieved the last fifteen years. Read on...
His Latin Lover looks showed through the gas mask worn as protection from paint fumes. He knew maybe 200 words of English. Working with spray paint, stencils and rags on the sidewalk of Port Hueneme's tourist area, he painted the most beautiful ocean and sky scenes imaginable. Every one custom made for the customer. In 15 minutes. For $20.00. I saw it, I timed it. Every 15 minutes, another $20.00 check. $80.00/hr.
The women in the 100+ person crowd surrounding The Artist raved about the beautiful paintings, and how much they wanted one. The men, every one of whom had done the same timings and calculations as I, wondered aloud if they had chosen the right career. Being that wild and crazy process-oriented animal I am, I vowed to generalize his techniques to help me with my programming and tech-writing.
I watched him for hours. Bought one of his paintings. Dropped all other thoughts, and pondered what I'd seen. By nightfall I'd synthasized his production methodology down to goal, tools and riffs. Within a month I was using goal, tools and riffs to double my tech-writing productivity. Programming took much longer, and I'll talk more on that later in this article. But right now, let's talk about goal, tools and riffs.
Riff is a musical term for a sequence of notes so familiar to the musician's fingers as to be a reflex. That's part of the reason musicians play scales -- riffs come from scales. Go into your old record collection (or your parents or grandparents) and pull out some Led Zeppelin records. Zep took a riff, beat it to death, and made a killer song. Listen and see if you don't agree. If you have no Led Zeppelin, listen to any Heavy Metal. Most Metal is riff based to a degree.
Not musically inclined? No problem. Do you touch type? If so, I bet you don't type the word "the" as three letters, do you? It's a quick reflex finger movement. A riff.
The Artists riffs were richer and used varying tools. Need a school of multi-colored fish? Whip out a fish school stencil and can of spray paint to make the bodies. Center another stencil and grab a different color spray paint to paint the fishes highlights. Elapsed time -- 10 seconds.
Need the sun? Spray a large blob of yellow. Place an empty spray can upright on the yellow, and spray sky color all around. Lift the can, and smear with a rag to get the sun's radience. Reflex. Elapsed time -- 20 seconds.
Need a comet? Spray white into a round stencil, lift and smear upwards with a rag. Elapsed time -- 5 seconds.
He's used these riffs thousands of times. From these limited stock riffs he creates thousands of seemingly custom paintings.
The Artist created individual paintings. He didn't have to match an existing decor. The customer didn't tell him what tools he had to use. The customer didn't make The Artist paint in a way that could be easily modified. The Artist had freedoms the application developer just doesn't have.
Let's face it. Very few of us automate a business from scratch. They have a DBMS, so we have to hit that DBMS. They have an OS, so we have to be compatible with that OS. They have an existing software inventory written in language X, and programmers familiar with language X, so we have to write in language X. Of course using framework Y. Tens of languages, each with tens of frameworks. As we move from job to job, task to task, contract to contract, our knowledge of tools becomes broad, but seldom deep.
And speaking of Python, it comes with modules for html, xml, ftp, math, strings, database, complex numbers for you engineering types, system calls. And of course the Tkinter interface to tcl/tk, so you can build nice little gui apps with Python. Web app? No problem -- use Python-centric Zope or quick and simple PHP. The technologist fully conversant with these tools can punch out a sophisticated custom app in a day. An app built from pre-built, intensively tested, fully modularized (unlike most "guis" and "frameworks) components.
Perhaps most promising, open source vertical market components are appearing. There is now an open source project to develop medical practice management software. Can litigation support, retail and hospitality software be far behind?
He who forgets history is bound to repeat it. There have been more than ten complete economic cycles this century, and nothing's happened to change that trend. But the number of developers has doubled the past few years. And when the downturn comes, the newly minted VB developers, Java jockeys, Visual C++ gurus, Crystal Report savants and Y2K mechanics won't simply walk into the sunset. They'll compete. A brutal game of musical chairs, with large numbers of programmers indefinitely kicked out of the glass house.
Most dispossessed developers will become "starving artists". But a few will follow the footsteps of that long ago street artist. They'll look at the startup small businesses, so long denied automation due to bid-up developer costs, as a new market. Even in a recession, a healthy small business can afford $640 for a custom app. And since the programmer builds it from tried and true open source components, it takes a day. That's $80/hr, in a recession, with their less enterprising brethren on the unemployment rolls.
But it gets better. "The Enterprise", no longer able to afford the Microsoft tax or the $1500/seat development app costs, or the training costs of ever changing and buggy development environments, will take note of the small business successes. The goal/tools/riffs open source pioneers will be invited into the department, then the enterprise. And that's how Linux will finally win.
The 1980's PC revolution is an exact precident! Sick to death of mainframe and mini vendors and their own glass house holding them over a barrel, departments hired PC pioneers to quickly implement software solutions for the department. As soon as the glass house gang had superior development competition, self preservation dictated they hire PC programmers. A skeptic might say the 80's revolution was about a new hardware platform. Not really. It was about price, reliability, development speed, training cost, and about a widespread uniform easily programmable operating system.
All submissions become the property of the publisher (Steve Litt), unless other arrangements are previously made in writing. We do not currently pay for articles. Troubleshooters.Com reserves the right to edit any submission for clarity or brevity. Any published article will include a two sentence description of the author, a hypertext link to his or her email, and a phone number if desired. Upon request, we will include a hypertext link, at the end of the magazine issue, to the author's website, providing that website meets the Troubleshooters.Com criteria for links and that the author's website first links to Troubleshooters.Com.
Submissions should be emailed to Steve Litt's email address, with subject line Article Submission. The first paragraph of your message should read as follows (unless other arrangements are previously made in writing):