Troubleshooting Professional Magazine
Self-Documenting Code |
Let's talk about readability. I'm often brought in to help make the enhance or rewrite decision. The big factor -- readability. The client doesn't want to pay mega-hours just to decide which way to go. So a timely decision must be made, based on the existing code and the needed enhancements and bug fixes. If I can't quickly read the code well enough to understand the original programmer's design intent, I'll probably recommend rewrite and comment on the lack of that programmer's readability.
In other words, a program can be very well designed from the machine point of view, and even from the human point of view. But if it does not clearly display that design, its usefullness is severely curtailed.
This issue of Troubleshooting Professional Magazine is devoted to code readability. With an ever greater proportion of functionality implemented in software, we Troubleshooters deserve readable code. So kick back, relax, and enjoy this issue of Troubleshooting Professional. Learn my thoughts on readability, and enhance them with your own habits and discoveries. And remember, if you're a Troubleshooter, Technologist, or free software user, this is your magazine. Enjoy!
I was taught the new methods from the start. Structured programming. Use comments often. Use modern languages like C whose compilers don't care what line or column the code appears in. Our instructors laughed at indentation dependent languages like RPG. Those were the way of the spaghetti specialist, not young turks like us.
Our instructors were split on the subject of Cobol. Yes, it had (at least in those days) no local variables, and it was indentation dependent, but it was "self documenting". What could be clearer than "move employee record to print line."? The school's elite looked down their nose at Cobol, that sissy language. I wasn't so sure. Self-documentation sure had its advantages. I became a professional C programmer in 1984 -- one of SMC's first students to do so. And the more I did C, the more I yearned for readability. I couldn't even work on stuff I'd written six months ago. But that's getting ahead of the story.
And we rewrote, rewrote, rewrote. All that creeping crud made by the spaghetti swarm, we couldn't read it -- we replaced it. And because our design methodologies were much more scalable than spaghetti producing flow charts, we got the job done quickly. But somehow, our stuff wasn't much more readable than that of the flow chart fellows in the unemployment lines. My reaction: Comment even more. The late 80's and early 90's saw me producing programs that were 30% comments. I never saw the problem. After all, I was jumping around so much I didn't maintain my own code. That changed in the early 90's.
When you maintain commented code, you must change the code and the comments to match each other. Well actually, "must" is too strong a word. In hot deadline situations, many changed the code and left the comments alone. So when the next maintenance programmer came along, he was misled. Kludges followed kludges. Good software gone bad. I started commenting less.
Sure, we displaced the spaghetti guys in the marketplace, but not before they subtly indoctrinated us. That indoctrination -- write efficient code. Oh sure, we knew to trade some efficiency to substitute subroutine calls for gotos, but we didn't want too many levels of subroutine calls. There's speed and stack to consider.
I told the junior programmer he'd need to be less a artist and more productionist and efficiency expert if he wanted to make it in our shop. I soon moved on, never again seeing the junior programmer or his beautifully crafted code, which I (wrongly) associated with non-productivity. It would be years before I really appreciated what that programmer had done.
Hungarian was nice while I was in that shop, because everyone was using it. But I used it less and less after leaving that shop because it didn't deliver the promised readability. And you shouldn't use global vars (g) anyway. And the world was turning away from strongly typed languages...
I'd heard such great things about Python -- why had Guido Van Rossem given it this syntax straight out of the Disco era? But I wanted to see what all my friends were raving about, so I took a few hours coding Python. And after 3 hours I realized, "Man, that's readable!".
Remember all those brace placement feuds that made Windows/Linux look like a friendly chat? Gone! Remember that turkey maintenance programmer who screwed up the indentation, followed by yet another turkey maintenance programmer who believed the design to be represented by the indentation, moved the braces and messed up the program? Gone. With the "compiler" enforcing indentation, what you see is what you get. (and where do you want to go today?:-).
If some foolish maintenance programmer messes up your Python indentation, chances are it will immediately error out, giving a line number. But even if he's unlucky enough to have exdented the final subordinate statement, the wrong answers will quickly let him know there's a problem, and he'll go back to the last area he worked on and find it. And if you're *really* concerned that someone will delete an indent and create a hard to find problem, a single comment saying
# end of if x < 1will alert all but the densest maintainers to what's going on. And if you *ULTIMATELY* want to enforce the intended indentation, have the final statement be a call to thisIsTheEndOfTheBlock(), a do-nothing routine taking one argument -- an argument declared in the block that will abort with a NameError out if not earlier declared within the block :-)
Just like the beautiful program the junior programmer showed me so many years before. The one I criticized him for. The one where the guy had subroutines just to execute one statement. Nahh, I couldn't advocate that, could I?
Imagine a program taking a person's first and last names as the second and third command line arguments. Which statement is easier reading?
print sys.argv[3]or
print getLastname()Of course, to use the latter there must be code like this somewhere:
def getLastname(): return sys.argv[3]But by the time you get to that point, the source of the last name is a mere implementation detail. The point is, the programmer has documented his design intent.
But at what cost? Golly gee, the machine cycles, the stack!
If you're using Perl or Python, chances are performance isn't your priority anyway, so who cares. Performance people like C and C++. And in those languages you can make getLastname() a macro. Have your cake and eat it too. It compiles to argv[3], but the source reads getLastname().
This really starts gaining power when using objects to represent the state of a thing:
print employee.getLastname(), employee.getEmployeeNumber() print lawyer.getLastname(), lawyer.getPhone(), lawyer.getDegree() print defendent.getLastname(), defendent.getLastConvictionDate()
A design methodology!
That's right. If you order our compiler enforced self documentation methodology, we'll throw in, absolutely free, a design methodology. Want a simple example?
We all know you should never sit down and just start typing a program. Right? You diagram it, or use an outline processor, or use case tools, or whatever. But what if your code looks like English:
#!/usr/bin/python def main(): (age,firstname,lastname,gender) = getCommandLineArguments() if correctDemographics(age,gender): sendSalesLetter(age,firstname,lastname,gender) def getCommandLineArguments(): return (eval(sys.argv[1]), sys.argv[2], sys.argv[3], sys.argv[4]) def correctDemographics(age, gender): if 20 <= age < 30 and gender == "male": return(1) else: return(0) def sendSalesLetter(age,firstname,lastname,gender): printLetterhead() printSalutation(gender, lastname) printBody(gender, firstname, lastname)Of course you need to code the three routines called in sendSalesLetter, and at the very bottom you need a call to main(). But you get the picture. To a certain extent you can type in your design document, which has English like readability. The design doc is also your code. But wait. What about OOP? Same thing. Each object is an actor in a script, so name all the leading men and ladies in the main routine. Name them descriptive names -- job titles if you will. Each of their method names exactly describes a single behavior. Then the main routine becomes a script narrating how they all interact with each other. Hardly any syntax -- anyone can understand it, and everyone knows the design intent. Supporting actors might be similarly declared in the bodies or methods of the main actors, and likewise their interactions are narrated, and so forth.
Let's look at Cobol. It's self-documenting at a computer instruction level, but not *necessarily* at a conceptual level. That wordiness:
move employee-record to printlineis really not a whole lot more descriptive, for somebody knowing both languages, than
memcpy(printline, employee_record, sizeof(employee_record)); printline[sizeof(employee_record)] = '\0';They both describe what's occuring on an instruction level, but say nothing about what's happening on a purpose level. I've seen many Cobol programs comprised of a bunch of highly readable instructions, which when taken as a whole, seem meaningless.
Orthodox use of Hungarian merely guarantees that wrong types won't be assigned or passed, but does nothing toward revealing the underlying design.
Gracie Slick is an old lady, Culture Club is a speck on music history's dustheap, and text format source code is now catching its second wind.
I still author the mag as a single web page. But when I'm all done, I can run Perl program splitmag.pl on the single file to create the individual ones, create the table of contents, and create all the forward and back links. I simply need to make sure that every article starts with an <H1> style title preceded by an anchor beginning with an underscore. The script takes care of everything else.
I wipped out this script in about 12 hours. I never intended it to be
a showpiece. But I think it's a (small sized) example of the self-documentation
methodologies I've previously discussed:
#!/usr/bin/perl -w # by Steve Litt. Public domain. First published 7/26/1999. # NO WARRANTEE! use strict; sub read_source_file { my($fname) = ""; $fname = $_[0]; open(MYINPUTFILE, "<" . $fname) or die "Could not open " . $fname; my(@lines) = <MYINPUTFILE>; close(MYINPUTFILE); return(@lines); } sub make_contents_string { my(@tags) = @{$_[0]}; my(@titles) = @{$_[1]}; my($contents) = "<center><b><font size=+3>CONTENTS</font></b></center><p><b><ul>\n"; my($title); my($tag); foreach $tag (@tags) { $title = shift(@titles); if($tag =~ m/^_/) { $contents = $contents . "<li><a href=\"$tag.htm\">" . $title ."</li>\n"; } } $contents = $contents . "</ul></b><p>\n"; return($contents); } sub get_issue_title { my($page) = $_[0]; $page =~ m/<title>(.*?)<\/title>/si; my($rtrn) = $1; return($rtrn); } sub fill_article_lists { my($oldpage) = $_[0]; my(@tags, @titles, @starts, @lengths); # LOCAL VERSIONS OF PASSED BACK ARGS #### LOOP CONTROL AND STATE VARS #### my($prevStart) = 0; my($ss) = 0; my($start); #### PRIMING TAG PUSH, NOTE NO @lengths PRIME, IT HAPPENS AT BOTTOM #### push(@tags, "%startdoc"); push(@titles, "STARTDOC"); push(@starts, 0); #### FILL THE ARTICLE LISTS #### while($oldpage =~ m/(<h1>[\n\w]*?<a NAME=")([_%].+?)("><\/a>)(.+?)(<\/h1>)/sg) { if(!defined($1)) {last;} $start = pos($oldpage) - length($1) - length($2) - length($3) - length($4) - length($5); push(@tags, $2); push(@titles, $4); push(@starts, $start); push(@lengths, $start - $prevStart); $prevStart = $start; $ss++; } #### FINISH LAST ARTCILE'S LENGTH #### $oldpage =~ m/<\/body>/sig; $start = pos($oldpage); push(@lengths, $start - $prevStart); #### ADD LIST MEMBERS FOR TRAILING <\BODY><\HTML> #### push(@tags, "%enddoc"); push(@titles, "ENDDOC"); push(@starts, $start); push(@lengths, length($oldpage) + 1 - $start); #### RETURN LISTS THRU ARGS #### @{$_[1]} = @tags; @{$_[2]} = @titles; @{$_[3]} = @starts; @{$_[4]} = @lengths; } sub hdr_string { my($title, $issuetitle) = @_; return( "<!doctype html public \"-//w3c//dtd html 4.0 transitional//en\">\n" . "<html>\n" . "<head>\n" . " <meta http-equiv=\"Content-Type\" content=\"text/html; charset=iso-8859-1\">\n" . " <title>$title</title>\n" . "</head>\n" . "<body text=\"#000000\" bgcolor=\"#FFFFFF\" link=\"#0000EE\" vlink=\"#551A8B\" alink=\"#FF0000\">\n" . "\n" . "<center><font size=+2><b><a href=\"../../troubleshooters.htm\">Troubleshooters.Com</a> Presents</b></font></center><p>\n" . "<center><font size=+1><b><a href=\"index.htm\">$issuetitle</a></b></font>\n" . "<b><font size=-2></font></b>\n" . "<p><b><font size=-2>Copyright (C) 1999 by Steve Litt. All rights reserved.\n" . "Materials from guest authors copyrighted by them and licensed for perpetual\n" . "use to Troubleshooting Professional Magazine. All rights reserved to the\n" . "copyright holder, except for items specifically marked otherwise (certain\n" . "free software source code, GNU/GPL, etc.). All material herein provided\n" . "\"As-Is\". User assumes all risk and responsibility for any outcome.</font></b></center>\n" . "<p>\n" ); } sub write_article { my($article) = $_[0]; my($ss) = $_[1]; my($issueTitle) = $_[2]; my($tag) = $_[3]; my(@tags) = @{$_[4]}; my(@titles) = @{$_[5]}; open(OUF, ">" . $tag . ".htm") or die "Can not write file for $tag"; print OUF &hdr_string($titles[$ss], $issueTitle); ##### CREATE TOP OF PAGE PREVIOUS AND NEXT POINTERS ##### print OUF "<center><table BORDER COLS=1 WIDTH=\"100%\" BGCOLOR=\"#FFCCCC\" ><tr><td><center><b><font size=-1>\n"; if($ss <= 0) { print OUF "<--"; } elsif(!($tags[$ss-1] =~ m/^_/)) { print OUF "<--"; } else { print OUF "<a href=\"$tags[$ss-1].htm\"><--$titles[$ss-1]</a>"; } print OUF " | "; print OUF "<a href=\"./index.htm\">Contents</a>"; print OUF " | "; if($ss >= scalar(@tags)) { print OUF "-->"; } elsif(!($tags[$ss+1] =~ m/^_/)) { print OUF "-->"; } else { print OUF "<a href=\"$tags[$ss+1].htm\">$titles[$ss+1]--></a>"; } print OUF "</font></b></center></td></tr></table></center>\n"; ##### WRITE ARTICLE ITSELF ##### print OUF "$article\n"; print OUF "<hr WIDTH=\"100%\"><b><ul>\n"; ##### CREATE BOTTOM OF PAGE NEXT, PREVIOUS AND CONTENTS POINTERS ##### if($ss >= scalar(@tags)) { print OUF "<li>Next article:</li>\n"; } elsif(!($tags[$ss+1] =~ m/^_/)) { print OUF "<li>Next article:</li>\n"; } else { print OUF "<li><a href=\"$tags[$ss+1].htm\">Next article: $titles[$ss+1]</a></li>\n"; } if($ss <= 0) { print OUF "<li>Previous article:</li>\n"; } elsif(!($tags[$ss-1] =~ m/^_/)) { print OUF "<li>Previous article:</li>\n"; } else { print OUF "<li><a href=\"$tags[$ss-1].htm\">Previous article: $titles[$ss-1]</a></li>\n"; } print OUF "<li><a href=\"index.htm\">Magazine Contents</a></li>\n"; print OUF "<li><font size=+1><a href=\"../../troubleshooters.htm\">Troubleshooters.Com</a></font></li>\n"; print OUF "</ul></b>"; print OUF "<\/body><\/html>\n"; close(OUF); } sub writem { my($page) = $_[0]; my($infname) = $_[1]; my($issueTitle) = $_[2]; my(@tags) = @{$_[3]}; my(@titles) = @{$_[4]}; my(@starts) = @{$_[5]}; my(@lengths) = @{$_[6]}; my($ss) = 0; open(MAIN, ">index.htm") or die "Cannot open index.htm"; my($tag); foreach $tag (@tags) { my($article) = substr($page,$starts[$ss],$lengths[$ss]); if($tag =~ m/^_/) { write_article( $article, $ss, $issueTitle, $tag, \@tags, \@titles ); } elsif($tag eq "%startdoc") { print MAIN $article; } elsif($tag eq "%contents") { print MAIN &make_contents_string(\@tags, \@titles); } elsif($tag eq "%enddoc") { print MAIN "</body></html>\n"; } else { die "Cannot print undefined tag."; } $ss = $ss + 1; } } sub main() { my($srcfile) = $ARGV[0]; my(@tags, @titles, @starts, @lengths); my(@sourcelines) = &read_source_file($srcfile); if(!defined(@sourcelines)){die "Must have source file as single argument";} my($sourcepage) = join("", @sourcelines); &fill_article_lists($sourcepage, \@tags, \@titles, \@starts, \@lengths); my($issueTitle) = &get_issue_title($sourcepage); &writem( $sourcepage, $srcfile, $issueTitle, \@tags, \@titles, \@starts, \@lengths ); } &main(); |
Let's examine splitmag.pl. The first and fourth lines of main() document the command syntax. Basically the main routine says read the source file and mold into a string called $sourcepage. Fill the four article lists, get the issue title, then write the article pages. Subroutine writem() should have been called write_web_pages(). All in all a fine, self documenting main routine.
Routines read_source_file(), get_issue_title() and fill_article_lists() are pretty obvious (though the latter needs a few comments for complete readability). Poorly named writem() starts out readable, identifying its arguments and an output file, but gets a little hairy in the foreach $tag (@tags) loop. But it's not too bad, and at least identifies @tags as the controlling entity. writem() calls write_article() with six arguments.
All pretense of readability is dropped in write_article(). It does an admirable job identifying the arguments and opening the output file, but then descends into a conglomeration of code, which comments tell us perform three sequential tasks: 1)Create page top pointers, 2)Create the page itself, 3)Create page bottom pointers. Note that #2 is trivial. What I should have done is had #1 and #3 be subroutine calls print_page_top() and print_page_bottom() respectively. The problem is I didn't want to pass all those darn arguments again, and all those lists and subscripts were getting confusing.
In retrospect, of course, the solution was to encapsulate lists @tags, @titles, @starts, @lengths and subscript $ss into an object with a name like $articles. $articles might have methods like next(), previous(), tag(), title(), start(), length(), getSubscript(), setSubscript(). The constructor would take the main routine's $sourcepage as an argument and basically do the same parsing as the existing fill_article_lists(). Such a construction would have cut arguments passed to routines, and encouraged further functional decomposition, resulting in vastly improved readability approaching true self-documentation.
Everybody knows all real app dev is done with Delphi, Clarion, VB, Powerbuilder, JBuilder, Visual J++, PowerJ, Visual Fox and the like. These days we must have rapid development. Only the backward looking dinosaurs use text source code. These tips would have been useful in 1990, but they're a day late and a dollar short.
Well, except that they can be used in drag n drop RAD. Most of these RAD environments force you to design your program objects around screens. Ugh. (And thank you Powerbuilder and Clarion for not being like that). But even so, considering that every screen *should* have some function, correct naming of the screens and limitations of functionality will produce (drum roll please) self documenting code. And even in the most RADical RAD, you can design a complex object around a "hello world" style screen. So if you're a good designer, the tips outlined earlier in this Troubleshooting Professional Magazine issue produce self documenting RAD.
And then there's the fact that the fundamental premise of this article might be false. Is source code dead? Was Linux written in VB? Or even VC++ with MFC? Or even C++? Nope, C. C for the kernal, C for enhancements, C for drivers. But that's all systems programming. You can't do apps in code, right?
Try tcl/tk -- compare its productivity to VB. Don't like tcl's unusual syntax? Program in Python or Perl, and use their tk modules for screens. RAD code is not an oxymoron.
But certainly as Linux matures drag and drop environments will displace code. Right? Well, that's really two questions. Will drag and drop environments appear? Certainly. With free Python, Perl and GNU C, the d&d environment is a proprietorists last bastion of price gouging. Will d&d displace code? The Linux marketplace has a proven track record of knowing when the king has no clothes. Only time will tell.
And all of this may be a moot argument. Because Linux, with its no-hassle licensing, modularity, and wealth of utilities, just may tip the scales toward the software-manufacturing model the deified but not achieved the last fifteen years. Read on...
All submissions become the property of the publisher (Steve Litt), unless other arrangements are previously made in writing. We do not currently pay for articles. Troubleshooters.Com reserves the right to edit any submission for clarity or brevity. Any published article will include a two sentence description of the author, a hypertext link to his or her email, and a phone number if desired. Upon request, we will include a hypertext link, at the end of the magazine issue, to the author's website, providing that website meets the Troubleshooters.Com criteria for links and that the author's website first links to Troubleshooters.Com.
Submissions should be emailed to Steve Litt's email address, with subject line Article Submission. The first paragraph of your message should read as follows (unless other arrangements are previously made in writing):