diff options
| author | Craig Jennings <c@cjennings.net> | 2025-10-12 11:47:26 -0500 |
|---|---|---|
| committer | Craig Jennings <c@cjennings.net> | 2025-10-12 11:47:26 -0500 |
| commit | 092304d9e0ccc37cc0ddaa9b136457e56a1cac20 (patch) | |
| tree | ea81999b8442246c978b364dd90e8c752af50db5 /assets/info/pcl.info | |
changing repositories
Diffstat (limited to 'assets/info/pcl.info')
| -rw-r--r-- | assets/info/pcl.info | 29024 |
1 files changed, 29024 insertions, 0 deletions
diff --git a/assets/info/pcl.info b/assets/info/pcl.info new file mode 100644 index 00000000..9a77790c --- /dev/null +++ b/assets/info/pcl.info @@ -0,0 +1,29024 @@ +This is pcl.info, produced by makeinfo version 5.2 from pcl.texi. + + +File: pcl.info, Node: Top, Next: Letter, Prev: (dir), Up: (dir) + +Practical Common Lisp +===================== + +Unofficial Texinfo Format (C) Peter Seibel + + “that book is dead sexy” — ‘Xach on #lisp’ + (*Note Blurbs::.) + +This page, and the pages it links to, contain text of the Common Lisp +book Practical Common Lisp published by Apress +(https://www.apress.com/us/book/9781590592397). These pages now contain +the final text as it appears in the book. If you find errors in these +pages, please send email to <book@gigamonkeys.com>. These pages will +remain online in perpetuity -— I hope they will serve as a useful +introduction to Common Lisp for folks who are curious about Lisp but +maybe not yet curious enough to shell out big bucks for a dead-tree book +and a good Common Lisp tutorial for folks who want to get down to real +coding right away. However, don't let that stop you from buying the +printed version available from Apress at your favorite local or online +bookseller. For the complete bookstore browsing experience, you can +read the letter to the reader (*Note Letter::.) that appears on the +back cover of the treeware edition of the book. + +* Menu: + +* Letter:: Letter to the Reader +* Blurbs:: Blurbs +* Chapter 1:: Introduction: Why Lisp? +* Chapter 2:: Lather, Rinse, Repeat: A Tour of the REPL +* Chapter 3:: Practical: A Simple Database +* Chapter 4:: Syntax and Semantics +* Chapter 5:: Functions +* Chapter 6:: Variables +* Chapter 7:: Macros: Standard Control Constructs +* Chapter 8:: Macros: Defining Your Own +* Chapter 9:: Practical: Building a Unit Test Framework +* Chapter 10:: Numbers, Characters, and Strings +* Chapter 11:: Collections +* Chapter 12:: They Called It LISP for a Reason: List Processing +* Chapter 13:: Beyond Lists: Other Uses for Cons Cells +* Chapter 14:: Files and File I/O +* Chapter 15:: Practical: A Portable Pathname Library +* Chapter 16:: Object Reorientation: Generic Functions +* Chapter 17:: Object Reorientation: Classes +* Chapter 18:: A Few FORMAT Recipes +* Chapter 19:: Beyond Exception Handling: Conditions and Restarts +* Chapter 20:: The Special Operators +* Chapter 21:: Programming in the Large: Packages and Symbols +* Chapter 22:: LOOP for Black Belts +* Chapter 23:: Practical: A Spam Filter +* Chapter 24:: Practical: Parsing Binary Files +* Chapter 25:: Practical: An ID3 Parser +* Chapter 26:: Practical: Web Programming with AllegroServe +* Chapter 27:: Practical: An MP3 Database +* Chapter 28:: Practical: A Shoutcast Server +* Chapter 29:: Practical: An MP3 Browser +* Chapter 30:: Practical: An HTML Generation Library, the Interpreter +* Chapter 31:: Practical: An HTML Generation Library, the Compiler +* Chapter 32:: Conclusion: What's Next? + + -- The Detailed Node Listing -- + +1. Introduction: Why Lisp? + +* 1-1:: Why Lisp? +* 1-2:: Where It Began +* 1-2-1:: But I learned Lisp Once, And IT Wasn't Like what you're describing +* 1-3:: Who This Book Is For + +2. Lather, Rinse, Repeat: A Tour of the REPL + +* 2-1:: Choosing a Lisp Implementation +* 2-2:: Getting Up and Running with Lisp in a Box +* 2-3:: Free Your Mind: Interactive Programming +* 2-4:: Experimenting in the REPL +* 2-5:: "Hello, World," Lisp Style +* 2-6:: Saving Your Work + +3. Practical: A Simple Database + +* 3-1:: CDs and Records +* 3-2:: Filing CDs +* 3-3:: Looking at the Database Contents +* 3-4:: Improving the User Interaction +* 3-5:: Saving and Loading the Database +* 3-6:: Querying the Database +* 3-7:: Updating Existing Records-Another Use for WHERE +* 3-8:: Removing Duplication and Winning Big +* 3-9:: Wrapping Up + +4. Syntax and Semantics + +* 4-1:: What's with All the Parentheses? +* 4-2:: Breaking Open the Black Box +* 4-3:: S-expressions +* 4-4:: S-expressions As Lisp Forms +* 4-5:: Function Calls +* 4-6:: Special Operators +* 4-7:: Macros +* 4-8:: Truth, Falsehood, and Equality +* 4-9:: Formatting Lisp Code + +5. Functions + +* 5-1:: Defining New Functions +* 5-2:: Function Parameter Lists +* 5-3:: Optional Parameters +* 5-4:: Rest Parameters +* 5-5:: Keyword Parameters +* 5-6:: Mixing Different Parameter Types +* 5-7:: Function Return Values +* 5-8:: Functions As Data, a.k.a. Higher-Order Functions +* 5-9:: Anonymous Functions + +6. Variables + +* 6-1:: Variable Basics +* 6-2:: Lexical Variables and Closures +* 6-3:: Dynamic, a.k.a. Special, Variables +* 6-4:: Constants +* 6-5:: Assignment +* 6-6:: Generalized Assignment +* 6-7:: Other Ways to Modify Places + +7. Macros: Standard Control Constructs + +* 7-1:: WHEN and UNLESS +* 7-2:: COND +* 7-3:: AND, OR, and NOT +* 7-4:: Looping +* 7-5:: DOLIST and DOTIMES +* 7-6:: DO +* 7-7:: The Mighty LOOP + +8. Macros: Defining Your Own + +* 8-1:: The Story of Mac: A Just-So Story +* 8-2:: Macro Expansion Time vs. Runtime +* 8-3:: DEF MACRO +* 8-4:: A Sample Macro: do-primes +* 8-5:: Macro Parameters +* 8-6:: Generating the Expansion +* 8-7:: Plugging the Leaks +* 8-8:: Macro-Writing Macros +* 8-8-1:: Another classic macro-writing MACRO: ONCE-ONLY +* 8-9:: Beyond Simple Macros + +9. Practical: Building a Unit Test Framework + +* 9-1:: Two First Tries +* 9-2:: Refactoring +* 9-3:: Fixing the Return Value +* 9-4:: Better Result Reporting +* 9-5:: An Abstraction Emerges +* 9-6:: A Hierarchy of Tests +* 9-7:: Wrapping Up + +10. Numbers, Characters, and Strings + +* 10-1:: Numbers +* 10-2:: Numeric Literals +* 10-3:: Basic Math +* 10-4:: Numeric Comparisons +* 10-5:: Higher Math +* 10-6:: Characters +* 10-7:: Character Comparisons +* 10-8:: Strings +* 10-9:: String Comparisons + +11. Collections + +* 11-1:: Vectors +* 11-2:: Subtypes of Vector +* 11-3:: Vectors As Sequences +* 11-4:: Sequence Iterating Functions +* 11-5:: Higher-Order Function Variants +* 11-6:: Whole Sequence Manipulations +* 11-7:: Sorting and Merging +* 11-8:: Subsequence Manipulations +* 11-9:: Sequence Predicates +* 11-10:: Sequence Mapping Functions +* 11-11:: Hash Tables +* 11-12:: Hash Table Iteration + +12. They Called It LISP for a Reason: List Processing + +* 12-1:: "There Is No List" +* 12-2:: Functional Programming and Lists +* 12-3:: "Destructive" Operations +* 12-4:: Combining Recycling with Shared Structure +* 12-5:: List-Manipulation Functions +* 12-6:: Mapping +* 12-7:: Other Structures + +13. Beyond Lists: Other Uses for Cons Cells + +* 13-1:: Trees +* 13-2:: Sets +* 13-3:: Lookup Tables: Alists and Plists +* 13-4:: DESTRUCTURING-BIND + +14. Files and File I/O + +* 14-1:: Reading File Data +* 14-2:: Reading Binary Data +* 14-3:: Bulk Reads +* 14-4:: File Output +* 14-5:: Closing Files +* 14-6:: Filenames +* 14-6-1:: How We Got Here +* 14-7:: How Pathnames Represent Filenames +* 14-8:: Constructing New Pathnames +* 14-9:: Two Representations of Directory Names +* 14-10:: Interacting with the File System +* 14-11:: Other Kinds of I/O + +15. Practical: A Portable Pathname Library + +* 15-1:: The API +* 15-2:: *FEATURES* and Read-Time Conditionalization +* 15-2-1:: Packaging the Library +* 15-3:: Listing a Directory +* 15-4:: Testing a File's Existence +* 15-5:: Walking a Directory Tree + +16. Object Reorientation: Generic Functions + +* 16-1:: Generic Functions and Classes +* 16-2:: Generic Functions and Methods +* 16-3:: DEFGENERIC +* 16-4:: DEFMETHOD +* 16-5:: Method Combination +* 16-6:: The Standard Method Combination +* 16-7:: Other Method Combinations +* 16-8:: Multimethods +* 16-8-1:: Multimethods vs. Method Overloading +* 16-9:: To Be Continued . . . + +17. Object Reorientation: Classes + +* 17-1:: DEFCLASS +* 17-1-1:: What Are "User-Defined Classes"? +* 17-2:: Slot Specifiers +* 17-3:: Object Initialization +* 17-4:: Accessor Functions +* 17-5:: WITH-SLOTS and WITH-ACCESSORS +* 17-6:: Class-Allocated Slots +* 17-7:: Slots and Inheritance +* 17-8:: Multiple Inheritance +* 17-9:: Good Object-Oriented Design + +18. A Few FORMAT Recipes + +* 18-1:: The FORMAT Function +* 18-2:: FORMAT Directives +* 18-3:: Basic Formatting +* 18-4:: Character and Integer Directives +* 18-5:: Floating-Point Directives +* 18-6:: English-Language Directives +* 18-7:: Conditional Formatting +* 18-8:: Iteration +* 18-9:: Hop, Skip, Jump +* 18-10:: And More . . . + +19. Beyond Exception Handling: Conditions and Restarts + +* 19-1:: The Lisp Way +* 19-2:: Conditions +* 19-3:: Condition Handlers +* 19-3-1:: JAVA-STYLE EXCEPTON HANDLING +* 19-4:: Restarts +* 19-5:: Providing Multiple Restarts +* 19-6:: Other Uses for Conditions +* 19-6-1:: Writing Robust Software + +20. The Special Operators + +* 20-1:: Controlling Evaluation +* 20-2:: Manipulating the Lexical Environment +* 20-3:: Local Flow of Control +* 20-4:: Unwinding the Stack +* 20-5:: Multiple Values +* 20-6:: EVAL-WHEN +* 20-7:: Other Special Operators + +21. Programming in the Large: Packages and Symbols + +* 21-1:: How the Reader Uses Packages +* 21-2:: A Bit of Package and Symbol Vocabulary +* 21-3:: Three Standard Packages +* 21-4:: Defining Your Own Packages +* 21-5:: Packaging Reusable Libraries +* 21-6:: Importing Individual Names +* 21-7:: Packaging Mechanics +* 21-8:: Package Gotchas + +22. LOOP for Black Belts + +* 22-1:: The Parts of a LOOP +* 22-2:: Iteration Control +* 22-3:: Counting Loops +* 22-4:: Looping Over Collections and Packages +* 22-5:: Equals-Then Iteration +* 22-6:: Local Variables +* 22-7:: Destructuring Variables +* 22-8:: Value Accumulation +* 22-9:: Unconditional Execution +* 22-10:: Conditional Execution +* 22-11:: Setting Up and Tearing Down +* 22-12:: Termination Tests +* 22-13:: Putting It All Together + +23. Practical: A Spam Filter + +* 23-1:: The Heart of a Spam Filter +* 23-2:: Training the Filter +* 23-3:: Per-Word Statistics +* 23-4:: Combining Probabilities +* 23-5:: Inverse Chi Square +* 23-6:: Training the Filter +* 23-7:: Testing the Filter +* 23-8:: A Couple of Utility Functions +* 23-9:: Analyzing the Results +* 23-10:: What's Next + +24. Practical: Parsing Binary Files + +* 24-1:: Binary Files +* 24-2:: Binary Format Basics +* 24-3:: Strings in Binary Files +* 24-4:: Composite Structures +* 24-5:: Designing the Macros +* 24-6:: Making the Dream a Reality +* 24-7:: Reading Binary Objects +* 24-8:: Writing Binary Objects +* 24-9:: Adding Inheritance and Tagged Structures +* 24-10:: Keeping Track of Inherited Slots +* 24-11:: Tagged Structures +* 24-12:: Primitive Binary Types +* 24-13:: The Current Object Stack + +25. Practical: An ID3 Parser + +* 25-1:: Structure of an ID3v2 Tag +* 25-2:: Defining a Package +* 25-3:: Integer Types +* 25-4:: String Types +* 25-5:: ID3 Tag Header +* 25-6:: ID3 Frames +* 25-7:: Detecting Tag Padding +* 25-8:: Supporting Multiple Versions of ID3 +* 25-9:: Versioned Frame Base Classes +* 25-10:: Versioned Concrete Frame Classes +* 25-11:: What Frames Do You Actually Need? +* 25-12:: Text Information Frames +* 25-13:: Comment Frames +* 25-14:: Extracting Information from an ID3 Tag + +26. Practical: Web Programming with AllegroServe + +* 26-1:: A 30-Second Intro to Server-Side Web Programming +* 26-2:: AllegroServe +* 26-3:: Generating Dynamic Content with AllegroServe +* 26-4:: Generating HTML +* 26-5:: HTML Macros +* 26-6:: Query Parameters +* 26-7:: Cookies +* 26-8:: A Small Application Framework +* 26-9:: The Implementation + +27. Practical: An MP3 Database + +* 27-1:: The Database +* 27-1-1:: The Package +* 27-2:: Defining a Schema +* 27-3:: Inserting Values +* 27-4:: Querying the Database +* 27-5:: Matching Functions +* 27-6:: Getting at the Results +* 27-7:: Other Database Operations + +28. Practical: A Shoutcast Server + +* 28-1:: The Shoutcast Protocol +* 28-2:: Song Sources +* 28-2-1:: The Package +* 28-3:: Implementing Shoutcast + +29. Practical: An MP3 Browser + +* 29-1:: Playlists +* 29-1-1:: The Package +* 29-2:: Playlists As Song Sources +* 29-3:: Manipulating the Playlist +* 29-4:: Query Parameter Types +* 29-5:: Boilerplate HTML +* 29-6:: The Browse Page +* 29-7:: The Playlist +* 29-8:: Finding a Playlist +* 29-9:: Running the App + +30. Practical: An HTML Generation Library, the Interpreter + +* 30-1:: Designing a Domain-Specific Language +* 30-2:: The FOO Language +* 30-3:: Character Escaping +* 30-3-1:: The Package +* 30-4:: Indenting Printer +* 30-5:: HTML Processor Interface +* 30-6:: The Pretty Printer Backend +* 30-6-1:: Using Conditions to Have Your Cake and Eat It Too +* 30-7:: The Basic Evaluation Rule +* 30-8:: What's Next? + +31. Practical: An HTML Generation Library, the Compiler + +* 31-1:: The Compiler +* 31-2:: FOO Special Operators +* 31-3:: FOO Macros +* 31-4:: The Public API +* 31-5:: The End of the Line + +32. Conclusion: What's Next? + +* 32-1:: Finding Lisp Libraries +* 32-2:: Interfacing with Other Languages +* 32-3:: Make It Work, Make It Right, Make It Fast +* 32-4:: Delivering Applications +* 32-5:: Where to Go Next + + + + +File: pcl.info, Node: Letter, Next: Blurbs, Prev: Top, Up: Top + +Letter to the Reader +******************** + +Dear Reader, + +‘Practical Common Lisp’ ... isn't that an oxymoron? If you're like +most programmers, you probably know something about Lisp -— from a +comp sci course in college or from learning enough Elisp to customize +Emacs a bit. Or maybe you just know someone who won't shut up about +Lisp, the greatest language ever. But you probably never figured you'd +see ‘practical’ and ‘Lisp’ in the same book title. + +Yet, you're reading this; you must want to know more. Maybe you believe +learning Lisp will make you a better programmer in any language. Or +maybe you just want to know what those Lisp fanatics are yammering about +all the time. Or maybe you have learned some Lisp but haven't quite +made the leap to using it to write interesting software. + +If any of those is true, this book is for you. Using Common Lisp, an +ANSI standardized, industrial-strength dialect of Lisp, I show you how +to write software that goes well beyond silly academic exercises or +trivial editor customizations. And I show you how Lisp -— even with +many of its features adopted by other languages—still has a few tricks +up its sleeve. + +But unlike many Lisp books, this one doesn't just touch on a few of +Lisp's greatest features and then leave you on your own to actually use +them. I cover all the language features you'll need to write real +programs and devote well over a third of the book to developing +nontrivial software —- a statistical spam filter, a library for +parsing binary files, and a server for streaming MP3s over a network +complete with an online MP3 database and Web interface. + +So turn the book over, open it up, and see for yourself how eminently +practical using the greatest language ever invented can be. + +Sincerely, + +Peter Seibel + + +File: pcl.info, Node: Blurbs, Next: Chapter 1, Prev: Letter, Up: Top + + “I have been complimented many times and they always embarrass + me; I always feel that they have not said enough.” —Mark Twain + +Blurbs +****** + + “that book is dead sexy” — ‘Xach on #lisp’ + + ------------------------------------------------------------- + + “Peter Seibel offers a fresh view of Lisp and its possibilities + for elegantly solving problems. In Practical Common Lisp, he gives + enough basic information to let you quickly see the power of the + functional language paradigm. He then dazzles you with examples + that seem almost magical in their simplicity and power. This read + is pure fun from start to finish.” - ‘Gary Pollice’, from Dr. + Dobb's Portal May-17-2006 article on the 2006 Jolt Awards + (http://www.drdobbs.com/joltawards/2006-jolt-awards/187900423?pgno=4) + + ------------------------------------------------------------- + + “Peter Seibel's ‘Practical Common Lisp’ is just what the + title implies: an excellent introduction to Common Lisp for someone + who wants to dive in and start using the language for real work. + The book is very well written and is fun to read—at least for + those of us whose idea of fun extends to learning new programming + languages. + + Rather than spending a lot of time on abstract discussion of Lisp's + place in the universe of programming lnaguages, Seibel dives right + in, guiding the reader through a series of programming examples of + increasing complexity. This approach places the most emphasis on + those parts of Common Lisp that skilled programmers use the most, + without getting bogged down in the odd corners of Common Lisp that + even the experts must look up in the manual. The result of + Seibel's example-driven approach is to give the reader an excellent + appreciation of the power of Common Lisp in building complex, + evolving software systems with a minimum of effort. + + There are already many good books on Common Lisp that offer a more + abstract and comparative approach, but a good 'Here's how you do + it—and why' book, aimed at the working programmer, is a valuable + contribution, both to current Common Lisp users and those who + should be.” — ‘Scott E. Fahlman, Research Professor of + Computer Science, Carnegie Mellon Universit’ + + ------------------------------------------------------------- + + “This book shows the power of Lisp not only in the areas that it + has traditionally been noted for—such as developing a complete + unit test framework in only 26 lines of code—but also in new + areas such as parsing binary MP3 files, building a web application + for browsing a collection of songs, and streaming audio over the + web. Many readers will be surprised that Lisp allows you to do all + this with conciseness similar to scripting languages such as + Python, efficiency similar to C++, and unparalleled flexibility in + designing your own language extensions.” - ‘Peter Norvig, + Director of Search Quality, Google Inc; author of Paradigms of + Artificial Intelligence Programming: Case Studies in Common Lisp’ + + ------------------------------------------------------------- + + “I wish this book had already existed when I started learning + Lisp. It's not that there aren't other good books about (Common) + Lisp out there, but none of them has such a pragmatic, up-to-date + approach. And let's not forget that Peter covers topics like + pathnames or conditions and restarts which are completely ignored + in the rest of the Lisp literature. + + If you're new to Lisp and want to dive right in don't hesitate to + buy this book. Once you've read it and worked with it you can + continue with the ‘classics’ like Graham, Norvig, Keene, or + Steele.” — ‘Edi Weitz, maintainer of the Common Lisp Cookbook + and author of CL-PPCRE regular expression library.’ + + ------------------------------------------------------------- + + “Two prehensile toes up!” - ‘Kenny Tilton, comp.lang.lisp + demon, reporting on behalf of his development team.’ + + ------------------------------------------------------------- + + “Experienced programmers learn best from examples and it is + delightful to see that Lisp is finally being served with Seibel's + example-rich tutorial text. Especially delightful is the fact that + this book includes so many examples that fall within the realm of + problems today's programmers might be called upon to tackle, such + as Web development and streaming media. - ‘Philip Greenspun, + author of Software Engineering for Internet Applications, MIT + Department of Electrical Engineering and Computer Science’ + + ------------------------------------------------------------- + + “‘Practical Common Lisp’ is an excellent book that covers the + breadth of the Common Lisp language and also demonstrates the + unique features of Common Lisp with real-world applications that + the reader can run and extend. This book not only shows what + Common Lisp is but also why every programmer should be familiar + with Lisp.” - ‘John Foderaro, Senior Scientist, Franz Inc.’ + + ------------------------------------------------------------- + + “The Maxima Project frequently gets queries from potential new + contributors who would like to learn Common Lisp. I am pleased to + finally have a book that I can recommend to them without + reservation. Peter Seibel's clear, direct style allows the reader + to quickly appreciate the power of Common Lisp. His many included + examples, which focus on contemporary programming problems, + demonstrate that Lisp is much more than an academic programming + language. ‘Practical Common Lisp’ is a welcome addition to the + literature.” — ‘James Amundson, Maxima Project Leader’ + + ------------------------------------------------------------- + + “I like the interspersed Practical chapters on 'real' and useful + programs. We need books of this kind telling the world that + crunching strings and numbers into trees or graphs is easily done + in Lisp. — ‘Professor Christian Queinnec, Universite Paris 6 + (Pierre et Marie Curie)’ + + ------------------------------------------------------------- + + “One of the most important parts of learning a programming + language is learning its proper programming style. This is hard to + teach, but it can be painlessly absorbed from ‘Practical Common + Lisp’. Just reading the practical examples made me a better + programmer in any language.” - ‘Peter Scott, Lisp programmer’ + + ------------------------------------------------------------- + + “Finally, a Lisp book for the rest of us. If you want to learn + how to write a factorial function, this is not your book. Seibel + writes for the practical programmer, emphasizing the + engineer/artist over the scientist, subtly and gracefully implying + the power of the language while solving understandable real-world + problems. + + In most chapters, the reading of the chapter feels just like the + experience of writing a program, starting with a little + understanding, then having that understanding grow, like building + the shoulders upon which you can then stand. When Seibel + introduced macros as an aside while building a test framework, I + was shocked at how such a simple example made me really 'get' them. + Narrative context is extremely powerful and the technical books + that use it are a cut above. Congrats!” — ‘Keith Irwin, Lisp + Programmer’ + + ------------------------------------------------------------- + + “While learning Lisp, one is often refered to the CL HyperSpec if + they do not know what a particular function does, however, I found + that I often did not 'get it' just reading the HyperSpec. When I + had a problem of this manner, I turned to ‘Practical Common + Lisp’ every single time—it is by far the most readable source + on the subject that shows you how to program, not just tell you.” + — ‘Philip Haddad, Lisp Programmer’ + + ------------------------------------------------------------- + + “With the IT world evolving at an ever increasing pace, + professionals need the most powerful tools available. This is why + Common Lisp—the most powerful, flexible, and stable programming + language ever—is seeing such a rise in popularity. ‘Practical + Common Lisp’ is the long-awaited book that will help you harness + the power of Common Lisp to tackle today's complex real world + problems.” — ‘Marc Battyani, author of CL-PDF, + CL-TYPESETTING, and mod_lisp.’ + + ------------------------------------------------------------- + + “Please don't assume Common Lisp is only useful for Databases, + Unit Test Frameworks, Spam Filters, ID3 Parsers, Web Programming, + Shoutcast Servers, HTML Generation Interpreters, and HTML + Generation Compilers just because these are the only things + happened to be implemented in the book ‘Practical Common Lisp’. + — ‘Tobias C. Rittweiler, Lisp Programmer’ + + ------------------------------------------------------------- + + “When I met Peter, who just started writing this book, I asked to + myself (not to him, of course) ‘why yet another book on Common + Lisp, when there are many nice introductory books?’ One year + later, I found a draft of the new book and recognized I was wrong. + This book is not ‘yet another’ one. The author focuses on + practical aspects rather than on technical details of the language. + When I first studied Lisp by reading an introductory book, I felt I + understood the language, but I also had an impression ‘so + what?’, meaning I had no idea about how to use it. In contrast, + this book leaps into a ‘PRACTICAL’ chapter after the first few + chapters that explain the very basic notions of the language. Then + the readers are expected to learn more about the language while + they are following the PRACTICAL projects, which are combined + together to form a product of a significant size. After reading + this book, the readers will feel themselves expert programmers on + Common Lisp since they have ‘finished’ a big project already. + I think Lisp is the only language that allows this type of + practical introduction. Peter makes use of this feature of the + language in building up a fancy introduction on Common Lisp.” — + ‘Taiichi Yuasa, Professor, Department of Communications and + Computer Engineering, Kyoto University’ + + ------------------------------------------------------------- + + Have something to say about this book? Something nice? Want to + see it here? Send it along to <book@gigamonkeys.com>. + + +File: pcl.info, Node: Chapter 1, Next: Chapter 2, Prev: Blurbs, Up: Top + +1. Introduction: Why Lisp? +========================== + +If you think the greatest pleasure in programming comes from getting a +lot done with code that simply and clearly expresses your intention, +then programming in Common Lisp is likely to be about the most fun you +can have with a computer. You'll get more done, faster, using it than +you would using pretty much any other language. + +That's a bold claim. Can I justify it? Not in a just a few pages in +this chapter-you're going to have to learn some Lisp and see for +yourself-thus the rest of this book. For now, let me start with some +anecdotal evidence, the story of my own road to Lisp. Then, in the next +section, I'll explain the payoff I think you'll get from learning Common +Lisp. + +I'm one of what must be a fairly small number of second-generation Lisp +hackers. My father got his start in computers writing an operating +system in assembly for the machine he used to gather data for his +doctoral dissertation in physics. After running computer systems at +various physics labs, by the 1980s he had left physics altogether and +was working at a large pharmaceutical company. That company had a +project under way to develop software to model production processes in +its chemical plants-if you increase the size of this vessel, how does it +affect annual production? The original team, writing in FORTRAN, had +burned through half the money and almost all the time allotted to the +project with nothing to show for their efforts. This being the 1980s +and the middle of the artificial intelligence (AI) boom, Lisp was in the +air. So my dad-at that point not a Lisper-went to Carnegie Mellon +University (CMU) to talk to some of the folks working on what was to +become Common Lisp about whether Lisp might be a good language for this +project. + +The CMU folks showed him some demos of stuff they were working on, and +he was convinced. He in turn convinced his bosses to let his team take +over the failing project and do it in Lisp. A year later, and using +only what was left of the original budget, his team delivered a working +application with features that the original team had given up any hope +of delivering. My dad credits his team's success to their decision to +use Lisp. + +Now, that's just one anecdote. And maybe my dad is wrong about why they +succeeded. Or maybe Lisp was better only in comparison to other +languages of the day. These days we have lots of fancy new languages, +many of which have incorporated features from Lisp. Am I really saying +Lisp can offer you the same benefits today as it offered my dad in the +1980s? Read on. + +Despite my father's best efforts, I didn't learn any Lisp in high +school. After a college career that didn't involve much programming in +any language, I was seduced by the Web and back into computers. I +worked first in Perl, learning enough to be dangerous while building an +online discussion forum for ‘Mother Jones’ magazine's Web site and +then moving to a Web shop, Organic Online, where I worked on big-for the +time-Web sites such as the one Nike put up during the 1996 Olympics. +Later I moved onto Java as an early developer at WebLogic, now part of +BEA. After WebLogic, I joined another startup where I was the lead +programmer on a team building a transactional messaging system in Java. +Along the way, my general interest in programming languages led me to +explore such mainstream languages as C, C++, and Python, as well as less +well-known ones such as Smalltalk, Eiffel, and Beta. + +So I knew two languages inside and out and was familiar with another +half dozen. Eventually, however, I realized my interest in programming +languages was really rooted in the idea planted by my father's tales of +Lisp-that different languages really are different, and that, despite +the formal Turing equivalence of all programming languages, you really +can get more done more quickly in some languages than others and have +more fun doing it. Yet, ironically, I had never spent that much time +with Lisp itself. So, I started doing some Lisp hacking in my free +time. And whenever I did, it was exhilarating how quickly I was able to +go from idea to working code. + +For example, one vacation, having a week or so to hack Lisp, I decided +to try writing a version of a program-a system for breeding genetic +algorithms to play the game of Go-that I had written early in my career +as a Java programmer. Even handicapped by my then rudimentary knowledge +of Common Lisp and having to look up even basic functions, it still felt +more productive than it would have been to rewrite the same program in +Java, even with several extra years of Java experience acquired since +writing the first version. + +A similar experiment led to the library I'll discuss in Chapter 24. +Early in my time at WebLogic I had written a library, in Java, for +taking apart Java class files. It worked, but the code was a bit of a +mess and hard to modify or extend. I had tried several times, over the +years, to rewrite that library, thinking that with my ever-improving +Java chops I'd find some way to do it that didn't bog down in piles of +duplicated code. I never found a way. But when I tried to do it in +Common Lisp, it took me only two days, and I ended up not only with a +Java class file parser but with a general-purpose library for taking +apart any kind of binary file. You'll see how that library works in +Chapter 24 and use it in Chapter 25 to write a parser for the ID3 tags +embedded in MP3 files. + +* Menu: + +* 1-1:: Why Lisp? +* 1-2:: Where It Began +* 1-3:: Who This Book Is For + + +File: pcl.info, Node: 1-1, Next: 1-2, Prev: Chapter 1, Up: Chapter 1 + +Why Lisp? +========= + +It's hard, in only a few pages of an introductory chapter, to explain +why users of a language like it, and it's even harder to make the case +for why you should invest your time in learning a certain language. +Personal history only gets us so far. Perhaps I like Lisp because of +some quirk in the way my brain is wired. It could even be genetic, +since my dad has it too. So before you dive into learning Lisp, it's +reasonable to want to know what the payoff is going to be. + +For some languages, the payoff is relatively obvious. For instance, if +you want to write low-level code on Unix, you should learn C. Or if you +want to write certain kinds of cross-platform applications, you should +learn Java. And any of a number companies still use a lot of C++, so if +you want to get a job at one of them, you should learn C++. + +For most languages, however, the payoff isn't so easily categorized; it +has to do with subjective criteria such as how it feels to use the +language. Perl advocates like to say that Perl "makes easy things easy +and hard things possible" and revel in the fact that, as the Perl motto +has it, "There's more than one way to do it." (1) Python's fans, on the +other hand, think Python is clean and simple and think Python code is +easier to understand because, as ‘their’ motto says, "There's only +one way to do it." + +So, why Common Lisp? There's no immediately obvious payoff for adopting +Common Lisp the way there is for C, Java, and C++ (unless, of course, +you happen to own a Lisp Machine). The benefits of using Lisp have much +more to do with the experience of using it. I'll spend the rest of this +book showing you the specific features of Common Lisp and how to use +them so you can see for yourself what it's like. For now I'll try to +give you a sense of Lisp's philosophy. + +The nearest thing Common Lisp has to a motto is the koan-like +description, "the programmable programming language." While cryptic, +that description gets at the root of the biggest advantage Common Lisp +still has over other languages. More than any other language, Common +Lisp follows the philosophy that what's good for the language's designer +is good for the language's users. Thus, when you're programming in +Common Lisp, you almost never find yourself wishing the language +supported some feature that would make your program easier to write, +because, as you'll see throughout this book, you can just add the +feature yourself. + +Consequently, a Common Lisp program tends to provide a much clearer +mapping between your ideas about how the program works and the code you +actually write. Your ideas aren't obscured by boilerplate code and +endlessly repeated idioms. This makes your code easier to maintain +because you don't have to wade through reams of code every time you need +to make a change. Even systemic changes to a program's behavior can +often be achieved with relatively small changes to the actual code. +This also means you'll develop code more quickly; there's less code to +write, and you don't waste time thrashing around trying to find a clean +way to express yourself within the limitations of the language. (2) + +Common Lisp is also an excellent language for exploratory programming-if +you don't know exactly how your program is going to work when you first +sit down to write it, Common Lisp provides several features to help you +develop your code incrementally and interactively. + +For starters, the interactive read-eval-print loop, which I'll introduce +in the next chapter, lets you continually interact with your program as +you develop it. Write a new function. Test it. Change it. Try a +different approach. You never have to stop for a lengthy compilation +cycle. (3) + +Other features that support a flowing, interactive programming style are +Lisp's dynamic typing and the Common Lisp condition system. Because of +the former, you spend less time convincing the compiler you should be +allowed to run your code and more time actually running it and working +on it, (4) and the latter lets you develop even your error handling code +interactively. + +Another consequence of being "a programmable programming language" is +that Common Lisp, in addition to incorporating small changes that make +particular programs easier to write, can easily adopt big new ideas +about how programming languages should work. For instance, the original +implementation of the Common Lisp Object System (CLOS), Common Lisp's +powerful object system, was as a library written in portable Common +Lisp. This allowed Lisp programmers to gain actual experience with the +facilities it provided before it was officially incorporated into the +language. + +Whatever new paradigm comes down the pike next, it's extremely likely +that Common Lisp will be able to absorb it without requiring any changes +to the core language. For example, a Lisper has recently written a +library, AspectL, that adds support for aspect-oriented programming +(AOP) to Common Lisp. (5) If AOP turns out to be the next big thing, +Common Lisp will be able to support it without any changes to the base +language and without extra preprocessors and extra compilers. (6) + + ---------- Footnotes ---------- + + (1) Perl is also worth learning as "the duct tape of the Internet." + + (2) Unfortunately, there's little actual research on the productivity +of different languages. One report that shows Lisp coming out well +compared to C++ and Java in the combination of programmer and program +efficiency is discussed at http://www.norvig.com/java-lisp.html. + + (3) Psychologists have identified a state of mind called flow in +which we're capable of incredible concentration and productivity. The +importance of flow to programming has been recognized for nearly two +decades since it was discussed in the classic book about human factors +in programming ‘Peopleware: Productive Projects and Teams’ by Tom +DeMarco and Timothy Lister (Dorset House, 1987). The two key facts +about flow are that it takes around 15 minutes to get into a state of +flow and that even brief interruptions can break you right out of it, +requiring another 15-minute immersion to reenter. DeMarco and Lister, +like most subsequent authors, concerned themselves mostly with +flow-destroying interruptions such as ringing telephones and inopportune +visits from the boss. Less frequently considered but probably just as +important to programmers are the interruptions caused by our tools. +Languages that require, for instance, a lengthy compilation before you +can try your latest code can be just as inimical to flow as a noisy +phone or a nosy boss. So, one way to look at Lisp is as a language +designed to keep you in a state of flow. + + (4) This point is bound to be somewhat controversial, at least with +some folks. Static versus dynamic typing is one of the classic +religious wars in programming. If you're coming from C++ and Java (or +from statically typed functional languages such as Haskel and ML) and +refuse to consider living without static type checks, you might as well +put this book down now. However, before you do, you might first want to +check out what self-described "statically typed bigot" Robert Martin +(author of ‘Designing Object Oriented C++ Applications Using the Booch +Method’ [Prentice Hall, 1995]) and C++ and Java author Bruce Eckel +(author of ‘Thinking in C++’ [Prentice Hall, 1995] and ‘Thinking +in Java’ [Prentice Hall, 1998]) have had to say about dynamic typing +on their weblogs (http://www.artima.com/weblogs/viewpost.jsp?thread=4639 +and http://www.mindview.net/WebLog/log-0025). On the other hand, folks +coming from Smalltalk, Python, Perl, or Ruby should feel right at home +with this aspect of Common Lisp. + + (5) AspectL is an interesting project insofar as AspectJ, its +Java-based predecessor, was written by Gregor Kiczales, one of the +designers of Common Lisp's object and metaobject systems. To many +Lispers, AspectJ seems like Kiczales's attempt to backport his ideas +from Common Lisp into Java. However, Pascal Costanza, the author of +AspectL, thinks there are interesting ideas in AOP that could be useful +in Common Lisp. Of course, the reason he's able to implement AspectL as +a library is because of the incredible flexibility of the Common Lisp +Meta Object Protocol Kiczales designed. To implement AspectJ, Kiczales +had to write what was essentially a separate compiler that compiles a +new language into Java source code. The AspectL project page is at +http://common-lisp.net/ project/aspectl/. + + (6) Or to look at it another, more technically accurate, way, Common +Lisp comes with a built-in facility for integrating compilers for +embedded languages. + + +File: pcl.info, Node: 1-2, Next: 1-3, Prev: 1-1, Up: Chapter 1 + +Where It Began +============== + +Common Lisp is the modern descendant of the Lisp language first +conceived by John McCarthy in 1956. Lisp circa 1956 was designed for +"symbolic data processing" (1) and derived its name from one of the +things it was quite good at: LISt Processing. We've come a long way +since then: Common Lisp sports as fine an array of modern data types as +you can ask for: a condition system that, as you'll see in Chapter 19, +provides a whole level of flexibility missing from the exception systems +of languages such as Java, Python, and C++; powerful facilities for +doing object-oriented programming; and several language facilities that +just don't exist in other programming languages. How is this possible? +What on Earth would provoke the evolution of such a well-equipped +language? + +Well, McCarthy was (and still is) an artificial intelligence (AI) +researcher, and many of the features he built into his initial version +of the language made it an excellent language for AI programming. +During the AI boom of the 1980s, Lisp remained a favorite tool for +programmers writing software to solve hard problems such as automated +theorem proving, planning and scheduling, and computer vision. These +were problems that required a lot of hard-to-write software; to make a +dent in them, AI programmers needed a powerful language, and they grew +Lisp into the language they needed. And the Cold War helped-as the +Pentagon poured money into the Defense Advanced Research Projects Agency +(DARPA), a lot of it went to folks working on problems such as +large-scale battlefield simulations, automated planning, and natural +language interfaces. These folks also used Lisp and continued pushing +it to do what they needed. + +The same forces that drove Lisp's feature evolution also pushed the +envelope along other dimensions-big AI problems eat up a lot of +computing resources however you code them, and if you run Moore's law in +reverse for 20 years, you can imagine how scarce computing resources +were on circa-80s hardware. The Lisp guys had to find all kinds of ways +to squeeze performance out of their implementations. Modern Common Lisp +implementations are the heirs to those early efforts and often include +quite sophisticated, native machine code-generating compilers. While +today, thanks to Moore's law, it's possible to get usable performance +from a purely interpreted language, that's no longer an issue for Common +Lisp. As I'll show in Chapter 32, with proper (optional) declarations, +a good Lisp compiler can generate machine code quite similar to what +might be generated by a C compiler. + +The 1980s were also the era of the Lisp Machines, with several +companies, most famously Symbolics, producing computers that ran Lisp +natively from the chips up. Thus, Lisp became a systems programming +language, used for writing the operating system, editors, compilers, and +pretty much everything else that ran on the Lisp Machines. + +In fact, by the early 1980s, with various AI labs and the Lisp machine +vendors all providing their own Lisp implementations, there was such a +proliferation of Lisp systems and dialects that the folks at DARPA began +to express concern about the Lisp community splintering. To address +this concern, a grassroots group of Lisp hackers got together in 1981 +and began the process of standardizing a new language called Common Lisp +that combined the best features from the existing Lisp dialects. Their +work was documented in the book ‘Common Lisp the Language’ by Guy +Steele (Digital Press, 1984)-CLtL to the Lisp-cognoscenti. + +By 1986 the first Common Lisp implementations were available, and the +writing was on the wall for the dialects it was intended to replace. In +1996, the American National Standards Institute (ANSI) released a +standard for Common Lisp that built on and extended the language +specified in CLtL, adding some major new features such as the CLOS and +the condition system. And even that wasn't the last word: like CLtL +before it, the ANSI standard intentionally leaves room for implementers +to experiment with the best way to do things: a full Lisp implementation +provides a rich runtime environment with access to GUI widgets, multiple +threads of control, TCP/IP sockets, and more. These days Common Lisp is +evolving much like other open-source languages-the folks who use it +write the libraries they need and often make them available to others. +In the last few years, in particular, there has been a spurt of activity +in open-source Lisp libraries. + +So, on one hand, Lisp is one of computer science's "classical" +languages, based on ideas that have stood the test of time. (2) On the +other, it's a thoroughly modern, general-purpose language whose design +reflects a deeply pragmatic approach to solving real problems as +efficiently and robustly as possible. The only downside of Lisp's +"classical" heritage is that lots of folks are still walking around with +ideas about Lisp based on some particular flavor of Lisp they were +exposed to at some particular time in the nearly half century since +McCarthy invented Lisp. If someone tells you Lisp is only interpreted, +that it's slow, or that you have to use recursion for everything, ask +them what dialect of Lisp they're talking about and whether people were +wearing bell-bottoms when they learned it. (3) + +* Menu: + +* 1-2-1:: But I learned Lisp Once, And IT Wasn't Like what you're describing + + ---------- Footnotes ---------- + + (1) ‘Lisp 1.5 Programmer's Manual’ (M.I.T. Press, 1962) + + (2) Ideas first introduced in Lisp include the if/then/else +construct, recursive function calls, dynamic memory allocation, garbage +collection, first-class functions, lexical closures, interactive +programming, incremental compilation, and dynamic typing. + + (3) One of the most commonly repeated myths about Lisp is that it's +"dead." While it's true that Common Lisp isn't as widely used as, say, +Visual Basic or Java, it seems strange to describe a language that +continues to be used for new development and that continues to attract +new users as "dead." Some recent Lisp success stories include Paul +Graham's Viaweb, which became Yahoo Store when Yahoo bought his company; +ITA Software's airfare pricing and shopping system, QPX, used by the +online ticket seller Orbitz and others; Naughty Dog's game for the +PlayStation 2, Jak and Daxter, which is largely written in a +domain-specific Lisp dialect Naughty Dog invented called GOAL, whose +compiler is itself written in Common Lisp; and the Roomba, the +autonomous robotic vacuum cleaner, whose software is written in L, a +downwardly compatible subset of Common Lisp. Perhaps even more telling +is the growth of the Common-Lisp.net Web site, which hosts open-source +Common Lisp projects, and the number of local Lisp user groups that have +sprung up in the past couple of years. + + +File: pcl.info, Node: 1-2-1, Next: 1-3, Prev: 1-2, Up: 1-2 + +But I learned Lisp Once, And IT Wasn't Like what you're describing +================================================================== + +If you've used Lisp in the past, you may have ideas about what "Lisp" is +that have little to do with Common Lisp. While Common Lisp supplanted +most of the dialects it's descended from, it isn't the only remaining +Lisp dialect, and depending on where and when you were exposed to Lisp, +you may very well have learned one of these other dialects. + +Other than Common Lisp, the one general-purpose Lisp dialect that still +has an active user community is Scheme. Common Lisp borrowed a few +important features from Scheme but never intended to replace it. + +Originally designed at M.I.T., where it was quickly put to use as a +teaching language for undergraduate computer science courses, Scheme has +always been aimed at a different language niche than Common Lisp. In +particular, Scheme's designers have focused on keeping the core language +as small and as simple as possible. This has obvious benefits for a +teaching language and also for programming language researchers who like +to be able to formally prove things about languages. + +It also has the benefit of making it relatively easy to understand the +whole language as specified in the standard. But, it does so at the +cost of omitting many useful features that are standardized in Common +Lisp. Individual Scheme implementations may provide these features in +implementation-specific ways, but their omission from the standard makes +it harder to write portable Scheme code than to write portable Common +Lisp code. + +Scheme also emphasizes a functional programming style and the use of +recursion much more than Common Lisp does. If you studied Lisp in +college and came away with the impression that it was only an academic +language with no real-world application, chances are you learned Scheme. +This isn't to say that's a particularly fair characterization of Scheme, +but it's even less applicable to Common Lisp, which was expressly +designed to be a real-world engineering language rather than a +theoretically "pure" language. + +If you've learned Scheme, you should also be aware that a number of +subtle differences between Scheme and Common Lisp may trip you up. +These differences are also the basis for several perennial religious +wars between the hotheads in the Common Lisp and Scheme communities. +I'll try to point out some of the more important differences as we go +along. + +Two other Lisp dialects still in widespread use are Elisp, the extension +language for the Emacs editor, and Autolisp, the extension language for +Autodesk's AutoCAD computer-aided design tool. Although it's possible +more lines of Elisp and Autolisp have been written than of any other +dialect of Lisp, neither can be used outside their host application, and +both are quite old-fashioned Lisps compared to either Scheme or Common +Lisp. If you've used one of these dialects, prepare to hop in the Lisp +time machine and jump forward several decades. + + +File: pcl.info, Node: 1-3, Next: Chapter 2, Prev: 1-2, Up: Chapter 1 + +Who This Book Is For +==================== + +This book is for you if you're curious about Common Lisp, regardless of +whether you're already convinced you want to use it or if you just want +to know what all the fuss is about. + +If you've learned some Lisp already but have had trouble making the leap +from academic exercises to real programs, this book should get you on +your way. On the other hand, you don't have to be already convinced +that you want to use Lisp to get something out of this book. + +If you're a hard-nosed pragmatist who wants to know what advantages +Common Lisp has over languages such as Perl, Python, Java, C, or C#, +this book should give you some ideas. Or maybe you don't even care +about using Lisp-maybe you're already sure Lisp isn't really any better +than other languages you know but are annoyed by some Lisper telling you +that's because you just don't "get it." If so, this book will give you +a straight-to-the-point introduction to Common Lisp. If, after reading +this book, you still think Common Lisp is no better than your current +favorite languages, you'll be in an excellent position to explain +exactly why. + +I cover not only the syntax and semantics of the language but also how +you can use it to write software that does useful stuff. In the first +part of the book, I'll cover the language itself, mixing in a few +"practical" chapters, where I'll show you how to write real code. Then, +after I've covered most of the language, including several parts that +other books leave for you to figure out on your own, the remainder of +the book consists of nine more practical chapters where I'll help you +write several medium-sized programs that actually do things you might +find useful: filter spam, parse binary files, catalog MP3s, stream MP3s +over a network, and provide a Web interface for the MP3 catalog and +server. + +After you finish this book, you'll be familiar with all the most +important features of the language and how they fit together, you'll +have used Common Lisp to write several nontrivial programs, and you'll +be well prepared to continue exploring the language on your own. While +everyone's road to Lisp is different, I hope this book will help smooth +the way for you. So, let's begin. + + +File: pcl.info, Node: Chapter 2, Prev: Chapter 1, Up: Top + +2. Lather, Rinse, Repeat: A Tour of the REPL +============================================ + +In this chapter you'll set up your programming environment and write +your first Common Lisp programs. We'll use the easy-to-install Lisp in +a Box developed by Matthew Danish and Mikel Evins, which packages a +Common Lisp implementation with Emacs, a powerful Lisp-aware text +editor, and SLIME (1) , a Common Lisp development environment built on +top of Emacs. + +This combo provides a state-of-the-art Common Lisp development +environment that supports the incremental, interactive development style +that characterizes Lisp programming. The SLIME environment has the +added advantage of providing a fairly uniform user interface regardless +of the operating system and Common Lisp implementation you choose. I'll +use the Lisp in a Box environment in order to have a specific +development environment to talk about; folks who want to explore other +development environments such as the graphical integrated development +environments (IDEs) provided by some of the commercial Lisp vendors or +environments based on other editors shouldn't have too much trouble +translating the basics (2) . + +* Menu: + +* 2-1:: Choosing a Lisp Implementation +* 2-2:: Getting Up and Running with Lisp in a Box +* 2-3:: Free Your Mind: Interactive Programming +* 2-4:: Experimenting in the REPL +* 2-5:: "Hello, World," Lisp Style +* 2-6:: Saving Your Work + + ---------- Footnotes ---------- + + (1) Superior Lisp Interaction Mode for Emacs + + (2) If you've had a bad experience with Emacs previously, you should +treat Lisp in a Box as an IDE that happens to use an Emacs-like editor +as its text editor; there will be no need to become an Emacs guru to +program Lisp. It is, however, orders of magnitude more enjoyable to +program Lisp with an editor that has some basic Lisp awareness. At a +minimum, you'll want an editor that can automatically match ()s for you +and knows how to automatically indent Lisp code. Because Emacs is +itself largely written in a Lisp dialect, Elisp, it has quite a bit of +support for editing Lisp code. Emacs is also deeply embedded into the +history of Lisp and the culture of Lisp hackers: the original Emacs and +its immediate predecessors, TECMACS and TMACS, were written by Lispers +at the Massachusetts Institute of Technology (MIT). The editors on the +Lisp Machines were versions of Emacs written entirely in Lisp. The +first two Lisp Machine Emacs, following the hacker tradition of +recursive acronyms, were EINE and ZWEI, which stood for EINE Is Not +Emacs and ZWEI Was EINE Initially. Later ones used a descendant of +ZWEI, named, more prosaically, ZMACS. + + +File: pcl.info, Node: 2-1, Next: 2-2, Prev: Chapter 2, Up: Chapter 2 + +Choosing a Lisp Implementation +============================== + +The first thing you have to do is to choose a Lisp implementation. This +may seem like a strange thing to have to do for folks used to languages +such as Perl, Python, Visual Basic (VB), C#, and Java. The difference +between Common Lisp and these languages is that Common Lisp is defined +by its standard-there is neither a single implementation controlled by a +benevolent dictator, as with Perl and Python, nor a canonical +implementation controlled by a single company, as with VB, C#, and Java. +Anyone who wants to read the standard and implement the language is free +to do so. Furthermore, changes to the standard have to be made in +accordance with a process controlled by the standards body American +National Standards Institute (ANSI). That process is designed to keep +any one entity, such as a single vendor, from being able to arbitrarily +change the standard (1) . Thus, the Common Lisp standard is a contract +between any Common Lisp vendor and Common Lisp programmers. The +contract tells you that if you write a program that uses the features of +the language the way they're described in the standard, you can count on +your program behaving the same in any conforming implementation. + +On the other hand, the standard may not cover everything you may want to +do in your programs-some things were intentionally left unspecified in +order to allow continuing experimentation by implementers in areas where +there wasn't consensus about the best way for the language to support +certain features. So every implementation offers some features above +and beyond what's specified in the standard. Depending on what kind of +programming you're going to be doing, it may make sense to just pick one +implementation that has the extra features you need and use that. On +the other hand, if we're delivering Lisp source to be used by others, +such as libraries, you'll want-as far as possible-to write portable +Common Lisp. For writing code that should be mostly portable but that +needs facilities not defined by the standard, Common Lisp provides a +flexible way to write code "conditionalized" on the features available +in a particular implementation. You'll see an example of this kind of +code in Chapter 15 when we develop a simple library that smoothes over +some differences between how different Lisp implementations deal with +filenames. + +For the moment, however, the most important characteristic of an +implementation is whether it runs on our favorite operating system. The +folks at Franz, makers of Allegro Common Lisp, are making available a +trial version of their product for use with this book that runs on +Linux, Windows, and OS X. Folks looking for an open-source +implementation have several options. SBCL (2) is a high-quality +open-source implementation that compiles to native code and runs on a +wide variety of Unixes, including Linux and OS X. SBCL is derived from +CMUCL (3) , which is a Common Lisp developed at Carnegie Mellon +University, and, like CMUCL, is largely in the public domain, except a +few sections licensed under Berkeley Software Distribution (BSD) style +licenses. CMUCL itself is another fine choice, though SBCL tends to be +easier to install and now supports 21-bit Unicode (4) . For OS X users, +OpenMCL is an excellent choice-it compiles to machine code, supports +threads, and has quite good integration with OS X's Carbon and Cocoa +toolkits. Other open-source and commercial implementations are +available. See Chapter 32 for resources from which you can get more +information. + +All the Lisp code in this book should work in any conforming Common Lisp +implementation unless otherwise noted, and SLIME will smooth out some of +the differences between implementations by providing us with a common +interface for interacting with Lisp. The output shown in this book is +from Allegro running on GNU/Linux; in some cases, other Lisp's may +generate slightly different error messages or debugger output. + + ---------- Footnotes ---------- + + (1) Practically speaking, there's very little likelihood of the +language standard itself being revised-while there are a small handful +of warts that folks might like to clean up, the ANSI process isn't +amenable to opening an existing standard for minor tweaks, and none of +the warts that might be cleaned up actually cause anyone any serious +difficulty. The future of Common Lisp standardization is likely to +proceed via de facto standards, much like the "standardization" of Perl +and Python-as different implementers experiment with application +programming interfaces (APIs) and libraries for doing things not +specified in the language standard, other implementers may adopt them or +people will develop portability libraries to smooth over the differences +between implementations for features not specified in the language +standard. + + (2) Steel Bank Common Lisp + + (3) CMU Common Lisp + + (4) SBCL forked from CMUCL in order to focus on cleaning up the +internals and making it easier to maintain. But the fork has been +amiable; bug fixes tend to propagate between the two projects, and +there's talk that someday they will merge back together. + + +File: pcl.info, Node: 2-2, Next: 2-3, Prev: 2-1, Up: Chapter 2 + +Getting Up and Running with Lisp in a Box +========================================= + +Since the Lisp in a Box packaging is designed to get new Lispers up and +running in a first-rate Lisp development environment with minimum +hassle, all you need to do to get it running is to grab the appropriate +package for your operating system and the preferred Lisp from the Lisp +in a Box Web site listed in Chapter 32 and then follow the installation +instructions. + +Since Lisp in a Box uses Emacs as its editor, you'll need to know at +least a bit about how to use it. Perhaps the best way to get started +with Emacs is to work through its built-in tutorial. To start the +tutorial, select the first item of the Help menu, Emacs tutorial. Or +press the Ctrl key, type h, release the Ctrl key, and then press t. +Most Emacs commands are accessible via such key combinations; because +key combinations are so common, Emacs users have a notation for +describing key combinations that avoids having to constantly write out +combinations such as "Press the Ctrl key, type h, release the Ctrl key, +and then press t." Keys to be pressed together-a so-called key +chord-are written together and separated by a hyphen. Keys, or key +chords, to be pressed in sequence are separated by spaces. In a key +chord, C represents the Ctrl key and M represents the Meta key (also +known as Alt). Thus, we could write the key combination we just +described that starts the tutorial like so: C-h t. + +The tutorial describes other useful commands and the key combinations +that invoke them. Emacs also comes with extensive online documentation +using its own built-in hypertext documentation browser, Info. To read +the manual, type C-h i. The Info system comes with its own tutorial, +accessible simply by pressing h while reading the manual. Finally, +Emacs provides quite a few ways to get help, all bound to key combos +starting with C-h. Typing C-h ? brings up a complete list. Two of the +most useful, besides the tutorial, are C-h k, which lets us type any key +combo and tells us what command it invokes, and C-h w, which lets us +enter the name of a command and tells us what key combination invokes +it. + +The other crucial bit of Emacs terminology, for folks who refuse to work +through the tutorial, is the notion of a ‘buffer’. While working in +Emacs, each file you edit will be represented by a different buffer, +only one of which is "current" at any given time. The current buffer +receives all input-whatever you type and any commands you invoke. +Buffers are also used to represent interactions with programs such as +Common Lisp. Thus, one common action you'll take is to "switch +buffers," which means to make a different buffer the current buffer so +you can edit a particular file or interact with a particular program. +The command ‘switch-to-buffer’, bound to the key combination C-x b, +prompts for the name of a buffer in the area at the bottom of the Emacs +frame. When entering a buffer name, hitting Tab will complete the name +based on the characters typed so far or will show a list of possible +completions. The prompt also suggests a default buffer, which you can +accept just by hitting Return. You can also switch buffers by selecting +a buffer from the Buffers menu. + +In certain contexts, other key combinations may be available for +switching to certain buffers. For instance, when editing Lisp source +files, the key combo C-c C-z switches to the buffer where you interact +with Lisp. + + +File: pcl.info, Node: 2-3, Next: 2-4, Prev: 2-2, Up: Chapter 2 + +Free Your Mind: Interactive Programming +======================================= + +When you start Lisp in a Box, you should see a buffer containing a +prompt that looks like this: + + CL-USER> + +This is the Lisp prompt. Like a Unix or DOS shell prompt, the Lisp +prompt is a place where you can type expressions that will cause things +to happen. However, instead of reading and interpreting a line of shell +commands, Lisp reads Lisp expressions, evaluates them according to the +rules of Lisp, and prints the result. Then it does it again with the +next expression you type. That endless cycle of reading, evaluating, +and printing is why it's called the ‘read-eval-print loop’, or REPL +for short. It's also referred to as the ‘top-level’, the +‘top-level listener’, or the ‘Lisp listener’. + +From within the environment provided by the REPL, you can define and +redefine program elements such as variables, functions, classes, and +methods; evaluate any Lisp expression; load files containing Lisp source +code or compiled code; compile whole files or individual functions; +enter the debugger; step through code; and inspect the state of +individual Lisp objects. + +All those facilities are built into the language, accessible via +functions defined in the language standard. If you had to, you could +build a pretty reasonable programming environment out of just the REPL +and any text editor that knows how to properly indent Lisp code. But +for the true Lisp programming experience, you need an environment, such +as SLIME, that lets you interact with Lisp both via the REPL and while +editing source files. For instance, you don't want to have to cut and +paste a function definition from a source file to the REPL or have to +load a whole file just because you changed one function; your Lisp +environment should let us evaluate or compile both individual +expressions and whole files directly from your editor. + + +File: pcl.info, Node: 2-4, Next: 2-5, Prev: 2-3, Up: Chapter 2 + +Experimenting in the REPL +========================= + +To try the REPL, you need a Lisp expression that can be read, evaluated, +and printed. One of the simplest kinds of Lisp expressions is a number. +At the Lisp prompt, you can type 10 followed by Return and should see +something like this: + + CL-USER> 10 + 10 + +The first 10 is the one you typed. The Lisp reader, the R in REPL, +reads the text "10" and creates a Lisp object representing the number +10. This object is a ‘self-evaluating object’, which means that +when given to the evaluator, the E in REPL, it evaluates to itself. +This value is then given to the printer, which prints the 10 on the line +by itself. While that may seem like a lot of work just to get back to +where you started, things get a bit more interesting when you give Lisp +something meatier to chew on. For instance, you can type (+ 2 3) at the +Lisp prompt. + + CL-USER> (+ 2 3) + 5 + +Anything in parentheses is a list, in this case a list of three +elements, the symbol +, and the numbers 2 and 3. Lisp, in general, +evaluates lists by treating the first element as the name of a function +and the rest of the elements as expressions to be evaluated to yield the +arguments to the function. In this case, the symbol + names a function +that performs addition. 2 and 3 evaluate to themselves and are then +passed to the addition function, which returns 5. The value 5 is passed +to the printer, which prints it. Lisp can evaluate a list expression in +other ways, but we needn't get into them right away. First we have to +write. . . + + +File: pcl.info, Node: 2-5, Next: 2-6, Prev: 2-4, Up: Chapter 2 + +"Hello, World," Lisp Style +========================== + +No programming book is complete without a "hello, world" (1) program. +As it turns out, it's trivially easy to get the REPL to print "hello, +world." + + CL-USER> "hello, world" + "hello, world" + +This works because strings, like numbers, have a literal syntax that's +understood by the Lisp reader and are self-evaluating objects: Lisp +reads the double-quoted string and instantiates a string object in +memory that, when evaluated, evaluates to itself and is then printed in +the same literal syntax. The quotation marks aren't part of the string +object in memory-they're just the syntax that tells the reader to read a +string. The printer puts them back on when it prints the string because +it tries to print objects in the same syntax the reader understands. + +However, this may not really qualify as a "hello, world" ‘program’. +It's more like the "hello, world" ‘value’. + +You can take a step toward a real program by writing some code that as a +side effect prints the string "hello, world" to standard output. Common +Lisp provides a couple ways to emit output, but the most flexible is the +‘FORMAT’ function. ‘FORMAT’ takes a variable number of +arguments, but the only two required arguments are the place to send the +output and a string. You'll see in the next chapter how the string can +contain embedded directives that allow you to interpolate subsequent +arguments into the string, à la ‘printf’ or Python's string-%. As +long as the string doesn't contain an ~, it will be emitted as-is. If +you pass t as its first argument, it sends its output to standard +output. So a ‘FORMAT’ expression that will print "hello, world" +looks like this (2) : + + CL-USER> (format t "hello, world") + hello, world + NIL + +One thing to note about the result of the ‘FORMAT’ expression is the +‘NIL’ on the line after the "hello, world" output. That ‘NIL’ +is the result of evaluating the ‘FORMAT’ expression, printed by the +REPL. (‘NIL’ is Lisp's version of false and/or null. More on that +in Chapter 4.) Unlike the other expressions we've seen so far, a +‘FORMAT’ expression is more interesting for its side effect-printing +to standard output in this case-than for its return value. But every +expression in Lisp evaluates to some result (3) . + +However, it's still arguable whether you've yet written a true +"program." But you're getting there. And you're seeing the bottom-up +style of programming supported by the REPL: you can experiment with +different approaches and build a solution from parts you've already +tested. Now that you have a simple expression that does what you want, +you just need to package it in a function. Functions are one of the +basic program building blocks in Lisp and can be defined with a +‘DEFUN’ expression such as this: + + CL-USER> (defun hello-world () (format t "hello, world")) + HELLO-WORLD + +The ‘hello-world’ after the ‘DEFUN’ is the name of the function. +In Chapter 4 we'll look at exactly what characters can be used in a +name, but for now suffice it to say that lots of characters, such as -, +that are illegal in names in other languages are legal in Common Lisp. +It's standard Lisp style-not to mention more in line with normal English +typography-to form compound names with hyphens, such as +‘hello-world’, rather than with underscores, as in +‘hello_world’, or with inner caps such as ‘helloWorld’. The ()s +after the name delimit the parameter list, which is empty in this case +because the function takes no arguments. The rest is the body of the +function. + +At one level, this expression, like all the others you've seen, is just +another expression to be read, evaluated, and printed by the REPL. The +return value in this case is the name of the function you just defined +(4) . But like the ‘FORMAT’ expression, this expression is more +interesting for the side effects it has than for its return value. +Unlike the ‘FORMAT’ expression, however, the side effects are +invisible: when this expression is evaluated, a new function that takes +no arguments and with the body (format t "hello, world") is created and +given the name HELLO-WORLD. + +Once you've defined the function, you can call it like this: + + CL-USER> (hello-world) + hello, world + NIL + +You can see that the output is just the same as when you evaluated the +‘FORMAT’ expression directly, including the ‘NIL’ value printed +by the REPL. Functions in Common Lisp automatically return the value of +the last expression evaluated. + + ---------- Footnotes ---------- + + (1) The venerable "hello, world" predates even the classic Kernighan +and Ritchie C book that played a big role in its popularization. The +original "hello, world" seems to have come from Brian Kernighan's "A +Tutorial Introduction to the Language B" that was part of the ‘Bell +Laboratories Computing Science Technical Report #8: The Programming +Language B’ published in January 1973. (It's available online at +http://cm.bell-labs.com/cm/cs/who/dmr/bintro.html.) + + (2) These are some other expressions that also print the string +"hello, world": + + (3) Well, as you'll see when I discuss returning multiple values, +it's technically possible to write expressions that evaluate to no +value, but even such expressions are treated as returning NIL when +evaluated in a context that expects a value. + + (4) I'll discuss in Chapter 4 why the name has been converted to all +uppercase. + + +File: pcl.info, Node: 2-6, Next: Chapter 3, Prev: 2-5, Up: Chapter 2 + +Saving Your Work +================ + +You could argue that this is a complete "hello, world" program of sorts. +However, it still has a problem. If you exit Lisp and restart, the +function definition will be gone. Having written such a fine function, +you'll want to save your work. + +Easy enough. You just need to create a file in which to save the +definition. In Emacs you can create a new file by typing C-x C-f and +then, when Emacs prompts you, entering the name of the file you want to +create. It doesn't matter particularly where you put the file. It's +customary to name Common Lisp source files with a .lisp extension, +though some folks use .cl instead. + +Once you've created the file, you can type the definition you previously +entered at the REPL. Some things to note are that after you type the +opening parenthesis and the word ‘DEFUN’, at the bottom of the Emacs +window, SLIME will tell you the arguments expected. The exact form will +depend somewhat on what Common Lisp implementation you're using, but +it'll probably look something like this: + + (defun name varlist &rest body) + +The message will disappear as you start to type each new element but +will reappear each time you enter a space. When you're entering the +definition in the file, you might choose to break the definition across +two lines after the parameter list. If you hit Return and then Tab, +SLIME will automatically indent the second line appropriately, like this +(1) : + + (defun hello-world () + (format t "hello, world")) + +SLIME will also help match up the parentheses-as you type a closing +parenthesis, it will flash the corresponding opening parenthesis. Or +you can just type C-c C-q to invoke the command +‘slime-close-parens-at-point’, which will insert as many closing +parentheses as necessary to match all the currently open parentheses. + +Now you can get this definition into your Lisp environment in several +ways. The easiest is to type C-c C-c with the cursor anywhere in or +immediately after the ‘DEFUN’ form, which runs the command +‘slime-compile-defun’, which in turn sends the definition to Lisp to +be evaluated and compiled. To make sure this is working, you can make +some change to hello-world, recompile it, and then go back to the REPL, +using C-c C-z or C-x b, and call it again. For instance, you could make +it a bit more grammatical. + + (defun hello-world () + (format t "Hello, world!")) + +Next, recompile with C-c C-c and then type C-c C-z to switch to the REPL +to try the new version. + + CL-USER> (hello-world) + Hello, world! + NIL + +You'll also probably want to save the file you've been working on; in +the hello.lisp buffer, type C-x C-s to invoke the Emacs command +‘save-buffer’. + +Now to try reloading this function from the source file, you'll need to +quit Lisp and restart. To quit you can use a SLIME shortcut: at the +REPL, type a comma. At the bottom of the Emacs window, you will be +prompted for a command. Type ‘quit’ (or ‘sayoonara’), and then +hit Enter. This will quit Lisp and close all the buffers created by +SLIME such as the REPL buffer (2) . Now restart SLIME by typing ‘M-x +slime’. + +Just for grins, you can try to invoke ‘hello-world’. + + CL-USER> (hello-world) + +At that point SLIME will pop up a new buffer that starts with something +that looks like this: + + attempt to call `HELLO-WORLD' which is an undefined function. + [Condition of type UNDEFINED-FUNCTION] + + Restarts: + 0: [TRY-AGAIN] Try calling HELLO-WORLD again. + 1: [RETURN-VALUE] Return a value instead of calling HELLO-WORLD. + 2: [USE-VALUE] Try calling a function other than HELLO-WORLD. + 3: [STORE-VALUE] Setf the symbol-function of HELLO-WORLD and call it again. + 4: [ABORT] Abort handling SLIME request. + 5: [ABORT] Abort entirely from this process. + + Backtrace: + 0: (SWANK::DEBUG-IN-EMACS #<UNDEFINED-FUNCTION #x716b082a>) + 1: ((FLET SWANK:SWANK-DEBUGGER-HOOK SWANK::DEBUG-IT)) + 2: (SWANK:SWANK-DEBUGGER-HOOK #<UNDEFINED-FUNCTION #x716b082a> #<Function SWANK-DEBUGGER-HOOK>) + 3: (ERROR #<UNDEFINED-FUNCTION #x716b082a>) + 4: (EVAL (HELLO-WORLD)) + 5: (SWANK::EVAL-REGION "(hello-world) + " T) + +Blammo! What happened? Well, you tried to invoke a function that +doesn't exist. But despite the burst of output, Lisp is actually +handling this situation gracefully. Unlike Java or Python, Common Lisp +doesn't just bail-throwing an exception and unwinding the stack. And it +definitely doesn't dump core just because you tried to invoke a missing +function. Instead Lisp drops you into the debugger. + +While you're in the debugger you still have full access to Lisp, so you +can evaluate expressions to examine the state of our program and maybe +even fix things. For now don't worry about that; just type q to exit +the debugger and get back to the REPL. The debugger buffer will go away, +and the REPL will show this: + + CL-USER> (hello-world) + ; Evaluation aborted + CL-USER> + +There's obviously more that can be done from within the debugger than +just abort-we'll see, for instance, in Chapter 19 how the debugger +integrates with the error handling system. For now, however, the +important thing to know is that you can always get out of it, and back +to the REPL, by typing ‘q’. + +Back at the REPL you can try again. Things blew up because Lisp didn't +know the definition of hello-world. So you need to let Lisp know about +the definition we saved in the file hello.lisp. You have several ways +you could do this. You could switch back to the buffer containing the +file (type C-x b and then enter hello.lisp when prompted) and recompile +the definition as you did before with C-c C-c. Or you can load the +whole file, which would be a more convenient approach if the file +contained a bunch of definitions, using the LOAD function at the REPL +like this: + + CL-USER> (load "hello.lisp") + ; Loading /home/peter/my-lisp-programs/hello.lisp + T + +The ‘T’ means everything loaded correctly (3) . Loading a file with +LOAD is essentially equivalent to typing each of the expressions in the +file at the REPL in the order they appear in the file, so after the call +to ‘LOAD’, hello-world should be defined: + + CL-USER> (hello-world) + Hello, world! + NIL + +Another way to load a file's worth of definitions is to compile the file +first with ‘COMPILE-FILE’ and then ‘LOAD’ the resulting compiled +file, called a ‘FASL file’, which is short for ‘fast-load file’. +‘COMPILE-FILE’ returns the name of the FASL file, so we can compile +and load from the REPL like this: + + CL-USER> (load (compile-file "hello.lisp")) + ;;; Compiling file hello.lisp + ;;; Writing fasl file hello.fasl + ;;; Fasl write complete + v; Fast loading /home/peter/my-lisp-programs/hello.fasl + T + +SLIME also provides support for loading and compiling files without +using the REPL. When you're in a source code buffer, you can use C-c C-l +to load the file with ‘slime-load-file’. Emacs will prompt for the +name of a file to load with the name of the current file already filled +in; you can just hit Enter. Or you can type C-c C-k to compile and load +the file represented by the current buffer. In some Common Lisp +implementations, compiling code this way will make it quite a bit +faster; in others, it won't, typically because they always compile +everything. + +This should be enough to give you a flavor of how Lisp programming +works. Of course I haven't covered all the tricks and techniques yet, +but you've seen the essential elements-interacting with the REPL trying +things out, loading and testing new code, tweaking and debugging. +Serious Lisp hackers often keep a Lisp image running for days on end, +adding, redefining, and testing bits of their program incrementally. + +Also, even when the Lisp app is deployed, there's often still a way to +get to a REPL. You'll see in Chapter 26 how you can use the REPL and +SLIME to interact with the Lisp that's running a Web server at the same +time as it's serving up Web pages. It's even possible to use SLIME to +connect to a Lisp running on a different machine, allowing you-for +instance-to debug a remote server just like a local one. + +An even more impressive instance of remote debugging occurred on NASA's +1998 Deep Space 1 mission. A half year after the space craft launched, +a bit of Lisp code was going to control the spacecraft for two days +while conducting a sequence of experiments. Unfortunately, a subtle +race condition in the code had escaped detection during ground testing +and was already in space. When the bug manifested in the wild-100 +million miles away from Earth-the team was able to diagnose and fix the +running code, allowing the experiments to complete (4) . One of the +programmers described it as follows: + + Debugging a program running on a $100M piece of hardware that is + 100 million miles away is an interesting experience. Having a + read-eval-print loop running on the spacecraft proved invaluable in + finding and fixing the problem. + +You're not quite ready to send any Lisp code into deep space, but in the +next chapter you'll take a crack at writing a program a bit more +interesting than "hello, world." + + ---------- Footnotes ---------- + + (1) You could also have entered the definition as two lines at the +REPL, as the REPL reads whole expressions, not lines. + + (2) SLIME shortcuts aren't part of Common Lisp-they're commands to +SLIME. + + (3) If for some reason the ‘LOAD’ doesn't go cleanly, you'll get +another error and drop back into the debugger. If this happens, the +most likely reason is that Lisp can't find the file, probably because +its idea of the current working directory isn't the same as where the +file is located. In that case, you can quit the debugger by typing q +and then use the SLIME shortcut cd to change Lisp's idea of the current +directory-type a comma and then cd when prompted for a command and then +the name of the directory where hello.lisp was saved. + + (4) http://www.flownet.com/gat/jpl-lisp.html + + +File: pcl.info, Node: Chapter 3, Next: Chapter 4, Prev: Chapter 2, Up: Top + +3. Practical: A Simple Database +=============================== + +Obviously, before you can start building real software in Lisp, you'll +have to learn the language. But let's face it-you may be thinking, +"'Practical Common Lisp,' isn't that an oxymoron? Why should you be +expected to bother learning all the details of a language unless it's +actually good for something you care about?" So I'll start by giving +you a small example of what you can do with Common Lisp. In this +chapter you'll write a simple database for keeping track of CDs. You'll +use similar techniques in Chapter 27 when you build a database of MP3s +for our streaming MP3 server. In fact, you could think of this as part +of the MP3 software project-after all, in order to have a bunch of MP3s +to listen to, it might be helpful to be able to keep track of which CDs +you have and which ones you need to rip. + + In this chapter, I'll cover just enough Lisp as we go along for you +to understand how the code works. But I'll gloss over quite a few +details. For now you needn't sweat the small stuff-the next several +chapters will cover all the Common Lisp constructs used here, and more, +in a much more systematic way. + + One terminology note: I'll discuss a handful of Lisp operators in +this chapter. In Chapter 4, you'll learn that Common Lisp provides +three distinct kinds of operators: functions, macros, and special +operators. For the purposes of this chapter, you don't really need to +know the difference. I will, however, refer to different operators as +functions or macros or special operators as appropriate, rather than +trying to hide the details behind the word operator. For now you can +treat function, macro, and special operator as all more or less +equivalent. (1) + + Also, keep in mind that I won't bust out all the most sophisticated +Common Lisp techniques for your very first post-"hello, world" program. +The point of this chapter isn't that this is how you would write a +database in Lisp; rather, the point is for you to get an idea of what +programming in Lisp is like and to see how even a relatively simple Lisp +program can be quite featureful. + +* Menu: + +* 3-1:: CDs and Records +* 3-2:: Filing CDs +* 3-3:: Looking at the Database Contents +* 3-4:: Improving the User Interaction +* 3-5:: Saving and Loading the Database +* 3-6:: Querying the Database +* 3-7:: Updating Existing Records-Another Use for WHERE +* 3-8:: Removing Duplication and Winning Big +* 3-9:: Wrapping Up + + ---------- Footnotes ---------- + + (1) Before I proceed, however, it's crucially important that you +forget anything you may know about #define-style "macros" as implemented +in the C pre-processor. Lisp macros are a totally different beast. + + +File: pcl.info, Node: 3-1, Next: 3-2, Prev: Chapter 3, Up: Chapter 3 + +CDs and Records +=============== + +To keep track of CDs that need to be ripped to MP3s and which CDs should +be ripped first, each record in the database will contain the title and +artist of the CD, a rating of how much the user likes it, and a flag +saying whether it has been ripped. So, to start with, you'll need a way +to represent a single database record (in other words, one CD). Common +Lisp gives you lots of choices of data structures from a simple +four-item list to a user-defined class, using the Common Lisp Object +System (CLOS). + + For now you can stay at the simple end of the spectrum and use a +list. You can make a list with the LIST function, which, appropriately +enough, returns a list of its arguments. + + CL-USER> (list 1 2 3) + (1 2 3) + + You could use a four-item list, mapping a given position in the list +to a given field in the record. However, another flavor of list-called +a property list, or plist for short-is even more convenient. A plist is +a list where every other element, starting with the first, is a symbol +that describes what the next element in the list is. I won't get into +all the details of exactly what a symbol is right now; basically it's a +name. For the symbols that name the fields in the CD database, you can +use a particular kind of symbol, called a keyword symbol. A keyword is +any name that starts with a colon (:), for instance, :foo. Here's an +example of a plist using the keyword symbols :a, :b, and :c as property +names: + + CL-USER> (list :a 1 :b 2 :c 3) + (:A 1 :B 2 :C 3) + + Note that you can create a property list with the same LIST function +as you use to create other lists; it's the contents that make it a +plist. + + The thing that makes plists a convenient way to represent the records +in a database is the function GETF, which takes a plist and a symbol and +returns the value in the plist following the symbol, making a plist a +sort of poor man's hash table. Lisp has real hash tables too, but +plists are sufficient for your needs here and can more easily be saved +to a file, which will come in handy later. + + CL-USER> (getf (list :a 1 :b 2 :c 3) :a) + 1 + CL-USER> (getf (list :a 1 :b 2 :c 3) :c) + 3 + + Given all that, you can easily enough write a function make-cd that +will take the four fields as arguments and return a plist representing +that CD. + + (defun make-cd (title artist rating ripped) + (list :title title :artist artist :rating rating :ripped ripped)) + + The word DEFUN tells us that this form is defining a new function. +The name of the function is make-cd. After the name comes the parameter +list. This function has four parameters: title, artist, rating, and +ripped. Everything after the parameter list is the body of the +function. In this case the body is just one form, a call to LIST. When +make-cd is called, the arguments passed to the call will be bound to the +variables in the parameter list. For instance, to make a record for the +CD Roses by Kathy Mattea, you might call make-cd like this: + + CL-USER> (make-cd "Roses" "Kathy Mattea" 7 t) + (:TITLE "Roses" :ARTIST "Kathy Mattea" :RATING 7 :RIPPED T) + + +File: pcl.info, Node: 3-2, Next: 3-3, Prev: 3-1, Up: Chapter 3 + +Filing CDs +========== + +A single record, however, does not a database make. You need some +larger construct to hold the records. Again, for simplicity's sake, a +list seems like a good choice. Also for simplicity you can use a global +variable, *db*, which you can define with the DEFVAR macro. The +asterisks (*) in the name are a Lisp naming convention for global +variables. (1) + + (defvar *db* nil) + + You can use the PUSH macro to add items to *db*. But it's probably a +good idea to abstract things a tiny bit, so you should define a function +add-record that adds a record to the database. + + (defun add-record (cd) (push cd *db*)) + + Now you can use add-record and make-cd together to add CDs to the +database. + + CL-USER> (add-record (make-cd "Roses" "Kathy Mattea" 7 t)) + ((:TITLE "Roses" :ARTIST "Kathy Mattea" :RATING 7 :RIPPED T)) + CL-USER> (add-record (make-cd "Fly" "Dixie Chicks" 8 t)) + ((:TITLE "Fly" :ARTIST "Dixie Chicks" :RATING 8 :RIPPED T) + (:TITLE "Roses" :ARTIST "Kathy Mattea" :RATING 7 :RIPPED T)) + CL-USER> (add-record (make-cd "Home" "Dixie Chicks" 9 t)) + ((:TITLE "Home" :ARTIST "Dixie Chicks" :RATING 9 :RIPPED T) + (:TITLE "Fly" :ARTIST "Dixie Chicks" :RATING 8 :RIPPED T) + (:TITLE "Roses" :ARTIST "Kathy Mattea" :RATING 7 :RIPPED T)) + + The stuff printed by the REPL after each call to add-record is the +return value, which is the value returned by the last expression in the +function body, the PUSH. And PUSH returns the new value of the variable +it's modifying. So what you're actually seeing is the value of the +database after the record has been added. + + ---------- Footnotes ---------- + + (1) Using a global variable also has some drawbacks-for instance, you +can have only one database at a time. In Chapter 27, with more of the +language under your belt, you'll be ready to build a more flexible +database. You'll also see, in Chapter 6, how even using a global +variable is more flexible in Common Lisp than it may be in other +languages. + + +File: pcl.info, Node: 3-3, Next: 3-4, Prev: 3-2, Up: Chapter 3 + +Looking at the Database Contents +================================ + +You can also see the current value of *db* whenever you want by typing +*db* at the REPL. + + CL-USER> *db* + ((:TITLE "Home" :ARTIST "Dixie Chicks" :RATING 9 :RIPPED T) + (:TITLE "Fly" :ARTIST "Dixie Chicks" :RATING 8 :RIPPED T) + (:TITLE "Roses" :ARTIST "Kathy Mattea" :RATING 7 :RIPPED T)) + + However, that's not a very satisfying way of looking at the output. +You can write a dump-db function that dumps out the database in a more +human-readable format, like this: + + TITLE: Home + ARTIST: Dixie Chicks + RATING: 9 + RIPPED: T + + TITLE: Fly + ARTIST: Dixie Chicks + RATING: 8 + RIPPED: T + + TITLE: Roses + ARTIST: Kathy Mattea + RATING: 7 + RIPPED: T + + The function looks like this: + + (defun dump-db () + (dolist (cd *db*) + (format t "~{~a:~10t~a~%~}~%" cd))) + + This function works by looping over all the elements of *db* with the +DOLIST macro, binding each element to the variable cd in turn. For each +value of cd, you use the FORMAT function to print it. + + Admittedly, the FORMAT call is a little cryptic. However, FORMAT +isn't particularly more complicated than C or Perl's printf function or +Python's string-% operator. In Chapter 18 I'll discuss FORMAT in +greater detail. For now we can take this call bit by bit. As you saw +in Chapter 2, FORMAT takes at least two arguments, the first being the +stream where it sends its output; t is shorthand for the stream +*standard-output*. + + The second argument to FORMAT is a format string that can contain +both literal text and directives telling FORMAT things such as how to +interpolate the rest of its arguments. Format directives start with ~ +(much the way printf's directives start with %). FORMAT understands +dozens of directives, each with their own set of options. (1) However, +for now I'll just focus on the ones you need to write dump-db. + + The ~a directive is the aesthetic directive; it means to consume one +argument and output it in a human-readable form. This will render +keywords without the leading : and strings without quotation marks. For +instance: + + CL-USER> (format t "~a" "Dixie Chicks") + Dixie Chicks + NIL + + or: + + CL-USER> (format t "~a" :title) + TITLE + NIL + + The ~t directive is for tabulating. The ~10t tells FORMAT to emit +enough spaces to move to the tenth column before processing the next ~a. +A ~t doesn't consume any arguments. + + CL-USER> (format t "~a:~10t~a" :artist "Dixie Chicks") + ARTIST: Dixie Chicks + NIL + + Now things get slightly more complicated. When FORMAT sees ~{ the +next argument to be consumed must be a list. FORMAT loops over that +list, processing the directives between the ~{ and ~}, consuming as many +elements of the list as needed each time through the list. In dump-db, +the FORMAT loop will consume one keyword and one value from the list +each time through the loop. The ~% directive doesn't consume any +arguments but tells FORMAT to emit a newline. Then after the ~} ends +the loop, the last ~% tells FORMAT to emit one more newline to put a +blank line between each CD. + + Technically, you could have also used FORMAT to loop over the +database itself, turning our dump-db function into a one-liner. + + (defun dump-db () + (format t "~{~{~a:~10t~a~%~}~%~}" *db*)) + + That's either very cool or very scary depending on your point of +view. + + ---------- Footnotes ---------- + + (1) One of the coolest FORMAT directives is the ~R directive. Ever +want to know how to say a really big number in English words? Lisp +knows. Evaluate this: + + (format nil "~r" 1606938044258990275541962092) + + and you should get back (wrapped for legibility): + + +File: pcl.info, Node: 3-4, Next: 3-5, Prev: 3-3, Up: Chapter 3 + +Improving the User Interaction +============================== + +While our add-record function works fine for adding records, it's a bit +Lispy for the casual user. And if they want to add a bunch of records, +it's not very convenient. So you may want to write a function to prompt +the user for information about a set of CDs. Right away you know you'll +need some way to prompt the user for a piece of information and read it. +So let's write that. + + (defun prompt-read (prompt) + (format *query-io* "~a: " prompt) + (force-output *query-io*) + (read-line *query-io*)) + + You use your old friend FORMAT to emit a prompt. Note that there's +no ~% in the format string, so the cursor will stay on the same line. +The call to FORCE-OUTPUT is necessary in some implementations to ensure +that Lisp doesn't wait for a newline before it prints the prompt. + + Then you can read a single line of text with the aptly named +READ-LINE function. The variable *query-io* is a global variable (which +you can tell because of the * naming convention for global variables) +that contains the input stream connected to the terminal. The return +value of prompt-read will be the value of the last form, the call to +READ-LINE, which returns the string it read (without the trailing +newline.) + + You can combine your existing make-cd function with prompt-read to +build a function that makes a new CD record from data it gets by +prompting for each value in turn. + + (defun prompt-for-cd () + (make-cd + (prompt-read "Title") + (prompt-read "Artist") + (prompt-read "Rating") + (prompt-read "Ripped [y/n]"))) + + That's almost right. Except prompt-read returns a string, which, +while fine for the Title and Artist fields, isn't so great for the +Rating and Ripped fields, which should be a number and a boolean. +Depending on how sophisticated a user interface you want, you can go to +arbitrary lengths to validate the data the user enters. For now let's +lean toward the quick and dirty: you can wrap the prompt-read for the +rating in a call to Lisp's PARSE-INTEGER function, like this: + + (parse-integer (prompt-read "Rating")) + + Unfortunately, the default behavior of PARSE-INTEGER is to signal an +error if it can't parse an integer out of the string or if there's any +non-numeric junk in the string. However, it takes an optional keyword +argument :junk-allowed, which tells it to relax a bit. + + (parse-integer (prompt-read "Rating") :junk-allowed t) + + But there's still one problem: if it can't find an integer amidst all +the junk, PARSE-INTEGER will return NIL rather than a number. In +keeping with the quick-and-dirty approach, you may just want to call +that 0 and continue. Lisp's OR macro is just the thing you need here. +It's similar to the "short-circuiting" || in Perl, Python, Java, and C; +it takes a series of expressions, evaluates them one at a time, and +returns the first non-nil value (or NIL if they're all NIL). So you can +use the following: + + (or (parse-integer (prompt-read "Rating") :junk-allowed t) 0) + + to get a default value of 0. + + Fixing the code to prompt for Ripped is quite a bit simpler. You can +just use the Common Lisp function Y-OR-N-P. + + (y-or-n-p "Ripped [y/n]: ") + + In fact, this will be the most robust part of prompt-for-cd, as +Y-OR-N-P will reprompt the user if they enter something that doesn't +start with y, Y, n, or N. + + Putting those pieces together you get a reasonably robust +prompt-for-cd function. + + (defun prompt-for-cd () + (make-cd + (prompt-read "Title") + (prompt-read "Artist") + (or (parse-integer (prompt-read "Rating") :junk-allowed t) 0) + (y-or-n-p "Ripped [y/n]: "))) + + Finally, you can finish the "add a bunch of CDs" interface by +wrapping prompt-for-cd in a function that loops until the user is done. +You can use the simple form of the LOOP macro, which repeatedly executes +a body of expressions until it's exited by a call to RETURN. For +example: + + (defun add-cds () + (loop (add-record (prompt-for-cd)) + (if (not (y-or-n-p "Another? [y/n]: ")) (return)))) + + Now you can use add-cds to add some more CDs to the database. + + CL-USER> (add-cds) + Title: Rockin' the Suburbs + Artist: Ben Folds + Rating: 6 + Ripped [y/n]: y + Another? [y/n]: y + Title: Give Us a Break + Artist: Limpopo + Rating: 10 + Ripped [y/n]: y + Another? [y/n]: y + Title: Lyle Lovett + Artist: Lyle Lovett + Rating: 9 + Ripped [y/n]: y + Another? [y/n]: n + NIL + + +File: pcl.info, Node: 3-5, Next: 3-6, Prev: 3-4, Up: Chapter 3 + +Saving and Loading the Database +=============================== + +Having a convenient way to add records to the database is nice. But +it's not so nice that the user is going to be very happy if they have to +reenter all the records every time they quit and restart Lisp. Luckily, +with the data structures you're using to represent the data, it's +trivially easy to save the data to a file and reload it later. Here's a +save-db function that takes a filename as an argument and saves the +current state of the database: + + (defun save-db (filename) + (with-open-file (out filename + :direction :output + :if-exists :supersede) + (with-standard-io-syntax + (print *db* out)))) + + The WITH-OPEN-FILE macro opens a file, binds the stream to a +variable, executes a set of expressions, and then closes the file. It +also makes sure the file is closed even if something goes wrong while +evaluating the body. The list directly after WITH-OPEN-FILE isn't a +function call but rather part of the syntax defined by WITH-OPEN-FILE. +It contains the name of the variable that will hold the file stream to +which you'll write within the body of WITH-OPEN-FILE, a value that must +be a file name, and then some options that control how the file is +opened. Here you specify that you're opening the file for writing with +:direction :output and that you want to overwrite an existing file of +the same name if it exists with :if-exists :supersede. + + Once you have the file open, all you have to do is print the contents +of the database with (print *db* out). Unlike FORMAT, PRINT prints Lisp +objects in a form that can be read back in by the Lisp reader. The +macro WITH-STANDARD-IO-SYNTAX ensures that certain variables that affect +the behavior of PRINT are set to their standard values. You'll use the +same macro when you read the data back in to make sure the Lisp reader +and printer are operating compatibly. + + The argument to save-db should be a string containing the name of the +file where the user wants to save the database. The exact form of the +string will depend on what operating system they're using. For +instance, on a Unix box they should be able to call save-db like this: + + CL-USER> (save-db "~/my-cds.db") + ((:TITLE "Lyle Lovett" :ARTIST "Lyle Lovett" :RATING 9 :RIPPED T) + (:TITLE "Give Us a Break" :ARTIST "Limpopo" :RATING 10 :RIPPED T) + (:TITLE "Rockin' the Suburbs" :ARTIST "Ben Folds" :RATING 6 :RIPPED + T) + (:TITLE "Home" :ARTIST "Dixie Chicks" :RATING 9 :RIPPED T) + (:TITLE "Fly" :ARTIST "Dixie Chicks" :RATING 8 :RIPPED T) + (:TITLE "Roses" :ARTIST "Kathy Mattea" :RATING 9 :RIPPED T)) + + On Windows, the filename might be something like "c:/my-cds.db" or +"c:\\my-cds.db." (1) + + You can open this file in any text editor to see what it looks like. +You should see something a lot like what the REPL prints if you type +*db*. + + The function to load the database back in is similar. + + (defun load-db (filename) + (with-open-file (in filename) + (with-standard-io-syntax + (setf *db* (read in))))) + + This time you don't need to specify :direction in the options to +WITH-OPEN-FILE, since you want the default of :input. And instead of +printing, you use the function READ to read from the stream in. This is +the same reader used by the REPL and can read any Lisp expression you +could type at the REPL prompt. However, in this case, you're just +reading and saving the expression, not evaluating it. Again, the +WITH-STANDARD-IO-SYNTAX macro ensures that READ is using the same basic +syntax that save-db did when it PRINTed the data. + + The SETF macro is Common Lisp's main assignment operator. It sets +its first argument to the result of evaluating its second argument. So +in load-db the *db* variable will contain the object read from the file, +namely, the list of lists written by save-db. You do need to be careful +about one thing-load-db clobbers whatever was in *db* before the call. +So if you've added records with add-record or add-cds that haven't been +saved with save-db, you'll lose them. + + ---------- Footnotes ---------- + + (1) Windows actually understands forward slashes in filenames even +though it normally uses a backslash as the directory separator. This is +convenient since otherwise you have to write double backslashes because +backslash is the escape character in Lisp strings. + + +File: pcl.info, Node: 3-6, Next: 3-7, Prev: 3-5, Up: Chapter 3 + +Querying the Database +===================== + +Now that you have a way to save and reload the database to go along with +a convenient user interface for adding new records, you soon may have +enough records that you won't want to be dumping out the whole database +just to look at what's in it. What you need is a way to query the +database. You might like, for instance, to be able to write something +like this: + + (select :artist "Dixie Chicks") + + and get a list of all the records where the artist is the Dixie +Chicks. Again, it turns out that the choice of saving the records in a +list will pay off. + + The function REMOVE-IF-NOT takes a predicate and a list and returns a +list containing only the elements of the original list that match the +predicate. In other words, it has removed all the elements that don't +match the predicate. However, REMOVE-IF-NOT doesn't really remove +anything-it creates a new list, leaving the original list untouched. +It's like running grep over a file. The predicate argument can be any +function that accepts a single argument and returns a boolean value-NIL +for false and anything else for true. + + For instance, if you wanted to extract all the even elements from a +list of numbers, you could use REMOVE-IF-NOT as follows: + + CL-USER> (remove-if-not #'evenp '(1 2 3 4 5 6 7 8 9 10)) + (2 4 6 8 10) + + In this case, the predicate is the function EVENP, which returns true +if its argument is an even number. The funny notation #' is shorthand +for "Get me the function with the following name." Without the #', Lisp +would treat evenp as the name of a variable and look up the value of the +variable, not the function. + + You can also pass REMOVE-IF-NOT an anonymous function. For instance, +if EVENP didn't exist, you could write the previous expression as the +following: + + CL-USER> (remove-if-not #'(lambda (x) (= 0 (mod x 2))) '(1 2 3 4 5 6 7 8 9 10)) + (2 4 6 8 10) + + In this case, the predicate is this anonymous function: + + (lambda (x) (= 0 (mod x 2))) + + which checks that its argument is equal to 0 modulus 2 (in other +words, is even). If you wanted to extract only the odd numbers using an +anonymous function, you'd write this: + + CL-USER> (remove-if-not #'(lambda (x) (= 1 (mod x 2))) '(1 2 3 4 5 6 7 8 9 10)) + (1 3 5 7 9) + + Note that lambda isn't the name of the function-it's the indicator +you're defining an anonymous function. (1) Other than the lack of a +name, however, a LAMBDA expression looks a lot like a DEFUN: the word +lambda is followed by a parameter list, which is followed by the body of +the function. + + To select all the Dixie Chicks' albums in the database using +REMOVE-IF-NOT, you need a function that returns true when the artist +field of a record is "Dixie Chicks". Remember that we chose the plist +representation for the database records because the function GETF can +extract named fields from a plist. So assuming cd is the name of a +variable holding a single database record, you can use the expression +(getf cd :artist) to extract the name of the artist. The function +EQUAL, when given string arguments, compares them character by +character. So (equal (getf cd :artist) "Dixie Chicks") will test +whether the artist field of a given CD is equal to "Dixie Chicks". All +you need to do is wrap that expression in a LAMBDA form to make an +anonymous function and pass it to REMOVE-IF-NOT. + + CL-USER> (remove-if-not + #'(lambda (cd) (equal (getf cd :artist) "Dixie Chicks")) *db*) + ((:TITLE "Home" :ARTIST "Dixie Chicks" :RATING 9 :RIPPED T) + (:TITLE "Fly" :ARTIST "Dixie Chicks" :RATING 8 :RIPPED T)) + + Now suppose you want to wrap that whole expression in a function that +takes the name of the artist as an argument. You can write that like +this: + + (defun select-by-artist (artist) + (remove-if-not + #'(lambda (cd) (equal (getf cd :artist) artist)) + *db*)) + + Note how the anonymous function, which contains code that won't run +until it's invoked in REMOVE-IF-NOT, can nonetheless refer to the +variable artist. In this case the anonymous function doesn't just save +you from having to write a regular function-it lets you write a function +that derives part of its meaning-the value of artist-from the context in +which it's embedded. + + So that's select-by-artist. However, selecting by artist is only one +of the kinds of queries you might like to support. You could write +several more functions, such as select-by-title, select-by-rating, +select-by-title-and-artist, and so on. But they'd all be about the same +except for the contents of the anonymous function. You can instead make +a more general select function that takes a function as an argument. + + (defun select (selector-fn) + (remove-if-not selector-fn *db*)) + + So what happened to the #'? Well, in this case you don't want +REMOVE-IF-NOT to use the function named selector-fn. You want it to use +the anonymous function that was passed as an argument to select in the +variable selector-fn. Though, the #' comes back in the call to select. + + CL-USER> (select #'(lambda (cd) (equal (getf cd :artist) "Dixie Chicks"))) + ((:TITLE "Home" :ARTIST "Dixie Chicks" :RATING 9 :RIPPED T) + (:TITLE "Fly" :ARTIST "Dixie Chicks" :RATING 8 :RIPPED T)) + + But that's really quite gross-looking. Luckily, you can wrap up the +creation of the anonymous function. + + (defun artist-selector (artist) + #'(lambda (cd) (equal (getf cd :artist) artist))) + + This is a function that returns a function and one that references a +variable that-it seems-won't exist after artist-selector returns. (2) +It may seem odd now, but it actually works just the way you'd want-if +you call artist-selector with an argument of "Dixie Chicks", you get an +anonymous function that matches CDs whose :artist field is "Dixie +Chicks", and if you call it with "Lyle Lovett", you get a different +function that will match against an :artist field of "Lyle Lovett". So +now you can rewrite the call to select like this: + + CL-USER> (select (artist-selector "Dixie Chicks")) + ((:TITLE "Home" :ARTIST "Dixie Chicks" :RATING 9 :RIPPED T) + (:TITLE "Fly" :ARTIST "Dixie Chicks" :RATING 8 :RIPPED T)) + + Now you just need some more functions to generate selectors. But +just as you don't want to have to write select-by-title, +select-by-rating, and so on, because they would all be quite similar, +you're not going to want to write a bunch of nearly identical +selector-function generators, one for each field. Why not write one +general-purpose selector-function generator, a function that, depending +on what arguments you pass it, will generate a selector function for +different fields or maybe even a combination of fields? You can write +such a function, but first you need a crash course in a feature called +keyword parameters. + + In the functions you've written so far, you've specified a simple +list of parameters, which are bound to the corresponding arguments in +the call to the function. For instance, the following function: + + (defun foo (a b c) (list a b c)) + + has three parameters, a, b, and c, and must be called with three +arguments. But sometimes you may want to write a function that can be +called with varying numbers of arguments. Keyword parameters are one +way to achieve this. A version of foo that uses keyword parameters +might look like this: + + (defun foo (&key a b c) (list a b c)) + + The only difference is the &key at the beginning of the argument +list. However, the calls to this new foo will look quite different. +These are all legal calls with the result to the right of the ==>: + + (foo :a 1 :b 2 :c 3) ==> (1 2 3) + (foo :c 3 :b 2 :a 1) ==> (1 2 3) + (foo :a 1 :c 3) ==> (1 NIL 3) + (foo) ==> (NIL NIL NIL) + + As these examples show, the value of the variables a, b, and c are +bound to the values that follow the corresponding keyword. And if a +particular keyword isn't present in the call, the corresponding variable +is set to NIL. I'm glossing over a bunch of details of how keyword +parameters are specified and how they relate to other kinds of +parameters, but you need to know one more detail. + + Normally if a function is called with no argument for a particular +keyword parameter, the parameter will have the value NIL. However, +sometimes you'll want to be able to distinguish between a NIL that was +explicitly passed as the argument to a keyword parameter and the default +value NIL. To allow this, when you specify a keyword parameter you can +replace the simple name with a list consisting of the name of the +parameter, a default value, and another parameter name, called a +supplied-p parameter. The supplied-p parameter will be set to true or +false depending on whether an argument was actually passed for that +keyword parameter in a particular call to the function. Here's a +version of foo that uses this feature: + + (defun foo (&key a (b 20) (c 30 c-p)) (list a b c c-p)) + + Now the same calls from earlier yield these results: + + (foo :a 1 :b 2 :c 3) ==> (1 2 3 T) + (foo :c 3 :b 2 :a 1) ==> (1 2 3 T) + (foo :a 1 :c 3) ==> (1 20 3 T) + (foo) ==> (NIL 20 30 NIL) + + The general selector-function generator, which you can call where for +reasons that will soon become apparent if you're familiar with SQL +databases, is a function that takes four keyword parameters +corresponding to the fields in our CD records and generates a selector +function that selects any CDs that match all the values given to where. +For instance, it will let you say things like this: + + (select (where :artist "Dixie Chicks")) + + or this: + + (select (where :rating 10 :ripped nil)) + + The function looks like this: + + (defun where (&key title artist rating (ripped nil ripped-p)) + #'(lambda (cd) + (and + (if title (equal (getf cd :title) title) t) + (if artist (equal (getf cd :artist) artist) t) + (if rating (equal (getf cd :rating) rating) t) + (if ripped-p (equal (getf cd :ripped) ripped) t)))) + + This function returns an anonymous function that returns the logical +AND of one clause per field in our CD records. Each clause checks if +the appropriate argument was passed in and then either compares it to +the value in the corresponding field in the CD record or returns t, +Lisp's version of truth, if the parameter wasn't passed in. Thus, the +selector function will return t only for CDs that match all the +arguments passed to where. (3) Note that you need to use a three-item +list to specify the keyword parameter ripped because you need to know +whether the caller actually passed :ripped nil, meaning, "Select CDs +whose ripped field is nil," or whether they left out :ripped altogether, +meaning "I don't care what the value of the ripped field is." + + ---------- Footnotes ---------- + + (1) The word lambda is used in Lisp because of an early connection to +the lambda calculus, a mathematical formalism invented for studying +mathematical functions. + + (2) The technical term for a function that references a variable in +its enclosing scope is a closure because the function "closes over" the +variable. I'll discuss closures in more detail in Chapter 6. + + (3) Note that in Lisp, an IF form, like everything else, is an +expression that returns a value. It's actually more like the ternary +operator (?:) in Perl, Java, and C in that this is legal in those +languages: + + some_var = some_boolean ? value1 : value2; + + while this isn't: + + some_var = if (some_boolean) value1; else value2; + + because in those languages, if is a statement, not an expression. + + +File: pcl.info, Node: 3-7, Next: 3-8, Prev: 3-6, Up: Chapter 3 + +Updating Existing Records-Another Use for WHERE +=============================================== + +Now that you've got nice generalized select and where functions, you're +in a good position to write the next feature that every database needs-a +way to update particular records. In SQL the update command is used to +update a set of records matching a particular where clause. That seems +like a good model, especially since you've already got a where-clause +generator. In fact, the update function is mostly just the application +of a few ideas you've already seen: using a passed-in selector function +to choose the records to update and using keyword arguments to specify +the values to change. The main new bit is the use of a function MAPCAR +that maps over a list, *db* in this case, and returns a new list +containing the results of calling a function on each item in the +original list. + + (defun update (selector-fn &key title artist rating (ripped nil ripped-p)) + (setf *db* + (mapcar + #'(lambda (row) + (when (funcall selector-fn row) + (if title (setf (getf row :title) title)) + (if artist (setf (getf row :artist) artist)) + (if rating (setf (getf row :rating) rating)) + (if ripped-p (setf (getf row :ripped) ripped))) + row) *db*))) + + One other new bit here is the use of SETF on a complex form such as +(getf row :title). I'll discuss SETF in greater detail in Chapter 6, +but for now you just need to know that it's a general assignment +operator that can be used to assign lots of "places" other than just +variables. (It's a coincidence that SETF and GETF have such similar +names-they don't have any special relationship.) For now it's enough to +know that after (setf (getf row :title) title), the plist referenced by +row will have the value of the variable title following the property +name :title. With this update function if you decide that you really +dig the Dixie Chicks and that all their albums should go to 11, you can +evaluate the following form: + + CL-USER> (update (where :artist "Dixie Chicks") :rating 11) + NIL + + And it is so. + + CL-USER> (select (where :artist "Dixie Chicks")) + ((:TITLE "Home" :ARTIST "Dixie Chicks" :RATING 11 :RIPPED T) + (:TITLE "Fly" :ARTIST "Dixie Chicks" :RATING 11 :RIPPED T)) + + You can even more easily add a function to delete rows from the +database. + + (defun delete-rows (selector-fn) + (setf *db* (remove-if selector-fn *db*))) + + The function REMOVE-IF is the complement of REMOVE-IF-NOT; it returns +a list with all the elements that do match the predicate removed. Like +REMOVE-IF-NOT, it doesn't actually affect the list it's passed but by +saving the result back into *db*, delete-rows (1) actually changes the +contents of the database. (2) + + ---------- Footnotes ---------- + + (1) You need to use the name delete-rows rather than the more obvious +delete because there's already a function in Common Lisp called DELETE. +The Lisp package system gives you a way to deal with such naming +conflicts, so you could have a function named delete if you wanted. But +I'm not ready to explain packages just yet. + + (2) If you're worried that this code creates a memory leak, rest +assured: Lisp was the language that invented garbage collection (and +heap allocation for that matter). The memory used by the old value of +*db* will be automatically reclaimed, assuming no one else is holding on +to a reference to it, which none of this code is. + + +File: pcl.info, Node: 3-8, Next: 3-9, Prev: 3-7, Up: Chapter 3 + +Removing Duplication and Winning Big +==================================== + +So far all the database code supporting insert, select, update, and +delete, not to mention a command-line user interface for adding new +records and dumping out the contents, is just a little more than 50 +lines. Total. (1) + + Yet there's still some annoying code duplication. And it turns out +you can remove the duplication and make the code more flexible at the +same time. The duplication I'm thinking of is in the where function. +The body of the where function is a bunch of clauses like this, one per +field: + + (if title (equal (getf cd :title) title) t) + + Right now it's not so bad, but like all code duplication it has the +same cost: if you want to change how it works, you have to change +multiple copies. And if you change the fields in a CD, you'll have to +add or remove clauses to where. And update suffers from the same kind +of duplication. It's doubly annoying since the whole point of the where +function is to dynamically generate a bit of code that checks the values +you care about; why should it have to do work at runtime checking +whether title was even passed in? + + Imagine that you were trying to optimize this code and discovered +that it was spending too much time checking whether title and the rest +of the keyword parameters to where were even set? (2) If you really +wanted to remove all those runtime checks, you could go through a +program and find all the places you call where and look at exactly what +arguments you're passing. Then you could replace each call to where +with an anonymous function that does only the computation necessary. +For instance, if you found this snippet of code: + + (select (where :title "Give Us a Break" :ripped t)) + + you could change it to this: + + (select + #'(lambda (cd) + (and (equal (getf cd :title) "Give Us a Break") + (equal (getf cd :ripped) t)))) + + Note that the anonymous function is different from the one that where +would have returned; you're not trying to save the call to where but +rather to provide a more efficient selector function. This anonymous +function has clauses only for the fields that you actually care about at +this call site, so it doesn't do any extra work the way a function +returned by where might. + + You can probably imagine going through all your source code and +fixing up all the calls to where in this way. But you can probably also +imagine that it would be a huge pain. If there were enough of them, and +it was important enough, it might even be worthwhile to write some kind +of preprocessor that converts where calls to the code you'd write by +hand. + + The Lisp feature that makes this trivially easy is its macro system. +I can't emphasize enough that the Common Lisp macro shares essentially +nothing but the name with the text-based macros found in C and C++. +Where the C pre-processor operates by textual substitution and +understands almost nothing of the structure of C and C++, a Lisp macro +is essentially a code generator that gets run for you automatically by +the compiler. (3) When a Lisp expression contains a call to a macro, +instead of evaluating the arguments and passing them to the function, +the Lisp compiler passes the arguments, unevaluated, to the macro code, +which returns a new Lisp expression that is then evaluated in place of +the original macro call. + + I'll start with a simple, and silly, example and then show how you +can replace the where function with a where macro. Before I can write +this example macro, I need to quickly introduce one new function: +REVERSE takes a list as an argument and returns a new list that is its +reverse. So (reverse '(1 2 3)) evaluates to (3 2 1). Now let's create +a macro. + + (defmacro backwards (expr) (reverse expr)) + + The main syntactic difference between a function and a macro is that +you define a macro with DEFMACRO instead of DEFUN. After that a macro +definition consists of a name, just like a function, a parameter list, +and a body of expressions, both also like a function. However, a macro +has a totally different effect. You can use this macro as follows: + + CL-USER> (backwards ("hello, world" t format)) + hello, world + NIL + + How did that work? When the REPL started to evaluate the backwards +expression, it recognized that backwards is the name of a macro. So it +left the expression ("hello, world" t format) unevaluated, which is good +because it isn't a legal Lisp form. It then passed that list to the +backwards code. The code in backwards passed the list to REVERSE, which +returned the list (format t "hello, world"). backwards then passed that +value back out to the REPL, which then evaluated it in place of the +original expression. + + The backwards macro thus defines a new language that's a lot like +Lisp-just backward-that you can drop into anytime simply by wrapping a +backward Lisp expression in a call to the backwards macro. And, in a +compiled Lisp program, that new language is just as efficient as normal +Lisp because all the macro code-the code that generates the new +expression-runs at compile time. In other words, the compiler will +generate exactly the same code whether you write (backwards ("hello, +world" t format)) or (format t "hello, world"). + + So how does that help with the code duplication in where? Well, you +can write a macro that generates exactly the code you need for each +particular call to where. Again, the best approach is to build our code +bottom up. In the hand-optimized selector function, you had an +expression of the following form for each actual field referred to in +the original call to where: + + (equal (getf cd field) value) + + So let's write a function that, given the name of a field and a +value, returns such an expression. Since an expression is just a list, +you might think you could write something like this: + + (defun make-comparison-expr (field value) ; wrong + (list equal (list getf cd field) value)) + + However, there's one trick here: as you know, when Lisp sees a simple +name such as field or value other than as the first element of a list, +it assumes it's the name of a variable and looks up its value. That's +fine for field and value; it's exactly what you want. But it will treat +equal, getf, and cd the same way, which isn't what you want. However, +you also know how to stop Lisp from evaluating a form: stick a single +forward quote (') in front of it. So if you write make-comparison-expr +like this, it will do what you want: + + (defun make-comparison-expr (field value) + (list 'equal (list 'getf 'cd field) value)) + + You can test it out in the REPL. + + CL-USER> (make-comparison-expr :rating 10) + (EQUAL (GETF CD :RATING) 10) + CL-USER> (make-comparison-expr :title "Give Us a Break") + (EQUAL (GETF CD :TITLE) "Give Us a Break") + + It turns out that there's an even better way to do it. What you'd +really like is a way to write an expression that's mostly not evaluated +and then have some way to pick out a few expressions that you do want +evaluated. And, of course, there's just such a mechanism. A back quote +(') before an expression stops evaluation just like a forward quote. + + CL-USER> `(1 2 3) + (1 2 3) + CL-USER> '(1 2 3) + (1 2 3) + + However, in a back-quoted expression, any subexpression that's +preceded by a comma is evaluated. Notice the effect of the comma in the +second expression: + + `(1 2 (+ 1 2)) ==> (1 2 (+ 1 2)) + `(1 2 ,(+ 1 2)) ==> (1 2 3) + + Using a back quote, you can write make-comparison-expr like this: + + (defun make-comparison-expr (field value) + `(equal (getf cd ,field) ,value)) + + Now if you look back to the hand-optimized selector function, you can +see that the body of the function consisted of one comparison expression +per field/value pair, all wrapped in an AND expression. Assume for the +moment that you'll arrange for the arguments to the where macro to be +passed as a single list. You'll need a function that can take the +elements of such a list pairwise and collect the results of calling +make-comparison-expr on each pair. To implement that function, you can +dip into the bag of advanced Lisp tricks and pull out the mighty and +powerful LOOP macro. + + (defun make-comparisons-list (fields) + (loop while fields + collecting (make-comparison-expr (pop fields) (pop fields)))) + + A full discussion of LOOP will have to wait until Chapter 22; for now +just note that this LOOP expression does exactly what you need: it loops +while there are elements left in the fields list, popping off two at a +time, passing them to make-comparison-expr, and collecting the results +to be returned at the end of the loop. The POP macro performs the +inverse operation of the PUSH macro you used to add records to *db*. + + Now you just need to wrap up the list returned by +make-comparison-list in an AND and an anonymous function, which you can +do in the where macro itself. Using a back quote to make a template +that you fill in by interpolating the value of make-comparisons-list, +it's trivial. + + (defmacro where (&rest clauses) + `#'(lambda (cd) (and ,@(make-comparisons-list clauses)))) + + This macro uses a variant of , (namely, the ,@) before the call to +make-comparisons-list. The ,@ "splices" the value of the following +expression-which must evaluate to a list-into the enclosing list. You +can see the difference between , and , in the following two expressions: + + `(and ,(list 1 2 3)) ==> (AND (1 2 3)) + `(and ,@(list 1 2 3)) ==> (AND 1 2 3) + + You can also use , to splice into the middle of a list. + + `(and ,@(list 1 2 3) 4) ==> (AND 1 2 3 4) + + The other important feature of the where macro is the use of &rest in +the argument list. Like &key, &rest modifies the way arguments are +parsed. With a &rest in its parameter list, a function or macro can +take an arbitrary number of arguments, which are collected into a single +list that becomes the value of the variable whose name follows the +&rest. So if you call where like this: + + (where :title "Give Us a Break" :ripped t) + + the variable clauses will contain the list. + + (:title "Give Us a Break" :ripped t) + + This list is passed to make-comparisons-list, which returns a list of +comparison expressions. You can see exactly what code a call to where +will generate using the function MACROEXPAND-1. If you pass +MACROEXPAND-1, a form representing a macro call, it will call the macro +code with appropriate arguments and return the expansion. So you can +check out the previous where call like this: + + CL-USER> (macroexpand-1 '(where :title "Give Us a Break" :ripped t)) + #'(LAMBDA (CD) + (AND (EQUAL (GETF CD :TITLE) "Give Us a Break") + (EQUAL (GETF CD :RIPPED) T))) + T + + Looks good. Let's try it for real. + + CL-USER> (select (where :title "Give Us a Break" :ripped t)) + ((:TITLE "Give Us a Break" :ARTIST "Limpopo" :RATING 10 :RIPPED T)) + + It works. And the where macro with its two helper functions is +actually one line shorter than the old where function. And it's more +general in that it's no longer tied to the specific fields in our CD +records. + + ---------- Footnotes ---------- + + (1) A friend of mine was once interviewing an engineer for a +programming job and asked him a typical interview question: how do you +know when a function or method is too big? Well, said the candidate, I +don't like any method to be bigger than my head. You mean you can't +keep all the details in your head? No, I mean I put my head up against +my monitor, and the code shouldn't be bigger than my head. + + (2) It's unlikely that the cost of checking whether keyword +parameters had been passed would be a detectible drag on performance +since checking whether a variable is NIL is going to be pretty cheap. +On the other hand, these functions returned by where are going to be +right in the middle of the inner loop of any select, update, or +delete-rows call, as they have to be called once per entry in the +database. Anyway, for illustrative purposes, this will have to do. + + (3) Macros are also run by the interpreter-however, it's easier to +understand the point of macros when you think about compiled code. As +with everything else in this chapter, I'll cover this in greater detail +in future chapters. + + +File: pcl.info, Node: 3-9, Next: Chapter 4, Prev: 3-7, Up: Chapter 3 + +Wrapping Up +=========== + +Now, an interesting thing has happened. You removed duplication and +made the code more efficient and more general at the same time. That's +often the way it goes with a well-chosen macro. This makes sense +because a macro is just another mechanism for creating +abstractions-abstraction at the syntactic level, and abstractions are by +definition more concise ways of expressing underlying generalities. Now +the only code in the mini-database that's specific to CDs and the fields +in them is in the make-cd, prompt-for-cd, and add-cd functions. In +fact, our new where macro would work with any plist-based database. + + However, this is still far from being a complete database. You can +probably think of plenty of features to add, such as supporting multiple +tables or more elaborate queries. In Chapter 27 we'll build an MP3 +database that incorporates some of those features. + + The point of this chapter was to give you a quick introduction to +just a handful of Lisp's features and show how they're used to write +code that's a bit more interesting than "hello, world." In the next +chapter we'll begin a more systematic overview of Lisp. + + +File: pcl.info, Node: Chapter 4, Next: Chapter 5, Prev: Chapter 3, Up: Top + +4. Syntax and Semantics +======================= + +After that whirlwind tour, we'll settle down for a few chapters to take +a more systematic look at the features you've used so far. I'll start +with an overview of the basic elements of Lisp's syntax and semantics, +which means, of course, that I must first address that burning question. +. . + +* Menu: + +* 4-1:: What's with All the Parentheses? +* 4-2:: Breaking Open the Black Box +* 4-3:: S-expressions +* 4-4:: S-expressions As Lisp Forms +* 4-5:: Function Calls +* 4-6:: Special Operators +* 4-7:: Macros +* 4-8:: Truth, Falsehood, and Equality +* 4-9:: Formatting Lisp Code + + +File: pcl.info, Node: 4-1, Next: 4-2, Prev: Chapter 4, Up: Chapter 4 + +What's with All the Parentheses? +================================ + +Lisp's syntax is quite a bit different from the syntax of languages +descended from Algol. The two most immediately obvious characteristics +are the extensive use of parentheses and prefix notation. For whatever +reason, a lot of folks are put off by this syntax. Lisp's detractors +tend to describe the syntax as "weird" and "annoying." Lisp, they say, +must stand for Lots of Irritating Superfluous Parentheses. Lisp folks, +on the other hand, tend to consider Lisp's syntax one of its great +virtues. How is it that what's so off-putting to one group is a source +of delight to another? + + I can't really make the complete case for Lisp's syntax until I've +explained Lisp's macros a bit more thoroughly, but I can start with an +historical tidbit that suggests it may be worth keeping an open mind: +when John McCarthy first invented Lisp, he intended to implement a more +Algol-like syntax, which he called M-expressions. However, he never got +around to it. He explained why not in his article "History of Lisp." +(1) + + The project of defining M-expressions precisely and compiling them + or at least translating them into S-expressions was neither + finalized nor explicitly abandoned. It just receded into the + indefinite future, and a new generation of programmers appeared who + preferred [S-expressions] to any FORTRAN-like or ALGOL-like + notation that could be devised. + + In other words, the people who have actually used Lisp over the past +45 years have liked the syntax and have found that it makes the language +more powerful. In the next few chapters, you'll begin to see why. + + ---------- Footnotes ---------- + + (1) http://www-formal.stanford.edu/jmc/history/lisp/node3.html + + +File: pcl.info, Node: 4-2, Next: 4-3, Prev: 4-1, Up: Chapter 4 + +Breaking Open the Black Box +=========================== + +Before we look at the specifics of Lisp's syntax and semantics, it's +worth taking a moment to look at how they're defined and how this +differs from many other languages. + + In most programming languages, the language processor-whether an +interpreter or a compiler-operates as a black box: you shove a sequence +of characters representing the text of a program into the black box, and +it-depending on whether it's an interpreter or a compiler-either +executes the behaviors indicated or produces a compiled version of the +program that will execute the behaviors when it's run. + + Inside the black box, of course, language processors are usually +divided into subsystems that are each responsible for one part of the +task of translating a program text into behavior or object code. A +typical division is to split the processor into three phases, each of +which feeds into the next: a lexical analyzer breaks up the stream of +characters into tokens and feeds them to a parser that builds a tree +representing the expressions in the program, according to the language's +grammar. This tree-called an abstract syntax tree-is then fed to an +evaluator that either interprets it directly or compiles it into some +other language such as machine code. Because the language processor is +a black box, the data structures used by the processor, such as the +tokens and abstract syntax trees, are of interest only to the language +implementer. + + In Common Lisp things are sliced up a bit differently, with +consequences for both the implementer and for how the language is +defined. Instead of a single black box that goes from text to program +behavior in one step, Common Lisp defines two black boxes, one that +translates text into Lisp objects and another that implements the +semantics of the language in terms of those objects. The first box is +called the reader, and the second is called the evaluator. (1) + + Each black box defines one level of syntax. The reader defines how +strings of characters can be translated into Lisp objects called +s-expressions. (2) Since the s-expression syntax includes syntax for +lists of arbitrary objects, including other lists, s-expressions can +represent arbitrary tree expressions, much like the abstract syntax tree +generated by the parsers for non-Lisp languages. + + The evaluator then defines a syntax of Lisp forms that can be built +out of s-expressions. Not all s-expressions are legal Lisp forms any +more than all sequences of characters are legal s-expressions. For +instance, both (foo 1 2) and ("foo" 1 2) are s-expressions, but only the +former can be a Lisp form since a list that starts with a string has no +meaning as a Lisp form. + + This split of the black box has a couple of consequences. One is +that you can use s-expressions, as you saw in Chapter 3, as an +externalizable data format for data other than source code, using READ +to read it and PRINT to print it. (3) The other consequence is that +since the semantics of the language are defined in terms of trees of +objects rather than strings of characters, it's easier to generate code +within the language than it would be if you had to generate code as +text. Generating code completely from scratch is only marginally +easier-building up lists vs. building up strings is about the same +amount of work. The real win, however, is that you can generate code by +manipulating existing data. This is the basis for Lisp's macros, which +I'll discuss in much more detail in future chapters. For now I'll focus +on the two levels of syntax defined by Common Lisp: the syntax of +s-expressions understood by the reader and the syntax of Lisp forms +understood by the evaluator. + + ---------- Footnotes ---------- + + (1) Lisp implementers, like implementers of any language, have many +ways they can implement an evaluator, ranging from a "pure" interpreter +that interprets the objects given to the evaluator directly to a +compiler that translates the objects into machine code that it then +runs. In the middle are implementations that compile the input into an +intermediate form such as bytecodes for a virtual machine and then +interprets the bytecodes. Most Common Lisp implementations these days +use some form of compilation even when evaluating code at run time. + + (2) Sometimes the phrase s-expression refers to the textual +representation and sometimes to the objects that result from reading the +textual representation. Usually either it's clear from context which is +meant or the distinction isn't that important. + + (3) Not all Lisp objects can be written out in a way that can be read +back in. But anything you can READ can be printed back out "readably" +with PRINT. + + +File: pcl.info, Node: 4-3, Next: 4-4, Prev: 4-2, Up: Chapter 4 + +S-expressions +============= + +The basic elements of s-expressions are lists and atoms. Lists are +delimited by parentheses and can contain any number of +whitespace-separated elements. Atoms are everything else. (1) The +elements of lists are themselves s-expressions (in other words, atoms or +nested lists). Comments-which aren't, technically speaking, +s-expressions-start with a semicolon, extend to the end of a line, and +are treated essentially like whitespace. + + And that's pretty much it. Since lists are syntactically so trivial, +the only remaining syntactic rules you need to know are those governing +the form of different kinds of atoms. In this section I'll describe the +rules for the most commonly used kinds of atoms: numbers, strings, and +names. After that, I'll cover how s-expressions composed of these +elements can be evaluated as Lisp forms. + + Numbers are fairly straightforward: any sequence of digits-possibly +prefaced with a sign (+ or -), containing a decimal point (.) or a +solidus (/), or ending with an exponent marker-is read as a number. For +example: + + 123 ; the integer one hundred twenty-three + 3/7 ; the ratio three-sevenths + 1.0 ; the floating-point number one in default precision + 1.0e0 ; another way to write the same floating-point number + 1.0d0 ; the floating-point number one in "double" precision + 1.0e-4 ; the floating-point equivalent to one-ten-thousandth + +42 ; the integer forty-two + -42 ; the integer negative forty-two + -1/4 ; the ratio negative one-quarter + -2/8 ; another way to write negative one-quarter + 246/2 ; another way to write the integer one hundred twenty-three + + These different forms represent different kinds of numbers: integers, +ratios, and floating point. Lisp also supports complex numbers, which +have their own notation and which I'll discuss in Chapter 10. + + As some of these examples suggest, you can notate the same number in +many ways. But regardless of how you write them, all rationals-integers +and ratios-are represented internally in "simplified" form. In other +words, the objects that represent -2/8 or 246/2 aren't distinct from the +objects that represent -1/4 and 123. Similarly, 1.0 and 1.0e0 are just +different ways of writing the same number. On the other hand, 1.0, +1.0d0, and 1 can all denote different objects because the different +floating-point representations and integers are different types. We'll +save the details about the characteristics of different kinds of numbers +for Chapter 10. + + Strings literals, as you saw in the previous chapter, are enclosed in +double quotes. Within a string a backslash (\) escapes the next +character, causing it to be included in the string regardless of what it +is. The only two characters that must be escaped within a string are +double quotes and the backslash itself. All other characters can be +included in a string literal without escaping, regardless of their +meaning outside a string. Some example string literals are as follows: + + "foo" ; the string containing the characters f, o, and o. + "fo\o" ; the same string + "fo\\o" ; the string containing the characters f, o, \, and o. + "fo\"o" ; the string containing the characters f, o, ", and o. + + Names used in Lisp programs, such as FORMAT and hello-world, and *db* +are represented by objects called symbols. The reader knows nothing +about how a given name is going to be used-whether it's the name of a +variable, a function, or something else. It just reads a sequence of +characters and builds an object to represent the name. (2) Almost any +character can appear in a name. Whitespace characters can't, though, +because the elements of lists are separated by whitespace. Digits can +appear in names as long as the name as a whole can't be interpreted as a +number. Similarly, names can contain periods, but the reader can't read +a name that consists only of periods. Ten characters that serve other +syntactic purposes can't appear in names: open and close parentheses, +double and single quotes, backtick, comma, colon, semicolon, backslash, +and vertical bar. And even those characters can, if you're willing to +escape them by preceding the character to be escaped with a backslash or +by surrounding the part of the name containing characters that need +escaping with vertical bars. + + Two important characteristics of the way the reader translates names +to symbol objects have to do with how it treats the case of letters in +names and how it ensures that the same name is always read as the same +symbol. While reading names, the reader converts all unescaped +characters in a name to their uppercase equivalents. Thus, the reader +will read foo, Foo, and FOO as the same symbol: FOO. However, \f\o\o and +|foo| will both be read as foo, which is a different object than the +symbol FOO. This is why when you define a function at the REPL and it +prints the name of the function, it's been converted to uppercase. +Standard style, these days, is to write code in all lowercase and let +the reader change names to uppercase. (3) + + To ensure that the same textual name is always read as the same +symbol, the reader interns symbols-after it has read the name and +converted it to all uppercase, the reader looks in a table called a +package for an existing symbol with the same name. If it can't find +one, it creates a new symbol and adds it to the table. Otherwise, it +returns the symbol already in the table. Thus, anywhere the same name +appears in any s-expression, the same object will be used to represent +it. (4) + + Because names can contain many more characters in Lisp than they can +in Algol-derived languages, certain naming conventions are distinct to +Lisp, such as the use of hyphenated names like hello-world. Another +important convention is that global variables are given names that start +and end with *. Similarly, constants are given names starting and +ending in +. And some programmers will name particularly low-level +functions with names that start with % or even %%. The names defined in +the language standard use only the alphabetic characters (A-Z) plus *, ++, -, /, 1, 2, <, =, >, and &. + + The syntax for lists, numbers, strings, and symbols can describe a +good percentage of Lisp programs. Other rules describe notations for +literal vectors, individual characters, and arrays, which I'll cover +when I talk about the associated data types in Chapters 10 and 11. For +now the key thing to understand is how you can combine numbers, strings, +and symbols with parentheses-delimited lists to build s-expressions +representing arbitrary trees of objects. Some simple examples look like +this: + + x ; the symbol X + () ; the empty list + (1 2 3) ; a list of three numbers + ("foo" "bar") ; a list of two strings + (x y z) ; a list of three symbols + (x 1 "foo") ; a list of a symbol, a number, and a string + (+ (* 2 3) 4) ; a list of a symbol, a list, and a number. + + An only slightly more complex example is the following four-item list +that contains two symbols, the empty list, and another list, itself +containing two symbols and a string: + + (defun hello-world () + (format t "hello, world")) + + ---------- Footnotes ---------- + + (1) The empty list, (), which can also be written NIL, is both an +atom and a list. + + (2) In fact, as you'll see later, names aren't intrinsically tied to +any one kind of thing. You can use the same name, depending on context, +to refer to both a variable and a function, not to mention several other +possibilities. + + (3) The case-converting behavior of the reader can, in fact, be +customized, but understanding when and how to change it requires a much +deeper discussion of the relation between names, symbols, and other +program elements than I'm ready to get into just yet. + + (4) I'll discuss the relation between symbols and packages in more +detail in Chapter 21. + + +File: pcl.info, Node: 4-4, Next: 4-5, Prev: 4-3, Up: Chapter 4 + +S-expressions As Lisp Forms +=========================== + +After the reader has translated a bunch of text into s-expressions, the +s-expressions can then be evaluated as Lisp code. Or some of them +can-not every s-expressions that the reader can read can necessarily be +evaluated as Lisp code. Common Lisp's evaluation rule defines a second +level of syntax that determines which s-expressions can be treated as +Lisp forms. (1) The syntactic rules at this level are quite simple. +Any atom-any nonlist or the empty list-is a legal Lisp form as is any +list that has a symbol as its first element. (2) + + Of course, the interesting thing about Lisp forms isn't their syntax +but how they're evaluated. For purposes of discussion, you can think of +the evaluator as a function that takes as an argument a syntactically +well-formed Lisp form and returns a value, which we can call the value +of the form. Of course, when the evaluator is a compiler, this is a bit +of a simplification-in that case, the evaluator is given an expression +and generates code that will compute the appropriate value when it's +run. But this simplification lets me describe the semantics of Common +Lisp in terms of how the different kinds of Lisp forms are evaluated by +this notional function. + + The simplest Lisp forms, atoms, can be divided into two categories: +symbols and everything else. A symbol, evaluated as a form, is +considered the name of a variable and evaluates to the current value of +the variable. (3) I'll discuss in Chapter 6 how variables get their +values in the first place. You should also note that certain +"variables" are that old oxymoron of programming: "constant variables." +For instance, the symbol PI names a constant variable whose value is the +best possible floating-point approximation to the mathematical constant +pi. + + All other atoms-numbers and strings are the kinds you've seen so +far-are self-evaluating objects. This means when such an expression is +passed to the notional evaluation function, it's simply returned. You +saw examples of self-evaluating objects in Chapter 2 when you typed 10 +and "hello, world" at the REPL. + + It's also possible for symbols to be self-evaluating in the sense +that the variables they name can be assigned the value of the symbol +itself. Two important constants that are defined this way are T and +NIL, the canonical true and false values. I'll discuss their role as +booleans in the section "Truth, Falsehood, and Equality." + + Another class of self-evaluating symbols are the keyword +symbols-symbols whose names start with :. When the reader interns such +a name, it automatically defines a constant variable with the name and +with the symbol as the value. + + Things get more interesting when we consider how lists are evaluated. +All legal list forms start with a symbol, but three kinds of list forms +are evaluated in three quite different ways. To determine what kind of +form a given list is, the evaluator must determine whether the symbol +that starts the list is the name of a function, a macro, or a special +operator. If the symbol hasn't been defined yet-as may be the case if +you're compiling code that contains references to functions that will be +defined later-it's assumed to be a function name. (4) I'll refer to the +three kinds of forms as function call forms, macro forms, and special +forms. + + ---------- Footnotes ---------- + + (1) Of course, other levels of correctness exist in Lisp, as in other +languages. For instance, the s-expression that results from reading +(foo 1 2) is syntactically well-formed but can be evaluated only if foo +is the name of a function or macro. + + (2) One other rarely used kind of Lisp form is a list whose first +element is a lambda form. I'll discuss this kind of form in Chapter 5. + + (3) One other possibility exists-it's possible to define symbol +macros that are evaluated slightly differently. We won't worry about +them. + + (4) In Common Lisp a symbol can name both an operator-function, +macro, or special operator-and a variable. This is one of the major +differences between Common Lisp and Scheme. The difference is sometimes +described as Common Lisp being a Lisp-2 vs. Scheme being a Lisp-1-a +Lisp-2 has two namespaces, one for operators and one for variables, but +a Lisp-1 uses a single namespace. Both choices have advantages, and +partisans can debate endlessly which is better. + + +File: pcl.info, Node: 4-5, Next: 4-6, Prev: 4-4, Up: Chapter 4 + +Function Calls +============== + +The evaluation rule for function call forms is simple: evaluate the +remaining elements of the list as Lisp forms and pass the resulting +values to the named function. This rule obviously places some +additional syntactic constraints on a function call form: all the +elements of the list after the first must themselves be well-formed Lisp +forms. In other words, the basic syntax of a function call form is as +follows, where each of the arguments is itself a Lisp form: + + (function-name argument*) + + Thus, the following expression is evaluated by first evaluating 1, +then evaluating 2, and then passing the resulting values to the + +function, which returns 3: + + (+ 1 2) + + A more complex expression such as the following is evaluated in +similar fashion except that evaluating the arguments (+ 1 2) and (- 3 4) +entails first evaluating their arguments and applying the appropriate +functions to them: + + (* (+ 1 2) (- 3 4)) + + Eventually, the values 3 and -1 are passed to the * function, which +returns -3. + + As these examples show, functions are used for many of the things +that require special syntax in other languages. This helps keep Lisp's +syntax regular. + + +File: pcl.info, Node: 4-6, Next: 4-7, Prev: 4-5, Up: Chapter 4 + +Special Operators +================= + +That said, not all operations can be defined as functions. Because all +the arguments to a function are evaluated before the function is called, +there's no way to write a function that behaves like the IF operator you +used in Chapter 3. To see why, consider this form: + + (if x (format t "yes") (format t "no")) + + If IF were a function, the evaluator would evaluate the argument +expressions from left to right. The symbol x would be evaluated as a +variable yielding some value; then (format t "yes") would be evaluated +as a function call, yielding NIL after printing "yes" to standard +output. Then (format t "no") would be evaluated, printing "no" and also +yielding NIL. Only after all three expressions were evaluated would the +resulting values be passed to IF, too late for it to control which of +the two FORMAT expressions gets evaluated. + + To solve this problem, Common Lisp defines a couple dozen so-called +special operators, IF being one, that do things that functions can't do. +There are 25 in all, but only a small handful are used directly in +day-to-day programming. (1) + + When the first element of a list is a symbol naming a special +operator, the rest of the expressions are evaluated according to the +rule for that operator. + + The rule for IF is pretty easy: evaluate the first expression. If it +evaluates to non-NIL, then evaluate the next expression and return its +value. Otherwise, return the value of evaluating the third expression +or NIL if the third expression is omitted. In other words, the basic +form of an IF expression is as follows: + + (if test-form then-form [ else-form ]) + + The test-form will always be evaluated and then one or the other of +the then-form or else-form. + + An even simpler special operator is QUOTE, which takes a single +expression as its "argument" and simply returns it, unevaluated. For +instance, the following evaluates to the list (+ 1 2), not the value 3: + + (quote (+ 1 2)) + + There's nothing special about this list; you can manipulate it just +like any list you could create with the LIST function. (2) + + QUOTE is used commonly enough that a special syntax for it is built +into the reader. Instead of writing the following: + + (quote (+ 1 2)) + you can write this: + + '(+ 1 2) + + This syntax is a small extension of the s-expression syntax +understood by the reader. From the point of view of the evaluator, both +those expressions will look the same: a list whose first element is the +symbol QUOTE and whose second element is the list (+ 1 2). (3) + + In general, the special operators implement features of the language +that require some special processing by the evaluator. For instance, +several special operators manipulate the environment in which other +forms will be evaluated. One of these, which I'll discuss in detail in +Chapter 6, is LET, which is used to create new variable bindings. The +following form evaluates to 10 because the second x is evaluated in an +environment where it's the name of a variable established by the LET +with the value 10: + + (let ((x 10)) x) + + ---------- Footnotes ---------- + + (1) The others provide useful, but somewhat esoteric, features. I'll +discuss them as the features they support come up. + + (2) Well, one difference exists-literal objects such as quoted lists, +but also including double-quoted strings, literal arrays, and vectors +(whose syntax you'll see later), must not be modified. Consequently, +any lists you plan to manipulate you should create with LIST. + + (3) This syntax is an example of a reader macro. Reader macros +modify the syntax the reader uses to translate text into Lisp objects. +It is, in fact, possible to define your own reader macros, but that's a +rarely used facility of the language. When most Lispers talk about +"extending the syntax" of the language, they're talking about regular +macros, as I'll discuss in a moment. + + +File: pcl.info, Node: 4-7, Next: 4-8, Prev: 4-6, Up: Chapter 4 + +Macros +====== + +While special operators extend the syntax of Common Lisp beyond what can +be expressed with just function calls, the set of special operators is +fixed by the language standard. Macros, on the other hand, give users +of the language a way to extend its syntax. As you saw in Chapter 3, a +macro is a function that takes s-expressions as arguments and returns a +Lisp form that's then evaluated in place of the macro form. The +evaluation of a macro form proceeds in two phases: First, the elements +of the macro form are passed, unevaluated, to the macro function. +Second, the form returned by the macro function-called its expansion-is +evaluated according to the normal evaluation rules. + + It's important to keep the two phases of evaluating a macro form +clear in your mind. It's easy to lose track when you're typing +expressions at the REPL because the two phases happen one after another +and the value of the second phase is immediately returned. But when +Lisp code is compiled, the two phases happen at completely different +times, so it's important to keep clear what's happening when. For +instance, when you compile a whole file of source code with the function +COMPILE-FILE, all the macro forms in the file are recursively expanded +until the code consists of nothing but function call forms and special +forms. This macroless code is then compiled into a FASL file that the +LOAD function knows how to load. The compiled code, however, isn't +executed until the file is loaded. Because macros generate their +expansion at compile time, they can do relatively large amounts of work +generating their expansion without having to pay for it when the file is +loaded or the functions defined in the file are called. + + Since the evaluator doesn't evaluate the elements of the macro form +before passing them to the macro function, they don't need to be +well-formed Lisp forms. Each macro assigns a meaning to the +s-expressions in the macro form by virtue of how it uses them to +generate its expansion. In other words, each macro defines its own +local syntax. For instance, the backwards macro from Chapter 3 defines +a syntax in which an expression is a legal backwards form if it's a list +that's the reverse of a legal Lisp form. + + I'll talk quite a bit more about macros throughout this book. For +now the important thing for you to realize is that macros-while +syntactically similar to function calls-serve quite a different purpose, +providing a hook into the compiler. (1) + + ---------- Footnotes ---------- + + (1) People without experience using Lisp's macros or, worse yet, +bearing the scars of C preprocessor-inflicted wounds, tend to get +nervous when they realize that macro calls look like regular function +calls. This turns out not to be a problem in practice for several +reasons. One is that macro forms are usually formatted differently than +function calls. For instance, you write the following: + + (dolist (x foo) + (print x)) + + rather than this: + + (dolist (x foo) (print x)) + + or + + (dolist (x foo) + (print x)) + + the way you would if DOLIST was a function. A good Lisp environment +will automatically format macro calls correctly, even for user-defined +macros. + + And even if a DOLIST form was written on a single line, there are +several clues that it's a macro: For one, the expression (x foo) is +meaningful by itself only if x is the name of a function or macro. +Combine that with the later occurrence of x as a variable, and it's +pretty suggestive that DOLIST is a macro that's creating a binding for a +variable named x. Naming conventions also help-looping constructs, +which are invariably macros-are frequently given names starting with do. + + +File: pcl.info, Node: 4-8, Next: 4-9, Prev: 4-7, Up: Chapter 4 + +Truth, Falsehood, and Equality +============================== + +Two last bits of basic knowledge you need to get under your belt are +Common Lisp's notion of truth and falsehood and what it means for two +Lisp objects to be "equal." Truth and falsehood are-in this +realm-straightforward: the symbol NIL is the only false value, and +everything else is true. The symbol T is the canonical true value and +can be used when you need to return a non-NIL value and don't have +anything else handy. The only tricky thing about NIL is that it's the +only object that's both an atom and a list: in addition to falsehood, +it's also used to represent the empty list. (1) This equivalence +between NIL and the empty list is built into the reader: if the reader +sees (), it reads it as the symbol NIL. They're completely +interchangeable. And because NIL, as I mentioned previously, is the +name of a constant variable with the symbol NIL as its value, the +expressions nil, (), 'nil, and '() all evaluate to the same thing-the +unquoted forms are evaluated as a reference to the constant variable +whose value is the symbol NIL, but in the quoted forms the QUOTE special +operator evaluates to the symbol directly. For the same reason, both t +and 't will evaluate to the same thing: the symbol T. + + Using phrases such as "the same thing" of course begs the question of +what it means for two values to be "the same." As you'll see in future +chapters, Common Lisp provides a number of type-specific equality +predicates: = is used to compare numbers, CHAR= to compare characters, +and so on. In this section I'll discuss the four "generic" equality +predicates-functions that can be passed any two Lisp objects and will +return true if they're equivalent and false otherwise. They are, in +order of discrimination, EQ, EQL, EQUAL, and EQUALP. + + EQ tests for "object identity"-two objects are EQ if they're +identical. Unfortunately, the object identity of numbers and characters +depends on how those data types are implemented in a particular Lisp. +Thus, EQ may consider two numbers or two characters with the same value +to be equivalent, or it may not. Implementations have enough leeway +that the expression (eq 3 3) can legally evaluate to either true or +false. More to the point, (eq x x) can evaluate to either true or false +if the value of x happens to be a number or character. + + Thus, you should never use EQ to compare values that may be numbers +or characters. It may seem to work in a predictable way for certain +values in a particular implementation, but you have no guarantee that it +will work the same way if you switch implementations. And switching +implementations may mean simply upgrading your implementation to a new +version-if your Lisp implementer changes how they represent numbers or +characters, the behavior of EQ could very well change as well. + + Thus, Common Lisp defines EQL to behave like EQ except that it also +is guaranteed to consider two objects of the same class representing the +same numeric or character value to be equivalent. Thus, (eql 1 1) is +guaranteed to be true. And (eql 1 1.0) is guaranteed to be false since +the integer value 1 and the floating-point value are instances of +different classes. + + There are two schools of thought about when to use EQ and when to use +EQL: The "use EQ when possible" camp argues you should use EQ when you +know you aren't going to be com-paring numbers or characters because (a) +it's a way to indicate that you aren't going to be comparing numbers or +characters and (b) it will be marginally more efficient since EQ doesn't +have to check whether its arguments are numbers or characters. + + The "always use EQL" camp says you should never use EQ because (a) +the potential gain in clarity is lost because every time someone reading +your code-including you-sees an EQ, they have to stop and check whether +it's being used correctly (in other words, that it's never going to be +called upon to compare numbers or characters) and (b) that the +efficiency difference between EQ and EQL is in the noise compared to +real performance bottlenecks. + + The code in this book is written in the "always use EQL" style. (2) + + The other two equality predicates, EQUAL and EQUALP, are general in +the sense that they can operate on all types of objects, but they're +much less fundamental than EQ or EQL. They each define a slightly less +discriminating notion of equivalence than EQL, allowing different +objects to be considered equivalent. There's nothing special about the +particular notions of equivalence these functions implement except that +they've been found to be handy by Lisp programmers in the past. If +these predicates don't suit your needs, you can always define your own +predicate function that compares different types of objects in the way +you need. + + EQUAL loosens the discrimination of EQL to consider lists equivalent +if they have the same structure and contents, recursively, according to +EQUAL. EQUAL also considers strings equivalent if they contain the same +characters. It also defines a looser definition of equivalence than EQL +for bit vectors and pathnames, two data types I'll discuss in future +chapters. For all other types, it falls back on EQL. + + EQUALP is similar to EQUAL except it's even less discriminating. It +considers two strings equivalent if they contain the same characters, +ignoring differences in case. It also considers two characters +equivalent if they differ only in case. Numbers are equivalent under +EQUALP if they represent the same mathematical value. Thus, (equalp 1 +1.0) is true. Lists with EQUALP elements are EQUALP; likewise, arrays +with EQUALP elements are EQUALP. As with EQUAL, there are a few other +data types that I haven't covered yet for which EQUALP can consider two +objects equivalent that neither EQL nor EQUAL will. For all other data +types, EQUALP falls back on EQL. + + ---------- Footnotes ---------- + + (1) Using the empty list as false is a reflection of Lisp's heritage +as a list-processing language much as the use of the integer 0 as false +in C is a reflection of its heritage as a bit-twiddling language. Not +all Lisps handle boolean values the same way. Another of the many +subtle differences upon which a good Common Lisp vs. Scheme flame war +can rage for days is Scheme's use of a distinct false value #f, which +isn't the same value as either the symbol nil or the empty list, which +are also distinct from each other. + + (2) Even the language standard is a bit ambivalent about which of EQ +or EQL should be preferred. Object identity is defined by EQ, but the +standard defines the phrase the same when talking about objects to mean +EQL unless another predicate is explicitly mentioned. Thus, if you want +to be 100 percent technically correct, you can say that (- 3 2) and (- 4 +3) evaluate to "the same" object but not that they evaluate to +"identical" objects. This is, admittedly, a bit of an +angels-on-pinheads kind of issue. + + +File: pcl.info, Node: 4-9, Next: Chapter 5, Prev: 4-7, Up: Chapter 4 + +Formatting Lisp Code +==================== + +While code formatting is, strictly speaking, neither a syntactic nor a +semantic matter, proper formatting is important to reading and writing +code fluently and idiomatically. The key to formatting Lisp code is to +indent it properly. The indentation should reflect the structure of the +code so that you don't need to count parentheses to see what goes with +what. In general, each new level of nesting gets indented a bit more, +and, if line breaks are necessary, items at the same level of nesting +are lined up. Thus, a function call that needs to be broken up across +multiple lines might be written like this: + + (some-function arg-with-a-long-name + another-arg-with-an-even-longer-name) + + Macro and special forms that implement control constructs are +typically indented a little differently: the "body" elements are +indented two spaces relative to the opening parenthesis of the form. +Thus: + + (defun print-list (list) + (dolist (i list) + (format t "item: ~a~%" i))) + + However, you don't need to worry too much about these rules because a +proper Lisp environment such as SLIME will take care of it for you. In +fact, one of the advantages of Lisp's regular syntax is that it's fairly +easy for software such as editors to know how to indent it. Since the +indentation is supposed to reflect the structure of the code and the +structure is marked by parentheses, it's easy to let the editor indent +your code for you. + + In SLIME, hitting Tab at the beginning of each line will cause it to +be indented appropriately, or you can re-indent a whole expression by +positioning the cursor on the opening parenthesis and typing C-M-q. Or +you can re-indent the whole body of a function from anywhere within it +by typing C-c M-q. + + Indeed, experienced Lisp programmers tend to rely on their editor +handling indenting automatically, not just to make their code look nice +but to detect typos: once you get used to how code is supposed to be +indented, a misplaced parenthesis will be instantly recognizable by the +weird indentation your editor gives you. For example, suppose you were +writing a function that was supposed to look like this: + + (defun foo () + (if (test) + (do-one-thing) + (do-another-thing))) + + Now suppose you accidentally left off the closing parenthesis after +test. Because you don't bother counting parentheses, you quite likely +would have added an extra parenthesis at the end of the DEFUN form, +giving you this code: + + (defun foo () + (if (test + (do-one-thing) + (do-another-thing)))) + + However, if you had been indenting by hitting Tab at the beginning of +each line, you wouldn't have code like that. Instead you'd have this: + + (defun foo () + (if (test + (do-one-thing) + (do-another-thing)))) + + Seeing the then and else clauses indented way out under the condition +rather than just indented slightly relative to the IF shows you +immediately that something is awry. + + Another important formatting rule is that closing parentheses are +always put on the same line as the last element of the list they're +closing. That is, don't write this: + + (defun foo () + (dotimes (i 10) + (format t "~d. hello~%" i) + ) + ) + + but instead write this: + + (defun foo () + (dotimes (i 10) + (format t "~d. hello~%" i))) + + The string of )))s at the end may seem forbidding, but as long your +code is properly indented the parentheses should fade away-no need to +give them undue prominence by spreading them across several lines. + + Finally, comments should be prefaced with one to four semicolons +depending on the scope of the comment as follows: + + ;;;; Four semicolons are used for a file header comment. + + ;;; A comment with three semicolons will usually be a paragraph + ;;; comment that applies to a large section of code that follows, + + (defun foo (x) + (dotimes (i x) + ;; Two semicolons indicate this comment applies to the code + ;; that follows. Note that this comment is indented the same + ;; as the code that follows. + (some-function-call) + (another i) ; this comment applies to this line only + (and-another) ; and this is for this line + (baz))) + + Now you're ready to start looking in greater detail at the major +building blocks of Lisp programs, functions, variables, and macros. Up +next: functions. + + +File: pcl.info, Node: Chapter 5, Next: Chapter 6, Prev: Chapter 4, Up: Top + +5. Functions +============ + +After the rules of syntax and semantics, the three most basic components +of all Lisp programs are functions, variables and macros. You used all +three while building the database in Chapter 3, but I glossed over a lot +of the details of how they work and how to best use them. I'll devote +the next few chapters to these three topics, starting with functions, +which-like their counterparts in other languages-provide the basic +mechanism for abstracting, well, functionality. + + The bulk of Lisp itself consists of functions. More than three +quarters of the names defined in the language standard name functions. +All the built-in data types are defined purely in terms of what +functions operate on them. Even Lisp's powerful object system is built +upon a conceptual extension to functions, generic functions, which I'll +cover in Chapter 16. + + And, despite the importance of macros to The Lisp Way, in the end all +real functionality is provided by functions. Macros run at compile +time, so the code they generate-the code that will actually make up the +program after all the macros are expanded-will consist entirely of calls +to functions and special operators. Not to mention, macros themselves +are also functions, albeit functions that are used to generate code +rather than to perform the actions of the program. (1) + +* Menu: + +* 5-1:: Defining New Functions +* 5-2:: Function Parameter Lists +* 5-3:: Optional Parameters +* 5-4:: Rest Parameters +* 5-5:: Keyword Parameters +* 5-6:: Mixing Different Parameter Types +* 5-7:: Function Return Values +* 5-8:: Functions As Data, a.k.a. Higher-Order Functions +* 5-9:: Anonymous Functions + + ---------- Footnotes ---------- + + (1) Despite the importance of functions in Common Lisp, it isn't +really accurate to describe it as a functional language. It's true some +of Common Lisp's features, such as its list manipulation functions, are +designed to be used in a body-form* style and that Lisp has a prominent +place in the history of functional programming-McCarthy introduced many +ideas that are now considered important in functional programming-but +Common Lisp was intentionally designed to support many different styles +of programming. In the Lisp family, Scheme is the nearest thing to a +"pure" functional language, and even it has several features that +disqualify it from absolute purity compared to languages such as Haskell +and ML. + + +File: pcl.info, Node: 5-1, Next: 5-2, Prev: Chapter 5, Up: Chapter 5 + +Defining New Functions +====================== + +Normally functions are defined using the DEFUN macro. The basic +skeleton of a DEFUN looks like this: + + (defun name (parameter*) + "Optional documentation string." + body-form*) + + Any symbol can be used as a function name. (1) Usually function +names contain only alphabetic characters and hyphens, but other +characters are allowed and are used in certain naming conventions. For +instance, functions that convert one kind of value to another sometimes +use -> in the name. For example, a function to convert strings to +widgets might be called string->widget. The most important naming +convention is the one mentioned in Chapter 2, which is that you +construct compound names with hyphens rather than underscores or inner +caps. Thus, frob-widget is better Lisp style than either frob_widget or +frobWidget. + + A function's parameter list defines the variables that will be used +to hold the arguments passed to the function when it's called. (2) If +the function takes no arguments, the list is empty, written as (). +Different flavors of parameters handle required, optional, multiple, and +keyword arguments. I'll discuss the details in the next section. + + If a string literal follows the parameter list, it's a documentation +string that should describe the purpose of the function. When the +function is defined, the documentation string will be associated with +the name of the function and can later be obtained using the +DOCUMENTATION function. (3) + + Finally, the body of a DEFUN consists of any number of Lisp +expressions. They will be evaluated in order when the function is +called and the value of the last expression is returned as the value of +the function. Or the RETURN-FROM special operator can be used to return +immediately from anywhere in a function, as I'll discuss in a moment. + + In Chapter 2 we wrote a hello-world function, which looked like this: + + (defun hello-world () (format t "hello, world")) + + You can now analyze the parts of this function. Its name is +hello-world, its parameter list is empty so it takes no arguments, it +has no documentation string, and its body consists of one expression. + + (format t "hello, world") + + The following is a slightly more complex function: + + (defun verbose-sum (x y) + "Sum any two numbers after printing a message." + (format t "Summing ~d and ~d.~%" x y) + (+ x y)) + + This function is named verbose-sum, takes two arguments that will be +bound to the parameters x and y, has a documentation string, and has a +body consisting of two expressions. The value returned by the call to + +becomes the return value of verbose-sum. + + ---------- Footnotes ---------- + + (1) Well, almost any symbol. It's undefined what happens if you use +any of the names defined in the language standard as a name for one of +your own functions. However, as you'll see in Chapter 21, the Lisp +package system allows you to create names in different namespaces, so +this isn't really an issue. + + (2) Parameter lists are sometimes also called lambda lists because of +the historical relationship between Lisp's notion of functions and the +lambda calculus. + + (3) For example, the following: + + (documentation 'foo 'function) + + returns the documentation string for the function foo. Note, +however, that documentation strings are intended for human consumption, +not programmatic access. A Lisp implementation isn't required to store +them and is allowed to discard them at any time, so portable programs +shouldn't depend on their presence. In some implementations an +implementation-defined variable needs to be set before it will store +documentation strings. + + +File: pcl.info, Node: 5-2, Next: 5-3, Prev: 5-1, Up: Chapter 5 + +Function Parameter Lists +======================== + +There's not a lot more to say about function names or documentation +strings, and it will take a good portion of the rest of this book to +describe all the things you can do in the body of a function, which +leaves us with the parameter list. + + The basic purpose of a parameter list is, of course, to declare the +variables that will receive the arguments passed to the function. When +a parameter list is a simple list of variable names-as in +verbose-sum-the parameters are called required parameters. When a +function is called, it must be supplied with one argument for every +required parameter. Each parameter is bound to the corresponding +argument. If a function is called with too few or too many arguments, +Lisp will signal an error. + + However, Common Lisp's parameter lists also give you more flexible +ways of mapping the arguments in a function call to the function's +parameters. In addition to required parameters, a function can have +optional parameters. Or a function can have a single parameter that's +bound to a list containing any extra arguments. And, finally, arguments +can be mapped to parameters using keywords rather than position. Thus, +Common Lisp's parameter lists provide a convenient solution to several +common coding problems. + + +File: pcl.info, Node: 5-3, Next: 5-4, Prev: 5-2, Up: Chapter 5 + +Optional Parameters +=================== + +While many functions, like verbose-sum, need only required parameters, +not all functions are quite so simple. Sometimes a function will have a +parameter that only certain callers will care about, perhaps because +there's a reasonable default value. An example is a function that +creates a data structure that can grow as needed. Since the data +structure can grow, it doesn't matter-from a correctness point of +view-what the initial size is. But callers who have a good idea how +many items they're going to put into the data structure may be able to +improve performance by specifying a specific initial size. Most +callers, though, would probably rather let the code that implements the +data structure pick a good general-purpose value. In Common Lisp you +can accommodate both kinds of callers by using an optional parameter; +callers who don't care will get a reasonable default, and other callers +can provide a specific value. (1) + + To define a function with optional parameters, after the names of any +required parameters, place the symbol &optional followed by the names of +the optional parameters. A simple example looks like this: + + (defun foo (a b &optional c d) (list a b c d)) + + When the function is called, arguments are first bound to the +required parameters. After all the required parameters have been given +values, if there are any arguments left, their values are assigned to +the optional parameters. If the arguments run out before the optional +parameters do, the remaining optional parameters are bound to the value +NIL. Thus, the function defined previously gives the following results: + + (foo 1 2) ==> (1 2 NIL NIL) + (foo 1 2 3) ==> (1 2 3 NIL) + (foo 1 2 3 4) ==> (1 2 3 4) + + Lisp will still check that an appropriate number of arguments are +passed to the function-in this case between two and four, inclusive-and +will signal an error if the function is called with too few or too many. + + Of course, you'll often want a different default value than NIL. You +can specify the default value by replacing the parameter name with a +list containing a name and an expression. The expression will be +evaluated only if the caller doesn't pass enough arguments to provide a +value for the optional parameter. The common case is simply to provide +a value as the expression. + + (defun foo (a &optional (b 10)) (list a b)) + + This function requires one argument that will be bound to the +parameter a. The second parameter, b, will take either the value of the +second argument, if there is one, or 10. + + (foo 1 2) ==> (1 2) + (foo 1) ==> (1 10) + + Sometimes, however, you may need more flexibility in choosing the +default value. You may want to compute a default value based on other +parameters. And you can-the default-value expression can refer to +parameters that occur earlier in the parameter list. If you were +writing a function that returned some sort of representation of a +rectangle and you wanted to make it especially convenient to make +squares, you might use an argument list like this: + + (defun make-rectangle (width &optional (height width)) ...) + + which would cause the height parameter to take the same value as the +width parameter unless explicitly specified. + + Occasionally, it's useful to know whether the value of an optional +argument was supplied by the caller or is the default value. Rather +than writing code to check whether the value of the parameter is the +default (which doesn't work anyway, if the caller happens to explicitly +pass the default value), you can add another variable name to the +parameter specifier after the default-value expression. This variable +will be bound to true if the caller actually supplied an argument for +this parameter and NIL otherwise. By convention, these variables are +usually named the same as the actual parameter with a "-supplied-p" on +the end. For example: + + (defun foo (a b &optional (c 3 c-supplied-p)) + (list a b c c-supplied-p)) + + This gives results like this: + + (foo 1 2) ==> (1 2 3 NIL) + (foo 1 2 3) ==> (1 2 3 T) + (foo 1 2 4) ==> (1 2 4 T) + + ---------- Footnotes ---------- + + (1) In languages that don't support optional parameters directly, +programmers typically find ways to simulate them. One technique is to +use distinguished "no-value" values that the caller can pass to indicate +they want the default value of a given parameter. In C, for example, +it's common to use NULL as such a distinguished value. However, such a +protocol between the function and its callers is ad hoc-in some +functions or for some arguments NULL may be the distinguished value +while in other functions or for other arguments the magic value may be +-1 or some #defined constant. + + +File: pcl.info, Node: 5-4, Next: 5-5, Prev: 5-3, Up: Chapter 5 + +Rest Parameters +=============== + +Optional parameters are just the thing when you have discrete parameters +for which the caller may or may not want to provide values. But some +functions need to take a variable number of arguments. Several of the +built-in functions you've seen already work this way. FORMAT has two +required arguments, the stream and the control string. But after that +it needs a variable number of arguments depending on how many values +need to be interpolated into the control string. The + function also +takes a variable number of arguments-there's no particular reason to +limit it to summing just two numbers; it will sum any number of values. +(It even works with zero arguments, returning 0, the identity under +addition.) The following are all legal calls of those two functions: + + (format t "hello, world") + (format t "hello, ~a" name) + (format t "x: ~d y: ~d" x y) + (+) + (+ 1) + (+ 1 2) + (+ 1 2 3) + + Obviously, you could write functions taking a variable number of +arguments by simply giving them a lot of optional parameters. But that +would be incredibly painful-just writing the parameter list would be bad +enough, and that doesn't get into dealing with all the parameters in the +body of the function. To do it properly, you'd have to have as many +optional parameters as the number of arguments that can legally be +passed in a function call. This number is implementation dependent but +guaranteed to be at least 50. And in current implementations it ranges +from 4,096 to 536,870,911. (1) Blech. That kind of mind-bending tedium +is definitely not The Lisp Way. + + Instead, Lisp lets you include a catchall parameter after the symbol +&rest. If a function includes a &rest parameter, any arguments +remaining after values have been doled out to all the required and +optional parameters are gathered up into a list that becomes the value +of the &rest parameter. Thus, the parameter lists for FORMAT and + +probably look something like this: + + (defun format (stream string &rest values) ...) + (defun + (&rest numbers) ...) + + ---------- Footnotes ---------- + + (1) The constant CALL-ARGUMENTS-LIMIT tells you the +implementation-specific value. + + +File: pcl.info, Node: 5-5, Next: 5-6, Prev: 5-4, Up: Chapter 5 + +Keyword Parameters +================== + +Optional and rest parameters give you quite a bit of flexibility, but +neither is going to help you out much in the following situation: +Suppose you have a function that takes four optional parameters. Now +suppose that most of the places the function is called, the caller wants +to provide a value for only one of the four parameters and, further, +that the callers are evenly divided as to which parameter they will use. + + The callers who want to provide a value for the first parameter are +fine-they just pass the one optional argument and leave off the rest. +But all the other callers have to pass some value for between one and +three arguments they don't care about. Isn't that exactly the problem +optional parameters were designed to solve? + + Of course it is. The problem is that optional parameters are still +positional-if the caller wants to pass an explicit value for the fourth +optional parameter, it turns the first three optional parameters into +required parameters for that caller. Luckily, another parameter flavor, +keyword parameters, allow the caller to specify which values go with +which parameters. + + To give a function keyword parameters, after any required, &optional, +and &rest parameters you include the symbol &key and then any number of +keyword parameter specifiers, which work like optional parameter +specifiers. Here's a function that has only keyword parameters: + + (defun foo (&key a b c) (list a b c)) + + When this function is called, each keyword parameters is bound to the +value immediately following a keyword of the same name. Recall from +Chapter 4 that keywords are names that start with a colon and that +they're automatically defined as self-evaluating constants. + + If a given keyword doesn't appear in the argument list, then the +corresponding parameter is assigned its default value, just like an +optional parameter. Because the keyword arguments are labeled, they can +be passed in any order as long as they follow any required arguments. +For instance, foo can be invoked as follows: + + (foo) ==> (NIL NIL NIL) + (foo :a 1) ==> (1 NIL NIL) + (foo :b 1) ==> (NIL 1 NIL) + (foo :c 1) ==> (NIL NIL 1) + (foo :a 1 :c 3) ==> (1 NIL 3) + (foo :a 1 :b 2 :c 3) ==> (1 2 3) + (foo :a 1 :c 3 :b 2) ==> (1 2 3) + + As with optional parameters, keyword parameters can provide a default +value form and the name of a supplied-p variable. In both keyword and +optional parameters, the default value form can refer to parameters that +appear earlier in the parameter list. + + (defun foo (&key (a 0) (b 0 b-supplied-p) (c (+ a b))) + (list a b c b-supplied-p)) + + (foo :a 1) ==> (1 0 1 NIL) + (foo :b 1) ==> (0 1 1 T) + (foo :b 1 :c 4) ==> (0 1 4 T) + (foo :a 2 :b 1 :c 4) ==> (2 1 4 T) + + Also, if for some reason you want the keyword the caller uses to +specify the parameter to be different from the name of the actual +parameter, you can replace the parameter name with another list +containing the keyword to use when calling the function and the name to +be used for the parameter. The following definition of foo: + + (defun foo (&key ((:apple a)) ((:box b) 0) ((:charlie c) 0 c-supplied-p)) + (list a b c c-supplied-p)) + + lets the caller call it like this: + + (foo :apple 10 :box 20 :charlie 30) ==> (10 20 30 T) + + This style is mostly useful if you want to completely decouple the +public API of the function from the internal details, usually because +you want to use short variable names internally but descriptive keywords +in the API. It's not, however, very frequently used. + + +File: pcl.info, Node: 5-6, Next: 5-7, Prev: 5-5, Up: Chapter 5 + +Mixing Different Parameter Types +================================ + +It's possible, but rare, to use all four flavors of parameters in a +single function. Whenever more than one flavor of parameter is used, +they must be declared in the order I've discussed them: first the names +of the required parameters, then the optional parameters, then the rest +parameter, and finally the keyword parameters. Typically, however, in +functions that use multiple flavors of parameters, you'll combine +required parameters with one other flavor or possibly combine &optional +and &rest parameters. The other two combinations, either &optional or +&rest parameters combined with &key parameters, can lead to somewhat +surprising behavior. + + Combining &optional and &key parameters yields surprising enough +results that you should probably avoid it altogether. The problem is +that if a caller doesn't supply values for all the optional parameters, +then those parameters will eat up the keywords and values intended for +the keyword parameters. For instance, this function unwisely mixes +&optional and &key parameters: + + (defun foo (x &optional y &key z) (list x y z)) + + If called like this, it works fine: + + (foo 1 2 :z 3) ==> (1 2 3) + + And this is also fine: + + (foo 1) ==> (1 nil nil) + + But this will signal an error: + + (foo 1 :z 3) ==> ERROR + + This is because the keyword :z is taken as a value to fill the +optional y parameter, leaving only the argument 3 to be processed. At +that point, Lisp will be expecting either a keyword/value pair or +nothing and will complain. Perhaps even worse, if the function had had +two &optional parameters, this last call would have resulted in the +values :z and 3 being bound to the two &optional parameters and the &key +parameter z getting the default value NIL with no indication that +anything was amiss. + + In general, if you find yourself writing a function that uses both +&optional and &key parameters, you should probably just change it to use +all &key parameters-they're more flexible, and you can always add new +keyword parameters without disturbing existing callers of the function. +You can also remove keyword parameters, as long as no one is using them. +(1) In general, using keyword parameters helps make code much easier to +maintain and evolve-if you need to add some new behavior to a function +that requires new parameters, you can add keyword parameters without +having to touch, or even recompile, any existing code that calls the +function. + + You can safely combine &rest and &key parameters, but the behavior +may be a bit surprising initially. Normally the presence of either +&rest or &key in a parameter list causes all the values remaining after +the required and &optional parameters have been filled in to be +processed in a particular way-either gathered into a list for a &rest +parameter or assigned to the appropriate &key parameters based on the +keywords. If both &rest and &key appear in a parameter list, then both +things happen-all the remaining values, which include the keywords +themselves, are gathered into a list that's bound to the &rest +parameter, and the appropriate values are also bound to the &key +parameters. So, given this function: + + (defun foo (&rest rest &key a b c) (list rest a b c)) + + you get this result: + + (foo :a 1 :b 2 :c 3) ==> ((:A 1 :B 2 :C 3) 1 2 3) + + ---------- Footnotes ---------- + + (1) Four standard functions take both &optional and &key +arguments-READ-FROM-STRING, PARSE-NAMESTRING, WRITE-LINE, and +WRITE-STRING. They were left that way during standardization for +backward compatibility with earlier Lisp dialects. READ-FROM-STRING +tends to be the one that catches new Lisp programmers most frequently-a +call such as (read-from-string s :start 10) seems to ignore the :start +keyword argument, reading from index 0 instead of 10. That's because +READ-FROM-STRING also has two &optional parameters that swallowed up the +arguments :start and 10. + + +File: pcl.info, Node: 5-7, Next: 5-8, Prev: 5-6, Up: Chapter 5 + +Function Return Values +====================== + +All the functions you've written so far have used the default behavior +of returning the value of the last expression evaluated as their own +return value. This is the most common way to return a value from a +function. + + However, sometimes it's convenient to be able to return from the +middle of a function such as when you want to break out of nested +control constructs. In such cases you can use the RETURN-FROM special +operator to immediately return any value from the function. + + You'll see in Chapter 20 that RETURN-FROM is actually not tied to +functions at all; it's used to return from a block of code defined with +the BLOCK special operator. However, DEFUN automatically wraps the +whole function body in a block with the same name as the function. So, +evaluating a RETURN-FROM with the name of the function and the value you +want to return will cause the function to immediately exit with that +value. RETURN-FROM is a special operator whose first "argument" is the +name of the block from which to return. This name isn't evaluated and +thus isn't quoted. + + The following function uses nested loops to find the first pair of +numbers, each less than 10, whose product is greater than the argument, +and it uses RETURN-FROM to return the pair as soon as it finds it: + + (defun foo (n) + (dotimes (i 10) + (dotimes (j 10) + (when (> (* i j) n) + (return-from foo (list i j)))))) + + Admittedly, having to specify the name of the function you're +returning from is a bit of a pain-for one thing, if you change the +function's name, you'll need to change the name used in the RETURN-FROM +as well. (1) But it's also the case that explicit RETURN-FROMs are used +much less frequently in Lisp than return statements in C-derived +languages, because all Lisp expressions, including control constructs +such as loops and conditionals, evaluate to a value. So it's not much +of a problem in practice. + + ---------- Footnotes ---------- + + (1) Another macro, RETURN, doesn't require a name. However, you +can't use it instead of RETURN-FROM to avoid having to specify the +function name; it's syntactic sugar for returning from a block named +NIL. I'll cover it, along with the details of BLOCK and RETURN-FROM, in +Chapter 20. + + +File: pcl.info, Node: 5-8, Next: 5-9, Prev: 5-7, Up: Chapter 5 + +Functions As Data, a.k.a. Higher-Order Functions +================================================ + +While the main way you use functions is to call them by name, a number +of situations exist where it's useful to be able treat functions as +data. For instance, if you can pass one function as an argument to +another, you can write a general-purpose sorting function while allowing +the caller to provide a function that's responsible for comparing any +two elements. Then the same underlying algorithm can be used with many +different comparison functions. Similarly, callbacks and hooks depend +on being able to store references to code in order to run it later. +Since functions are already the standard way to abstract bits of code, +it makes sense to allow functions to be treated as data. (1) + + In Lisp, functions are just another kind of object. When you define +a function with DEFUN, you're really doing two things: creating a new +function object and giving it a name. It's also possible, as you saw in +Chapter 3, to use LAMBDA expressions to create a function without giving +it a name. The actual representation of a function object, whether +named or anonymous, is opaque-in a native-compiling Lisp, it probably +consists mostly of machine code. The only things you need to know are +how to get hold of it and how to invoke it once you've got it. + + The special operator FUNCTION provides the mechanism for getting at a +function object. It takes a single argument and returns the function +with that name. The name isn't quoted. Thus, if you've defined a +function foo, like so: + + CL-USER> (defun foo (x) (* 2 x)) + FOO + + you can get the function object like this: (2) + + CL-USER> (function foo) + #<Interpreted Function FOO> + + In fact, you've already used FUNCTION, but it was in disguise. The +syntax #', which you used in Chapter 3, is syntactic sugar for FUNCTION, +just the way ' is syntactic sugar for QUOTE. (3) Thus, you can also get +the function object for foo like this: + + CL-USER> #'foo + #<Interpreted Function FOO> + + Once you've got the function object, there's really only one thing +you can do with it-invoke it. Common Lisp provides two functions for +invoking a function through a function object: FUNCALL and APPLY. (4) +They differ only in how they obtain the arguments to pass to the +function. + + FUNCALL is the one to use when you know the number of arguments +you're going to pass to the function at the time you write the code. +The first argument to FUNCALL is the function object to be invoked, and +the rest of the arguments are passed onto that function. Thus, the +following two expressions are equivalent: + + (foo 1 2 3) === (funcall #'foo 1 2 3) + + However, there's little point in using FUNCALL to call a function +whose name you know when you write the code. In fact, the previous two +expressions will quite likely compile to exactly the same machine +instructions. + + The following function demonstrates a more apt use of FUNCALL. It +accepts a function object as an argument and plots a simple ASCII-art +histogram of the values returned by the argument function when it's +invoked on the values from min to max, stepping by step. + + (defun plot (fn min max step) + (loop for i from min to max by step do + (loop repeat (funcall fn i) do (format t "*")) + (format t "~%"))) + + The FUNCALL expression computes the value of the function for each +value of i. The inner LOOP uses that computed value to determine how +many times to print an asterisk to standard output. + + Note that you don't use FUNCTION or #' to get the function value of +fn; you want it to be interpreted as a variable because it's the +variable's value that will be the function object. You can call plot +with any function that takes a single numeric argument, such as the +built-in function EXP that returns the value of e raised to the power of +its argument. + + CL-USER> (plot #'exp 0 4 1/2) + * + * + ** + **** + ******* + ************ + ******************** + ********************************* + ****************************************************** + NIL + + FUNCALL, however, doesn't do you any good when the argument list is +known only at runtime. For instance, to stick with the plot function +for another moment, suppose you've obtained a list containing a function +object, a minimum and maximum value, and a step value. In other words, +the list contains the values you want to pass as arguments to plot. +Suppose this list is in the variable plot-data. You could invoke plot +on the values in that list like this: + + (plot (first plot-data) (second plot-data) (third plot-data) (fourth plot-data)) + + This works fine, but it's pretty annoying to have to explicitly +unpack the arguments just so you can pass them to plot. + + That's where APPLY comes in. Like FUNCALL, the first argument to +APPLY is a function object. But after the function object, instead of +individual arguments, it expects a list. It then applies the function +to the values in the list. This allows you to write the following +instead: + + (apply #'plot plot-data) + + As a further convenience, APPLY can also accept "loose" arguments as +long as the last argument is a list. Thus, if plot-data contained just +the min, max, and step values, you could still use APPLY like this to +plot the EXP function over that range: + + (apply #'plot #'exp plot-data) + + APPLY doesn't care about whether the function being applied takes +&optional, &rest, or &key arguments-the argument list produced by +combining any loose arguments with the final list must be a legal +argument list for the function with enough arguments for all the +required parameters and only appropriate keyword parameters. + + ---------- Footnotes ---------- + + (1) Lisp, of course, isn't the only language to treat functions as +data. C uses function pointers, Perl uses subroutine references, Python +uses a scheme similar to Lisp, and C# introduces delegates, essentially +typed function pointers, as an improvement over Java's rather clunky +reflection and anonymous class mechanisms. + + (2) The exact printed representation of a function object will differ +from implementation to implementation. + + (3) The best way to think of FUNCTION is as a special kind of +quotation. QUOTEing a symbol prevents it from being evaluated at all, +resulting in the symbol itself rather than the value of the variable +named by that symbol. FUNCTION also circumvents the normal evaluation +rule but, instead of preventing the symbol from being evaluated at all, +causes it to be evaluated as the name of a function, just the way it +would if it were used as the function name in a function call +expression. + + (4) There's actually a third, the special operator +MULTIPLE-VALUE-CALL, but I'll save that for when I discuss expressions +that return multiple values in Chapter 20. + + +File: pcl.info, Node: 5-9, Next: Chapter 6, Prev: 5-7, Up: Chapter 5 + +Anonymous Functions +=================== + +Once you start writing, or even simply using, functions that accept +other functions as arguments, you're bound to discover that sometimes +it's annoying to have to define and name a whole separate function +that's used in only one place, especially when you never call it by +name. + + When it seems like overkill to define a new function with DEFUN, you +can create an "anonymous" function using a LAMBDA expression. As +discussed in Chapter 3, a LAMBDA expression looks like this: + + (lambda (parameters) body) + + One way to think of LAMBDA expressions is as a special kind of +function name where the name itself directly describes what the function +does. This explains why you can use a LAMBDA expression in the place of +a function name with #'. + + (funcall #'(lambda (x y) (+ x y)) 2 3) ==> 5 + + You can even use a LAMBDA expression as the "name" of a function in a +function call expression. If you wanted, you could write the previous +FUNCALL expression more concisely. + + ((lambda (x y) (+ x y)) 2 3) ==> 5 + + But this is almost never done; it's merely worth noting that it's +legal in order to emphasize that LAMBDA expressions can be used anywhere +a normal function name can be. (1) + + Anonymous functions can be useful when you need to pass a function as +an argument to another function and the function you need to pass is +simple enough to express inline. For instance, suppose you wanted to +plot the function 2x. You could define the following function: + + (defun double (x) (* 2 x)) + + which you could then pass to plot. + + CL-USER> (plot #'double 0 10 1) + + ** + **** + ****** + ******** + ********** + ************ + ************** + **************** + ****************** + ******************** + NIL + + But it's easier, and arguably clearer, to write this: + + CL-USER> (plot #'(lambda (x) (* 2 x)) 0 10 1) + + ** + **** + ****** + ******** + ********** + ************ + ************** + **************** + ****************** + ******************** + NIL + + The other important use of LAMBDA expressions is in making closures, +functions that capture part of the environment where they're created. +You used closures a bit in Chapter 3, but the details of how closures +work and what they're used for is really more about how variables work +than functions, so I'll save that discussion for the next chapter. + + ---------- Footnotes ---------- + + (1) In Common Lisp it's also possible to use a LAMBDA expression as +an argument to FUNCALL (or some other function that takes a function +argument such as SORT or MAPCAR) with no #' before it, like this: + + (funcall (lambda (x y) (+ x y)) 2 3) + + This is legal and is equivalent to the version with the #' but for a +tricky reason. Historically LAMBDA expressions by themselves weren't +expressions that could be evaluated. That is LAMBDA wasn't the name of +a function, macro, or special operator. Rather, a list starting with +the symbol LAMBDA was a special syntactic construct that Lisp recognized +as a kind of function name. + + But if that were still true, then (funcall (lambda (...) ...)) +would be illegal because FUNCALL is a function and the normal evaluation +rule for a function call would require that the LAMBDA expression be +evaluated. However, late in the ANSI standardization process, in order +to make it possible to implement ISLISP, another Lisp dialect being +standardized at the same time, strictly as a user-level compatibility +layer on top of Common Lisp, a LAMBDA macro was defined that expands +into a call to FUNCTION wrapped around the LAMBDA expression. In other +words, the following LAMBDA expression: + + (lambda () 42) + + expands into the following when it occurs in a context where it +evaluated: + + (function (lambda () 42)) ; or #'(lambda () 42) + + This makes its use in a value position, such as an argument to +FUNCALL, legal. In other words, it's pure syntactic sugar. Most folks +either always use #' before LAMBDA expressions in value positions or +never do. In this book, I always use #'. + + +File: pcl.info, Node: Chapter 6, Next: Chapter 7, Prev: Chapter 5, Up: Top + +6. Variables +============ + +The next basic building block we need to look at are variables. Common +Lisp supports two kinds of variables: lexical and dynamic. (1) These +two types correspond roughly to "local" and "global" variables in other +languages. However, the correspondence is only approximate. On one +hand, some languages' "local" variables are in fact much like Common +Lisp's dynamic variables. (2) And on the other, some languages' local +variables are lexically scoped without providing all the capabilities +provided by Common Lisp's lexical variables. In particular, not all +languages that provide lexically scoped variables support closures. + + To make matters a bit more confusing, many of the forms that deal +with variables can be used with both lexical and dynamic variables. So +I'll start by discussing a few aspects of Lisp's variables that apply to +both kinds and then cover the specific characteristics of lexical and +dynamic variables. Then I'll discuss Common Lisp's general-purpose +assignment operator, SETF, which is used to assign new values to +variables and just about every other place that can hold a value. + +* Menu: + +* 6-1:: Variable Basics +* 6-2:: Lexical Variables and Closures +* 6-3:: Dynamic, a.k.a. Special, Variables +* 6-4:: Constants +* 6-5:: Assignment +* 6-6:: Generalized Assignment +* 6-7:: Other Ways to Modify Places + + ---------- Footnotes ---------- + + (1) Dynamic variables are also sometimes called special variables for +reasons you'll see later in this chapter. It's important to be aware of +this synonym, as some folks (and Lisp implementations) use one term +while others use the other. + + (2) Early Lisps tended to use dynamic variables for local variables, +at least when interpreted. Elisp, the Lisp dialect used in Emacs, is a +bit of a throwback in this respect, continuing to support only dynamic +variables. Other languages have recapitulated this transition from +dynamic to lexical variables-Perl's local variables, for instance, are +dynamic while its my variables, introduced in Perl 5, are lexical. +Python never had true dynamic variables but only introduced true lexical +scoping in version 2.2. (Python's lexical variables are still somewhat +limited compared to Lisp's because of the conflation of assignment and +binding in the language's syntax.) + + +File: pcl.info, Node: 6-1, Next: 6-2, Prev: Chapter 6, Up: Chapter 6 + +Variable Basics +=============== + +As in other languages, in Common Lisp variables are named places that +can hold a value. However, in Common Lisp, variables aren't typed the +way they are in languages such as Java or C++. That is, you don't need +to declare the type of object that each variable can hold. Instead, a +variable can hold values of any type and the values carry type +information that can be used to check types at runtime. Thus, Common +Lisp is dynamically typed-type errors are detected dynamically. For +instance, if you pass something other than a number to the + function, +Common Lisp will signal a type error. On the other hand, Common Lisp is +a strongly typed language in the sense that all type errors will be +detected-there's no way to treat an object as an instance of a class +that it's not. (1) + + All values in Common Lisp are, conceptually at least, references to +objects. (2) Consequently, assigning a variable a new value changes +what object the variable refers to but has no effect on the previously +referenced object. However, if a variable holds a reference to a +mutable object, you can use that reference to modify the object, and the +modification will be visible to any code that has a reference to the +same object. + + One way to introduce new variables you've already used is to define +function parameters. As you saw in the previous chapter, when you +define a function with DEFUN, the parameter list defines the variables +that will hold the function's arguments when it's called. For example, +this function defines three variables-x, y, and z-to hold its arguments. + + (defun foo (x y z) (+ x y z)) + + Each time a function is called, Lisp creates new bindings to hold the +arguments passed by the function's caller. A binding is the runtime +manifestation of a variable. A single variable-the thing you can point +to in the program's source code-can have many different bindings during +a run of the program. A single variable can even have multiple bindings +at the same time; parameters to a recursive function, for example, are +rebound for each call to the function. + + As with all Common Lisp variables, function parameters hold object +references. (3) Thus, you can assign a new value to a function +parameter within the body of the function, and it will not affect the +bindings created for another call to the same function. But if the +object passed to a function is mutable and you change it in the +function, the changes will be visible to the caller since both the +caller and the callee will be referencing the same object. + + Another form that introduces new variables is the LET special +operator. The skeleton of a LET form looks like this: + + (let (variable*) + body-form*) + + where each variable is a variable initialization form. Each +initialization form is either a list containing a variable name and an +initial value form or-as a shorthand for initializing the variable to +NIL-a plain variable name. The following LET form, for example, binds +the three variables x, y, and z with initial values 10, 20, and NIL: + + (let ((x 10) (y 20) z) + ...) + + When the LET form is evaluated, all the initial value forms are first +evaluated. Then new bindings are created and initialized to the +appropriate initial values before the body forms are executed. Within +the body of the LET, the variable names refer to the newly created +bindings. After the LET, the names refer to whatever, if anything, they +referred to before the LET. + + The value of the last expression in the body is returned as the value +of the LET expression. Like function parameters, variables introduced +with LET are rebound each time the LET is entered. (4) + + The scope of function parameters and LET variables-the area of the +program where the variable name can be used to refer to the variable's +binding-is delimited by the form that introduces the variable. This +form-the function definition or the LET-is called the binding form. As +you'll see in a bit, the two types of variables-lexical and dynamic-use +two slightly different scoping mechanisms, but in both cases the scope +is delimited by the binding form. + + If you nest binding forms that introduce variables with the same +name, then the bindings of the innermost variable shadows the outer +bindings. For instance, when the following function is called, a +binding is created for the parameter x to hold the function's argument. +Then the first LET creates a new binding with the initial value 2, and +the inner LET creates yet another binding, this one with the initial +value 3. The bars on the right mark the scope of each binding. + + (defun foo (x) + (format t "Parameter: ~a~%" x) ; |<------ x is argument + (let ((x 2)) ; | + (format t "Outer LET: ~a~%" x) ; | |<---- x is 2 + (let ((x 3)) ; | | + (format t "Inner LET: ~a~%" x)) ; | | |<-- x is 3 + (format t "Outer LET: ~a~%" x)) ; | | + (format t "Parameter: ~a~%" x)) ; | + + Each reference to x will refer to the binding with the smallest +enclosing scope. Once control leaves the scope of one binding form, the +binding from the immediately enclosing scope is unshadowed and x refers +to it instead. Thus, calling foo results in this output: + + CL-USER> (foo 1) + Parameter: 1 + Outer LET: 2 + Inner LET: 3 + Outer LET: 2 + Parameter: 1 + NIL + + In future chapters I'll discuss other constructs that also serve as +binding forms-any construct that introduces a new variable name that's +usable only within the construct is a binding form. + + For instance, in Chapter 7 you'll meet the DOTIMES loop, a basic +counting loop. It introduces a variable that holds the value of a +counter that's incremented each time through the loop. The following +loop, for example, which prints the numbers from 0 to 9, binds the +variable x: + + (dotimes (x 10) (format t "~d " x)) + + Another binding form is a variant of LET, LET*. The difference is +that in a LET, the variable names can be used only in the body of the +LET-the part of the LET after the variables list-but in a LET*, the +initial value forms for each variable can refer to variables introduced +earlier in the variables list. Thus, you can write the following: + + (let* ((x 10) + (y (+ x 10))) + (list x y)) + + but not this: + + (let ((x 10) + (y (+ x 10))) + (list x y)) + + However, you could achieve the same result with nested LETs. + + (let ((x 10)) + (let ((y (+ x 10))) + (list x y))) + + ---------- Footnotes ---------- + + (1) Actually, it's not quite true to say that all type errors will +always be detected-it's possible to use optional declarations to tell +the compiler that certain variables will always contain objects of a +particular type and to turn off runtime type checking in certain regions +of code. However, declarations of this sort are used to optimize code +after it has been developed and debugged, not during normal development. + + (2) As an optimization certain kinds of objects, such as integers +below a certain size and characters, may be represented directly in +memory where other objects would be represented by a pointer to the +actual object. However, since integers and characters are immutable, it +doesn't matter that there may be multiple copies of "the same" object in +different variables. This is the root of the difference between EQ and +EQL discussed in Chapter 4. + + (3) In compiler-writer terms Common Lisp functions are +"pass-by-value." However, the values that are passed are references to +objects. This is similar to how Java and Python work. + + (4) The variables in LET forms and function parameters are created by +exactly the same mechanism. In fact, in some Lisp dialects-though not +Common Lisp-LET is simply a macro that expands into a call to an +anonymous function. That is, in those dialects, the following: + + (let ((x 10)) (format t "~a" x)) + + is a macro form that expands into this: + + ((lambda (x) (format t "~a" x)) 10) + + +File: pcl.info, Node: 6-2, Next: 6-3, Prev: 6-1, Up: Chapter 6 + +Lexical Variables and Closures +============================== + +By default all binding forms in Common Lisp introduce lexically scoped +variables. Lexically scoped variables can be referred to only by code +that's textually within the binding form. Lexical scoping should be +familiar to anyone who has programmed in Java, C, Perl, or Python since +they all provide lexically scoped "local" variables. For that matter, +Algol programmers should also feel right at home, as Algol first +introduced lexical scoping in the 1960s. + + However, Common Lisp's lexical variables are lexical variables with a +twist, at least compared to the original Algol model. The twist is +provided by the combination of lexical scoping with nested functions. +By the rules of lexical scoping, only code textually within the binding +form can refer to a lexical variable. But what happens when an +anonymous function contains a reference to a lexical variable from an +enclosing scope? For instance, in this expression: + + (let ((count 0)) #'(lambda () (setf count (1+ count)))) + + the reference to count inside the LAMBDA form should be legal +according to the rules of lexical scoping. Yet the anonymous function +containing the reference will be returned as the value of the LET form +and can be invoked, via FUNCALL, by code that's not in the scope of the +LET. So what happens? As it turns out, when count is a lexical +variable, it just works. The binding of count created when the flow of +control entered the LET form will stick around for as long as needed, in +this case for as long as someone holds onto a reference to the function +object returned by the LET form. The anonymous function is called a +closure because it "closes over" the binding created by the LET. + + The key thing to understand about closures is that it's the binding, +not the value of the variable, that's captured. Thus, a closure can not +only access the value of the variables it closes over but can also +assign new values that will persist between calls to the closure. For +instance, you can capture the closure created by the previous expression +in a global variable like this: + + (defparameter *fn* (let ((count 0)) #'(lambda () (setf count (1+ count))))) + + Then each time you invoke it, the value of count will increase by +one. + + CL-USER> (funcall *fn*) + 1 + CL-USER> (funcall *fn*) + 2 + CL-USER> (funcall *fn*) + 3 + + A single closure can close over many variable bindings simply by +referring to them. Or multiple closures can capture the same binding. +For instance, the following expression returns a list of three closures, +one that increments the value of the closed over count binding, one that +decrements it, and one that returns the current value: + + (let ((count 0)) + (list + #'(lambda () (incf count)) + #'(lambda () (decf count)) + #'(lambda () count))) + + +File: pcl.info, Node: 6-3, Next: 6-4, Prev: 6-2, Up: Chapter 6 + +Dynamic, a.k.a. Special, Variables +================================== + +Lexically scoped bindings help keep code understandable by limiting the +scope, literally, in which a given name has meaning. This is why most +modern languages use lexical scoping for local variables. Sometimes, +however, you really want a global variable-a variable that you can refer +to from anywhere in your program. While it's true that indiscriminate +use of global variables can turn code into spaghetti nearly as quickly +as unrestrained use of goto, global variables do have legitimate uses +and exist in one form or another in almost every programming language. +(1) And as you'll see in a moment, Lisp's version of global variables, +dynamic variables, are both more useful and more manageable. + + Common Lisp provides two ways to create global variables: DEFVAR and +DEFPARAMETER. Both forms take a variable name, an initial value, and an +optional documentation string. After it has been DEFVARed or +DEFPARAMETERed, the name can be used anywhere to refer to the current +binding of the global variable. As you've seen in previous chapters, +global variables are conventionally named with names that start and end +with *. You'll see later in this section why it's quite important to +follow that naming convention. Examples of DEFVAR and DEFPARAMETER look +like this: + + (defvar *count* 0 + "Count of widgets made so far.") + + (defparameter *gap-tolerance* 0.001 + "Tolerance to be allowed in widget gaps.") + + The difference between the two forms is that DEFPARAMETER always +assigns the initial value to the named variable while DEFVAR does so +only if the variable is undefined. A DEFVAR form can also be used with +no initial value to define a global variable without giving it a value. +Such a variable is said to be unbound. + + Practically speaking, you should use DEFVAR to define variables that +will contain data you'd want to keep even if you made a change to the +source code that uses the variable. For instance, suppose the two +variables defined previously are part of an application for controlling +a widget factory. It's appropriate to define the *count* variable with +DEFVAR because the number of widgets made so far isn't invalidated just +because you make some changes to the widget-making code. (2) + + On the other hand, the variable *gap-tolerance* presumably has some +effect on the behavior of the widget-making code itself. If you decide +you need a tighter or looser tolerance and change the value in the +DEFPARAMETER form, you'd like the change to take effect when you +recompile and reload the file. + + After defining a variable with DEFVAR or DEFPARAMETER, you can refer +to it from anywhere. For instance, you might define this function to +increment the count of widgets made: + + (defun increment-widget-count () (incf *count*)) + + The advantage of global variables is that you don't have to pass them +around. Most languages store the standard input and output streams in +global variables for exactly this reason-you never know when you're +going to want to print something to standard out, and you don't want +every function to have to accept and pass on arguments containing those +streams just in case someone further down the line needs them. + + However, once a value, such as the standard output stream, is stored +in a global variable and you have written code that references that +global variable, it's tempting to try to temporarily modify the behavior +of that code by changing the variable's value. + + For instance, suppose you're working on a program that contains some +low-level logging functions that print to the stream in the global +variable *standard-output*. Now suppose that in part of the program you +want to capture all the output generated by those functions into a file. +You might open a file and assign the resulting stream to +*standard-output*. Now the low-level functions will send their output +to the file. + + This works fine until you forget to set *standard-output* back to the +original stream when you're done. If you forget to reset +*standard-output*, all the other code in the program that uses +*standard-output* will also send its output to the file. (3) + + What you really want, it seems, is a way to wrap a piece of code in +something that says, "All code below here-all the functions it calls, +all the functions they call, and so on, down to the lowest-level +functions-should use this value for the global variable +*standard-output*." Then when the high-level function returns, the old +value of *standard-output* should be automatically restored. + + It turns out that that's exactly what Common Lisp's other kind of +variable-dynamic variables-let you do. When you bind a dynamic +variable-for example, with a LET variable or a function parameter-the +binding that's created on entry to the binding form replaces the global +binding for the duration of the binding form. Unlike a lexical binding, +which can be referenced by code only within the lexical scope of the +binding form, a dynamic binding can be referenced by any code that's +invoked during the execution of the binding form. (4) And it turns out +that all global variables are, in fact, dynamic variables. + + Thus, if you want to temporarily redefine *standard-output*, the way +to do it is simply to rebind it, say, with a LET. + + (let ((*standard-output* *some-other-stream*)) + (stuff)) + + In any code that runs as a result of the call to stuff, references to +*standard-output* will use the binding established by the LET. And when +stuff returns and control leaves the LET, the new binding of +*standard-output* will go away and subsequent references to +*standard-output* will see the binding that was current before the LET. +At any given time, the most recently established binding shadows all +other bindings. Conceptually, each new binding for a given dynamic +variable is pushed onto a stack of bindings for that variable, and +references to the variable always use the most recent binding. As +binding forms return, the bindings they created are popped off the +stack, exposing previous bindings. (5) + + A simple example shows how this works. + + (defvar *x* 10) + (defun foo () (format t "X: ~d~%" *x*)) + + The DEFVAR creates a global binding for the variable *x* with the +value 10. The reference to *x* in foo will look up the current binding +dynamically. If you call foo from the top level, the global binding +created by the DEFVAR is the only binding available, so it prints 10. + + CL-USER> (foo) + X: 10 + NIL + + But you can use LET to create a new binding that temporarily shadows +the global binding, and foo will print a different value. + + CL-USER> (let ((*x* 20)) (foo)) + X: 20 + NIL + + Now call foo again, with no LET, and it again sees the global +binding. + + CL-USER> (foo) + X: 10 + NIL + + Now define another function. + + (defun bar () + (foo) + (let ((*x* 20)) (foo)) + (foo)) + + Note that the middle call to foo is wrapped in a LET that binds *x* +to the new value 20. When you run bar, you get this result: + + CL-USER> (bar) + X: 10 + X: 20 + X: 10 + NIL + + As you can see, the first call to foo sees the global binding, with +its value of 10. The middle call, however, sees the new binding, with +the value 20. But after the LET, foo once again sees the global +binding. + + As with lexical bindings, assigning a new value affects only the +current binding. To see this, you can redefine foo to include an +assignment to *x*. + + (defun foo () + (format t "Before assignment~18tX: ~d~%" *x*) + (setf *x* (+ 1 *x*)) + (format t "After assignment~18tX: ~d~%" *x*)) + + Now foo prints the value of *x*, increments it, and prints it again. +If you just run foo, you'll see this: + + CL-USER> (foo) + Before assignment X: 10 + After assignment X: 11 + NIL + + Not too surprising. Now run bar. + + CL-USER> (bar) + Before assignment X: 11 + After assignment X: 12 + Before assignment X: 20 + After assignment X: 21 + Before assignment X: 12 + After assignment X: 13 + NIL + + Notice that *x* started at 11-the earlier call to foo really did +change the global value. The first call to foo from bar increments the +global binding to 12. The middle call doesn't see the global binding +because of the LET. Then the last call can see the global binding again +and increments it from 12 to 13. + + So how does this work? How does LET know that when it binds *x* it's +supposed to create a dynamic binding rather than a normal lexical +binding? It knows because the name has been declared special. (6) The +name of every variable defined with DEFVAR and DEFPARAMETER is +automatically declared globally special. This means whenever you use +such a name in a binding form-in a LET or as a function parameter or any +other construct that creates a new variable binding-the binding that's +created will be a dynamic binding. This is why the *naming* +*convention* is so important-it'd be bad news if you used a name for +what you thought was a lexical variable and that variable happened to be +globally special. On the one hand, code you call could change the value +of the binding out from under you; on the other, you might be shadowing +a binding established by code higher up on the stack. If you always +name global variables according to the * naming convention, you'll never +accidentally use a dynamic binding where you intend to establish a +lexical binding. + + It's also possible to declare a name locally special. If, in a +binding form, you declare a name special, then the binding created for +that variable will be dynamic rather than lexical. Other code can +locally declare a name special in order to refer to the dynamic binding. +However, locally special variables are relatively rare, so you needn't +worry about them. (7) + + Dynamic bindings make global variables much more manageable, but it's +important to notice they still allow action at a distance. Binding a +global variable has two at a distance effects-it can change the behavior +of downstream code, and it also opens the possibility that downstream +code will assign a new value to a binding established higher up on the +stack. You should use dynamic variables only when you need to take +advantage of one or both of these characteristics. + + ---------- Footnotes ---------- + + (1) Java disguises global variables as public static fields, C uses +extern variables, and Python's module-level and Perl's package-level +variables can likewise be accessed from anywhere. + + (2) If you specifically want to reset a DEFVARed variable, you can +either set it directly with SETF or make it unbound using MAKUNBOUND and +then reevaluate the DEFVAR form. + + (3) The strategy of temporarily reassigning *standard-output* also +breaks if the system is multithreaded-if there are multiple threads of +control trying to print to different streams at the same time, they'll +all try to set the global variable to the stream they want to use, +stomping all over each other. You could use a lock to control access to +the global variable, but then you're not really getting the benefit of +multiple concurrent threads, since whatever thread is printing has to +lock out all the other threads until it's done even if they want to +print to a different stream. + + (4) The technical term for the interval during which references may +be made to a binding is its extent. Thus, scope and extent are +complementary notions-scope refers to space while extent refers to time. +Lexical variables have lexical scope but indefinite extent, meaning they +stick around for an indefinite interval, determined by how long they're +needed. Dynamic variables, by contrast, have indefinite scope since +they can be referred to from anywhere but dynamic extent. To further +confuse matters, the combination of indefinite scope and dynamic extent +is frequently referred to by the misnomer dynamic scope. + + (5) Though the standard doesn't specify how to incorporate +multithreading into Common Lisp, implementations that provide +multithreading follow the practice established on the Lisp machines and +create dynamic bindings on a per-thread basis. A reference to a global +variable will find the binding most recently established in the current +thread, or the global binding. + + (6) This is why dynamic variables are also sometimes called special +variables. + + (7) If you must know, you can look up DECLARE, SPECIAL, and LOCALLY +in the HyperSpec. + + +File: pcl.info, Node: 6-4, Next: 6-5, Prev: 6-3, Up: Chapter 6 + +Constants +========= + +One other kind of variable I haven't mentioned at all is the oxymoronic +"constant variable." All constants are global and are defined with +DEFCONSTANT. The basic form of DEFCONSTANT is like DEFPARAMETER. + + (defconstant name initial-value-form [ documentation-string ]) + + As with DEFVAR and DEFPARAMETER, DEFCONSTANT has a global effect on +the name used-thereafter the name can be used only to refer to the +constant; it can't be used as a function parameter or rebound with any +other binding form. Thus, many Lisp programmers follow a naming +convention of using names starting and ending with + for constants. +This convention is somewhat less universally followed than the *-naming +convention for globally special names but is a good idea for the same +reason. (1) + + Another thing to note about DEFCONSTANT is that while the language +allows you to redefine a constant by reevaluating a DEFCONSTANT with a +different initial-value-form, what exactly happens after the +redefinition isn't defined. In practice, most implementations will +require you to reevaluate any code that refers to the constant in order +to see the new value since the old value may well have been inlined. +Consequently, it's a good idea to use DEFCONSTANT only to define things +that are really constant, such as the value of NIL. For things you might +ever want to change, you should use DEFPARAMETER instead. + + ---------- Footnotes ---------- + + (1) Several key constants defined by the language itself don't follow +this convention-not least of which are T and NIL. This is occasionally +annoying when one wants to use t as a local variable name. Another is +PI, which holds the best long-float approximation of the mathematical +constant pi. + + +File: pcl.info, Node: 6-5, Next: 6-6, Prev: 6-4, Up: Chapter 6 + +Assignment +========== + +Once you've created a binding, you can do two things with it: get the +current value and set it to a new value. As you saw in Chapter 4, a +symbol evaluates to the value of the variable it names, so you can get +the current value simply by referring to the variable. To assign a new +value to a binding, you use the SETF macro, Common Lisp's +general-purpose assignment operator. The basic form of SETF is as +follows: + + (setf place value) + + Because SETF is a macro, it can examine the form of the place it's +assigning to and expand into appropriate lower-level operations to +manipulate that place. When the place is a variable, it expands into a +call to the special operator SETQ, which, as a special operator, has +access to both lexical and dynamic bindings. (1) For instance, to +assign the value 10 to the variable x, you can write this: + + (setf x 10) + + As I discussed earlier, assigning a new value to a binding has no +effect on any other bindings of that variable. And it doesn't have any +effect on the value that was stored in the binding prior to the +assignment. Thus, the SETF in this function: + + (defun foo (x) (setf x 10)) + + will have no effect on any value outside of foo. The binding that +was created when foo was called is set to 10, immediately replacing +whatever value was passed as an argument. In particular, a form such as +the following: + + (let ((y 20)) + (foo y) + (print y)) + + will print 20, not 10, as it's the value of y that's passed to foo +where it's briefly the value of the variable x before the SETF gives x a +new value. + + SETF can also assign to multiple places in sequence. For instance, +instead of the following: + + (setf x 1) + (setf y 2) + + you can write this: + + (setf x 1 y 2) + + SETF returns the newly assigned value, so you can also nest calls to +SETF as in the following expression, which assigns both x and y the same +random value: + + (setf x (setf y (random 10))) + + ---------- Footnotes ---------- + + (1) Some old-school Lispers prefer to use SETQ with variables, but +modern style tends to use SETF for all assignments. + + +File: pcl.info, Node: 6-6, Next: 6-7, Prev: 6-5, Up: Chapter 6 + +Generalized Assignment +====================== + +Variable bindings, of course, aren't the only places that can hold +values. Common Lisp supports composite data structures such as arrays, +hash tables, and lists, as well as user-defined data structures, all of +which consist of multiple places that can each hold a value. + + I'll cover those data structures in future chapters, but while we're +on the topic of assignment, you should note that SETF can assign any +place a value. As I cover the different composite data structures, I'll +point out which functions can serve as "SETFable places." The short +version, however, is if you need to assign a value to a place, SETF is +almost certainly the tool to use. It's even possible to extend SETF to +allow it to assign to user-defined places though I won't cover that. +(1) + + In this regard SETF is no different from the = assignment operator in +most C-derived languages. In those languages, the = operator assigns +new values to variables, array elements, and fields of classes. In +languages such as Perl and Python that support hash tables as a built-in +data type, = can also set the values of individual hash table entries. +Table 6-1 summarizes the various ways = is used in those languages. + + Table 6-1. Assignment with = in Other Languages + + + + SETF works the same way-the first "argument" to SETF is a place to +store the value, and the second argument provides the value. As with +the = operator in these languages, you use the same form to express the +place as you'd normally use to fetch the value. (2) Thus, the Lisp +equivalents of the assignments in Table 6-1-given that AREF is the array +access function, GETHASH does a hash table lookup, and field might be a +function that accesses a slot named field of a user-defined object-are +as follows: + + Simple variable: (setf x 10) + Array: (setf (aref a 0) 10) + Hash table: (setf (gethash 'key hash) 10) + Slot named 'field': (setf (field o) 10) + + Note that SETFing a place that's part of a larger object has the same +semantics as SETFing a variable: the place is modified without any +effect on the object that was previously stored in the place. Again, +this is similar to how = behaves in Java, Perl, and Python. (3) + + ---------- Footnotes ---------- + + (1) Look up DEFSETF, DEFINE-SETF-EXPANDER for more information. + + (2) The prevalence of Algol-derived syntax for assignment with the +"place" on the left side of the = and the new value on the right side +has spawned the terminology lvalue, short for "left value," meaning +something that can be assigned to, and rvalue, meaning something that +provides a value. A compiler hacker would say, "SETF treats its first +argument as an lvalue." + + (3) C programmers may want to think of variables and other places as +holding a pointer to the real object; assigning to a variable simply +changes what object it points to while assigning to a part of a +composite object is similar to indirecting through the pointer to the +actual object. C++ programmers should note that the behavior of = in +C++ when dealing with objects-namely, a memberwise copy-is quite +idiosyncratic. + + +File: pcl.info, Node: 6-7, Next: Chapter 7, Prev: 6-6, Up: Chapter 6 + +Other Ways to Modify Places +=========================== + +While all assignments can be expressed with SETF, certain patterns +involving assigning a new value based on the current value are +sufficiently common to warrant their own operators. For instance, while +you could increment a number with SETF, like this: + + (setf x (+ x 1)) + + or decrement it with this: + + (setf x (- x 1)) + + it's a bit tedious, compared to the C-style ++x and -x. Instead, you +can use the macros INCF and DECF, which increment and decrement a place +by a certain amount that defaults to 1. + + (incf x) === (setf x (+ x 1)) + (decf x) === (setf x (- x 1)) + (incf x 10) === (setf x (+ x 10)) + + INCF and DECF are examples of a kind of macro called modify macros. +Modify macros are macros built on top of SETF that modify places by +assigning a new value based on the current value of the place. The main +benefit of modify macros is that they're more concise than the same +modification written out using SETF. Additionally, modify macros are +defined in a way that makes them safe to use with places where the place +expression must be evaluated only once. A silly example is this +expression, which increments the value of an arbitrary element of an +array: + + (incf (aref *array* (random (length *array*)))) + + A naive translation of that into a SETF expression might look like +this: + + (setf (aref *array* (random (length *array*))) + (1+ (aref *array* (random (length *array*))))) + + However, that doesn't work because the two calls to RANDOM won't +necessarily return the same value-this expression will likely grab the +value of one element of the array, increment it, and then store it back +as the new value of a different element. The INCF expression, however, +does the right thing because it knows how to take apart this expression: + + (aref *array* (random (length *array*))) + + to pull out the parts that could possibly have side effects to make +sure they're evaluated only once. In this case, it would probably +expand into something more or less equivalent to this: + + (let ((tmp (random (length *array*)))) + (setf (aref *array* tmp) (1+ (aref *array* tmp)))) + + In general, modify macros are guaranteed to evaluate both their +arguments and the subforms of the place form exactly once each, in +left-to-right order. + + The macro PUSH, which you used in the mini-database to add elements +to the *db* variable, is another modify macro. You'll take a closer +look at how it and its counterparts POP and PUSHNEW work in Chapter 12 +when I talk about how lists are represented in Lisp. + + Finally, two slightly esoteric but useful modify macros are ROTATEF +and SHIFTF. ROTATEF rotates values between places. For instance, if you +have two variables, a and b, this call: + + (rotatef a b) + + swaps the values of the two variables and returns NIL. Since a and b +are variables and you don't have to worry about side effects, the +previous ROTATEF expression is equivalent to this: + + (let ((tmp a)) (setf a b b tmp) nil) + + With other kinds of places, the equivalent expression using SETF +would be quite a bit more complex. + + SHIFTF is similar except instead of rotating values it shifts them to +the left-the last argument provides a value that's moved to the +second-to-last argument while the rest of the values are moved one to +the left. The original value of the first argument is simply returned. +Thus, the following: + + (shiftf a b 10) + + is equivalent-again, since you don't have to worry about side +effects-to this: + + (let ((tmp a)) (setf a b b 10) tmp) + + Both ROTATEF and SHIFTF can be used with any number of arguments and, +like all modify macros, are guaranteed to evaluate them exactly once, in +left to right order. + + With the basics of Common Lisp's functions and variables under your +belt, now you're ready to move onto the feature that continues to +differentiate Lisp from other languages: macros. + + +File: pcl.info, Node: Chapter 7, Next: Chapter 8, Prev: Chapter 6, Up: Top + +7. Macros: Standard Control Constructs +====================================== + +While many of the ideas that originated in Lisp, from the conditional +expression to garbage collection, have been incorporated into other +languages, the one language feature that continues to set Common Lisp +apart is its macro system. Unfortunately, the word macro describes a +lot of things in computing to which Common Lisp's macros bear only a +vague and metaphorical similarity. This causes no end of +misunderstanding when Lispers try to explain to non-Lispers what a great +feature macros are. (1) To understand Lisp's macros, you really need to +come at them fresh, without preconceptions based on other things that +also happen to be called macros. So let's start our discussion of +Lisp's macros by taking a step back and looking at various ways +languages support extensibility. + + All programmers should be used to the idea that the definition of a +language can include a standard library of functionality that's +implemented in terms of the "core" language-functionality that could +have been implemented by any programmer on top of the language if it +hadn't been defined as part of the standard library. C's standard +library, for instance, can be implemented almost entirely in portable C. +Similarly, most of the ever-growing set of classes and interfaces that +ship with Java's standard Java Development Kit (JDK) are written in +"pure" Java. + + One advantage of defining languages in terms of a core plus a +standard library is it makes them easier to understand and implement. +But the real benefit is in terms of expressiveness-since much of what +you think of as "the language" is really just a library-the language is +easy to extend. If C doesn't have a function to do some thing or +another that you need, you can write that function, and now you have a +slightly richer version of C. Similarly, in a language such as Java or +Smalltalk where almost all the interesting parts of the "language" are +defined in terms of classes, by defining new classes you extend the +language, making it more suited for writing programs to do whatever it +is you're trying to do. + + While Common Lisp supports both these methods of extending the +language, macros give Common Lisp yet another way. As I discussed +briefly in Chapter 4, each macro defines its own syntax, determining how +the s-expressions it's passed are turned into Lisp forms. With macros +as part of the core language it's possible to build new syntax-control +constructs such as WHEN, DOLIST, and LOOP as well as definitional forms +such as DEFUN and DEFPARAMETER-as part of the "standard library" rather +than having to hardwire them into the core. This has implications for +how the language itself is implemented, but as a Lisp programmer you'll +care more that it gives you another way to extend the language, making +it a better language for expressing solutions to your particular +programming problems. + + Now, it may seem that the benefits of having another way to extend +the language would be easy to recognize. But for some reason a lot of +folks who haven't actually used Lisp macros-folks who think nothing of +spending their days creating new functional abstractions or defining +hierarchies of classes to solve their programming problems-get spooked +by the idea of being able to define new syntactic abstractions. The +most common cause of macrophobia seems to be bad experiences with other +"macro" systems. Simple fear of the unknown no doubt plays a role, too. +To avoid triggering any macrophobic reactions, I'll ease into the +subject by discussing several of the standard control-construct macros +defined by Common Lisp. These are some of the things that, if Lisp +didn't have macros, would have to be built into the language core. When +you use them, you don't have to care that they're implemented as macros, +but they provide a good example of some of the things you can do with +macros. (2) In the next chapter, I'll show you how you can define your +own macros. + +* Menu: + +* 7-1:: WHEN and UNLESS +* 7-2:: COND +* 7-3:: AND, OR, and NOT +* 7-4:: Looping +* 7-5:: DOLIST and DOTIMES +* 7-6:: DO +* 7-7:: The Mighty LOOP + + ---------- Footnotes ---------- + + (1) To see what this misunderstanding looks like, find any longish +Usenet thread cross-posted between comp.lang.lisp and any other +comp.lang.* group with macro in the subject. A rough paraphrase goes +like this: + + Lispnik: "Lisp is the best because of its macros!"; + + Othernik: "You think Lisp is good because of macros?! But macros are +horrible and evil; Lisp must be horrible and evil." + + (2) Another important class of language constructs that are defined +using macros are all the definitional constructs such as DEFUN, +DEFPARAMETER, DEFVAR, and others. In Chapter 24 you'll define your own +definitional macros that will allow you to concisely write code for +reading and writing binary data. + + +File: pcl.info, Node: 7-1, Next: 7-2, Prev: Chapter 7, Up: Chapter 7 + +WHEN and UNLESS +=============== + +As you've already seen, the most basic form of conditional execution-if +x, do y; otherwise do z-is provided by the IF special operator, which +has this basic form: + + (if condition then-form [else-form]) + + The condition is evaluated and, if its value is non-NIL, the +then-form is evaluated and the resulting value returned. Otherwise, the +else-form, if any, is evaluated and its value returned. If condition is +NIL and there's no else-form, then the IF returns NIL. + + (if (> 2 3) "Yup" "Nope") ==> "Nope" + (if (> 2 3) "Yup") ==> NIL + (if (> 3 2) "Yup" "Nope") ==> "Yup" + + However, IF isn't actually such a great syntactic construct because +the then-form and else-form are each restricted to being a single Lisp +form. This means if you want to perform a sequence of actions in either +clause, you need to wrap them in some other syntax. For instance, +suppose in the middle of a spam-filtering program you wanted to both +file a message as spam and update the spam database when a message is +spam. You can't write this: + + (if (spam-p current-message) + (file-in-spam-folder current-message) + (update-spam-database current-message)) + + because the call to update-spam-database will be treated as the else +clause, not as part of the then clause. Another special operator, +PROGN, executes any number of forms in order and returns the value of +the last form. So you could get the desired behavior by writing the +following: + + (if (spam-p current-message) + (progn + (file-in-spam-folder current-message) + (update-spam-database current-message))) + + That's not too horrible. But given the number of times you'll likely +have to use this idiom, it's not hard to imagine that you'd get tired of +it after a while. "Why," you might ask yourself, "doesn't Lisp provide +a way to say what I really want, namely, 'When x is true, do this, that, +and the other thing'?" In other words, after a while you'd notice the +pattern of an IF plus a PROGN and wish for a way to abstract away the +details rather than writing them out every time. + + This is exactly what macros provide. In this case, Common Lisp comes +with a standard macro, WHEN, which lets you write this: + + (when (spam-p current-message) + (file-in-spam-folder current-message) + (update-spam-database current-message)) + + But if it wasn't built into the standard library, you could define +WHEN yourself with a macro such as this, using the backquote notation I +discussed in Chapter 3: (1) + + (defmacro when (condition &rest body) + `(if ,condition (progn ,@body))) + + A counterpart to the WHEN macro is UNLESS, which reverses the +condition, evaluating its body forms only if the condition is false. In +other words: + + (defmacro unless (condition &rest body) + `(if (not ,condition) (progn ,@body))) + + Admittedly, these are pretty trivial macros. There's no deep black +magic here; they just abstract away a few language-level bookkeeping +details, allowing you to express your true intent a bit more clearly. +But their very triviality makes an important point: because the macro +system is built right into the language, you can write trivial macros +like WHEN and UNLESS that give you small but real gains in clarity that +are then multiplied by the thousands of times you use them. In Chapters +24, 26, and 31 you'll see how macros can also be used on a larger scale, +creating whole domain-specific embedded languages. But first let's +finish our discussion of the standard control-construct macros. + + ---------- Footnotes ---------- + + (1) You can't actually feed this definition to Lisp because it's +illegal to redefine names in the COMMON-LISP package where WHEN comes +from. If you really want to try writing such a macro, you'd need to +change the name to something else, such as my-when. + + +File: pcl.info, Node: 7-2, Next: 7-3, Prev: 7-1, Up: Chapter 7 + +COND +==== + +Another time raw IF expressions can get ugly is when you have a +multibranch conditional: if a do x, else if b do y; else do z. There's +no logical problem writing such a chain of conditional expressions with +just IF, but it's not pretty. + + (if a + (do-x) + (if b + (do-y) + (do-z))) + + And it would be even worse if you needed to include multiple forms in +the then clauses, requiring PROGNs. So, not surprisingly, Common Lisp +provides a macro for expressing multibranch conditionals: COND. This is +the basic skeleton: + + (cond + (test-1 form*) + . + . + . + (test-N form*)) + + Each element of the body represents one branch of the conditional and +consists of a list containing a condition form and zero or more forms to +be evaluated if that branch is chosen. The conditions are evaluated in +the order the branches appear in the body until one of them evaluates to +true. At that point, the remaining forms in that branch are evaluated, +and the value of the last form in the branch is returned as the value of +the COND as a whole. If the branch contains no forms after the +condition, the value of the condition is returned instead. By +convention, the branch representing the final else clause in an +if/else-if chain is written with a condition of T. Any non-NIL value +will work, but a T serves as a useful landmark when reading the code. +Thus, you can write the previous nested IF expression using COND like +this: + + (cond (a (do-x)) + (b (do-y)) + (t (do-z))) + + +File: pcl.info, Node: 7-3, Next: 7-4, Prev: 7-2, Up: Chapter 7 + +AND, OR, and NOT +================ + +When writing the conditions in IF, WHEN, UNLESS, and COND forms, three +operators that will come in handy are the boolean logic operators, AND, +OR, and NOT. + + NOT is a function so strictly speaking doesn't belong in this +chapter, but it's closely tied to AND and OR. It takes a single argument +and inverts its truth value, returning T if the argument is NIL and NIL +otherwise. + + AND and OR, however, are macros. They implement logical conjunction +and disjunction of any number of subforms and are defined as macros so +they can short-circuit. That is, they evaluate only as many of their +subforms-in left-to-right order-as necessary to determine the overall +truth value. Thus, AND stops and returns NIL as soon as one of its +subforms evaluates to NIL. If all the subforms evaluate to non-NIL, it +returns the value of the last subform. OR, on the other hand, stops as +soon as one of its subforms evaluates to non-NIL and returns the +resulting value. If none of the subforms evaluate to true, OR returns +NIL. Here are some examples: + + (not nil) ==> T + (not (= 1 1)) ==> NIL + (and (= 1 2) (= 3 3)) ==> NIL + (or (= 1 2) (= 3 3)) ==> T + + +File: pcl.info, Node: 7-4, Next: 7-5, Prev: 7-3, Up: Chapter 7 + +Looping +======= + +Control constructs are the other main kind of looping constructs. +Common Lisp's looping facilities are-in addition to being quite powerful +and flexible-an interesting lesson in the have-your-cake-and-eat-it-too +style of programming that macros provide. + + As it turns out, none of Lisp's 25 special operators directly support +structured looping. All of Lisp's looping control constructs are macros +built on top of a pair of special operators that provide a primitive +goto facility. (1) Like many good abstractions, syntactic or otherwise, +Lisp's looping macros are built as a set of layered abstractions +starting from the base provided by those two special operators. + + At the bottom (leaving aside the special operators) is a very general +looping construct, DO. While very powerful, DO suffers, as do many +general-purpose abstractions, from being overkill for simple situations. +So Lisp also provides two other macros, DOLIST and DOTIMES, that are +less flexible than DO but provide convenient support for the common +cases of looping over the elements of a list and counting loops. While +an implementation can implement these macros however it wants, they're +typically implemented as macros that expand into an equivalent DO loop. +Thus, DO provides a basic structured looping construct on top of the +underlying primitives provided by Common Lisp's special operators, and +DOLIST and DOTIMES provide two easier-to-use, if less general, +constructs. And, as you'll see in the next chapter, you can build your +own looping constructs on top of DO for situations where DOLIST and +DOTIMES don't meet your needs. + + Finally, the LOOP macro provides a full-blown mini-language for +expressing looping constructs in a non-Lispy, English-like (or at least +Algol-like) language. Some Lisp hackers love LOOP; others hate it. +LOOP's fans like it because it provides a concise way to express certain +commonly needed looping constructs. Its detractors dislike it because +it's not Lispy enough. But whichever side one comes down on, it's a +remarkable example of the power of macros to add new constructs to the +language. + + ---------- Footnotes ---------- + + (1) The special operators, if you must know, are TAGBODY and GO. +There's no need to discuss them now, but I'll cover them in Chapter 20. + + +File: pcl.info, Node: 7-5, Next: 7-6, Prev: 7-4, Up: Chapter 7 + +DOLIST and DOTIMES +================== + +I'll start with the easy-to-use DOLIST and DOTIMES macros. + + DOLIST loops across the items of a list, executing the loop body with +a variable holding the successive items of the list. (1) This is the +basic skeleton (leaving out some of the more esoteric options): + + (dolist (var list-form) + body-form*) + + When the loop starts, the list-form is evaluated once to produce a +list. Then the body of the loop is evaluated once for each item in the +list with the variable var holding the value of the item. For instance: + + CL-USER> (dolist (x '(1 2 3)) (print x)) + 1 + 2 + 3 + NIL + + Used this way, the DOLIST form as a whole evaluates to NIL. + + If you want to break out of a DOLIST loop before the end of the list, +you can use RETURN. + + CL-USER> (dolist (x '(1 2 3)) (print x) (if (evenp x) (return))) + 1 + 2 + NIL + + DOTIMES is the high-level looping construct for counting loops. The +basic template is much the same as DOLIST's. + + (dotimes (var count-form) + body-form*) + + The count-form must evaluate to an integer. Each time through the +loop var holds successive integers from 0 to one less than that number. +For instance: + + CL-USER> (dotimes (i 4) (print i)) + 0 + 1 + 2 + 3 + NIL + + As with DOLIST, you can use RETURN to break out of the loop early. + + Because the body of both DOLIST and DOTIMES loops can contain any +kind of expressions, you can also nest loops. For example, to print out +the times tables from 1 × 1 = 1 to 20 × 20 = 400, you can write this +pair of nested DOTIMES loops: + + (dotimes (x 20) + (dotimes (y 20) + (format t "~3d " (* (1+ x) (1+ y)))) + (format t "~%")) + + ---------- Footnotes ---------- + + (1) DOLIST is similar to Perl's foreach or Python's for. Java added +a similar kind of loop construct with the "enhanced" for loop in Java +1.5, as part of JSR-201. Notice what a difference macros make. A Lisp +programmer who notices a common pattern in their code can write a macro +to give themselves a source-level abstraction of that pattern. A Java +programmer who notices the same pattern has to convince Sun that this +particular abstraction is worth adding to the language. Then Sun has to +publish a JSR and convene an industry-wide "expert group" to hash +everything out. That process-according to Sun-takes an average of 18 +months. After that, the compiler writers all have to go upgrade their +compilers to support the new feature. And even once the Java +programmer's favorite compiler supports the new version of Java, they +probably still can't use the new feature until they're allowed to break +source compatibility with older versions of Java. So an annoyance that +Common Lisp programmers can resolve for themselves within five minutes +plagues Java programmers for years. + + +File: pcl.info, Node: 7-6, Next: 7-7, Prev: 7-5, Up: Chapter 7 + +DO +== + +While DOLIST and DOTIMES are convenient and easy to use, they aren't +flexible enough to use for all loops. For instance, what if you want to +step multiple variables in parallel? Or use an arbitrary expression to +test for the end of the loop? If neither DOLIST nor DOTIMES meet your +needs, you still have access to the more general DO loop. + + Where DOLIST and DOTIMES provide only one loop variable, DO lets you +bind any number of variables and gives you complete control over how +they change on each step through the loop. You also get to define the +test that determines when to end the loop and can provide a form to +evaluate at the end of the loop to generate a return value for the DO +expression as a whole. The basic template looks like this: + + (do (variable-definition*) + (end-test-form result-form*) + statement*) + + Each variable-definition introduces a variable that will be in scope +in the body of the loop. The full form of a single variable definition +is a list containing three elements. + + (var init-form step-form) + + The init-form will be evaluated at the beginning of the loop and the +resulting values bound to the variable var. Before each subsequent +iteration of the loop, the step-form will be evaluated and the new value +assigned to var. The step-form is optional; if it's left out, the +variable will keep its value from iteration to iteration unless you +explicitly assign it a new value in the loop body. As with the variable +definitions in a LET, if the init-form is left out, the variable is +bound to NIL. Also as with LET, you can use a plain variable name as +shorthand for a list containing just the name. + + At the beginning of each iteration, after all the loop variables have +been given their new values, the end-test-form is evaluated. As long as +it evaluates to NIL, the iteration proceeds, evaluating the statements +in order. + + When the end-test-form evaluates to true, the result-forms are +evaluated, and the value of the last result form is returned as the +value of the DO expression. + + At each step of the iteration the step forms for all the variables +are evaluated before assigning any of the values to the variables. This +means you can refer to any of the other loop variables in the step +forms. (1) That is, in a loop like this: + + (do ((n 0 (1+ n)) + (cur 0 next) + (next 1 (+ cur next))) + ((= 10 n) cur)) + + the step forms (1+ n), next, and (+ cur next) are all evaluated using +the old values of n, cur, and next. Only after all the step forms have +been evaluated are the variables given their new values. +(Mathematically inclined readers may notice that this is a particularly +efficient way of computing the eleventh Fibonacci number.) + + This example also illustrates another characteristic of DO-because +you can step multiple variables, you often don't need a body at all. +Other times, you may leave out the result form, particularly if you're +just using the loop as a control construct. This flexibility, however, +is the reason that DO expressions can be a bit cryptic. Where exactly +do all the parentheses go? The best way to understand a DO expression +is to keep in mind the basic template. + + (do (variable-definition*) + (end-test-form result-form*) + statement*) + + The six parentheses in that template are the only ones required by +the DO itself. You need one pair to enclose the variable declarations, +one pair to enclose the end test and result forms, and one pair to +enclose the whole expression. Other forms within the DO may require +their own parentheses-variable definitions are usually lists, for +instance. And the test form is often a function call. But the skeleton +of a DO loop will always be the same. Here are some example DO loops +with the skeleton in bold: + + (do ((i 0 (1+ i))) + ((>= i 4)) + (print i)) + + Notice that the result form has been omitted. This is, however, not +a particularly idiomatic use of DO, as this loop is much more simply +written using DOTIMES. (2) + + (dotimes (i 4) (print i)) + + As another example, here's the bodiless Fibonacci-computing loop: + + (do ((n 0 (1+ n)) + (cur 0 next) + (next 1 (+ cur next))) + ((= 10 n) cur)) + + Finally, the next loop demonstrates a DO loop that binds no +variables. It loops while the current time is less than the value of a +global variable, printing "Waiting" once a minute. Note that even with +no loop variables, you still need the empty variables list. + + (do () + ((> (get-universal-time) *some-future-date*)) + (format t "Waiting~%") + (sleep 60)) + + ---------- Footnotes ---------- + + (1) A variant of DO, DO*, assigns each variable its value before +evaluating the step form for subsequent variables. For more details, +consult your favorite Common Lisp reference. + + (2) The DOTIMES is also preferred because the macro expansion will +likely include declarations that allow the compiler to generate more +efficient code. + + +File: pcl.info, Node: 7-7, Next: Chapter 8, Prev: 7-6, Up: Chapter 7 + +The Mighty LOOP +=============== + +For the simple cases you have DOLIST and DOTIMES. And if they don't suit +your needs, you can fall back on the completely general DO. What more +could you want? + + Well, it turns out a handful of looping idioms come up over and over +again, such as looping over various data structures: lists, vectors, +hash tables, and packages. Or accumulating values in various ways while +looping: collecting, counting, summing, minimizing, or maximizing. If +you need a loop to do one of these things (or several at the same time), +the LOOP macro may give you an easier way to express it. + + The LOOP macro actually comes in two flavors-simple and extended. +The simple version is as simple as can be-an infinite loop that doesn't +bind any variables. The skeleton looks like this: + + (loop + body-form*) + + The forms in body are evaluated each time through the loop, which +will iterate forever unless you use RETURN to break out. For example, +you could write the previous DO loop with a simple LOOP. + + (loop + (when (> (get-universal-time) *some-future-date*) + (return)) + (format t "Waiting~%") + (sleep 60)) + + The extended LOOP is quite a different beast. It's distinguished by +the use of certain loop keywords that implement a special-purpose +language for expressing looping idioms. It's worth noting that not all +Lispers love the extended LOOP language. At least one of Common Lisp's +original designers hated it. LOOP's detractors complain that its syntax +is totally un-Lispy (in other words, not enough parentheses). LOOP's +fans counter that that's the point: complicated looping constructs are +hard enough to understand without wrapping them up in DO's cryptic +syntax. It's better, they say, to have a slightly more verbose syntax +that gives you some clues what the heck is going on. + + For instance, here's an idiomatic DO loop that collects the numbers +from 1 to 10 into a list: + + (do ((nums nil) (i 1 (1+ i))) + ((> i 10) (nreverse nums)) + (push i nums)) ==> (1 2 3 4 5 6 7 8 9 10) + + A seasoned Lisper won't have any trouble understanding that code-it's +just a matter of understanding the basic form of a DO loop and +recognizing the PUSH/NREVERSE idiom for building up a list. But it's +not exactly transparent. The LOOP version, on the other hand, is almost +understandable as an English sentence. + + (loop for i from 1 to 10 collecting i) ==> (1 2 3 4 5 6 7 8 9 10) + + The following are some more examples of simple uses of LOOP. This +sums the first ten squares: + + (loop for x from 1 to 10 summing (expt x 2)) ==> 385 + + This counts the number of vowels in a string: + + (loop for x across "the quick brown fox jumps over the lazy dog" + counting (find x "aeiou")) ==> 11 + + This computes the eleventh Fibonacci number, similar to the DO loop +used earlier: + + (loop for i below 10 + and a = 0 then b + and b = 1 then (+ b a) + finally (return a)) + + The symbols across, and, below, collecting, counting, finally, for, +from, summing, then, and to are some of the loop keywords whose presence +identifies these as instances of the extended LOOP. (1) + + I'll save the details of LOOP for Chapter 22, but it's worth noting +here as another example of the way macros can be used to extend the base +language. While LOOP provides its own language for expressing looping +constructs, it doesn't cut you off from the rest of Lisp. The loop +keywords are parsed according to loop's grammar, but the rest of the +code in a LOOP is regular Lisp code. + + And it's worth pointing out one more time that while the LOOP macro +is quite a bit more complicated than macros such as WHEN or UNLESS, it +is just another macro. If it hadn't been included in the standard +library, you could implement it yourself or get a third-party library +that does. + + With that I'll conclude our tour of the basic control-construct +macros. Now you're ready to take a closer look at how to define your +own macros. + + ---------- Footnotes ---------- + + (1) Loop keywords is a bit of a misnomer since they aren't keyword +symbols. In fact, LOOP doesn't care what package the symbols are from. +When the LOOP macro parses its body, it considers any appropriately +named symbols equivalent. You could even use true keywords if you +wanted-:for, :across, and so on-because they also have the correct name. +But most folks just use plain symbols. Because the loop keywords are +used only as syntactic markers, it doesn't matter if they're used for +other purposes-as function or variable names. + + +File: pcl.info, Node: Chapter 8, Next: Chapter 9, Prev: Chapter 7, Up: Top + +8. Macros: Defining Your Own +============================ + +Now it's time to start writing your own macros. The standard macros I +covered in the previous chapter hint at some of the things you can do +with macros, but that's just the beginning. Common Lisp doesn't support +macros so every Lisp programmer can create their own variants of +standard control constructs any more than C supports functions so every +C programmer can write trivial variants of the functions in the C +standard library. Macros are part of the language to allow you to +create abstractions on top of the core language and standard library +that move you closer toward being able to directly express the things +you want to express. + + Perhaps the biggest barrier to a proper understanding of macros is, +ironically, that they're so well integrated into the language. In many +ways they seem like just a funny kind of function-they're written in +Lisp, they take arguments and return results, and they allow you to +abstract away distracting details. Yet despite these many similarities, +macros operate at a different level than functions and create a totally +different kind of abstraction. + + Once you understand the difference between macros and functions, the +tight integration of macros in the language will be a huge benefit. But +in the meantime, it's a frequent source of confusion for new Lispers. +The following story, while not true in a historical or technical sense, +tries to alleviate the confusion by giving you a way to think about how +macros work. + +* Menu: + +* 8-1:: The Story of Mac: A Just-So Story +* 8-2:: Macro Expansion Time vs. Runtime +* 8-3:: DEF MACRO +* 8-4:: A Sample Macro: do-primes +* 8-5:: Macro Parameters +* 8-6:: Generating the Expansion +* 8-7:: Plugging the Leaks +* 8-8:: Macro-Writing Macros +* 8-9:: Beyond Simple Macros + + +File: pcl.info, Node: 8-1, Next: 8-2, Prev: Chapter 8, Up: Chapter 8 + +The Story of Mac: A Just-So Story +================================= + +Once upon a time, long ago, there was a company of Lisp programmers. It +was so long ago, in fact, that Lisp had no macros. Anything that +couldn't be defined with a function or done with a special operator had +to be written in full every time, which was rather a drag. +Unfortunately, the programmers in this company-though brilliant-were +also quite lazy. Often in the middle of their programs-when the tedium +of writing a bunch of code got to be too much-they would instead write a +note describing the code they needed to write at that place in the +program. Even more unfortunately, because they were lazy, the +programmers also hated to go back and actually write the code described +by the notes. Soon the company had a big stack of programs that nobody +could run because they were full of notes about code that still needed +to be written. + + In desperation, the big bosses hired a junior programmer, Mac, whose +job was to find the notes, write the required code, and insert it into +the program in place of the notes. Mac never ran the programs-they +weren't done yet, of course, so he couldn't. But even if they had been +completed, Mac wouldn't have known what inputs to feed them. So he just +wrote his code based on the contents of the notes and sent it back to +the original programmer. + + With Mac's help, all the programs were soon completed, and the +company made a ton of money selling them-so much money that the company +could double the size of its programming staff. But for some reason no +one thought to hire anyone to help Mac; soon he was single- handedly +assisting several dozen programmers. To avoid spending all his time +searching for notes in source code, Mac made a small modification to the +compiler the programmers used. Thereafter, whenever the compiler hit a +note, it would e-mail him the note and wait for him to e-mail back the +replacement code. Unfortunately, even with this change, Mac had a hard +time keeping up with the programmers. He worked as carefully as he +could, but sometimes- especially when the notes weren't clear-he would +make mistakes. + + The programmers noticed, however, that the more precisely they wrote +their notes, the more likely it was that Mac would send back correct +code. One day, one of the programmers, having a hard time describing in +words the code he wanted, included in one of his notes a Lisp program +that would generate the code he wanted. That was fine by Mac; he just +ran the program and sent the result to the compiler. + + The next innovation came when a programmer put a note at the top of +one of his programs containing a function definition and a comment that +said, "Mac, don't write any code here, but keep this function for later; +I'm going to use it in some of my other notes." Other notes in the same +program said things such as, "Mac, replace this note with the result of +running that other function with the symbols x and y as arguments." + + This technique caught on so quickly that within a few days, most +programs contained dozens of notes defining functions that were only +used by code in other notes. To make it easy for Mac to pick out the +notes containing only definitions that didn't require any immediate +response, the programmers tagged them with the standard preface: +"Definition for Mac, Read Only." This-as the programmers were still +quite lazy-was quickly shortened to "DEF. MAC. R/O" and then "DEFMACRO." + + Pretty soon, there was no actual English left in the notes for Mac. +All he did all day was read and respond to e-mails from the compiler +containing DEFMACRO notes and calls to the functions defined in the +DEFMACROs. Since the Lisp programs in the notes did all the real work, +keeping up with the e-mails was no problem. Mac suddenly had a lot of +time on his hands and would sit in his office daydreaming about +white-sand beaches, clear blue ocean water, and drinks with little paper +umbrellas in them. + + Several months later the programmers realized nobody had seen Mac for +quite some time. When they went to his office, they found a thin layer +of dust over everything, a desk littered with travel brochures for +various tropical locations, and the computer off. But the compiler +still worked-how could it be? It turned out Mac had made one last +change to the compiler: instead of e-mailing notes to Mac, the compiler +now saved the functions defined by DEFMACRO notes and ran them when +called for by the other notes. The programmers decided there was no +reason to tell the big bosses Mac wasn't coming to the office anymore. +So to this day, Mac draws a salary and from time to time sends the +programmers a postcard from one tropical locale or another. + + +File: pcl.info, Node: 8-2, Next: 8-3, Prev: 8-1, Up: Chapter 8 + +Macro Expansion Time vs. Runtime +================================ + +The key to understanding macros is to be quite clear about the +distinction between the code that generates code (macros) and the code +that eventually makes up the program (everything else). When you write +macros, you're writing programs that will be used by the compiler to +generate the code that will then be compiled. Only after all the macros +have been fully expanded and the resulting code compiled can the program +actually be run. The time when macros run is called macro expansion +time; this is distinct from runtime, when regular code, including the +code generated by macros, runs. + + It's important to keep this distinction firmly in mind because code +running at macro expansion time runs in a very different environment +than code running at runtime. Namely, at macro expansion time, there's +no way to access the data that will exist at runtime. Like Mac, who +couldn't run the programs he was working on because he didn't know what +the correct inputs were, code running at macro expansion time can deal +only with the data that's inherent in the source code. For instance, +suppose the following source code appears somewhere in a program: + + (defun foo (x) + (when (> x 10) (print 'big))) + + Normally you'd think of x as a variable that will hold the argument +passed in a call to foo. But at macro expansion time, such as when the +compiler is running the WHEN macro, the only data available is the +source code. Since the program isn't running yet, there's no call to +foo and thus no value associated with x. Instead, the values the +compiler passes to WHEN are the Lisp lists representing the source code, +namely, (> x 10) and (print 'big). Suppose that WHEN is defined, as you +saw in the previous chapter, with something like the following macro: + + (defmacro when (condition &rest body) + `(if ,condition (progn ,@body))) + + When the code in foo is compiled, the WHEN macro will be run with +those two forms as arguments. The parameter condition will be bound to +the form (> x 10), and the form (print 'big) will be collected into a +list that will become the value of the &rest body parameter. The +backquote expression will then generate this code: + + (if (> x 10) (progn (print 'big))) + + by interpolating in the value of condition and splicing the value of +body into the PROGN. + + When Lisp is interpreted, rather than compiled, the distinction +between macro expansion time and runtime is less clear because they're +temporally intertwined. Also, the language standard doesn't specify +exactly how an interpreter must handle macros-it could expand all the +macros in the form being interpreted and then interpret the resulting +code, or it could start right in on interpreting the form and expand +macros when it hits them. In either case, macros are always passed the +unevaluated Lisp objects representing the subforms of the macro form, +and the job of the macro is still to produce code that will do something +rather than to do anything directly. + + +File: pcl.info, Node: 8-3, Next: 8-4, Prev: 8-2, Up: Chapter 8 + +DEF MACRO +========= + +As you saw in Chapter 3, macros really are defined with DEFMACRO forms, +though it stands-of course-for DEFine MACRO, not Definition for Mac. +The basic skeleton of a DEFMACRO is quite similar to the skeleton of a +DEFUN. + + (defmacro name (parameter*) + "Optional documentation string." + body-form*) + + Like a function, a macro consists of a name, a parameter list, an +optional documentation string, and a body of Lisp expressions. (1) +However, as I just discussed, the job of a macro isn't to do anything +directly-its job is to generate code that will later do what you want. + + Macros can use the full power of Lisp to generate their expansion, +which means in this chapter I can only scratch the surface of what you +can do with macros. I can, however, describe a general process for +writing macros that works for all macros from the simplest to the most +complex. + + The job of a macro is to translate a macro form-in other words, a +Lisp form whose first element is the name of the macro-into code that +does a particular thing. Sometimes you write a macro starting with the +code you'd like to be able to write, that is, with an example macro +form. Other times you decide to write a macro after you've written the +same pattern of code several times and realize you can make your code +clearer by abstracting the pattern. + + Regardless of which end you start from, you need to figure out the +other end before you can start writing a macro: you need to know both +where you're coming from and where you're going before you can hope to +write code to do it automatically. Thus, the first step of writing a +macro is to write at least one example of a call to the macro and the +code into which that call should expand. + + Once you have an example call and the desired expansion, you're ready +for the second step: writing the actual macro code. For simple macros +this will be a trivial matter of writing a backquoted template with the +macro parameters plugged into the right places. Complex macros will be +significant programs in their own right, complete with helper functions +and data structures. + + After you've written code to translate the example call to the +appropriate expansion, you need to make sure the abstraction the macro +provides doesn't "leak" details of its implementation. Leaky macro +abstractions will work fine for certain arguments but not others or will +interact with code in the calling environment in undesirable ways. As +it turns out, macros can leak in a small handful of ways, all of which +are easily avoided as long as you know to check for them. I'll discuss +how in the section "Plugging the Leaks." + + To sum up, the steps to writing a macro are as follows: + + * Write a sample call to the macro and the code it should expand + into, or vice versa. + * Write code that generates the handwritten expansion from the + arguments in the sample call. + * Make sure the macro abstraction doesn't "leak." + + ---------- Footnotes ---------- + + (1) As with functions, macros can also contain declarations, but you +don't need to worry about those for now. + + +File: pcl.info, Node: 8-4, Next: 8-5, Prev: 8-3, Up: Chapter 8 + +A Sample Macro: do-primes +========================= + +To see how this three-step process works, you'll write a macro do-primes +that provides a looping construct similar to DOTIMES and DOLIST except +that instead of iterating over integers or elements of a list, it +iterates over successive prime numbers. This isn't meant to be an +example of a particularly useful macro-it's just a vehicle for +demonstrating the process. + + First, you'll need two utility functions, one to test whether a given +number is prime and another that returns the next prime number greater +or equal to its argument. In both cases you can use a simple, but +inefficient, brute-force approach. + + (defun primep (number) + (when (> number 1) + (loop for fac from 2 to (isqrt number) never (zerop (mod number fac))))) + + (defun next-prime (number) + (loop for n from number when (primep n) return n)) + + Now you can write the macro. Following the procedure outlined +previously, you need at least one example of a call to the macro and the +code into which it should expand. Suppose you start with the idea that +you want to be able to write this: + + (do-primes (p 0 19) + (format t "~d " p)) + + to express a loop that executes the body once each for each prime +number greater or equal to 0 and less than or equal to 19, with the +variable p holding the prime number. It makes sense to model this macro +on the form of the standard DOLIST and DOTIMES macros; macros that +follow the pattern of existing macros are easier to understand and use +than macros that introduce gratuitously novel syntax. + + Without the do-primes macro, you could write such a loop with DO (and +the two utility functions defined previously) like this: + + (do ((p (next-prime 0) (next-prime (1+ p)))) + ((> p 19)) + (format t "~d " p)) + + Now you're ready to start writing the macro code that will translate +from the former to the latter. + + +File: pcl.info, Node: 8-5, Next: 8-6, Prev: 8-4, Up: Chapter 8 + +Macro Parameters +================ + +Since the arguments passed to a macro are Lisp objects representing the +source code of the macro call, the first step in any macro is to extract +whatever parts of those objects are needed to compute the expansion. +For macros that simply interpolate their arguments directly into a +template, this step is trivial: simply defining the right parameters to +hold the different arguments is sufficient. + + But this approach, it seems, will not suffice for do-primes. The +first argument to the do-primes call is a list containing the name of +the loop variable, p; the lower bound, 0; and the upper bound, 19. But +if you look at the expansion, the list as a whole doesn't appear in the +expansion; the three element are split up and put in different places. + + You could define do-primes with two parameters, one to hold the list +and a &rest parameter to hold the body forms, and then take apart the +list by hand, something like this: + + (defmacro do-primes (var-and-range &rest body) + (let ((var (first var-and-range)) + (start (second var-and-range)) + (end (third var-and-range))) + `(do ((,var (next-prime ,start) (next-prime (1+ ,var)))) + ((> ,var ,end)) + ,@body))) + + In a moment I'll explain how the body generates the correct +expansion; for now you can just note that the variables var, start, and +end each hold a value, extracted from var-and-range, that's then +interpolated into the backquote expression that generates do-primes's +expansion. + + However, you don't need to take apart var-and-range "by hand" because +macro parameter lists are what are called destructuring parameter lists. +Destructuring, as the name suggests, involves taking apart a +structure-in this case the list structure of the forms passed to a +macro. + + Within a destructuring parameter list, a simple parameter name can be +replaced with a nested parameter list. The parameters in the nested +parameter list will take their values from the elements of the +expression that would have been bound to the parameter the list +replaced. For instance, you can replace var-and-range with a list (var +start end), and the three elements of the list will automatically be +destructured into those three parameters. + + Another special feature of macro parameter lists is that you can use +&body as a synonym for &rest. Semantically &body and &rest are +equivalent, but many development environments will use the presence of a +&body parameter to modify how they indent uses of the macro-typically +&body parameters are used to hold a list of forms that make up the body +of the macro. + + So you can streamline the definition of do-primes and give a hint to +both human readers and your development tools about its intended use by +defining it like this: + + (defmacro do-primes ((var start end) &body body) + `(do ((,var (next-prime ,start) (next-prime (1+ ,var)))) + ((> ,var ,end)) + ,@body)) + + In addition to being more concise, destructuring parameter lists also +give you automatic error checking-with do-primes defined this way, Lisp +will be able to detect a call whose first argument isn't a three-element +list and will give you a meaningful error message just as if you had +called a function with too few or too many arguments. Also, in +development environments such as SLIME that indicate what arguments are +expected as soon as you type the name of a function or macro, if you use +a destructuring parameter list, the environment will be able to tell you +more specifically the syntax of the macro call. With the original +definition, SLIME would tell you do-primes is called like this: + + (do-primes var-and-range &rest body) + + But with the new definition, it can tell you that a call should look +like this: + + (do-primes (var start end) &body body) + + Destructuring parameter lists can contain &optional, &key, and &rest +parameters and can contain nested destructuring lists. However, you +don't need any of those options to write do-primes. + + +File: pcl.info, Node: 8-6, Next: 8-7, Prev: 8-5, Up: Chapter 8 + +Generating the Expansion +======================== + +Because do-primes is a fairly simple macro, after you've destructured +the arguments, all that's left is to interpolate them into a template to +get the expansion. + + For simple macros like do-primes, the special backquote syntax is +perfect. To review, a backquoted expression is similar to a quoted +expression except you can "unquote" particular subexpressions by +preceding them with a comma, possibly followed by an at (@) sign. +Without an at sign, the comma causes the value of the subexpression to +be included as is. With an at sign, the value-which must be a list-is +"spliced" into the enclosing list. + + Another useful way to think about the backquote syntax is as a +particularly concise way of writing code that generates lists. This way +of thinking about it has the benefit of being pretty much exactly what's +happening under the covers-when the reader reads a backquoted +expression, it translates it into code that generates the appropriate +list structure. For instance, '(,a b) might be read as (list a 'b). +The language standard doesn't specify exactly what code the reader must +produce as long as it generates the right list structure. + + Table 8-1 shows some examples of backquoted expressions along with +equivalent list-building code and the result you'd get if you evaluated +either the backquoted expression or the equivalent code. (1) + + Table 8-1. Backquote Examples + + + + It's important to note that backquote is just a convenience. But +it's a big convenience. To appreciate how big, compare the backquoted +version of do-primes to the following version, which uses explicit +list-building code: + + (defmacro do-primes-a ((var start end) &body body) + (append '(do) + (list (list (list var + (list 'next-prime start) + (list 'next-prime (list '1+ var))))) + (list (list (list '> var end))) + body)) + + As you'll see in a moment, the current implementation of do-primes +doesn't handle certain edge cases correctly. But first you should +verify that it at least works for the original example. You can test it +in two ways. You can test it indirectly by simply using it-presumably, +if the resulting behavior is correct, the expansion is correct. For +instance, you can type the original example's use of do-primes to the +REPL and see that it indeed prints the right series of prime numbers. + + CL-USER> (do-primes (p 0 19) (format t "~d " p)) + 2 3 5 7 11 13 17 19 + NIL + + Or you can check the macro directly by looking at the expansion of a +particular call. The function MACROEXPAND-1 takes any Lisp expression +as an argument and returns the result of doing one level of macro +expansion. (2) Because MACROEXPAND-1 is a function, to pass it a +literal macro form you must quote it. You can use it to see the +expansion of the previous call. (3) + + CL-USER> (macroexpand-1 '(do-primes (p 0 19) (format t "~d " p))) + (DO ((P (NEXT-PRIME 0) (NEXT-PRIME (1+ P)))) + ((> P 19)) + (FORMAT T "~d " P)) + T + + Or, more conveniently, in SLIME you can check a macro's expansion by +placing the cursor on the opening parenthesis of a macro form in your +source code and typing C-c RET to invoke the Emacs function +slime-macroexpand-1, which will pass the macro form to MACROEXPAND-1 and +"pretty print" the result in a temporary buffer. + + However you get to it, you can see that the result of macro expansion +is the same as the original handwritten expansion, so it seems that +do-primes works. + + ---------- Footnotes ---------- + + (1) APPEND, which I haven't discussed yet, is a function that takes +any number of list arguments and returns the result of splicing them +together into a single list. + + (2) Another function, MACROEXPAND, keeps expanding the result as long +as the first element of the resulting expansion is the name of the +macro. However, this will often show you a much lower-level view of +what the code is doing than you want, since basic control constructs +such as DO are also implemented as macros. In other words, while it can +be educational to see what your macro ultimately expands into, it isn't +a very useful view into what your own macros are doing. + + (3) If the macro expansion is shown all on one line, it's probably +because the variable *PRINT-PRETTY* is NIL. If it is, evaluating (setf +*print-pretty* t) should make the macro expansion easier to read. + + +File: pcl.info, Node: 8-7, Next: 8-8, Prev: 8-6, Up: Chapter 8 + +Plugging the Leaks +================== + +In his essay "The Law of Leaky Abstractions," Joel Spolsky coined the +term leaky abstraction to describe an abstraction that "leaks" details +it's supposed to be abstracting away. Since writing a macro is a way of +creating an abstraction, you need to make sure your macros don't leak +needlessly. (1) + + As it turns out, a macro can leak details of its inner workings in +three ways. Luckily, it's pretty easy to tell whether a given macro +suffers from any of those leaks and to fix them. + + The current definition suffers from one of the three possible macro +leaks: namely, it evaluates the end subform too many times. Suppose you +were to call do-primes with, instead of a literal number such as 19, an +expression such as (random 100) in the end position. + + (do-primes (p 0 (random 100)) + (format t "~d " p)) + + Presumably the intent here is to loop over the primes from zero to +whatever random number is returned by (random 100). However, this isn't +what the current implementation does, as MACROEXPAND-1 shows. + + CL-USER> (macroexpand-1 '(do-primes (p 0 (random 100)) (format t "~d " p))) + (DO ((P (NEXT-PRIME 0) (NEXT-PRIME (1+ P)))) + ((> P (RANDOM 100))) + (FORMAT T "~d " P)) + T + + When this expansion code is run, RANDOM will be called each time the +end test for the loop is evaluated. Thus, instead of looping until p is +greater than an initially chosen random number, this loop will iterate +until it happens to draw a random number less than or equal to the +current value of p. While the total number of iterations will still be +random, it will be drawn from a much different distribution than the +uniform distribution RANDOM returns. + + This is a leak in the abstraction because, to use the macro +correctly, the caller needs to be aware that the end form is going to be +evaluated more than once. One way to plug this leak would be to simply +define this as the behavior of do-primes. But that's not very +satisfactory-you should try to observe the Principle of Least +Astonishment when implementing macros. And programmers will typically +expect the forms they pass to macros to be evaluated no more times than +absolutely necessary. (2) Furthermore, since do-primes is built on the +model of the standard macros, DOTIMES and DOLIST, neither of which +causes any of the forms except those in the body to be evaluated more +than once, most programmers will expect do-primes to behave similarly. + + You can fix the multiple evaluation easily enough; you just need to +generate code that evaluates end once and saves the value in a variable +to be used later. Recall that in a DO loop, variables defined with an +initialization form and no step form don't change from iteration to +iteration. So you can fix the multiple evaluation problem with this +definition: + + (defmacro do-primes ((var start end) &body body) + `(do ((ending-value ,end) + (,var (next-prime ,start) (next-prime (1+ ,var)))) + ((> ,var ending-value)) + ,@body)) + + Unfortunately, this fix introduces two new leaks to the macro +abstraction. + + One new leak is similar to the multiple-evaluation leak you just +fixed. Because the initialization forms for variables in a DO loop are +evaluated in the order the variables are defined, when the macro +expansion is evaluated, the expression passed as end will be evaluated +before the expression passed as start, opposite to the order they appear +in the macro call. This leak doesn't cause any problems when start and +end are literal values like 0 and 19. But when they're forms that can +have side effects, evaluating them out of order can once again run afoul +of the Principle of Least Astonishment. + + This leak is trivially plugged by swapping the order of the two +variable definitions. + + (defmacro do-primes ((var start end) &body body) + `(do ((,var (next-prime ,start) (next-prime (1+ ,var))) + (ending-value ,end)) + ((> ,var ending-value)) + ,@body)) + + The last leak you need to plug was created by using the variable name +ending-value. The problem is that the name, which ought to be a purely +internal detail of the macro implementation, can end up interacting with +code passed to the macro or in the context where the macro is called. +The following seemingly innocent call to do-primes doesn't work +correctly because of this leak: + + (do-primes (ending-value 0 10) + (print ending-value)) + + Neither does this one: + + (let ((ending-value 0)) + (do-primes (p 0 10) + (incf ending-value p)) + ending-value) + + Again, MACROEXPAND-1 can show you the problem. The first call +expands to this: + + (do ((ending-value (next-prime 0) (next-prime (1+ ending-value))) + (ending-value 10)) + ((> ending-value ending-value)) + (print ending-value)) + + Some Lisps may reject this code because ending-value is used twice as +a variable name in the same DO loop. If not rejected outright, the code +will loop forever since ending-value will never be greater than itself. + + The second problem call expands to the following: + + (let ((ending-value 0)) + (do ((p (next-prime 0) (next-prime (1+ p))) + (ending-value 10)) + ((> p ending-value)) + (incf ending-value p)) + ending-value) + + In this case the generated code is perfectly legal, but the behavior +isn't at all what you want. Because the binding of ending-value +established by the LET outside the loop is shadowed by the variable with +the same name inside the DO, the form (incf ending-value p) increments +the loop variable ending-value instead of the outer variable with the +same name, creating another infinite loop. (3) + + Clearly, what you need to patch this leak is a symbol that will never +be used outside the code generated by the macro. You could try using a +really unlikely name, but that's no guarantee. You could also protect +yourself to some extent by using packages, as described in Chapter 21. +But there's a better solution. + + The function GENSYM returns a unique symbol each time it's called. +This is a symbol that has never been read by the Lisp reader and never +will be because it isn't interned in any package. Thus, instead of +using a literal name like ending-value, you can generate a new symbol +each time do-primes is expanded. + + (defmacro do-primes ((var start end) &body body) + (let ((ending-value-name (gensym))) + `(do ((,var (next-prime ,start) (next-prime (1+ ,var))) + (,ending-value-name ,end)) + ((> ,var ,ending-value-name)) + ,@body))) + + Note that the code that calls GENSYM isn't part of the expansion; it +runs as part of the macro expander and thus creates a new symbol each +time the macro is expanded. This may seem a bit strange at +first-ending-value-name is a variable whose value is the name of another +variable. But really it's no different from the parameter var whose +value is the name of a variable-the difference is the value of var was +created by the reader when the macro form was read, and the value of +ending-value-name is generated programmatically when the macro code +runs. + + With this definition the two previously problematic forms expand into +code that works the way you want. The first form: + + (do-primes (ending-value 0 10) + (print ending-value)) + + expands into the following: + + (do ((ending-value (next-prime 0) (next-prime (1+ ending-value))) + (#:g2141 10)) + ((> ending-value #:g2141)) + (print ending-value)) + + Now the variable used to hold the ending value is the gensymed +symbol, #:g2141. The name of the symbol, G2141, was generated by GENSYM +but isn't significant; the thing that matters is the object identity of +the symbol. Gensymed symbols are printed in the normal syntax for +uninterned symbols, with a leading #:. + + The other previously problematic form: + + (let ((ending-value 0)) + (do-primes (p 0 10) + (incf ending-value p)) + ending-value) + + looks like this if you replace the do-primes form with its expansion: + + (let ((ending-value 0)) + (do ((p (next-prime 0) (next-prime (1+ p))) + (#:g2140 10)) + ((> p #:g2140)) + (incf ending-value p)) + ending-value) + + Again, there's no leak since the ending-value variable bound by the +LET surrounding the do-primes loop is no longer shadowed by any +variables introduced in the expanded code. + + Not all literal names used in a macro expansion will necessarily +cause a problem-as you get more experience with the various binding +forms, you'll be able to determine whether a given name is being used in +a position that could cause a leak in a macro abstraction. But there's +no real downside to using a gensymed name just to be safe. + + With that fix, you've plugged all the leaks in the implementation of +do-primes. Once you've gotten a bit of macro-writing experience under +your belt, you'll learn to write macros with these kinds of leaks +preplugged. It's actually fairly simple if you follow these rules of +thumb: + + * Unless there's a particular reason to do otherwise, include any + subforms in the expansion in positions that will be evaluated in + the same order as the subforms appear in the macro call. + * Unless there's a particular reason to do otherwise, make sure + subforms are evaluated only once by creating a variable in the + expansion to hold the value of evaluating the argument form and + then using that variable anywhere else the value is needed in the + expansion. + * Use GENSYM at macro expansion time to create variable names used in + the expansion. + + ---------- Footnotes ---------- + + (1) This is from Joel on Software by Joel Spolsky, also available at +http://www.joelonsoftware.com/ articles/LeakyAbstractions.html. +Spolsky's point in the essay is that all abstractions leak to some +extent; that is, there are no perfect abstractions. But that doesn't +mean you should tolerate leaks you can easily plug. + + (2) Of course, certain forms are supposed to be evaluated more than +once, such as the forms in the body of a do-primes loop. + + (3) It may not be obvious that this loop is necessarily infinite +given the nonuniform occurrences of prime numbers. The starting point +for a proof that it is in fact infinite is Bertrand's postulate, which +says for any n > 1, there exists a prime p, n < p < 2n. From there you +can prove that for any prime number, P less than the sum of the +preceding prime numbers, the next prime, P', is also smaller than the +original sum plus P. + + +File: pcl.info, Node: 8-8, Next: 8-9, Prev: 8-7, Up: Chapter 8 + +Macro-Writing Macros +==================== + +Of course, there's no reason you should be able to take advantage of +macros only when writing functions. The job of macros is to abstract +away common syntactic patterns, and certain patterns come up again and +again in writing macros that can also benefit from being abstracted +away. + + In fact, you've already seen one such pattern-many macros will, like +the last version of do-primes, start with a LET that introduces a few +variables holding gensymed symbols to be used in the macro's expansion. +Since this is such a common pattern, why not abstract it away with its +own macro? + + In this section you'll write a macro, with-gensyms, that does just +that. In other words, you'll write a macro-writing macro: a macro that +generates code that generates code. While complex macro-writing macros +can be a bit confusing until you get used to keeping the various levels +of code clear in your mind, with-gensyms is fairly straightforward and +will serve as a useful but not too strenuous mental limbering exercise. + + You want to be able to write something like this: + + (defmacro do-primes ((var start end) &body body) + (with-gensyms (ending-value-name) + `(do ((,var (next-prime ,start) (next-prime (1+ ,var))) + (,ending-value-name ,end)) + ((> ,var ,ending-value-name)) + ,@body))) + + and have it be equivalent to the previous version of do-primes. In +other words, the with-gensyms needs to expand into a LET that binds each +named variable, ending-value-name in this case, to a gensymed symbol. +That's easy enough to write with a simple backquote template. + + (defmacro with-gensyms ((&rest names) &body body) + `(let ,(loop for n in names collect `(,n (gensym))) + ,@body)) + + Note how you can use a comma to interpolate the value of the LOOP +expression. The loop generates a list of binding forms where each +binding form consists of a list containing one of the names given to +with-gensyms and the literal code (gensym). You can test what code the +LOOP expression would generate at the REPL by replacing names with a +list of symbols. + + CL-USER> (loop for n in '(a b c) collect `(,n (gensym))) + ((A (GENSYM)) (B (GENSYM)) (C (GENSYM))) + + After the list of binding forms, the body argument to with-gensyms is +spliced in as the body of the LET. Thus, in the code you wrap in a +with-gensyms you can refer to any of the variables named in the list of +variables passed to with-gensyms. + + If you macro-expand the with-gensyms form in the new definition of +do-primes, you should see something like this: + + (let ((ending-value-name (gensym))) + `(do ((,var (next-prime ,start) (next-prime (1+ ,var))) + (,ending-value-name ,end)) + ((> ,var ,ending-value-name)) + ,@body)) + + Looks good. While this macro is fairly trivial, it's important to +keep clear about when the different macros are expanded: when you +compile the DEFMACRO of do-primes, the with-gensyms form is expanded +into the code just shown and compiled. Thus, the compiled version of +do-primes is just the same as if you had written the outer LET by hand. +When you compile a function that uses do-primes, the code generated by +with-gensyms runs generating the do-primes expansion, but with-gensyms +itself isn't needed to compile a do-primes form since it has already +been expanded, back when do-primes was compiled. + +* Menu: + +* 8-8-1:: Another classic macro-writing MACRO: ONCE-ONLY + + +File: pcl.info, Node: 8-8-1, Next: 8-8, Prev: 8-8, Up: 8-8 + +Another classic macro-writing MACRO: ONCE-ONLY +============================================== + +Another classic macro-writing macro is once-only, which is used to +generate code that evaluates certain macro arguments once only and in a +particular order. Using once-only, you could write do-primes almost as +simply as the original leaky version, like this: + + (defmacro do-primes ((var start end) &body body) + (once-only (start end) + `(do ((,var (next-prime ,start) (next-prime (1+ ,var)))) + ((> ,var ,end)) + ,@body))) + + However, the implementation of once-only is a bit too involved for a +blow-by-blow explanation, as it relies on multiple levels of backquoting +and unquoting. If you really want to sharpen your macro chops, you can +try to figure out how it works. It looks like this: + + (defmacro once-only ((&rest names) &body body) + (let ((gensyms (loop for n in names collect (gensym)))) + `(let (,@(loop for g in gensyms collect `(,g (gensym)))) + `(let (,,@(loop for g in gensyms for n in names collect ``(,,g ,,n))) + ,(let (,@(loop for n in names for g in gensyms collect `(,n ,g))) + ,@body))))) + + +File: pcl.info, Node: 8-9, Next: Chapter 9, Prev: 8-8, Up: Chapter 8 + +Beyond Simple Macros +==================== + +I could, of course, say a lot more about macros. All the macros you've +seen so far have been fairly simple examples that save you a bit of +typing but don't provide radical new ways of expressing things. In +upcoming chapters you'll see examples of macros that allow you to +express things in ways that would be virtually impossible without +macros. You'll start in the very next chapter, in which you'll build a +simple but effective unit test framework. + + +File: pcl.info, Node: Chapter 9, Next: Chapter 10, Prev: Chapter 8, Up: Top + +9. Practical: Building a Unit Test Framework +============================================ + +In this chapter you'll return to cutting code and develop a simple unit +testing framework for Lisp. This will give you a chance to use some of +the features you've learned about since Chapter 3, including macros and +dynamic variables, in real code. + + The main design goal of the test framework will be to make it as easy +as possible to add new tests, to run various suites of tests, and to +track down test failures. For now you'll focus on designing a framework +you can use during interactive development. + + The key feature of an automated testing framework is that the +framework is responsible for telling you whether all the tests passed. +You don't want to spend your time slogging through test output checking +answers when the computer can do it much more quickly and accurately. +Consequently, each test case must be an expression that yields a boolean +value-true or false, pass or fail. For instance, if you were writing +tests for the built-in + function, these might be reasonable test cases: +(1) + + (= (+ 1 2) 3) + (= (+ 1 2 3) 6) + (= (+ -1 -3) -4) + + Functions that have side effects will be tested slightly +differently-you'll have to call the function and then check for evidence +of the expected side effects. (2) But in the end, every test case has +to boil down to a boolean expression, thumbs up or thumbs down. + +* Menu: + +* 9-1:: Two First Tries +* 9-2:: Refactoring +* 9-3:: Fixing the Return Value +* 9-4:: Better Result Reporting +* 9-5:: An Abstraction Emerges +* 9-6:: A Hierarchy of Tests +* 9-7:: Wrapping Up + + ---------- Footnotes ---------- + + (1) This is for illustrative purposes only-obviously, writing test +cases for built-in functions such as + is a bit silly, since if such +basic things aren't working, the chances the tests will be running the +way you expect is pretty slim. On the other hand, most Common Lisps are +implemented largely in Common Lisp, so it's not crazy to imagine writing +test suites in Common Lisp to test the standard library functions. + + (2) Side effects can include such things as signaling errors; I'll +discuss Common Lisp's error handling system in Chapter 19. You may, +after reading that chapter, want to think about how to incorporate tests +that check whether a function does or does not signal a particular error +in certain situations. + + +File: pcl.info, Node: 9-1, Next: 9-2, Prev: Chapter 9, Up: Chapter 9 + +Two First Tries +=============== + +If you were doing ad hoc testing, you could enter these expressions at +the REPL and check that they return T. But you want a framework that +makes it easy to organize and run these test cases whenever you want. +If you want to start with the simplest thing that could possibly work, +you can just write a function that evaluates the test cases and ANDs the +results together. + + (defun test-+ () + (and + (= (+ 1 2) 3) + (= (+ 1 2 3) 6) + (= (+ -1 -3) -4))) + + Whenever you want to run this set of test cases, you can call test-+. + + CL-USER> (test-+) + T + + As long as it returns T, you know the test cases are passing. This +way of organizing tests is also pleasantly concise-you don't have to +write a bunch of test bookkeeping code. However, as you'll discover the +first time a test case fails, the result reporting leaves something to +be desired. When test-+ returns NIL, you'll know something failed, but +you'll have no idea which test case it was. + + So let's try another simple-even simpleminded-approach. To find out +what happens to each test case, you could write something like this: + + (defun test-+ () + (format t "~:[FAIL~;pass~] ... ~a~%" (= (+ 1 2) 3) '(= (+ 1 2) 3)) + (format t "~:[FAIL~;pass~] ... ~a~%" (= (+ 1 2 3) 6) '(= (+ 1 2 3) 6)) + (format t "~:[FAIL~;pass~] ... ~a~%" (= (+ -1 -3) -4) '(= (+ -1 -3) -4))) + + Now each test case will be reported individually. The +~:[FAIL~;pass~] part of the FORMAT directive causes FORMAT to print +"FAIL" if the first format argument is false and "pass" otherwise. (1) +Then you label the result with the test expression itself. Now running +test-+ shows you exactly what's going on. + + CL-USER> (test-+) + pass ... (= (+ 1 2) 3) + pass ... (= (+ 1 2 3) 6) + pass ... (= (+ -1 -3) -4) + NIL + + This time the result reporting is more like what you want, but the +code itself is pretty gross. The repeated calls to FORMAT as well as +the tedious duplication of the test expression cry out to be refactored. +The duplication of the test expression is particularly grating because +if you mistype it, the test results will be mislabeled. + + Another problem is that you don't get a single indicator whether all +the test cases passed. It's easy enough, with only three test cases, to +scan the output looking for "FAIL"; however, when you have hundreds of +test cases, it'll be more of a hassle. + + ---------- Footnotes ---------- + + (1) I'll discuss this and other FORMAT directives in more detail in +Chapter 18. + + +File: pcl.info, Node: 9-2, Next: 9-3, Prev: 9-1, Up: Chapter 9 + +Refactoring +=========== + +What you'd really like is a way to write test functions as streamlined +as the first test-+ that return a single T or NIL value but that also +report on the results of individual test cases like the second version. +Since the second version is close to what you want in terms of +functionality, your best bet is to see if you can factor out some of the +annoying duplication. + + The simplest way to get rid of the repeated similar calls to FORMAT +is to create a new function. + + (defun report-result (result form) + (format t "~:[FAIL~;pass~] ... ~a~%" result form)) + + Now you can write test-+ with calls to report-result instead of +FORMAT. It's not a huge improvement, but at least now if you decide to +change the way you report results, there's only one place you have to +change. + + (defun test-+ () + (report-result (= (+ 1 2) 3) '(= (+ 1 2) 3)) + (report-result (= (+ 1 2 3) 6) '(= (+ 1 2 3) 6)) + (report-result (= (+ -1 -3) -4) '(= (+ -1 -3) -4))) + + Next you need to get rid of the duplication of the test case +expression, with its attendant risk of mislabeling of results. What +you'd really like is to be able to treat the expression as both code (to +get the result) and data (to use as the label). Whenever you want to +treat code as data, that's a sure sign you need a macro. Or, to look at +it another way, what you need is a way to automate writing the +error-prone report-result calls. You'd like to be able to say something +like this: + + (check (= (+ 1 2) 3)) + + and have it mean the following: + + (report-result (= (+ 1 2) 3) '(= (+ 1 2) 3)) + + Writing a macro to do this translation is trivial. + + (defmacro check (form) + `(report-result ,form ',form)) + + Now you can change test-+ to use check. + + (defun test-+ () + (check (= (+ 1 2) 3)) + (check (= (+ 1 2 3) 6)) + (check (= (+ -1 -3) -4))) + + Since you're on the hunt for duplication, why not get rid of those +repeated calls to check? You can define check to take an arbitrary +number of forms and wrap them each in a call to report-result. + + (defmacro check (&body forms) + `(progn + ,@(loop for f in forms collect `(report-result ,f ',f)))) + + This definition uses a common macro idiom of wrapping a PROGN around +a series of forms in order to turn them into a single form. Notice also +how you can use , to splice in the result of an expression that returns +a list of expressions that are themselves generated with a backquote +template. + + With the new version of check you can write a new version of test-+ +like this: + + (defun test-+ () + (check + (= (+ 1 2) 3) + (= (+ 1 2 3) 6) + (= (+ -1 -3) -4))) + + that is equivalent to the following code: + + (defun test-+ () + (progn + (report-result (= (+ 1 2) 3) '(= (+ 1 2) 3)) + (report-result (= (+ 1 2 3) 6) '(= (+ 1 2 3) 6)) + (report-result (= (+ -1 -3) -4) '(= (+ -1 -3) -4)))) + + Thanks to check, this version is as concise as the first version of +test-+ but expands into code that does the same thing as the second +version. And now any changes you want to make to how test-+ behaves, +you can make by changing check. + + +File: pcl.info, Node: 9-3, Next: 9-4, Prev: 9-2, Up: Chapter 9 + +Fixing the Return Value +======================= + +You can start with fixing test-+ so its return value indicates whether +all the test cases passed. Since check is responsible for generating +the code that ultimately runs the test cases, you just need to change it +to generate code that also keeps track of the results. + + As a first step, you can make a small change to report-result so it +returns the result of the test case it's reporting. + + (defun report-result (result form) + (format t "~:[FAIL~;pass~] ... ~a~%" result form) + result) + + Now that report-result returns the result of its test case, it might +seem you could just change the PROGN to an AND to combine the results. +Unfortunately, AND doesn't do quite what you want in this case because +of its short-circuiting behavior: as soon as one test case fails, AND +will skip the rest. On the other hand, if you had a construct that +worked like AND without the short-circuiting, you could use it in the +place of PROGN, and you'd be done. Common Lisp doesn't provide such a +construct, but that's no reason you can't use it: it's a trivial matter +to write a macro to provide it yourself. + + Leaving test cases aside for a moment, what you want is a macro-let's +call it combine-results-that will let you say this: + + (combine-results + (foo) + (bar) + (baz)) + + and have it mean something like this: + + (let ((result t)) + (unless (foo) (setf result nil)) + (unless (bar) (setf result nil)) + (unless (baz) (setf result nil)) + result) + + The only tricky bit to writing this macro is that you need to +introduce a variable-result in the previous code-in the expansion. As +you saw in the previous chapter, using a literal name for variables in +macro expansions can introduce a leak in your macro abstraction, so +you'll need to create a unique name. This is a job for with-gensyms. +You can define combine-results like this: + + (defmacro combine-results (&body forms) + (with-gensyms (result) + `(let ((,result t)) + ,@(loop for f in forms collect `(unless ,f (setf ,result nil))) + ,result))) + + Now you can fix check by simply changing the expansion to use +combine-results instead of PROGN. + + (defmacro check (&body forms) + `(combine-results + ,@(loop for f in forms collect `(report-result ,f ',f)))) + + With that version of check, test-+ should emit the results of its +three test expressions and then return T to indicate that everything +passed. (1) + + CL-USER> (test-+) + pass ... (= (+ 1 2) 3) + pass ... (= (+ 1 2 3) 6) + pass ... (= (+ -1 -3) -4) + T + + And if you change one of the test cases so it fails, (2) the final +return value changes to NIL. + + CL-USER> (test-+) + pass ... (= (+ 1 2) 3) + pass ... (= (+ 1 2 3) 6) + FAIL ... (= (+ -1 -3) -5) + NIL + + ---------- Footnotes ---------- + + (1) If test-+ has been compiled-which may happen implicitly in +certain Lisp implementations-you may need to reevaluate the definition +of test-+ to get the changed definition of check to affect the behavior +of test-+. Interpreted code, on the other hand, typically expands +macros anew each time the code is interpreted, allowing the effects of +macro redefinitions to be seen immediately. + + (2) You have to change the test to make it fail since you can't +change the behavior of +. + + +File: pcl.info, Node: 9-4, Next: 9-5, Prev: 9-3, Up: Chapter 9 + +Better Result Reporting +======================= + +As long as you have only one test function, the current result reporting +is pretty clear. If a particular test case fails, all you have to do is +find the test case in the check form and figure out why it's failing. +But if you write a lot of tests, you'll probably want to organize them +somehow, rather than shoving them all into one function. For instance, +suppose you wanted to add some test cases for the * function. You might +write a new test function. + + (defun test-* () + (check + (= (* 2 2) 4) + (= (* 3 5) 15))) + + Now that you have two test functions, you'll probably want another +function that runs all the tests. That's easy enough. + + (defun test-arithmetic () + (combine-results + (test-+) + (test-*))) + + In this function you use combine-results instead of check since both +test-+ and test-* will take care of reporting their own results. When +you run test-arithmetic, you'll get the following results: + + CL-USER> (test-arithmetic) + pass ... (= (+ 1 2) 3) + pass ... (= (+ 1 2 3) 6) + pass ... (= (+ -1 -3) -4) + pass ... (= (* 2 2) 4) + pass ... (= (* 3 5) 15) + T + + Now imagine that one of the test cases failed and you need to track +down the problem. With only five test cases and two test functions, it +won't be too hard to find the code of the failing test case. But +suppose you had 500 test cases spread across 20 functions. It might be +nice if the results told you what function each test case came from. + + Since the code that prints the results is centralized in +report-result, you need a way to pass information about what test +function you're in to report-result. You could add a parameter to +report-result to pass this information, but check, which generates the +calls to report-result, doesn't know what function it's being called +from, which means you'd also have to change the way you call check, +passing it an argument that it simply passes onto report-result. + + This is exactly the kind of problem dynamic variables were designed +to solve. If you create a dynamic variable that each test function +binds to the name of the function before calling check, then +report-result can use it without check having to know anything about it. + + Step one is to declare the variable at the top level. + + (defvar *test-name* nil) + + Now you need to make another tiny change to report-result to include +*test-name* in the FORMAT output. + + (format t "~:[FAIL~;pass~] ... ~a: ~a~%" result *test-name* form) + + With those changes, the test functions will still work but will +produce the following output because *test-name* is never rebound: + + CL-USER> (test-arithmetic) + pass ... NIL: (= (+ 1 2) 3) + pass ... NIL: (= (+ 1 2 3) 6) + pass ... NIL: (= (+ -1 -3) -4) + pass ... NIL: (= (* 2 2) 4) + pass ... NIL: (= (* 3 5) 15) + T + + For the name to be reported properly, you need to change the two test +functions. + + (defun test-+ () + (let ((*test-name* 'test-+)) + (check + (= (+ 1 2) 3) + (= (+ 1 2 3) 6) + (= (+ -1 -3) -4)))) + + (defun test-* () + (let ((*test-name* 'test-*)) + (check + (= (* 2 2) 4) + (= (* 3 5) 15)))) + + Now the results are properly labeled. + + CL-USER> (test-arithmetic) + pass ... TEST-+: (= (+ 1 2) 3) + pass ... TEST-+: (= (+ 1 2 3) 6) + pass ... TEST-+: (= (+ -1 -3) -4) + pass ... TEST-*: (= (* 2 2) 4) + pass ... TEST-*: (= (* 3 5) 15) + T + + +File: pcl.info, Node: 9-5, Next: 9-6, Prev: 9-4, Up: Chapter 9 + +An Abstraction Emerges +====================== + +In fixing the test functions, you've introduced several new bits of +duplication. Not only does each function have to include the name of +the function twice-once as the name in the DEFUN and once in the binding +of *test-name*-but the same three-line code pattern is duplicated +between the two functions. You could remove the duplication simply on +the grounds that duplication is bad. But if you look more closely at +the root cause of the duplication, you can learn an important lesson +about how to use macros. + + The reason both these functions start the same way is because they're +both test functions. The duplication arises because, at the moment, +test function is only half an abstraction. The abstraction exists in +your mind, but in the code there's no way to express "this is a test +function" other than to write code that follows a particular pattern. + + Unfortunately, partial abstractions are a crummy tool for building +software. Because a half abstraction is expressed in code by a +manifestation of the pattern, you're guaranteed to have massive code +duplication with all the normal bad consequences that implies for +maintainability. More subtly, because the abstraction exists only in +the minds of programmers, there's no mechanism to make sure different +programmers (or even the same programmer working at different times) +actually understand the abstraction the same way. To make a complete +abstraction, you need a way to express "this is a test function" and +have all the code required by the pattern be generated for you. In +other words, you need a macro. + + Because the pattern you're trying to capture is a DEFUN plus some +boilerplate code, you need to write a macro that will expand into a +DEFUN. You'll then use this macro, instead of a plain DEFUN to define +test functions, so it makes sense to call it deftest. + + (defmacro deftest (name parameters &body body) + `(defun ,name ,parameters + (let ((*test-name* ',name)) + ,@body))) + + With this macro you can rewrite test-+ as follows: + + (deftest test-+ () + (check + (= (+ 1 2) 3) + (= (+ 1 2 3) 6) + (= (+ -1 -3) -4))) + + +File: pcl.info, Node: 9-6, Next: 9-7, Prev: 9-5, Up: Chapter 9 + +A Hierarchy of Tests +==================== + +Now that you've established test functions as first-class citizens, the +question might arise, should test-arithmetic be a test function? As +things stand, it doesn't really matter-if you did define it with +deftest, its binding of *test-name* would be shadowed by the bindings in +test-+ and test-* before any results are reported. + + But now imagine you've got thousands of test cases to organize. The +first level of organization is provided by test functions such as test-+ +and test-* that directly call check. But with thousands of test cases, +you'll likely need other levels of organization. Functions such as +test-arithmetic can group related test functions into test suites. Now +suppose some low-level test functions are called from multiple test +suites. It's not unheard of for a test case to pass in one context but +fail in another. If that happens, you'll probably want to know more +than just what low-level test function contains the test case. + + If you define the test suite functions such as test-arithmetic with +deftest and make a small change to the *test-name* bookkeeping, you can +have results reported with a "fully qualified" path to the test case, +something like this: + + pass ... (TEST-ARITHMETIC TEST-+): (= (+ 1 2) 3) + + Because you've already abstracted the process of defining a test +function, you can change the bookkeeping details without modifying the +code of the test functions. (1) To make *test-name* hold a list of test +function names instead of just the name of the most recently entered +test function, you just need to change this binding form: + + (let ((*test-name* ',name)) + + to the following: + + (let ((*test-name* (append *test-name* (list ',name)))) + + Since APPEND returns a new list made up of the elements of its +arguments, this version will bind *test-name* to a list containing the +old contents of *test-name* with the new name tacked onto the end. (2) +When each test function returns, the old value of *test-name* will be +restored. + + Now you can redefine test-arithmetic with deftest instead of DEFUN. + + (deftest test-arithmetic () + (combine-results + (test-+) + (test-*))) + + The results now show exactly how you got to each test expression. + + CL-USER> (test-arithmetic) + pass ... (TEST-ARITHMETIC TEST-+): (= (+ 1 2) 3) + pass ... (TEST-ARITHMETIC TEST-+): (= (+ 1 2 3) 6) + pass ... (TEST-ARITHMETIC TEST-+): (= (+ -1 -3) -4) + pass ... (TEST-ARITHMETIC TEST-*): (= (* 2 2) 4) + pass ... (TEST-ARITHMETIC TEST-*): (= (* 3 5) 15) + T + + As your test suite grows, you can add new layers of test functions; +as long as they're defined with deftest, the results will be reported +correctly. For instance, the following: + + (deftest test-math () + (test-arithmetic)) + + would generate these results: + + CL-USER> (test-math) + pass ... (TEST-MATH TEST-ARITHMETIC TEST-+): (= (+ 1 2) 3) + pass ... (TEST-MATH TEST-ARITHMETIC TEST-+): (= (+ 1 2 3) 6) + pass ... (TEST-MATH TEST-ARITHMETIC TEST-+): (= (+ -1 -3) -4) + pass ... (TEST-MATH TEST-ARITHMETIC TEST-*): (= (* 2 2) 4) + pass ... (TEST-MATH TEST-ARITHMETIC TEST-*): (= (* 3 5) 15) + T + + ---------- Footnotes ---------- + + (1) Though, again, if the test functions have been compiled, you'll +have to recompile them after changing the macro. + + (2) As you'll see in Chapter 12, APPENDing to the end of a list isn't +the most efficient way to build a list. But for now this is +sufficient-as long as the test hierarchies aren't too deep, it should be +fine. And if it becomes a problem, all you'll have to do is change the +definition of deftest. + + +File: pcl.info, Node: 9-7, Next: Chapter 10, Prev: 9-6, Up: Chapter 9 + +Wrapping Up +=========== + +You could keep going, adding more features to this test framework. But +as a framework for writing tests with a minimum of busywork and easily +running them from the REPL, this is a reasonable start. Here's the +complete code, all 26 lines of it: + + (defvar *test-name* nil) + + (defmacro deftest (name parameters &body body) + "Define a test function. Within a test function we can call + other test functions or use 'check' to run individual test + cases." + `(defun ,name ,parameters + (let ((*test-name* (append *test-name* (list ',name)))) + ,@body))) + + (defmacro check (&body forms) + "Run each expression in 'forms' as a test case." + `(combine-results + ,@(loop for f in forms collect `(report-result ,f ',f)))) + + (defmacro combine-results (&body forms) + "Combine the results (as booleans) of evaluating 'forms' in order." + (with-gensyms (result) + `(let ((,result t)) + ,@(loop for f in forms collect `(unless ,f (setf ,result nil))) + ,result))) + + (defun report-result (result form) + "Report the results of a single test case. Called by 'check'." + (format t "~:[FAIL~;pass~] ... ~a: ~a~%" result *test-name* form) + result) + + It's worth reviewing how you got here because it's illustrative of +how programming in Lisp often goes. + + You started by defining a simple version of your problem-how to +evaluate a bunch of boolean expressions and find out if they all +returned true. Just ANDing them together worked and was syntactically +clean but revealed the need for better result reporting. So you wrote +some really simpleminded code, chock-full of duplication and error-prone +idioms that reported the results the way you wanted. + + The next step was to see if you could refactor the second version +into something as clean as the former. You started with a standard +refactoring technique of extracting some code into a function, +report-result. Unfortunately, you could see that using report-result +was going to be tedious and error-prone since you had to pass the test +expression twice, once for the value and once as quoted data. So you +wrote the check macro to automate the details of calling report-result +correctly. + + While writing check, you realized as long as you were generating +code, you could make a single call to check to generate multiple calls +to report-result, getting you back to a version of test-+ about as +concise as the original AND version. + + At that point you had the check API nailed down, which allowed you to +start mucking with how it worked on the inside. The next task was to +fix check so the code it generated would return a boolean indicating +whether all the test cases had passed. Rather than immediately hacking +away at check, you paused to indulge in a little language design by +fantasy. What if-you fantasized-there was already a +non-short-circuiting AND construct. Then fixing check would be trivial. +Returning from fantasyland you realized there was no such construct but +that you could write one in a few lines. After writing combine-results, +the fix to check was indeed trivial. + + At that point all that was left was to make a few more improvements +to the way you reported test results. Once you started making changes +to the test functions, you realized those functions represented a +special category of function that deserved its own abstraction. So you +wrote deftest to abstract the pattern of code that turns a regular +function into a test function. + + With deftest providing an abstraction barrier between the test +definitions and the underlying machinery, you were able to enhance the +result reporting without touching the test functions. + + Now, with the basics of functions, variables, and macros mastered, +and a little practical experience using them, you're ready to start +exploring Common Lisp's rich standard library of functions and data +types. + + +File: pcl.info, Node: Chapter 10, Next: Chapter 11, Prev: Chapter 9, Up: Top + +10. Numbers, Characters, and Strings +==================================== + +While functions, variables, macros, and 25 special operators provide the +basic building blocks of the language itself, the building blocks of +your programs will be the data structures you use. As Fred Brooks +observed in The Mythical Man-Month, "Representation is the essence of +programming." (1) + + Common Lisp provides built-in support for most of the data types +typically found in modern languages: numbers (integer, floating point, +and complex), characters, strings, arrays (including multidimensional +arrays), lists, hash tables, input and output streams, and an +abstraction for portably representing filenames. Functions are also a +first-class data type in Lisp-they can be stored in variables, passed as +arguments, returned as return values, and created at runtime. + + And these built-in types are just the beginning. They're defined in +the language standard so programmers can count on them being available +and because they tend to be easier to implement efficiently when tightly +integrated with the rest of the implementation. But, as you'll see in +later chapters, Common Lisp also provides several ways for you to define +new data types, define operations on them, and integrate them with the +built-in data types. + + For now, however, you can start with the built-in data types. +Because Lisp is a high-level language, the details of exactly how +different data types are implemented are largely hidden. From your +point of view as a user of the language, the built-in data types are +defined by the functions that operate on them. So to learn a data type, +you just have to learn about the functions you can use with it. +Additionally, most of the built-in data types have a special syntax that +the Lisp reader understands and that the Lisp printer uses. That's why, +for instance, you can write strings as "foo"; numbers as 123, 1/23, and +1.23; and lists as (a b c). I'll describe the syntax for different +kinds of objects when I describe the functions for manipulating them. + + In this chapter, I'll cover the built-in "scalar" data types: +numbers, characters, and strings. Technically, strings aren't true +scalars-a string is a sequence of characters, and you can access +individual characters and manipulate strings with a function that +operates on sequences. But I'll discuss strings here because most of +the string-specific functions manipulate them as single values and also +because of the close relation between several of the string functions +and their character counterparts. + +* Menu: + +* 10-1:: Numbers +* 10-2:: Numeric Literals +* 10-3:: Basic Math +* 10-4:: Numeric Comparisons +* 10-5:: Higher Math +* 10-6:: Characters +* 10-7:: Character Comparisons +* 10-8:: Strings +* 10-9:: String Comparisons + + ---------- Footnotes ---------- + + (1) Fred Brooks, The Mythical Man-Month, 20th Anniversary Edition +(Boston: Addison-Wesley, 1995), p. 103. Emphasis in original. + + +File: pcl.info, Node: 10-1, Next: 10-2, Prev: Chapter 10, Up: Chapter 10 + +Numbers +======= + +Math, as Barbie says, is hard. (1) Common Lisp can't make the math part +any easier, but it does tend to get in the way a lot less than other +programming languages. That's not surprising given its mathematical +heritage. Lisp was originally designed by a mathematician as a tool for +studying mathematical functions. And one of the main projects of the +MAC project at MIT was the Macsyma symbolic algebra system, written in +Maclisp, one of Common Lisp's immediate predecessors. Additionally, +Lisp has been used as a teaching language at places such as MIT where +even the computer science professors cringe at the thought of telling +their students that 10/4 = 2, leading to Lisp's support for exact +ratios. And at various times Lisp has been called upon to compete with +FORTRAN in the high-performance numeric computing arena. + + One of the reasons Lisp is a nice language for math is its numbers +behave more like true mathematical numbers than the approximations of +numbers that are easy to implement in finite computer hardware. For +instance, integers in Common Lisp can be almost arbitrarily large rather +than being limited by the size of a machine word. (2) And dividing two +integers results in an exact ratio, not a truncated value. And since +ratios are represented as pairs of arbitrarily sized integers, ratios +can represent arbitrarily precise fractions. (3) + + On the other hand, for high-performance numeric programming, you may +be willing to trade the exactitude of rationals for the speed offered by +using the hardware's underlying floating-point operations. So, Common +Lisp also offers several types of floating-point numbers, which are +mapped by the implementation to the appropriate hardware-supported +floating-point representations. (4) Floats are also used to represent +the results of a computation whose true mathematical value would be an +irrational number. + + Finally, Common Lisp supports complex numbers-the numbers that result +from doing things such as taking square roots and logarithms of negative +numbers. The Common Lisp standard even goes so far as to specify the +principal values and branch cuts for irrational and transcendental +functions on the complex domain. + + ---------- Footnotes ---------- + + (1) Mattel's Teen Talk Barbie + + (2) Obviously, the size of a number that can be represented on a +computer with finite memory is still limited in practice; furthermore, +the actual representation of bignums used in a particular Common Lisp +implementation may place other limits on the size of number that can be +represented. But these limits are going to be well beyond +"astronomically" large numbers. For instance, the number of atoms in +the universe is estimated to be less than 2^269; current Common Lisp +implementations can easily handle numbers up to and beyond 2^262144. + + (3) Folks interested in using Common Lisp for intensive numeric +computation should note that a naive comparison of the performance of +numeric code in Common Lisp and languages such as C or FORTRAN will +probably show Common Lisp to be much slower. This is because something +as simple as (+ a b) in Common Lisp is doing a lot more than the +seemingly equivalent a + b in one of those languages. Because of Lisp's +dynamic typing and support for things such as arbitrary precision +rationals and complex numbers, a seemingly simple addition is doing a +lot more than an addition of two numbers that are known to be +represented by machine words. However, you can use declarations to give +Common Lisp information about the types of numbers you're using that +will enable it to generate code that does only as much work as the code +that would be generated by a C or FORTRAN compiler. Tuning numeric code +for this kind of performance is beyond the scope of this book, but it's +certainly possible. + + (4) While the standard doesn't require it, many Common Lisp +implementations support the IEEE standard for floating-point arithmetic, +IEEE Standard for Binary Floating-Point Arithmetic, ANSI/ IEEE Std +754-1985 (Institute of Electrical and Electronics Engineers, 1985). + + +File: pcl.info, Node: 10-2, Next: 10-3, Prev: 10-1, Up: Chapter 10 + +Numeric Literals +================ + +You can write numeric literals in a variety of ways; you saw a few +examples in Chapter 4. However, it's important to keep in mind the +division of labor between the Lisp reader and the Lisp evaluator-the +reader is responsible for translating text into Lisp objects, and the +Lisp evaluator then deals only with those objects. For a given number +of a given type, there can be many different textual representations, +all of which will be translated to the same object representation by the +Lisp reader. For instance, you can write the integer 10 as 10, 20/2, +#xA, or any of a number of other ways, but the reader will translate all +these to the same object. When numbers are printed back out-say, at the +REPL-they're printed in a canonical textual syntax that may be different +from the syntax used to enter the number. For example: + + CL-USER> 10 + 10 + CL-USER> 20/2 + 10 + CL-USER> #xa + 10 + + The syntax for integer values is an optional sign (+ or -) followed +by one or more digits. Ratios are written as an optional sign and a +sequence of digits, representing the numerator, a slash (/), and another +sequence of digits representing the denominator. All rational numbers +are "canonicalized" as they're read-that's why 10 and 20/2 are both read +as the same number, as are 3/4 and 6/8. Rationals are printed in +"reduced" form-integer values are printed in integer syntax and ratios +with the numerator and denominator reduced to lowest terms. + + It's also possible to write rationals in bases other than 10. If +preceded by #B or #b, a rational literal is read as a binary number with +0 and 1 as the only legal digits. An #O or #o indicates an octal number +(legal digits 0-7), and #X or #x indicates hexadecimal (legal digits 0-F +or 0-f). You can write rationals in other bases from 2 to 36 with #nR +where n is the base (always written in decimal). Additional "digits" +beyond 9 are taken from the letters A-Z or a-z. Note that these radix +indicators apply to the whole rational-it's not possible to write a +ratio with the numerator in one base and denominator in another. Also, +you can write integer values, but not ratios, as decimal digits +terminated with a decimal point. (1) Some examples of rationals, with +their canonical, decimal representation are as follows: + + 123 ==> 123 + +123 ==> 123 + -123 ==> -123 + 123. ==> 123 + 2/3 ==> 2/3 + -2/3 ==> -2/3 + 4/6 ==> 2/3 + 6/3 ==> 2 + #b10101 ==> 21 + #b1010/1011 ==> 10/11 + #o777 ==> 511 + #xDADA ==> 56026 + #36rABCDEFGHIJKLMNOPQRSTUVWXYZ ==> 8337503854730415241050377135811259267835 + + You can also write floating-point numbers in a variety of ways. +Unlike rational numbers, the syntax used to notate a floating-point +number can affect the actual type of number read. Common Lisp defines +four subtypes of floating-point number: short, single, double, and long. +Each subtype can use a different number of bits in its representation, +which means each subtype can represent values spanning a different range +and with different precision. More bits gives a wider range and more +precision. (2) + + The basic format for floating-point numbers is an optional sign +followed by a nonempty sequence of decimal digits possibly with an +embedded decimal point. This sequence can be followed by an exponent +marker for "computerized scientific notation." (3) The exponent marker +consists of a single letter followed by an optional sign and a sequence +of digits, which are interpreted as the power of ten by which the number +before the exponent marker should be multiplied. The letter does double +duty: it marks the beginning of the exponent and indicates what +floating- point representation should be used for the number. The +exponent markers s, f, d, l (and their uppercase equivalents) indicate +short, single, double, and long floats, respectively. The letter e +indicates that the default representation (initially single-float) +should be used. + + Numbers with no exponent marker are read in the default +representation and must contain a decimal point followed by at least one +digit to distinguish them from integers. The digits in a floating-point +number are always treated as base 10 digits-the #B, #X, #O, and #R +syntaxes work only with rationals. The following are some example +floating-point numbers along with their canonical representation: + + 1.0 ==> 1.0 + 1e0 ==> 1.0 + 1d0 ==> 1.0d0 + 123.0 ==> 123.0 + 123e0 ==> 123.0 + 0.123 ==> 0.123 + .123 ==> 0.123 + 123e-3 ==> 0.123 + 123E-3 ==> 0.123 + 0.123e20 ==> 1.23e+19 + 123d23 ==> 1.23d+25 + + Finally, complex numbers are written in their own syntax, namely, #C +or #c followed by a list of two real numbers representing the real and +imaginary part of the complex number. There are actually five kinds of +complex numbers because the real and imaginary parts must either both be +rational or both be the same kind of floating-point number. + + But you can write them however you want-if a complex is written with +one rational and one floating-point part, the rational is converted to a +float of the appropriate representation. Similarly, if the real and +imaginary parts are both floats of different representations, the one in +the smaller representation will be upgraded. + + However, no complex numbers have a rational real component and a zero +imaginary part-since such values are, mathematically speaking, rational, +they're represented by the appropriate rational value. The same +mathematical argument could be made for complex numbers with +floating-point components, but for those complex types a number with a +zero imaginary part is always a different object than the floating-point +number representing the real component. Here are some examples of +numbers written the complex number syntax: + + #c(2 1) ==> #c(2 1) + #c(2/3 3/4) ==> #c(2/3 3/4) + #c(2 1.0) ==> #c(2.0 1.0) + #c(2.0 1.0d0) ==> #c(2.0d0 1.0d0) + #c(1/2 1.0) ==> #c(0.5 1.0) + #c(3 0) ==> 3 + #c(3.0 0.0) ==> #c(3.0 0.0) + #c(1/2 0) ==> 1/2 + #c(-6/3 0) ==> -2 + + ---------- Footnotes ---------- + + (1) It's also possible to change the default base the reader uses for +numbers without a specific radix marker by changing the value of the +global variable *READ-BASE*. However, it's not clear that's the path to +anything other than complete insanity. + + (2) Since the purpose of floating-point numbers is to make efficient +use of floating-point hardware, each Lisp implementation is allowed to +map these four subtypes onto the native floating-point types as +appropriate. If the hardware supports fewer than four distinct +representations, one or more of the types may be equivalent. + + (3) "Computerized scientific notation" is in scare quotes because, +while commonly used in computer languages since the days of FORTRAN, +it's actually quite different from real scientific notation. In +particular, something like 1.0e4 means 10000.0, but in true scientific +notation that would be written as 1.0 x 10^4. And to further confuse +matters, in true scientific notation the letter e stands for the base of +the natural logarithm, so something like 1.0 x e^4, while superficially +similar to 1.0e4, is a completely different value, approximately 54.6. + + +File: pcl.info, Node: 10-3, Next: 10-4, Prev: 10-2, Up: Chapter 10 + +Basic Math +========== + +The basic arithmetic operations-addition, subtraction, multiplication, +and division-are supported for all the different kinds of Lisp numbers +with the functions +, -, *, and /. Calling any of these functions with +more than two arguments is equivalent to calling the same function on +the first two arguments and then calling it again on the resulting value +and the rest of the arguments. For example, (+ 1 2 3) is equivalent to +(+ (+ 1 2) 3). With only one argument, + and * return the value; - +returns its negation and / its reciprocal. (1) + + (+ 1 2) ==> 3 + (+ 1 2 3) ==> 6 + (+ 10.0 3.0) ==> 13.0 + (+ #c(1 2) #c(3 4)) ==> #c(4 6) + (- 5 4) ==> 1 + (- 2) ==> -2 + (- 10 3 5) ==> 2 + (* 2 3) ==> 6 + (* 2 3 4) ==> 24 + (/ 10 5) ==> 2 + (/ 10 5 2) ==> 1 + (/ 2 3) ==> 2/3 + (/ 4) ==> 1/4 + + If all the arguments are the same type of number (rational, floating +point, or complex), the result will be the same type except in the case +where the result of an operation on complex numbers with rational +components yields a number with a zero imaginary part, in which case the +result will be a rational. However, floating-point and complex numbers +are contagious-if all the arguments are reals but one or more are +floating-point numbers, the other arguments are converted to the nearest +floating-point value in a "largest" floating-point representation of the +actual floating-point arguments. Floating-point numbers in a "smaller" +representation are also converted to the larger representation. +Similarly, if any of the arguments are complex, any real arguments are +converted to the complex equivalents. + + (+ 1 2.0) ==> 3.0 + (/ 2 3.0) ==> 0.6666667 + (+ #c(1 2) 3) ==> #c(4 2) + (+ #c(1 2) 3/2) ==> #c(5/2 2) + (+ #c(1 1) #c(2 -1)) ==> 3 + + Because / doesn't truncate, Common Lisp provides four flavors of +truncating and rounding for converting a real number (rational or +floating point) to an integer: FLOOR truncates toward negative infinity, +returning the largest integer less than or equal to the argument. +CEILING truncates toward positive infinity, returning the smallest +integer greater than or equal to the argument. TRUNCATE truncates +toward zero, making it equivalent to FLOOR for positive arguments and to +CEILING for negative arguments. And ROUND rounds to the nearest +integer. If the argument is exactly halfway between two integers, it +rounds to the nearest even integer. + + Two related functions are MOD and REM, which return the modulus and +remainder of a truncating division on real numbers. These two functions +are related to the FLOOR and TRUNCATE functions as follows: + + (+ (* (floor (/ x y)) y) (mod x y)) === x + (+ (* (truncate (/ x y)) y) (rem x y)) === x + + Thus, for positive quotients they're equivalent, but for negative +quotients they produce different results. (2) + + The functions 1+ and 1- provide a shorthand way to express adding and +subtracting one from a number. Note that these are different from the +macros INCF and DECF. 1+ and 1- are just functions that return a new +value, but INCF and DECF modify a place. The following equivalences +show the relation between INCF/DECF, 1+/1-, and +/-: + + (incf x) === (setf x (1+ x)) === (setf x (+ x 1)) + (decf x) === (setf x (1- x)) === (setf x (- x 1)) + (incf x 10) === (setf x (+ x 10)) + (decf x 10) === (setf x (- x 10)) + + ---------- Footnotes ---------- + + (1) For mathematical consistency, + and * can also be called with no +arguments, in which case they return the appropriate identity: 0 for + +and 1 for *. + + (2) Roughly speaking, MOD is equivalent to the % operator in Perl and +Python, and REM is equivalent to the % in C and Java. (Technically, the +exact behavior of % in C wasn't specified until the C99 standard.) + + +File: pcl.info, Node: 10-4, Next: 10-5, Prev: 10-3, Up: Chapter 10 + +Numeric Comparisons +=================== + +The function = is the numeric equality predicate. It compares numbers +by mathematical value, ignoring differences in type. Thus, = will +consider mathematically equivalent values of different types equivalent +while the generic equality predicate EQL would consider them +inequivalent because of the difference in type. (The generic equality +predicate EQUALP, however, uses = to compare numbers.) If it's called +with more than two arguments, it returns true only if they all have the +same value. Thus: + + (= 1 1) ==> T + (= 10 20/2) ==> T + (= 1 1.0 #c(1.0 0.0) #c(1 0)) ==> T + + The /= function, conversely, returns true only if all its arguments +are different values. + + (/= 1 1) ==> NIL + (/= 1 2) ==> T + (/= 1 2 3) ==> T + (/= 1 2 3 1) ==> NIL + (/= 1 2 3 1.0) ==> NIL + + The functions <, >, <=, and >= order rationals and floating-point +numbers (in other words, the real numbers.) Like = and /=, these +functions can be called with more than two arguments, in which case each +argument is compared to the argument to its right. + + (< 2 3) ==> T + (> 2 3) ==> NIL + (> 3 2) ==> T + (< 2 3 4) ==> T + (< 2 3 3) ==> NIL + (<= 2 3 3) ==> T + (<= 2 3 3 4) ==> T + (<= 2 3 4 3) ==> NIL + + To pick out the smallest or largest of several numbers, you can use +the function MIN or MAX, which takes any number of real number arguments +and returns the minimum or maximum value. + + (max 10 11) ==> 11 + (min -12 -10) ==> -12 + (max -1 2 -3) ==> 2 + + Some other handy functions are ZEROP, MINUSP, and PLUSP, which test +whether a single real number is equal to, less than, or greater than +zero. Two other predicates, EVENP and ODDP, test whether a single +integer argument is even or odd. The P suffix on the names of these +functions is a standard naming convention for predicate functions, +functions that test some condition and return a boolean. + + +File: pcl.info, Node: 10-5, Next: 10-6, Prev: 10-4, Up: Chapter 10 + +Higher Math +=========== + +The functions you've seen so far are the beginning of the built-in +mathematical functions. Lisp also supports logarithms: LOG; +exponentiation: EXP and EXPT; the basic trigonometric functions: SIN, +COS, and TAN; their inverses: ASIN, ACOS, and ATAN; hyperbolic +functions: SINH, COSH, and TANH; and their inverses: ASINH, ACOSH, and +ATANH. It also provides functions to get at the individual bits of an +integer and to extract the parts of a ratio or a complex number. For a +complete list, see any Common Lisp reference. + + +File: pcl.info, Node: 10-6, Next: 10-7, Prev: 10-5, Up: Chapter 10 + +Characters +========== + +Common Lisp characters are a distinct type of object from numbers. +That's as it should be-characters are not numbers, and languages that +treat them as if they are tend to run into problems when character +encodings change, say, from 8-bit ASCII to 21-bit Unicode. (1) Because +the Common Lisp standard didn't mandate a particular representation for +characters, today several Lisp implementations use Unicode as their +"native" character encoding despite Unicode being only a gleam in a +standards body's eye at the time Common Lisp's own standardization was +being wrapped up. + + The read syntax for characters objects is simple: #\ followed by the +desired character. Thus, #\x is the character x. Any character can be +used after the #\, including otherwise special characters such as ", (, +and whitespace. However, writing whitespace characters this way isn't +very (human) readable; an alternative syntax for certain characters is +#\ followed by the character's name. Exactly what names are supported +depends on the character set and on the Lisp implementation, but all +implementations support the names Space and Newline. Thus, you should +write #\Space instead of #\ , though the latter is technically legal. +Other semistandard names (that implementations must use if the character +set has the appropriate characters) are Tab, Page, Rubout, Linefeed, +Return, and Backspace. + + ---------- Footnotes ---------- + + (1) Even Java, which was designed from the beginning to use Unicode +characters on the theory that Unicode was the going to be the character +encoding of the future, has run into trouble since Java characters are +defined to be a 16-bit quantity and the Unicode 3.1 standard extended +the range of the Unicode character set to require a 21-bit +representation. Ooops. + + +File: pcl.info, Node: 10-7, Next: 10-8, Prev: 10-6, Up: Chapter 10 + +Character Comparisons +===================== + +The main thing you can do with characters, other than putting them into +strings (which I'll get to later in this chapter), is to compare them +with other characters. Since characters aren't numbers, you can't use +the numeric comparison functions, such as < and >. Instead, two sets of +functions provide character-specific analogs to the numeric comparators; +one set is case-sensitive and the other case-insensitive. + + The case-sensitive analog to the numeric = is the function CHAR=. +Like =, CHAR= can take any number of arguments and returns true only if +they're all the same character. The case- insensitive version is +CHAR-EQUAL. + + The rest of the character comparators follow this same naming scheme: +the case-sensitive comparators are named by prepending the analogous +numeric comparator with CHAR; the case-insensitive versions spell out +the comparator name, separated from the CHAR with a hyphen. Note, +however, that <= and >= are "spelled out" with the logical equivalents +NOT-GREATERP and NOT-LESSP rather than the more verbose LESSP-OR-EQUALP +and GREATERP-OR-EQUALP. Like their numeric counterparts, all these +functions can take one or more arguments. Table 10-1 summarizes the +relation between the numeric and character comparison functions. + + Table 10-1. Character Comparison Functions + + + + Other functions that deal with characters provide functions for, +among other things, testing whether a given character is alphabetic or a +digit character, testing the case of a character, obtaining a +corresponding character in a different case, and translating between +numeric values representing character codes and actual character +objects. Again, for complete details, see your favorite Common Lisp +reference. + + +File: pcl.info, Node: 10-8, Next: 10-9, Prev: 10-7, Up: Chapter 10 + +Strings +======= + +As mentioned earlier, strings in Common Lisp are really a composite data +type, namely, a one-dimensional array of characters. Consequently, I'll +cover many of the things you can do with strings in the next chapter +when I discuss the many functions for manipulating sequences, of which +strings are just one type. But strings also have their own literal +syntax and a library of functions for performing string-specific +operations. I'll discuss these aspects of strings in this chapter and +leave the others for Chapter 11. + + As you've seen, literal strings are written enclosed in double +quotes. You can include any character supported by the character set in +a literal string except double quote (") and backslash (\). And you can +include these two as well if you escape them with a backslash. In fact, +backslash always escapes the next character, whatever it is, though this +isn't necessary for any character except for " and itself. Table 10-2 +shows how various literal strings will be read by the Lisp reader. + + Table 10-2. Literal Strings + + + + Note that the REPL will ordinarily print strings in readable form, +adding the enclosing quotation marks and any necessary escaping +backslashes, so if you want to see the actual contents of a string, you +need to use function such as FORMAT designed to print human-readable +output. For example, here's what you see if you type a string +containing an embedded quotation mark at the REPL: + + CL-USER> "foo\"bar" + "foo\"bar" + + FORMAT, on the other hand, will show you the actual string contents: +(1) + + CL-USER> (format t "foo\"bar") + foo"bar + NIL + + ---------- Footnotes ---------- + + (1) Note, however, that not all literal strings can be printed by +passing them as the second argument to FORMAT since certain sequences of +characters have a special meaning to FORMAT. To safely print an +arbitrary string-say, the value of a variable s-with FORMAT you should +write (format t "~a" s). + + +File: pcl.info, Node: 10-9, Next: Chapter 11, Prev: 10-8, Up: Chapter 10 + +String Comparisons +================== + +You can compare strings using a set of functions that follow the same +naming convention as the character comparison functions except with +STRING as the prefix rather than CHAR (see Table 10-3). + + Table 10-3. String Comparison Functions + + + + However, unlike the character and number comparators, the string +comparators can compare only two strings. That's because they also take +keyword arguments that allow you to restrict the comparison to a +substring of either or both strings. The arguments-:start1, :end1, +:start2, and :end2-specify the starting (inclusive) and ending +(exclusive) indices of substrings in the first and second string +arguments. Thus, the following: + + (string= "foobarbaz" "quuxbarfoo" :start1 3 :end1 6 :start2 4 :end2 7) + + compares the substring "bar" in the two arguments and returns true. +The :end1 and :end2 arguments can be NIL (or the keyword argument +omitted altogether) to indicate that the corresponding substring extends +to the end of the string. + + The comparators that return true when their arguments differ-that is, +all of them except STRING= and STRING-EQUAL-return the index in the +first string where the mismatch was detected. + + (string/= "lisp" "lissome") ==> 3 + + If the first string is a prefix of the second, the return value will +be the length of the first string, that is, one greater than the largest +valid index into the string. + + (string< "lisp" "lisper") ==> 4 + + When comparing substrings, the resulting value is still an index into +the string as a whole. For instance, the following compares the +substrings "bar" and "baz" but returns 5 because that's the index of the +r in the first string: + + (string< "foobar" "abaz" :start1 3 :start2 1) ==> 5 ; N.B. not 2 + + Other string functions allow you to convert the case of strings and +trim characters from one or both ends of a string. And, as I mentioned +previously, since strings are really a kind of sequence, all the +sequence functions I'll discuss in the next chapter can be used with +strings. For instance, you can discover the length of a string with the +LENGTH function and can get and set individual characters of a string +with the generic sequence element accessor function, ELT, or the generic +array element accessor function, AREF. Or you can use the +string-specific accessor, CHAR. But those functions, and others, are the +topic of the next chapter, so let's move on. + + +File: pcl.info, Node: Chapter 11, Next: Chapter 12, Prev: Chapter 10, Up: Top + +11. Collections +=============== + +Like most programming languages, Common Lisp provides standard data +types that collect multiple values into a single object. Every language +slices up the collection problem a little bit differently, but the basic +collection types usually boil down to an integer-indexed array type and +a table type that can be used to map more or less arbitrary keys to +values. The former are variously called ‘arrays’, ‘lists’, or +‘tuples’; the latter go by the names ‘hash tables’, +‘associative arrays’, ‘maps’, and ‘dictionaries’. + +Lisp is, of course, famous for its list data structure, and most Lisp +books, following the ontogeny-recapitulates-phylogeny principle of +language instruction, start their discussion of Lisp's collections with +lists. However, that approach often leads readers to the mistaken +conclusion that lists are Lisp's ‘only’ collection type. To make +matters worse, because Lisp's lists are such a flexible data structure, +it ‘is’ possible to use them for many of the things arrays and hash +tables are used for in other languages. But it's a mistake to focus too +much on lists; while they're a crucial data structure for representing +Lisp code as Lisp data, in many situations other data structures are +more appropriate. + +To keep lists from stealing the show, in this chapter I'll focus on +Common Lisp's other collection types: vectors and hash tables. (1) +However, vectors and lists share enough characteristics that Common Lisp +treats them both as subtypes of a more general abstraction, the +sequence. Thus, you can use many of the functions I'll discuss in this +chapter with both vectors and lists. + +* Menu: + +* 11-1:: Vectors +* 11-2:: Subtypes of Vector +* 11-3:: Vectors As Sequences +* 11-4:: Sequence Iterating Functions +* 11-5:: Higher-Order Function Variants +* 11-6:: Whole Sequence Manipulations +* 11-7:: Sorting and Merging +* 11-8:: Subsequence Manipulations +* 11-9:: Sequence Predicates +* 11-10:: Sequence Mapping Functions +* 11-11:: Hash Tables +* 11-12:: Hash Table Iteration + + ---------- Footnotes ---------- + + (1) Once you're familiar with all the data types Common Lisp offers, +you'll also see that lists can be useful for prototyping data structures +that will later be replaced with something more efficient once it +becomes clear how exactly the data is to be used. + + +File: pcl.info, Node: 11-1, Next: 11-2, Prev: Chapter 11, Up: Chapter 11 + +Vectors +======= + +Vectors are Common Lisp's basic integer-indexed collection, and they +come in two flavors. Fixed-size vectors are a lot like arrays in a +language such as Java: a thin veneer over a chunk of contiguous memory +that holds the vector's elements. (1) Resizable vectors, on the other +hand, are more like arrays in Perl or Ruby, lists in Python, or the +ArrayList class in Java: they abstract the actual storage, allowing the +vector to grow and shrink as elements are added and removed. + +You can make fixed-size vectors containing specific values with the +function ‘VECTOR’, which takes any number of arguments and returns a +freshly allocated fixed-size vector containing those arguments. + + (vector) ==> #() + (vector 1) ==> #(1) + (vector 1 2) ==> #(1 2) + +The #(...) syntax is the literal notation for vectors used by the Lisp +printer and reader. This syntax allows you to save and restore vectors +by ‘PRINT’ing them out and ‘READ’ing them back in. You can use +the #(...) syntax to include literal vectors in your code, but as the +effects of modifying literal objects aren't defined, you should always +use ‘VECTOR’ or the more general function ‘MAKE-ARRAY’ to create +vectors you plan to modify. + +‘MAKE-ARRAY’ is more general than ‘VECTOR’ since you can use it +to create arrays of any dimensionality as well as both fixed-size and +resizable vectors. The one required argument to ‘MAKE-ARRAY’ is a +list containing the dimensions of the array. Since a vector is a +one-dimensional array, this list will contain one number, the size of +the vector. As a convenience, ‘MAKE-ARRAY’ will also accept a plain +number in the place of a one-item list. With no other arguments, +‘MAKE-ARRAY’ will create a vector with uninitialized elements that +must be set before they can be accessed. (2) To create a vector with +the elements all set to a particular value, you can pass an +‘:initial-element’ argument. Thus, to make a five-element vector +with its elements initialized to ‘NIL’, you can write the following: + + (make-array 5 :initial-element nil) ==> #(NIL NIL NIL NIL NIL) + +‘MAKE-ARRAY’ is also the function to use to make a resizable vector. +A resizable vector is a slightly more complicated object than a +fixed-size vector; in addition to keeping track of the memory used to +hold the elements and the number of slots available, a resizable vector +also keeps track of the number of elements actually stored in the +vector. This number is stored in the vector's ‘fill pointer’, so +called because it's the index of the next position to be filled when you +add an element to the vector. + +To make a vector with a fill pointer, you pass ‘MAKE-ARRAY’ a +‘:fill-pointer’ argument. For instance, the following call to +‘MAKE-ARRAY’ makes a vector with room for five elements; but it +looks empty because the fill pointer is zero: + + (make-array 5 :fill-pointer 0) ==> #() + +To add an element to the end of a resizable vector, you can use the +function ‘VECTOR-PUSH’. It adds the element at the current value of +the fill pointer and then increments the fill pointer by one, returning +the index where the new element was added. The function +‘VECTOR-POP’ returns the most recently pushed item, decrementing the +fill pointer in the process. + + (defparameter *x* (make-array 5 :fill-pointer 0)) + + (vector-push 'a *x*) ==> 0 + *x* ==> #(A) + (vector-push 'b *x*) ==> 1 + *x* ==> #(A B) + (vector-push 'c *x*) ==> 2 + *x* ==> #(A B C) + (vector-pop *x*) ==> C + *x* ==> #(A B) + (vector-pop *x*) ==> B + *x* ==> #(A) + (vector-pop *x*) ==> A + *x* ==> #() + +However, even a vector with a fill pointer isn't completely resizable. +The vector ‘*x*’ can hold at most five elements. To make an +arbitrarily resizable vector, you need to pass ‘MAKE-ARRAY’ another +keyword argument: ‘:adjustable’. + + (make-array 5 :fill-pointer 0 :adjustable t) ==> #() + +This call makes an ‘adjustable’ vector whose underlying memory can +be resized as needed. To add elements to an adjustable vector, you use +‘VECTOR-PUSH-EXTEND’, which works just like ‘VECTOR-PUSH’ except +it will automatically expand the array if you try to push an element +onto a full vector-one whose fill pointer is equal to the size of the +underlying storage. (3) + + ---------- Footnotes ---------- + + (1) Vectors are called ‘vectors’, not ‘arrays’ as their +analogs in other languages are, because Common Lisp supports true +multidimensional arrays. It's equally correct, though more cumbersome, +to refer to them as ‘one-dimensional arrays’. + + (2) Array elements "must" be set before they're accessed in the sense +that the behavior is undefined; Lisp won't necessarily stop you. + + (3) While frequently used together, the ‘:fill-pointer’ and +‘:adjustable’ arguments are independent-you can make an adjustable +array without a fill pointer. However, you can use ‘VECTOR-PUSH’ +and ‘VECTOR-POP’ only with vectors that have a fill pointer and +‘VECTOR-PUSH-EXTEND’ only with vectors that have a fill pointer and +are adjustable. You can also use the function ‘ADJUST-ARRAY’ to +modify adjustable arrays in a variety of ways beyond just extending the +length of a vector. + + +File: pcl.info, Node: 11-2, Next: 11-3, Prev: 11-1, Up: Chapter 11 + +Subtypes of Vector +================== + +All the vectors you've dealt with so far have been ‘general’ vectors +that can hold any type of object. It's also possible to create +‘specialized’ vectors that are restricted to holding certain types +of elements. One reason to use specialized vectors is they may be +stored more compactly and can provide slightly faster access to their +elements than general vectors. However, for the moment let's focus on a +couple kinds of specialized vectors that are important data types in +their own right. + +One of these you've seen already-strings are vectors specialized to hold +characters. Strings are important enough to get their own read/print +syntax (double quotes) and the set of string-specific functions I +discussed in the previous chapter. But because they're also vectors, +all the functions I'll discuss in the next few sections that take vector +arguments can also be used with strings. These functions will fill out +the string library with functions for things such as searching a string +for a substring, finding occurrences of a character within a string, and +more. + +Literal strings, such as "foo", are like literal vectors written with +the #() syntax-their size is fixed, and they must not be modified. +However, you can use ‘MAKE-ARRAY’ to make resizable strings by +adding another keyword argument, ‘:element-type’. This argument +takes a ‘type’ descriptor. I won't discuss all the possible type +descriptors you can use here; for now it's enough to know you can create +a string by passing the symbol ‘CHARACTER’ as the +‘:element-type’ argument. Note that you need to quote the symbol to +prevent it from being treated as a variable name. For example, to make +an initially empty but resizable string, you can write this: + + (make-array 5 :fill-pointer 0 :adjustable t :element-type 'character) "" + +Bit vectors-vectors whose elements are all zeros or ones-also get some +special treatment. They have a special read/print syntax that looks +like #*00001111 and a fairly large library of functions, which I won't +discuss, for performing bit-twiddling operations such as "anding" +together two bit arrays. The type descriptor to pass as the +‘:element-type’ to create a bit vector is the symbol ‘BIT’. + + +File: pcl.info, Node: 11-3, Next: 11-4, Prev: 11-2, Up: Chapter 11 + +Vectors As Sequences +==================== + +As mentioned earlier, vectors and lists are the two concrete subtypes of +the abstract type ‘sequence’. All the functions I'll discuss in the +next few sections are sequence functions; in addition to being +applicable to vectors-both general and specialized-they can also be used +with lists. + +The two most basic sequence functions are ‘LENGTH’, which returns +the length of a sequence, and ‘ELT’, which allows you to access +individual elements via an integer index. ‘LENGTH’ takes a sequence +as its only argument and returns the number of elements it contains. +For vectors with a fill pointer, this will be the value of the fill +pointer. ‘ELT’, short for ‘element’, takes a sequence and an +integer index between zero (inclusive) and the length of the sequence +(exclusive) and returns the corresponding element. ‘ELT’ will +signal an error if the index is out of bounds. Like ‘LENGTH’, +‘ELT’ treats a vector with a fill pointer as having the length +specified by the fill pointer. + + (defparameter *x* (vector 1 2 3)) + + (length *x*) ==> 3 + (elt *x* 0) ==> 1 + (elt *x* 1) ==> 2 + (elt *x* 2) ==> 3 + (elt *x* 3) ==> error + +‘ELT’ is also a ‘SETF’able place, so you can set the value of a +particular element like this: + + (setf (elt *x* 0) 10) + + *x* ==> #(10 2 3) + + +File: pcl.info, Node: 11-4, Next: 11-5, Prev: 11-3, Up: Chapter 11 + +Sequence Iterating Functions +============================ + +While in theory all operations on sequences boil down to some +combination of ‘LENGTH’, ‘ELT’, and ‘SETF’ of ‘ELT’ +operations, Common Lisp provides a large library of sequence functions. + +One group of sequence functions allows you to express certain operations +on sequences such as finding or filtering specific elements without +writing explicit loops. Table 11-1 summarizes them. + +‘Table 11-1. Basic Sequence Functions’ + + + +Here are some simple examples of how to use these functions: + + (count 1 #(1 2 1 2 3 1 2 3 4)) ==> 3 + (remove 1 #(1 2 1 2 3 1 2 3 4)) ==> #(2 2 3 2 3 4) + (remove 1 '(1 2 1 2 3 1 2 3 4)) ==> (2 2 3 2 3 4) + (remove #\a "foobarbaz") ==> "foobrbz" + (substitute 10 1 #(1 2 1 2 3 1 2 3 4)) ==> #(10 2 10 2 3 10 2 3 4) + (substitute 10 1 '(1 2 1 2 3 1 2 3 4)) ==> (10 2 10 2 3 10 2 3 4) + (substitute #\x #\b "foobarbaz") ==> "fooxarxaz" + (find 1 #(1 2 1 2 3 1 2 3 4)) ==> 1 + (find 10 #(1 2 1 2 3 1 2 3 4)) ==> NIL + (position 1 #(1 2 1 2 3 1 2 3 4)) ==> 0 + +Note how ‘REMOVE’ and ‘SUBSTITUTE’ always return a sequence of +the same type as their sequence argument. + +You can modify the behavior of these five functions in a variety of ways +using keyword arguments. For instance, these functions, by default, +look for elements in the sequence that are the same object as the item +argument. You can change this in two ways: First, you can use the +‘:test’ keyword to pass a function that accepts two arguments and +returns a boolean. If provided, it will be used to compare ‘item’ +to each element instead of the default object equality test, ‘EQL’. +(1) Second, with the ‘:key’ keyword you can pass a one-argument +function to be called on each element of the sequence to extract a +‘key’ value, which will then be compared to the item in the place of +the element itself. Note, however, that functions such as ‘FIND’ +that return elements of the sequence continue to return the actual +element, not just the extracted key. + + (count "foo" #("foo" "bar" "baz") :test #'string=) ==> 1 + (find 'c #((a 10) (b 20) (c 30) (d 40)) :key #'first) ==> (C 30) + +To limit the effects of these functions to a particular subsequence of +the sequence argument, you can provide bounding indices with +‘:start’ and ‘:end’ arguments. Passing ‘NIL’ for ‘:end’ +or omitting it is the same as specifying the length of the sequence. +(2) + +If a non-‘NIL’ ‘:from-end’ argument is provided, then the +elements of the sequence will be examined in reverse order. By itself +‘:from-end’ can affect the results of only ‘FIND’ and +‘POSITION’. For instance: + + (find 'a #((a 10) (b 20) (a 30) (b 40)) :key #'first) ==> (A 10) + (find 'a #((a 10) (b 20) (a 30) (b 40)) :key #'first :from-end t) ==> (A 30) + +However, the ‘:from-end’ argument can affect ‘REMOVE’ and +‘SUBSTITUTE’ in conjunction with another keyword parameter, +‘:count’, that's used to specify how many elements to remove or +substitute. If you specify a ‘:count’ lower than the number of +matching elements, then it obviously matters which end you start from: + + (remove #\a "foobarbaz" :count 1) ==> "foobrbaz" + (remove #\a "foobarbaz" :count 1 :from-end t) ==> "foobarbz" + +And while ‘:from-end’ can't change the results of the ‘COUNT’ +function, it does affect the order the elements are passed to any +‘:test’ and ‘:key’ functions, which could possibly have side +effects. For example: + + CL-USER> (defparameter *v* #((a 10) (b 20) (a 30) (b 40))) + *V* + CL-USER> (defun verbose-first (x) (format t "Looking at ~s~%" x) (first x)) + VERBOSE-FIRST + CL-USER> (count 'a *v* :key #'verbose-first) + Looking at (A 10) + Looking at (B 20) + Looking at (A 30) + Looking at (B 40) + 2 + CL-USER> (count 'a *v* :key #'verbose-first :from-end t) + Looking at (B 40) + Looking at (A 30) + Looking at (B 20) + Looking at (A 10) + 2 + +Table 11-2 summarizes these arguments. + +‘Table 11-2. Standard Sequence Function Keyword Arguments’ + + + + ---------- Footnotes ---------- + + (1) Another parameter, ‘:test-not’ parameter, specifies a +two-argument predicate to be used like a ‘:test’ argument except +with the boolean result logically reversed. This parameter is +deprecated, however, in preference for using the ‘COMPLEMENT’ +function. ‘COMPLEMENT’ takes a function argu-ment and returns a +function that takes the same number of arguments as the original and +returns the logical complement of the original function. Thus, you can, +and should, write this: + + (count x sequence :test (complement #'some-test)) + + rather than the following: + + (count x sequence :test-not #'some-test) + + (2) Note, however, that the effect of ‘:start’ and ‘:end’ on +‘REMOVE’ and ‘SUBSTITUTE’ is only to limit the elements they +consider for removal or substitution; elements before ‘:start’ and +after ‘:end’ will be passed through untouched. + + +File: pcl.info, Node: 11-5, Next: 11-6, Prev: 11-4, Up: Chapter 11 + +Higher-Order Function Variants +============================== + +For each of the functions just discussed, Common Lisp provides two +‘higher-order function’ variants that, in the place of the item +argument, take a function to be called on each element of the sequence. +One set of variants are named the same as the basic function with an -IF +appended. These functions count, find, remove, and substitute elements +of the sequence for which the function argument returns true. The other +set of variants are named with an -IF-NOT suffix and count, find, +remove, and substitute elements for which the function argument does +‘not’ return true. + + (count-if #'evenp #(1 2 3 4 5)) ==> 2 + + (count-if-not #'evenp #(1 2 3 4 5)) ==> 3 + + (position-if #'digit-char-p "abcd0001") ==> 4 + + (remove-if-not #'(lambda (x) (char= (elt x 0) #\f)) + #("foo" "bar" "baz" "foom")) ==> #("foo" "foom") + +According to the language standard, the ‘-IF-NOT’ variants are +deprecated. However, that deprecation is generally considered to have +itself been ill-advised. If the standard is ever revised, it's more +likely the deprecation will be removed than the ‘-IF-NOT’ functions. +For one thing, the ‘REMOVE-IF-NOT’ variant is probably used more +often than ‘REMOVE-IF’. Despite its negative-sounding name, +‘REMOVE-IF-NOT’ is actually the positive variant-it returns the +elements that ‘do’ satisfy the predicate. (1) + +The -IF and -IF-NOT variants accept all the same keyword arguments as +their vanilla counterparts except for ‘:test’, which isn't needed +since the main argument is already a function. (2) With a ‘:key’ +argument, the value extracted by the ‘:key’ function is passed to +the function instead of the actual element. + + (count-if #'evenp #((1 a) (2 b) (3 c) (4 d) (5 e)) :key #'first) ==> 2 + + (count-if-not #'evenp #((1 a) (2 b) (3 c) (4 d) (5 e)) :key #'first) ==> 3 + + (remove-if-not #'alpha-char-p + #("foo" "bar" "1baz") :key #'(lambda (x) (elt x 0))) ==> #("foo" "bar") + +The ‘REMOVE’ family of functions also support a fourth variant, +‘REMOVE-DUPLICATES’, that has only one required argument, a +sequence, from which it removes all but one instance of each duplicated +element. It takes the same keyword arguments as ‘REMOVE’, except +for ‘:count’, since it always removes all duplicates. + + (remove-duplicates #(1 2 1 2 3 1 2 3 4)) ==> #(1 2 3 4) + + ---------- Footnotes ---------- + + (1) This same functionality goes by the name ‘grep’ in Perl and +‘filter’ in Python. + + (2) The difference between the predicates passed as ‘:test’ +arguments and as the function arguments to the -IF and -IF-NOT functions +is that the ‘:test’ predicates are two-argument predicates used to +compare the elements of the sequence to the specific item while the -IF +and -IF-NOT predicates are one-argument functions that simply test the +individual elements of the sequence. If the vanilla variants didn't +exist, you could implement them in terms of the -IF versions by +embedding a specific item in the test function. + + (count char string) === + (count-if #'(lambda (c) (eql char c)) string) + + (count char string :test #'CHAR-EQUAL) === + (count-if #'(lambda (c) (char-equal char c)) string) + + +File: pcl.info, Node: 11-6, Next: 11-7, Prev: 11-5, Up: Chapter 11 + +Whole Sequence Manipulations +============================ + +A handful of functions perform operations on a whole sequence (or +sequences) at a time. These tend to be simpler than the other functions +I've described so far. For instance, ‘COPY-SEQ’ and ‘REVERSE’ +each take a single argument, a sequence, and each returns a new sequence +of the same type. The sequence returned by ‘COPY-SEQ’ contains the +same elements as its argument while the sequence returned by +‘REVERSE’ contains the same elements but in reverse order. Note +that neither function copies the elements themselves-only the returned +sequence is a new object. + +The ‘CONCATENATE’ function creates a new sequence containing the +concatenation of any number of sequences. However, unlike ‘REVERSE’ +and ‘COPY-SEQ’, which simply return a sequence of the same type as +their single argument, ‘CONCATENATE’ must be told explicitly what +kind of sequence to produce in case the arguments are of different +types. Its first argument is a type descriptor, like the +‘:element-type’ argument to ‘MAKE-ARRAY’. In this case, the +type descriptors you'll most likely use are the symbols ‘VECTOR’, +‘LIST’, or ‘STRING’. (1) For example: + + (concatenate 'vector #(1 2 3) '(4 5 6)) ==> #(1 2 3 4 5 6) + (concatenate 'list #(1 2 3) '(4 5 6)) ==> (1 2 3 4 5 6) + (concatenate 'string "abc" '(#\d #\e #\f)) ==> "abcdef" + + ---------- Footnotes ---------- + + (1) If you tell ‘CONCATENATE’ to return a specialized vector, +such as a string, all the elements of the argument sequences must be +instances of the vector's element type. + + +File: pcl.info, Node: 11-7, Next: 11-8, Prev: 11-6, Up: Chapter 11 + +Sorting and Merging +=================== + +The functions ‘SORT’ and ‘STABLE-SORT’ provide two ways of +sorting a sequence. They both take a sequence and a two-argument +predicate and return a sorted version of the sequence. + + (sort (vector "foo" "bar" "baz") #'string<) ==> #("bar" "baz" "foo") + +The difference is that ‘STABLE-SORT’ is guaranteed to not reorder +any elements considered equivalent by the predicate while ‘SORT’ +guarantees only that the result is sorted and may reorder equivalent +elements. + +Both these functions are examples of what are called ‘destructive’ +functions. Destructive functions are allowed-typically for reasons of +efficiency-to modify their arguments in more or less arbitrary ways. +This has two implications: one, you should always do something with the +return value of these functions (such as assign it to a variable or pass +it to another function), and, two, unless you're done with the object +you're passing to the destructive function, you should pass a copy +instead. I'll say more about destructive functions in the next chapter. + +Typically you won't care about the unsorted version of a sequence after +you've sorted it, so it makes sense to allow ‘SORT’ and +‘STABLE-SORT’ to destroy the sequence in the course of sorting it. +But it does mean you need to remember to write the following: (1) + + (setf my-sequence (sort my-sequence #'string<)) + +rather than just this: + + (sort my-sequence #'string<) + +Both these functions also take a keyword argument, ‘:key’, which, +like the ‘:key’ argument in other sequence functions, should be a +function and will be used to extract the values to be passed to the +sorting predicate in the place of the actual elements. The extracted +keys are used only to determine the ordering of elements; the sequence +returned will contain the actual elements of the argument sequence. + +The ‘MERGE’ function takes two sequences and a predicate and returns +a sequence produced by merging the two sequences, according to the +predicate. It's related to the two sorting functions in that if each +sequence is already sorted by the same predicate, then the sequence +returned by ‘MERGE’ will also be sorted. Like the sorting +functions, ‘MERGE’ takes a ‘:key’ argument. Like +‘CONCATENATE’, and for the same reason, the first argument to +‘MERGE’ must be a type descriptor specifying the type of sequence to +produce. + + (merge 'vector #(1 3 5) #(2 4 6) #'<) ==> #(1 2 3 4 5 6) + (merge 'list #(1 3 5) #(2 4 6) #'<) ==> (1 2 3 4 5 6) + + ---------- Footnotes ---------- + + (1) When the sequence passed to the sorting functions is a vector, +the "destruction" is actually guaranteed to entail permuting the +elements in place, so you could get away without saving the returned +value. However, it's good style to always do something with the return +value since the sorting functions can modify lists in much more +arbitrary ways. + + +File: pcl.info, Node: 11-8, Next: 11-9, Prev: 11-7, Up: Chapter 11 + +Subsequence Manipulations +========================= + +Another set of functions allows you to manipulate subsequences of +existing sequences. The most basic of these is ‘SUBSEQ’, which +extracts a subsequence starting at a particular index and continuing to +a particular ending index or the end of the sequence. For instance: + + (subseq "foobarbaz" 3) ==> "barbaz" + (subseq "foobarbaz" 3 6) ==> "bar" + +‘SUBSEQ’ is also ‘SETF’able, but it won't extend or shrink a +sequence; if the new value and the subsequence to be replaced are +different lengths, the shorter of the two determines how many characters +are actually changed. + + (defparameter *x* (copy-seq "foobarbaz")) + + (setf (subseq *x* 3 6) "xxx") ; subsequence and new value are same length + *x* ==> "fooxxxbaz" + + (setf (subseq *x* 3 6) "abcd") ; new value too long, extra character ignored. + *x* ==> "fooabcbaz" + + (setf (subseq *x* 3 6) "xx") ; new value too short, only two characters changed + *x* ==> "fooxxcbaz" + +You can use the ‘FILL’ function to set multiple elements of a +sequence to a single value. The required arguments are a sequence and +the value with which to fill it. By default every element of the +sequence is set to the value; ‘:start’ and ‘:end’ keyword +arguments can limit the effects to a given subsequence. + +If you need to find a subsequence within a sequence, the ‘SEARCH’ +function works like ‘POSITION’ except the first argument is a +sequence rather than a single item. + + (position #\b "foobarbaz") ==> 3 + (search "bar" "foobarbaz") ==> 3 + +On the other hand, to find where two sequences with a common prefix +first diverge, you can use the ‘MISMATCH’ function. It takes two +sequences and returns the index of the first pair of mismatched +elements. + + (mismatch "foobarbaz" "foom") ==> 3 + +It returns ‘NIL’ if the strings match. ‘MISMATCH’ also takes +many of the standard keyword arguments: a ‘:key’ argument for +specifying a function to use to extract the values to be compared; a +‘:test’ argument to specify the comparison function; and +‘:start1’, ‘:end1’, ‘:start2’, and ‘:end2’ arguments to +specify subsequences within the two sequences. And a ‘:from-end’ +argument of ‘T’ specifies the sequences should be searched in +reverse order, causing ‘MISMATCH’ to return the index, in the first +sequence, where whatever common suffix the two sequences share begins. + + (mismatch "foobar" "bar" :from-end t) ==> 3 + + +File: pcl.info, Node: 11-9, Next: 11-10, Prev: 11-8, Up: Chapter 11 + +Sequence Predicates +=================== + +Four other handy functions are ‘EVERY’, ‘SOME’, ‘NOTANY’, +and ‘NOTEVERY’, which iterate over sequences testing a boolean +predicate. The first argument to all these functions is the predicate, +and the remaining arguments are sequences. The predicate should take as +many arguments as the number of sequences passed. The elements of the +sequences are passed to the predicate-one element from each +sequence-until one of the sequences runs out of elements or the overall +termination test is met: ‘EVERY’ terminates, returning false, as +soon as the predicate fails. If the predicate is always satisfied, it +returns true. ‘SOME’ returns the first non-‘NIL’ value returned +by the predicate or returns false if the predicate is never satisfied. +‘NOTANY’ returns false as soon as the predicate is satisfied or true +if it never is. And ‘NOTEVERY’ returns true as soon as the +predicate fails or false if the predicate is always satisfied. Here are +some examples of testing just one sequence: + + (every #'evenp #(1 2 3 4 5)) ==> NIL + (some #'evenp #(1 2 3 4 5)) ==> T + (notany #'evenp #(1 2 3 4 5)) ==> NIL + (notevery #'evenp #(1 2 3 4 5)) ==> T + +These calls compare elements of two sequences pairwise: + + (every #'> #(1 2 3 4) #(5 4 3 2)) ==> NIL + (some #'> #(1 2 3 4) #(5 4 3 2)) ==> T + (notany #'> #(1 2 3 4) #(5 4 3 2)) ==> NIL + (notevery #'> #(1 2 3 4) #(5 4 3 2)) ==> T + + +File: pcl.info, Node: 11-10, Next: 11-11, Prev: 11-9, Up: Chapter 11 + +Sequence Mapping Functions +========================== + +Finally, the last of the sequence functions are the generic mapping +functions. ‘MAP’, like the sequence predicate functions, takes a +‘n’-argument function and ‘n’ sequences. But instead of a +boolean value, ‘MAP’ returns a new sequence containing the result of +applying the function to subsequent elements of the sequences. Like +‘CONCATENATE’ and ‘MERGE’, ‘MAP’ needs to be told what kind +of sequence to create. + + (map 'vector #'* #(1 2 3 4 5) #(10 9 8 7 6)) ==> #(10 18 24 28 30) + +‘MAP-INTO’ is like ‘MAP’ except instead of producing a new +sequence of a given type, it places the results into a sequence passed +as the first argument. This sequence can be the same as one of the +sequences providing values for the function. For instance, to sum +several vectors-‘a’, ‘b’, and ‘c’-into one, you could write +this: + + (map-into a #'+ a b c) + +If the sequences are different lengths, ‘MAP-INTO’ affects only as +many elements as are present in the shortest sequence, including the +sequence being mapped into. However, if the sequence being mapped into +is a vector with a fill pointer, the number of elements affected isn't +limited by the fill pointer but rather by the actual size of the vector. +After a call to ‘MAP-INTO’, the fill pointer will be set to the +number of elements mapped. ‘MAP-INTO’ won't, however, extend an +adjustable vector. + +The last sequence function is ‘REDUCE’, which does another kind of +mapping: it maps over a single sequence, applying a two-argument +function first to the first two elements of the sequence and then to the +value returned by the function and subsequent elements of the sequence. +Thus, the following expression sums the numbers from one to ten: + + (reduce #'+ #(1 2 3 4 5 6 7 8 9 10)) ==> 55 + +‘REDUCE’ is a surprisingly useful function-whenever you need to +distill a sequence down to a single value, chances are you can write it +with ‘REDUCE’, and it will often be quite a concise way to express +what you want. For instance, to find the maximum value in a sequence of +numbers, you can write ‘(reduce #'max numbers)’. ‘REDUCE’ also +takes a full complement of keyword arguments (‘:key’, +‘:from-end’, ‘:start’, and ‘:end’) and one unique to +‘REDUCE’ (‘:initial-value’). The latter specifies a value +that's logically placed before the first element of the sequence (or +after the last if you also specify a true ‘:from-end’ argument). + + +File: pcl.info, Node: 11-11, Next: 11-12, Prev: 11-10, Up: Chapter 11 + +Hash Tables +=========== + +The other general-purpose collection provided by Common Lisp is the hash +table. Where vectors provide an integer-indexed data structure, hash +tables allow you to use arbitrary objects as the indexes, or keys. When +you add a value to a hash table, you store it under a particular key. +Later you can use the same key to retrieve the value. Or you can +associate a new value with the same key-each key maps to a single value. + +With no arguments ‘MAKE-HASH-TABLE’ makes a hash table that +considers two keys equivalent if they're the same object according to +‘EQL’. This is a good default unless you want to use strings as +keys, since two strings with the same contents aren't necessarily +‘EQL’. In that case you'll want a so-called ‘EQUAL’ hash table, +which you can get by passing the symbol ‘EQUAL’ as the ‘:test’ +keyword argument to ‘MAKE-HASH-TABLE’. Two other possible values +for the ‘:test’ argument are the symbols ‘EQ’ and ‘EQUALP’. +These are, of course, the names of the standard object comparison +functions, which I discussed in Chapter 4. However, unlike the +‘:test’ argument passed to sequence functions, +‘MAKE-HASH-TABLE’'s ‘:test’ can't be used to specify an +arbitrary function-only the values ‘EQ’, ‘EQL’, ‘EQUAL’, and +‘EQUALP’. This is because hash tables actually need two functions, +an equivalence function and a ‘hash’ function that computes a +numerical hash code from the key in a way compatible with how the +equivalence function will ultimately compare two keys. However, +although the language standard provides only for hash tables that use +the standard equivalence functions, most implementations provide some +mechanism for defining custom hash tables. + +The ‘GETHASH’ function provides access to the elements of a hash +table. It takes two arguments-a key and the hash table-and returns the +value, if any, stored in the hash table under that key or ‘NIL’. +(1) For example: + + (defparameter *h* (make-hash-table)) + + (gethash 'foo *h*) ==> NIL + + (setf (gethash 'foo *h*) 'quux) + + (gethash 'foo *h*) ==> QUUX + +Since ‘GETHASH’ returns ‘NIL’ if the key isn't present in the +table, there's no way to tell from the return value the difference +between a key not being in a hash table at all and being in the table +with the value ‘NIL’. ‘GETHASH’ solves this problem with a +feature I haven't discussed yet-multiple return values. ‘GETHASH’ +actually returns two values; the primary value is the value stored under +the given key or ‘NIL’. The secondary value is a boolean indicating +whether the key is present in the hash table. Because of the way +multiple values work, the extra return value is silently discarded +unless the caller explicitly handles it with a form that can "see" +multiple values. + +I'll discuss multiple return values in greater detail in Chapter 20, but +for now I'll give you a sneak preview of how to use the +‘MULTIPLE-VALUE-BIND’ macro to take advantage of ‘GETHASH’'s +extra return value. ‘MULTIPLE-VALUE-BIND’ creates variable bindings +like ‘LET’ does, filling them with the multiple values returned by a +form. + +The following function shows how you might use +‘MULTIPLE-VALUE-BIND’; the variables it binds are ‘value’ and +‘present’: + + (defun show-value (key hash-table) + (multiple-value-bind (value present) (gethash key hash-table) + (if present + (format nil "Value ~a actually present." value) + (format nil "Value ~a because key not found." value)))) + + (setf (gethash 'bar *h*) nil) ; provide an explicit value of NIL + + (show-value 'foo *h*) ==> "Value QUUX actually present." + (show-value 'bar *h*) ==> "Value NIL actually present." + (show-value 'baz *h*) ==> "Value NIL because key not found." + +Since setting the value under a key to ‘NIL’ leaves the key in the +table, you'll need another function to completely remove a key/value +pair. ‘REMHASH’ takes the same arguments as ‘GETHASH’ and +removes the specified entry. You can also completely clear a hash table +of all its key/value pairs with ‘CLRHASH’. + + ---------- Footnotes ---------- + + (1) By an accident of history, the order of arguments to +‘GETHASH’ is the opposite of ‘ELT’-‘ELT’ takes the +collection first and then the index while ‘GETHASH’ takes the key +first and then the collection. + + +File: pcl.info, Node: 11-12, Next: Chapter 12, Prev: 11-11, Up: Chapter 11 + +Hash Table Iteration +==================== + +Common Lisp provides a couple ways to iterate over the entries in a hash +table. The simplest of these is via the function ‘MAPHASH’. +Analogous to the ‘MAP’ function, ‘MAPHASH’ takes a two-argument +function and a hash table and invokes the function once for each +key/value pair in the hash table. For instance, to print all the +key/value pairs in a hash table, you could use ‘MAPHASH’ like this: + + (maphash #'(lambda (k v) (format t "~a => ~a~%" k v)) *h*) + +The consequences of adding or removing elements from a hash table while +iterating over it aren't specified (and are likely to be bad) with two +exceptions: you can use ‘SETF’ with ‘GETHASH’ to change the +value of the current entry, and you can use ‘REMHASH’ to remove the +current entry. For instance, to remove all the entries whose value is +less than ten, you could write this: + + (maphash #'(lambda (k v) (when (< v 10) (remhash k *h*))) *h*) + +The other way to iterate over a hash table is with the extended +‘LOOP’ macro, which I'll discuss in Chapter 22. (1) The ‘LOOP’ +equivalent of the first ‘MAPHASH’ expression would look like this: + + (loop for k being the hash-keys in *h* using (hash-value v) + do (format t "~a => ~a~%" k v)) + +I could say a lot more about the nonlist collections supported by Common +Lisp. For instance, I haven't discussed multidimensional arrays at all +or the library of functions for manipulating bit arrays. However, what +I've covered in this chapter should suffice for most of your +general-purpose programming needs. Now it's finally time to look at +Lisp's eponymous data structure: lists. + + ---------- Footnotes ---------- + + (1) ‘LOOP’'s hash table iteration is typically implemented on top +of a more primitive form, ‘WITH-HASH-TABLE-ITERATOR’, that you don't +need to worry about; it was added to the language specifically to +support implementing things such as ‘LOOP’ and is of little use +unless you need to write completely new control constructs for iterating +over hash tables. + + +File: pcl.info, Node: Chapter 12, Next: Chapter 13, Prev: Chapter 11, Up: Top + +12. They Called It LISP for a Reason: List Processing +===================================================== + +Lists play an important role in Lisp-for reasons both historical and +practical. Historically, lists were Lisp's original composite data +type, though it has been decades since they were its only such data +type. These days, a Common Lisp programmer is as likely to use a +vector, a hash table, or a user-defined class or structure as to use a +list. + + Practically speaking, lists remain in the language because they're an +excellent solution to certain problems. One such problem-how to +represent code as data in order to support code-transforming and +code-generating macros-is particular to Lisp, which may explain why +other languages don't feel the lack of Lisp-style lists. More +generally, lists are an excellent data structure for representing any +kind of heterogeneous and/or hierarchical data. They're also quite +lightweight and support a functional style of programming that's another +important part of Lisp's heritage. + + Thus, you need to understand lists on their own terms; as you gain a +better understanding of how lists work, you'll be in a better position +to appreciate when you should and shouldn't use them. + +* Menu: + +* 12-1:: "There Is No List" +* 12-2:: Functional Programming and Lists +* 12-3:: "Destructive" Operations +* 12-4:: Combining Recycling with Shared Structure +* 12-5:: List-Manipulation Functions +* 12-6:: Mapping +* 12-7:: Other Structures + + +File: pcl.info, Node: 12-1, Next: 12-2, Prev: Chapter 12, Up: Chapter 12 + +"There Is No List" +================== + + Spoon Boy: Do not try and bend the list. That's impossible. + Instead . . . only try to realize the truth. Neo: What truth? + Spoon Boy: There is no list. Neo: There is no list? Spoon Boy: + Then you'll see that it is not the list that bends; it is only + yourself. (1) + The key to understanding lists is to understand that they're largely +an illusion built on top of objects that are instances of a more +primitive data type. Those simpler objects are pairs of values called +cons cells, after the function CONS used to create them. + + CONS takes two arguments and returns a new cons cell containing the +two values. (2) These values can be references to any kind of object. +Unless the second value is NIL or another cons cell, a cons is printed +as the two values in parentheses separated by a dot, a so-called dotted +pair. + + (cons 1 2) ==> (1 . 2) + + The two values in a cons cell are called the CAR and the CDR after +the names of the functions used to access them. At the dawn of time, +these names were mnemonic, at least to the folks implementing the first +Lisp on an IBM 704. But even then they were just lifted from the +assembly mnemonics used to implement the operations. However, it's not +all bad that these names are somewhat meaningless-when considering +individual cons cells, it's best to think of them simply as an arbitrary +pair of values without any particular semantics. Thus: + + (car (cons 1 2)) ==> 1 + (cdr (cons 1 2)) ==> 2 + + Both CAR and CDR are also SETFable places-given an existing cons +cell, it's possible to assign a new value to either of its values. (3) + + (defparameter *cons* (cons 1 2)) + *cons* ==> (1 . 2) + (setf (car *cons*) 10) ==> 10 + *cons* ==> (10 . 2) + (setf (cdr *cons*) 20) ==> 20 + *cons* ==> (10 . 20) + + Because the values in a cons cell can be references to any kind of +object, you can build larger structures out of cons cells by linking +them together. Lists are built by linking together cons cells in a +chain. The elements of the list are held in the CARs of the cons cells +while the links to subsequent cons cells are held in the CDRs. The last +cell in the chain has a CDR of NIL, which-as I mentioned in Chapter +4-represents the empty list as well as the boolean value false. + + This arrangement is by no means unique to Lisp; it's called a singly +linked list. However, few languages outside the Lisp family provide +such extensive support for this humble data type. + + So when I say a particular value is a list, what I really mean is +it's either NIL or a reference to a cons cell. The CAR of the cons cell +is the first item of the list, and the CDR is a reference to another +list, that is, another cons cell or NIL, containing the remaining +elements. The Lisp printer understands this convention and prints such +chains of cons cells as parenthesized lists rather than as dotted pairs. + + (cons 1 nil) ==> (1) + (cons 1 (cons 2 nil)) ==> (1 2) + (cons 1 (cons 2 (cons 3 nil))) ==> (1 2 3) + + When talking about structures built out of cons cells, a few diagrams +can be a big help. Box-and-arrow diagrams represent cons cells as a +pair of boxes like this: + + + + The box on the left represents the CAR, and the box on the right is +the CDR. The values stored in a particular cons cell are either drawn in +the appropriate box or represented by an arrow from the box to a +representation of the referenced value. (4) For instance, the list (1 2 +3), which consists of three cons cells linked together by their CDRs, +would be diagrammed like this: + + + + However, most of the time you work with lists you won't have to deal +with individual cons cells-the functions that create and manipulate +lists take care of that for you. For example, the LIST function builds +a cons cells under the covers for you and links them together; the +following LIST expressions are equivalent to the previous CONS +expressions: + + (list 1) ==> (1) + (list 1 2) ==> (1 2) + (list 1 2 3) ==> (1 2 3) + + Similarly, when you're thinking in terms of lists, you don't have to +use the meaningless names CAR and CDR; FIRST and REST are synonyms for +CAR and CDR that you should use when you're dealing with cons cells as +lists. + + (defparameter *list* (list 1 2 3 4)) + (first *list*) ==> 1 + (rest *list*) ==> (2 3 4) + (first (rest *list*)) ==> 2 + + Because cons cells can hold any kind of values, so can lists. And a +single list can hold objects of different types. + + (list "foo" (list 1 2) 10) ==> ("foo" (1 2) 10) + + The structure of that list would look like this: + + + + Because lists can have other lists as elements, you can also use them +to represent trees of arbitrary depth and complexity. As such, they +make excellent representations for any heterogeneous, hierarchical data. +Lisp-based XML processors, for instance, usually represent XML documents +internally as lists. Another obvious example of tree-structured data is +Lisp code itself. In Chapters 30 and 31 you'll write an HTML generation +library that uses lists of lists to represent the HTML to be generated. +I'll talk more next chapter about using cons cells to represent other +data structures. + + Common Lisp provides quite a large library of functions for +manipulating lists. In the sections "List-Manipulation Functions" and +"Mapping," you'll look at some of the more important of these functions. +However, they will be easier to understand in the context of a few ideas +borrowed from functional programming. + + ---------- Footnotes ---------- + + (1) Adapted from The Matrix (http://us.imdb.com/Quotes?0133093) + + (2) CONS was originally short for the verb construct. + + (3) When the place given to SETF is a CAR or CDR, it expands into a +call to the function RPLACA or RPLACD; some old-school Lispers-the same +ones who still use SETQ-will still use RPLACA and RPLACD directly, but +modern style is to use SETF of CAR or CDR. + + (4) Typically, simple objects such as numbers are drawn within the +appropriate box, and more complex objects will be drawn outside the box +with an arrow from the box indicating the reference. This actually +corresponds well with how many Common Lisp implementations work-although +all objects are conceptually stored by reference, certain simple +immutable objects can be stored directly in a cons cell. + + +File: pcl.info, Node: 12-2, Next: 12-3, Prev: 12-1, Up: Chapter 12 + +Functional Programming and Lists +================================ + +The essence of functional programming is that programs are built +entirely of functions with no side effects that compute their results +based solely on the values of their arguments. The advantage of the +functional style is that it makes programs easier to understand. +Eliminating side effects eliminates almost all possibilities for action +at a distance. And since the result of a function is determined only by +the values of its arguments, its behavior is easier to understand and +test. For instance, when you see an expression such as (+ 3 4), you +know the result is uniquely determined by the definition of the + +function and the values 3 and 4. You don't have to worry about what may +have happened earlier in the execution of the program since there's +nothing that can change the result of evaluating that expression. + + Functions that deal with numbers are naturally functional since +numbers are immutable. A list, on the other hand, can be mutated, as +you've just seen, by SETFing the CARs and CDRs of the cons cells that +make up its backbone. However, lists can be treated as a functional +data type if you consider their value to be determined by the elements +they contain. Thus, any list of the form (1 2 3 4) is functionally +equivalent to any other list containing those four values, regardless of +what cons cells are actually used to represent the list. And any +function that takes a list as an argument and returns a value based +solely on the contents of the list can likewise be considered +functional. For instance, the REVERSE sequence function, given the list +(1 2 3 4), always returns a list (4 3 2 1). Different calls to REVERSE +with functionally equivalent lists as the argument will return +functionally equivalent result lists. Another aspect of functional +programming, which I'll discuss in the section "Mapping," is the use of +higher-order functions: functions that treat other functions as data, +taking them as arguments or returning them as results. + + Most of Common Lisp's list-manipulation functions are written in a +functional style. I'll discuss later how to mix functional and other +coding styles, but first you should understand a few subtleties of the +functional style as applied to lists. + + The reason most list functions are written functionally is it allows +them to return results that share cons cells with their arguments. To +take a concrete example, the function APPEND takes any number of list +arguments and returns a new list containing the elements of all its +arguments. For instance: + + (append (list 1 2) (list 3 4)) ==> (1 2 3 4) + + From a functional point of view, APPEND's job is to return the list +(1 2 3 4) without modifying any of the cons cells in the lists (1 2) and +(3 4). One obvious way to achieve that goal is to create a completely +new list consisting of four new cons cells. However, that's more work +than is necessary. Instead, APPEND actually makes only two new cons +cells to hold the values 1 and 2, linking them together and pointing the +CDR of the second cons cell at the head of the last argument, the list +(3 4). It then returns the cons cell containing the 1. None of the +original cons cells has been modified, and the result is indeed the list +(1 2 3 4). The only wrinkle is that the list returned by APPEND shares +some cons cells with the list (3 4). The resulting structure looks like +this: + + + + In general, APPEND must copy all but its last argument, but it can +always return a result that shares structure with the last argument. + + Other functions take similar advantage of lists' ability to share +structure. Some, like APPEND, are specified to always return results +that share structure in a particular way. Others are simply allowed to +return shared structure at the discretion of the implementation. + + +File: pcl.info, Node: 12-3, Next: 12-4, Prev: 12-2, Up: Chapter 12 + +"Destructive" Operations +======================== + +If Common Lisp were a purely functional language, that would be the end +of the story. However, because it's possible to modify a cons cell +after it has been created by SETFing its CAR or CDR, you need to think a +bit about how side effects and structure sharing mix. + + Because of Lisp's functional heritage, operations that modify +existing objects are called destructive-in functional programming, +changing an object's state "destroys" it since it no longer represents +the same value. However, using the same term to describe all +state-modifying operations leads to a certain amount of confusion since +there are two very different kinds of destructive operations, +for-side-effect operations and recycling operations. (1) + + For-side-effect operations are those used specifically for their side +effects. All uses of SETF are destructive in this sense, as are +functions that use SETF under the covers to change the state of an +existing object such as VECTOR-PUSH or VECTOR-POP. But it's a bit unfair +to describe these operations as destructive-they're not intended to be +used in code written in a functional style, so they shouldn't be +described using functional terminology. However, if you mix +nonfunctional, for-side-effect operations with functions that return +structure-sharing results, then you need to be careful not to +inadvertently modify the shared structure. For instance, consider these +three definitions: + + (defparameter *list-1* (list 1 2)) + (defparameter *list-2* (list 3 4)) + (defparameter *list-3* (append *list-1* *list-2*)) + + After evaluating these forms, you have three lists, but *list-3* and +*list-2* share structure just like the lists in the previous diagram. + + *list-1* ==> (1 2) + *list-2* ==> (3 4) + *list-3* ==> (1 2 3 4) + + Now consider what happens when you modify *list-2*. + + (setf (first *list-2*) 0) ==> 0 + *list-2* ==> (0 4) ; as expected + *list-3* ==> (1 2 0 4) ; maybe not what you wanted + + The change to *list-2* also changes *list-3* because of the shared +structure: the first cons cell in *list-2* is also the third cons cell +in *list-3*. SETFing the FIRST of *list-2* changes the value in the CAR +of that cons cell, affecting both lists. + + On the other hand, the other kind of destructive operations, +recycling operations, are intended to be used in functional code. They +use side effects only as an optimization. In particular, they reuse +certain cons cells from their arguments when building their result. +However, unlike functions such as APPEND that reuse cons cells by +including them, unmodified, in the list they return, recycling functions +reuse cons cells as raw material, modifying the CAR and CDR as necessary +to build the desired result. Thus, recycling functions can be used +safely only when the original lists aren't going to be needed after the +call to the recycling function. + + To see how a recycling function works, let's compare REVERSE, the +nondestructive function that returns a reversed version of a sequence, +to NREVERSE, a recycling version of the same function. Because REVERSE +doesn't modify its argument, it must allocate a new cons cell for each +element in the list being reversed. But suppose you write something +like this: + + (setf *list* (reverse *list*)) + + By assigning the result of REVERSE back to *list*, you've removed the +reference to the original value of *list*. Assuming the cons cells in +the original list aren't referenced anywhere else, they're now eligible +to be garbage collected. However, in many Lisp implementations it'd be +more efficient to immediately reuse the existing cons cells rather than +allocating new ones and letting the old ones become garbage. + + NREVERSE allows you to do exactly that. The N stands for +non-consing, meaning it doesn't need to allocate any new cons cells. +The exact side effects of NREVERSE are intentionally not specified-it's +allowed to modify any CAR or CDR of any cons cell in the list-but a +typical implementation might walk down the list changing the CDR of each +cons cell to point to the previous cons cell, eventually returning the +cons cell that was previously the last cons cell in the old list and is +now the head of the reversed list. No new cons cells need to be +allocated, and no garbage is created. + + Most recycling functions, like NREVERSE, have nondestructive +counterparts that compute the same result. In general, the recycling +functions have names that are the same as their non-destructive +counterparts except with a leading N. However, not all do, including +several of the more commonly used recycling functions such as NCONC, the +recycling version of APPEND, and DELETE, DELETE-IF, DELETE-IF-NOT, and +DELETE-DUPLICATES, the recycling versions of the REMOVE family of +sequence functions. + + In general, you use recycling functions in the same way you use their +nondestructive counterparts except it's safe to use them only when you +know the arguments aren't going to be used after the function returns. +The side effects of most recycling functions aren't specified tightly +enough to be relied upon. + + However, the waters are further muddied by a handful of recycling +functions with specified side effects that can be relied upon. They are +NCONC, the recycling version of APPEND, and NSUBSTITUTE and its -IF and +-IF-NOT variants, the recycling versions of the sequence functions +SUBSTITUTE and friends. + + Like APPEND, NCONC returns a concatenation of its list arguments, but +it builds its result in the following way: for each nonempty list it's +passed, NCONC sets the CDR of the list's last cons cell to point to the +first cons cell of the next nonempty list. It then returns the first +list, which is now the head of the spliced-together result. Thus: + + (defparameter *x* (list 1 2 3)) + + (nconc *x* (list 4 5 6)) ==> (1 2 3 4 5 6) + + *x* ==> (1 2 3 4 5 6) + + NSUBSTITUTE and variants can be relied on to walk down the list +structure of the list argument and to SETF the CARs of any cons cells +holding the old value to the new value and to otherwise leave the list +intact. It then returns the original list, which now has the same value +as would've been computed by SUBSTITUTE. (2) + + The key thing to remember about NCONC and NSUBSTITUTE is that they're +the exceptions to the rule that you can't rely on the side effects of +recycling functions. It's perfectly acceptable-and arguably good +style-to ignore the reliability of their side effects and use them, like +any other recycling function, only for the value they return. + + ---------- Footnotes ---------- + + (1) The phrase for-side-effect is used in the language standard, but +recycling is my own invention; most Lisp literature simply uses the term +destructive for both kinds of operations, leading to the confusion I'm +trying to dispel. + + (2) The string functions NSTRING-CAPITALIZE, NSTRING-DOWNCASE, and +NSTRING-UPCASE are similar-they return the same results as their N-less +counterparts but are specified to modify their string argument in place. + + +File: pcl.info, Node: 12-4, Next: 12-5, Prev: 12-3, Up: Chapter 12 + +Combining Recycling with Shared Structure +========================================= + +Although you can use recycling functions whenever the arguments to the +recycling function won't be used after the function call, it's worth +noting that each recycling function is a loaded gun pointed footward: if +you accidentally use a recycling function on an argument that is used +later, you're liable to lose some toes. + + To make matters worse, shared structure and recycling functions tend +to work at cross-purposes. Nondestructive list functions return lists +that share structure under the assumption that cons cells are never +modified, but recycling functions work by violating that assumption. +Or, put another way, sharing structure is based on the premise that you +don't care exactly what cons cells make up a list while using recycling +functions requires that you know exactly what cons cells are referenced +from where. + + In practice, recycling functions tend to be used in a few idiomatic +ways. By far the most common recycling idiom is to build up a list to +be returned from a function by "consing" onto the front of a list, +usually by PUSHing elements onto a list stored in a local variable and +then returning the result of NREVERSEing it. (1) + + This is an efficient way to build a list because each PUSH has to +create only one cons cell and modify a local variable and the NREVERSE +just has to zip down the list reassigning the CDRs. Because the list is +created entirely within the function, there's no danger any code outside +the function has a reference to any of its cons cells. Here's a +function that uses this idiom to build a list of the first n numbers, +starting at zero: (2) + + (defun upto (max) + (let ((result nil)) + (dotimes (i max) + (push i result)) + (nreverse result))) + + (upto 10) ==> (0 1 2 3 4 5 6 7 8 9) + + The next most common recycling idiom (3) is to immediately reassign +the value returned by the recycling function back to the place +containing the potentially recycled value. For instance, you'll often +see expressions like the following, using DELETE, the recycling version +of REMOVE: + + (setf foo (delete nil foo)) + + This sets the value of foo to its old value except with all the NILs +removed. However, even this idiom must be used with some care-if foo +shares structure with lists referenced elsewhere, using DELETE instead +of REMOVE can destroy the structure of those other lists. For example, +consider the two lists *list-2* and *list-3* from earlier that share +their last two cons cells. + + *list-2* ==> (0 4) + *list-3* ==> (1 2 0 4) + + You can delete 4 from *list-3* like this: + + (setf *list-3* (delete 4 *list-3*)) ==> (1 2 0) + + However, DELETE will likely perform the necessary deletion by setting +the CDR of the third cons cell to NIL, disconnecting the fourth cons +cell, the one holding the 4, from the list. Because the third cons cell +of *list-3* is also the first cons cell in *list-2*, the following +modifies *list-2* as well: + + *list-2* ==> (0) + + If you had used REMOVE instead of DELETE, it would've built a list +containing the values 1, 2, and 0, creating new cons cells as necessary +rather than modifying any of the cons cells in *list-3*. In that case, +*list-2* wouldn't have been affected. + + The PUSH/NREVERSE and SETF/DELETE idioms probably account for 80 +percent of the uses of recycling functions. Other uses are possible but +require keeping careful track of which functions return shared structure +and which do not. + + In general, when manipulating lists, it's best to write your own code +in a functional style-your functions should depend only on the contents +of their list arguments and shouldn't modify them. Following that rule +will, of course, rule out using any destructive functions, recycling or +otherwise. Once you have your code working, if profiling shows you need +to optimize, you can replace nondestructive list operations with their +recycling counterparts but only if you're certain the argument lists +aren't referenced from anywhere else. + + One last gotcha to watch out for is that the sorting functions SORT, +STABLE-SORT, and MERGE mentioned in Chapter 11 are also recycling +functions when applied to lists. (4) However, these functions don't +have nondestructive counterparts, so if you need to sort a list without +destroying it, you need to pass the sorting function a copy made with +COPY-LIST. In either case you need to be sure to save the result of the +sorting function because the original argument is likely to be in +tatters. For instance: + + CL-USER> (defparameter *list* (list 4 3 2 1)) + *LIST* + CL-USER> (sort *list* #'<) + (1 2 3 4) ; looks good + CL-USER> *list* + (4) ; whoops! + + ---------- Footnotes ---------- + + (1) For example, in an examination of all uses of recycling functions +in the Common Lisp Open Code Collection (CLOCC), a diverse set of +libraries written by various authors, instances of the PUSH/NREVERSE +idiom accounted for nearly half of all uses of recycling functions. + + (2) There are, of course, other ways to do this same thing. The +extended LOOP macro, for instance, makes it particularly easy and likely +generates code that's even more efficient than the PUSH/ NREVERSE +version. + + (3) This idiom accounts for 30 percent of uses of recycling in the +CLOCC code base. + + (4) SORT and STABLE-SORT can be used as for-side-effect operations on +vectors, but since they still return the sorted vector, you should +ignore that fact and use them for return values for the sake of +consistency. + + +File: pcl.info, Node: 12-5, Next: 12-6, Prev: 12-4, Up: Chapter 12 + +List-Manipulation Functions +=========================== + +With that background out of the way, you're ready to look at the library +of functions Common Lisp provides for manipulating lists. + + You've already seen the basic functions for getting at the elements +of a list: FIRST and REST. Although you can get at any element of a list +by combining enough calls to REST (to move down the list) with a FIRST +(to extract the element), that can be a bit tedious. So Common Lisp +provides functions named for the other ordinals from SECOND to TENTH +that return the appropriate element. More generally, the function NTH +takes two arguments, an index and a list, and returns the nth +(zero-based) element of the list. Similarly, NTHCDR takes an index and +a list and returns the result of calling CDR n times. (Thus, (nthcdr 0 +...) simply returns the original list, and (nthcdr 1 ...) is +equivalent to REST.) Note, however, that none of these functions is any +more efficient, in terms of work done by the computer, than the +equivalent combinations of FIRSTs and RESTs-there's no way to get to the +nth element of a list without following n CDR references. (1) + + The 28 composite CAR/CDR functions are another family of functions +you may see used from time to time. Each function is named by placing a +sequence of up to four As and Ds between a C and R, with each A +representing a call to CAR and each D a call to CDR. Thus: + + (caar list) === (car (car list)) + (cadr list) === (car (cdr list)) + (cadadr list) === (car (cdr (car (cdr list)))) + + Note, however, that many of these functions make sense only when +applied to lists that contain other lists. For instance, CAAR extracts +the CAR of the CAR of the list it's given; thus, the list it's passed +must contain another list as its first element. In other words, these +are really functions on trees rather than lists: + + (caar (list 1 2 3)) ==> error + (caar (list (list 1 2) 3)) ==> 1 + (cadr (list (list 1 2) (list 3 4))) ==> (3 4) + (caadr (list (list 1 2) (list 3 4))) ==> 3 + + These functions aren't used as often now as in the old days. And +even the most die-hard old-school Lisp hackers tend to avoid the longer +combinations. However, they're used quite a bit in older Lisp code, so +it's worth at least understanding how they work. (2) + + The FIRST-TENTH and CAR, CADR, and so on, functions can also be used +as SETFable places if you're using lists nonfunctionally. + + Table 12-1 summarizes some other list functions that I won't cover in +detail. + + Table 12-1. Other List Functions + + + + ---------- Footnotes ---------- + + (1) NTH is roughly equivalent to the sequence function ELT but works +only with lists. Also, confusingly, NTH takes the index as the first +argument, the opposite of ELT. Another difference is that ELT will +signal an error if you try to access an element at an index greater than +or equal to the length of the list, but NTH will return NIL. + + (2) In particular, they used to be used to extract the various parts +of expressions passed to macros before the invention of destructuring +parameter lists. For example, you could take apart the following +expression: + + (when (> x 10) (print x)) + + Like this: + + ;; the condition + (cadr '(when (> x 10) (print x))) ==> (> X 10) + ;; the body, as a list + (cddr '(when (> x 10) (print x))) ==> ((PRINT X)) + + +File: pcl.info, Node: 12-6, Next: 12-7, Prev: 12-5, Up: Chapter 12 + +Mapping +======= + +Another important aspect of the functional style is the use of +higher-order functions, functions that take other functions as arguments +or return functions as values. You saw several examples of higher-order +functions, such as MAP, in the previous chapter. Although MAP can be +used with both lists and vectors (that is, with any kind of sequence), +Common Lisp also provides six mapping functions specifically for lists. +The differences between the six functions have to do with how they build +up their result and whether they apply the function to the elements of +the list or to the cons cells of the list structure. + + MAPCAR is the function most like MAP. Because it always returns a +list, it doesn't require the result-type argument MAP does. Instead, +its first argument is the function to apply, and subsequent arguments +are the lists whose elements will provide the arguments to the function. +Otherwise, it behaves like MAP: the function is applied to successive +elements of the list arguments, taking one element from each list per +application of the function. The results of each function call are +collected into a new list. For example: + + (mapcar #'(lambda (x) (* 2 x)) (list 1 2 3)) ==> (2 4 6) + (mapcar #'+ (list 1 2 3) (list 10 20 30)) ==> (11 22 33) + + MAPLIST is just like MAPCAR except instead of passing the elements of +the list to the function, it passes the actual cons cells. (1) Thus, +the function has access not only to the value of each element of the +list (via the CAR of the cons cell) but also to the rest of the list +(via the CDR). + + MAPCAN and MAPCON work like MAPCAR and MAPLIST except for the way +they build up their result. While MAPCAR and MAPLIST build a completely +new list to hold the results of the function calls, MAPCAN and MAPCON +build their result by splicing together the results-which must be +lists-as if by NCONC. Thus, each function invocation can provide any +number of elements to be included in the result. (2) MAPCAN, like +MAPCAR, passes the elements of the list to the mapped function while +MAPCON, like MAPLIST, passes the cons cells. + + Finally, the functions MAPC and MAPL are control constructs disguised +as functions-they simply return their first list argument, so they're +useful only when the side effects of the mapped function do something +interesting. MAPC is the cousin of MAPCAR and MAPCAN while MAPL is in +the MAPLIST/MAPCON family. + + ---------- Footnotes ---------- + + (1) Thus, MAPLIST is the more primitive of the two functions-if you +had only MAPLIST, you could build MAPCAR on top of it, but you couldn't +build MAPLIST on top of MAPCAR. + + (2) In Lisp dialects that didn't have filtering functions like +REMOVE, the idiomatic way to filter a list was with MAPCAN. + + (mapcan #'(lambda (x) (if (= x 10) nil (list x))) list) === (remove 10 list) + + +File: pcl.info, Node: 12-7, Next: Chapter 13, Prev: 12-6, Up: Chapter 12 + +Other Structures +================ + +While cons cells and lists are typically considered to be synonymous, +that's not quite right-as I mentioned earlier, you can use lists of +lists to represent trees. Just as the functions discussed in this +chapter allow you to treat structures built out of cons cells as lists, +other functions allow you to use cons cells to represent trees, sets, +and two kinds of key/value maps. I'll discuss some of those functions +in the next chapter. + + +File: pcl.info, Node: Chapter 13, Next: Chapter 14, Prev: Chapter 12, Up: Top + +13. Beyond Lists: Other Uses for Cons Cells +=========================================== + +As you saw in the previous chapter, the list data type is an illusion +created by a set of functions that manipulate cons cells. Common Lisp +also provides functions that let you treat data structures built out of +cons cells as trees, sets, and lookup tables. In this chapter I'll give +you a quick tour of some of these other data structures and the +functions for manipulating them. As with the list-manipulation +functions, many of these functions will be useful when you start writing +more complicated macros and need to manipulate Lisp code as data. + +* Menu: + +* 13-1:: Trees +* 13-2:: Sets +* 13-3:: Lookup Tables: Alists and Plists +* 13-4:: DESTRUCTURING-BIND + + +File: pcl.info, Node: 13-1, Next: 13-2, Prev: Chapter 13, Up: Chapter 13 + +Trees +===== + +Treating structures built from cons cells as trees is just about as +natural as treating them as lists. What is a list of lists, after all, +but another way of thinking of a tree? The difference between a +function that treats a bunch of cons cells as a list and a function that +treats the same bunch of cons cells as a tree has to do with which cons +cells the functions traverse to find the values of the list or tree. +The cons cells traversed by a list function, called the list structure, +are found by starting at the first cons cell and following CDR +references until reaching a NIL. The elements of the list are the +objects referenced by the CARs of the cons cells in the list structure. +If a cons cell in the list structure has a CAR that references another +cons cell, the referenced cons cell is considered to be the head of a +list that's an element of the outer list. (1) Tree structure, on the +other hand, is traversed by following both CAR and CDR references for as +long as they point to other cons cells. The values in a tree are thus +the atomic-non-cons-cell-values referenced by either the CARs or the +CDRs of the cons cells in the tree structure. + + For instance, the following box-and-arrow diagram shows the cons +cells that make up the list of lists: ((1 2) (3 4) (5 6)). The list +structure includes only the three cons cells inside the dashed box while +the tree structure includes all the cons cells. + + + + To see the difference between a list function and a tree function, +you can consider how the functions COPY-LIST and COPY-TREE will copy +this bunch of cons cells. COPY-LIST, as a list function, copies the +cons cells that make up the list structure. That is, it makes a new +cons cell corresponding to each of the cons cells inside the dashed box. +The CARs of each of these new cons cells reference the same object as +the CARs of the original cons cells in the list structure. Thus, +COPY-LIST doesn't copy the sublists (1 2), (3 4), or (5 6), as shown in +this diagram: + + + + COPY-TREE, on the other hand, makes a new cons cell for each of the +cons cells in the diagram and links them together in the same structure, +as shown in this diagram: + + + + Where a cons cell in the original referenced an atomic value, the +corresponding cons cell in the copy will reference the same value. +Thus, the only objects referenced in common by the original tree and the +copy produced by COPY-TREE are the numbers 1-6, and the symbol NIL. + + Another function that walks both the CARs and the CDRs of a tree of +cons cells is TREE-EQUAL, which compares two trees, considering them +equal if the tree structure is the same shape and if the leaves are EQL +(or if they satisfy the test supplied with the :test keyword argument). + + Some other tree-centric functions are the tree analogs to the +SUBSTITUTE and NSUBSTITUTE sequence functions and their -IF and -IF-NOT +variants. The function SUBST, like SUBSTITUTE, takes a new item, an old +item, and a tree (as opposed to a sequence), along with :key and :test +keyword arguments, and it returns a new tree with the same shape as the +original tree but with all instances of the old item replaced with the +new item. For example: + + CL-USER> (subst 10 1 '(1 2 (3 2 1) ((1 1) (2 2)))) + (10 2 (3 2 10) ((10 10) (2 2))) + + SUBST-IF is analogous to SUBSTITUTE-IF. Instead of an old item, it +takes a one-argument function-the function is called with each atomic +value in the tree, and whenever it returns true, the position in the new +tree is filled with the new value. SUBST-IF-NOT is the same except the +values where the test returns NIL are replaced. NSUBST, NSUBST-IF, and +NSUBST-IF-NOT are the recycling versions of the SUBST functions. As +with most other recycling functions, you should use these functions only +as drop-in replacements for their nondestructive counterparts in +situations where you know there's no danger of modifying a shared +structure. In particular, you must continue to save the return value of +these functions since you have no guarantee that the result will be EQ +to the original tree. (2) + + ---------- Footnotes ---------- + + (1) It's possible to build a chain of cons cells where the CDR of the +last cons cell isn't NIL but some other atom. This is called a dotted +list because the last cons is a dotted pair. + + (2) It may seem that the NSUBST family of functions can and in fact +does modify the tree in place. However, there's one edge case: when the +"tree" passed is, in fact, an atom, it can't be modified in place, so +the result of NSUBST will be a different object than the argument: +(nsubst 'x 'y 'y) X. + + +File: pcl.info, Node: 13-2, Next: 13-3, Prev: 13-1, Up: Chapter 13 + +Sets +==== + +Sets can also be implemented in terms of cons cells. In fact, you can +treat any list as a set-Common Lisp provides several functions for +performing set-theoretic operations on lists. However, you should bear +in mind that because of the way lists are structured, these operations +get less and less efficient the bigger the sets get. + + That said, using the built-in set functions makes it easy to write +set-manipulation code. And for small sets they may well be more +efficient than the alternatives. If profiling shows you that these +functions are a performance bottleneck in your code, you can always +replace the lists with sets built on top of hash tables or bit vectors. + + To build up a set, you can use the function ADJOIN. ADJOIN takes an +item and a list representing a set and returns a list representing the +set containing the item and all the items in the original set. To +determine whether the item is present, it must scan the list; if the +item isn't found, ADJOIN creates a new cons cell holding the item and +pointing to the original list and returns it. Otherwise, it returns the +original list. + + ADJOIN also takes :key and :test keyword arguments, which are used +when determining whether the item is present in the original list. Like +CONS, ADJOIN has no effect on the original list-if you want to modify a +particular list, you need to assign the value returned by ADJOIN to the +place where the list came from. The modify macro PUSHNEW does this for +you automatically. + + CL-USER> (defparameter *set* ()) + *SET* + CL-USER> (adjoin 1 *set*) + (1) + CL-USER> *set* + NIL + CL-USER> (setf *set* (adjoin 1 *set*)) + (1) + CL-USER> (pushnew 2 *set*) + (2 1) + CL-USER> *set* + (2 1) + CL-USER> (pushnew 2 *set*) + (2 1) + + You can test whether a given item is in a set with MEMBER and the +related functions MEMBER-IF and MEMBER-IF-NOT. These functions are +similar to the sequence functions FIND, FIND-IF, and FIND-IF-NOT except +they can be used only with lists. And instead of returning the item +when it's present, they return the cons cell containing the item-in +other words, the sublist starting with the desired item. When the +desired item isn't present in the list, all three functions return NIL. + + The remaining set-theoretic functions provide bulk operations: +INTERSECTION, UNION, SET-DIFFERENCE, and SET-EXCLUSIVE-OR. Each of these +functions takes two lists and :key and :test keyword arguments and +returns a new list representing the set resulting from performing the +appropriate set-theoretic operation on the two lists: INTERSECTION +returns a list containing all the elements found in both arguments. +UNION returns a list containing one instance of each unique element from +the two arguments. (1) SET-DIFFERENCE returns a list containing all the +elements from the first argument that don't appear in the second +argument. And SET-EXCLUSIVE-OR returns a list containing those elements +appearing in only one or the other of the two argument lists but not in +both. Each of these functions also has a recycling counterpart whose +name is the same except with an N prefix. + + Finally, the function SUBSETP takes two lists and the usual :key and +:test keyword arguments and returns true if the first list is a subset +of the second-if every element in the first list is also present in the +second list. The order of the elements in the lists doesn't matter. + + CL-USER> (subsetp '(3 2 1) '(1 2 3 4)) + T + CL-USER> (subsetp '(1 2 3 4) '(3 2 1)) + NIL + + ---------- Footnotes ---------- + + (1) UNION takes only one element from each list, but if either list +contains duplicate elements, the result may also contain duplicates. + + +File: pcl.info, Node: 13-3, Next: 13-4, Prev: 13-2, Up: Chapter 13 + +Lookup Tables: Alists and Plists +================================ + +In addition to trees and sets, you can build tables that map keys to +values out of cons cells. Two flavors of cons-based lookup tables are +commonly used, both of which I've mentioned in passing in previous +chapters. They're association lists, also called alists, and property +lists, also known as plists. While you wouldn't use either alists or +plists for large tables-for that you'd use a hash table-it's worth +knowing how to work with them both because for small tables they can be +more efficient than hash tables and because they have some useful +properties of their own. + + An alist is a data structure that maps keys to values and also +supports reverse lookups, finding the key when given a value. Alists +also support adding key/value mappings that shadow existing mappings in +such a way that the shadowing mapping can later be removed and the +original mappings exposed again. + + Under the covers, an alist is essentially a list whose elements are +themselves cons cells. Each element can be thought of as a key/value +pair with the key in the cons cell's CAR and the value in the CDR. For +instance, the following is a box-and-arrow diagram of an alist mapping +the symbol A to the number 1, B to 2, and C to 3: + + + + Unless the value in the CDR is a list, cons cells representing the +key/value pairs will be dotted pairs in s-expression notation. The +alist diagramed in the previous figure, for instance, is printed like +this: + + ((A . 1) (B . 2) (C . 3)) + + The main lookup function for alists is ASSOC, which takes a key and +an alist and returns the first cons cell whose CAR matches the key or +NIL if no match is found. + + CL-USER> (assoc 'a '((a . 1) (b . 2) (c . 3))) + (A . 1) + CL-USER> (assoc 'c '((a . 1) (b . 2) (c . 3))) + (C . 3) + CL-USER> (assoc 'd '((a . 1) (b . 2) (c . 3))) + NIL + + To get the value corresponding to a given key, you simply pass the +result of ASSOC to CDR. + + CL-USER> (cdr (assoc 'a '((a . 1) (b . 2) (c . 3)))) + 1 + + By default the key given is compared to the keys in the alist using +EQL, but you can change that with the standard combination of :key and +:test keyword arguments. For instance, if you wanted to use string +keys, you might write this: + + CL-USER> (assoc "a" '(("a" . 1) ("b" . 2) ("c" . 3)) :test #'string=) + ("a" . 1) + + Without specifying :test to be STRING=, that ASSOC would probably +return NIL because two strings with the same contents aren't necessarily +EQL. + + CL-USER> (assoc "a" '(("a" . 1) ("b" . 2) ("c" . 3))) + NIL + + Because ASSOC searches the list by scanning from the front of the +list, one key/value pair in an alist can shadow other pairs with the +same key later in the list. + + CL-USER> (assoc 'a '((a . 10) (a . 1) (b . 2) (c . 3))) + (A . 10) + + You can add a pair to the front of an alist with CONS like this: + + (cons (cons 'new-key 'new-value) alist) + + However, as a convenience, Common Lisp provides the function ACONS, +which lets you write this: + + (acons 'new-key 'new-value alist) + + Like CONS, ACONS is a function and thus can't modify the place +holding the alist it's passed. If you want to modify an alist, you need +to write either this: + + (setf alist (acons 'new-key 'new-value alist)) + + or this: + + (push (cons 'new-key 'new-value) alist) + + Obviously, the time it takes to search an alist with ASSOC is a +function of how deep in the list the matching pair is found. In the +worst case, determining that no pair matches requires ASSOC to scan +every element of the alist. However, since the basic mechanism for +alists is so lightweight, for small tables an alist can outperform a +hash table. Also, alists give you more flexibility in how you do the +lookup. I already mentioned that ASSOC takes :key and :test keyword +arguments. When those don't suit your needs, you may be able to use the +ASSOC-IF and ASSOC-IF-NOT functions, which return the first key/value +pair whose CAR satisfies (or not, in the case of ASSOC-IF-NOT) the test +function passed in the place of a specific item. And three +functions-RASSOC, RASSOC-IF, and RASSOC-IF-NOT-work just like the +corresponding ASSOC functions except they use the value in the CDR of +each element as the key, performing a reverse lookup. + + The function COPY-ALIST is similar to COPY-TREE except, instead of +copying the whole tree structure, it copies only the cons cells that +make up the list structure, plus the cons cells directly referenced from +the CARs of those cells. In other words, the original alist and the +copy will both contain the same objects as the keys and values, even if +those keys or values happen to be made up of cons cells. + + Finally, you can build an alist from two separate lists of keys and +values with the function PAIRLIS. The resulting alist may contain the +pairs either in the same order as the original lists or in reverse +order. For example, you may get this result: + + CL-USER> (pairlis '(a b c) '(1 2 3)) + ((C . 3) (B . 2) (A . 1)) + + Or you could just as well get this: + + CL-USER> (pairlis '(a b c) '(1 2 3)) + ((A . 1) (B . 2) (C . 3)) + + The other kind of lookup table is the property list, or plist, which +you used to represent the rows in the database in Chapter 3. +Structurally a plist is just a regular list with the keys and values as +alternating values. For instance, a plist mapping A, B, and C, to 1, 2, +and 3 is simply the list (A 1 B 2 C 3). In boxes-and-arrows form, it +looks like this: + + + + However, plists are less flexible than alists. In fact, plists +support only one fundamental lookup operation, the function GETF, which +takes a plist and a key and returns the associated value or NIL if the +key isn't found. GETF also takes an optional third argument, which will +be returned in place of NIL if the key isn't found. + + Unlike ASSOC, which uses EQL as its default test and allows a +different test function to be supplied with a :test argument, GETF +always uses EQ to test whether the provided key matches the keys in the +plist. Consequently, you should never use numbers or characters as keys +in a plist; as you saw in Chapter 4, the behavior of EQ for those types +is essentially undefined. Practically speaking, the keys in a plist are +almost always symbols, which makes sense since plists were first +invented to implement symbolic "properties," arbitrary mappings between +names and values. + + You can use SETF with GETF to set the value associated with a given +key. SETF also treats GETF a bit specially in that the first argument +to GETF is treated as the place to modify. Thus, you can use SETF of +GETF to add a new key/value pair to an existing plist. + + CL-USER> (defparameter *plist* ()) + *PLIST* + CL-USER> *plist* + NIL + CL-USER> (setf (getf *plist* :a) 1) + 1 + CL-USER> *plist* + (:A 1) + CL-USER> (setf (getf *plist* :a) 2) + 2 + CL-USER> *plist* + (:A 2) + + To remove a key/value pair from a plist, you use the macro REMF, +which sets the place given as its first argument to a plist containing +all the key/value pairs except the one specified. It returns true if +the given key was actually found. + + CL-USER> (remf *plist* :a) + T + CL-USER> *plist* + NIL + + Like GETF, REMF always uses EQ to compare the given key to the keys +in the plist. + + Since plists are often used in situations where you want to extract +several properties from the same plist, Common Lisp provides a function, +GET-PROPERTIES, that makes it more efficient to extract multiple values +from a single plist. It takes a plist and a list of keys to search for +and returns, as multiple values, the first key found, the corresponding +value, and the head of the list starting with the found key. This +allows you to process a property list, extracting the desired +properties, without continually rescanning from the front of the list. +For instance, the following function efficiently processes-using the +hypothetical function process-property-all the key/value pairs in a +plist for a given list of keys: + + (defun process-properties (plist keys) + (loop while plist do + (multiple-value-bind (key value tail) (get-properties plist keys) + (when key (process-property key value)) + (setf plist (cddr tail))))) + + The last special thing about plists is the relationship they have +with symbols: every symbol object has an associated plist that can be +used to store information about the symbol. The plist can be obtained +via the function SYMBOL-PLIST. However, you rarely care about the whole +plist; more often you'll use the functions GET, which takes a symbol and +a key and is shorthand for a GETF of the same key in the symbols +SYMBOL-PLIST. + + (get 'symbol 'key) === (getf (symbol-plist 'symbol) 'key) + Like GETF, GET is SETFable, so you can attach arbitrary information +to a symbol like this: + + (setf (get 'some-symbol 'my-key) "information") + + To remove a property from a symbol's plist, you can use either REMF +of SYMBOL-PLIST or the convenience function REMPROP. (1) + + (remprop 'symbol 'key) === (remf (symbol-plist 'symbol key)) + + Being able to attach arbitrary information to names is quite handy +when doing any kind of symbolic programming. For instance, one of the +macros you'll write in Chapter 24 will attach information to names that +other instances of the same macros will extract and use when generating +their expansions. + + ---------- Footnotes ---------- + + (1) It's also possible to directly SETF SYMBOL-PLIST. However, that's +a bad idea, as different code may have added different properties to the +symbol's plist for different reasons. If one piece of code clobbers the +symbol's whole plist, it may break other code that added its own +properties to the plist. + + +File: pcl.info, Node: 13-4, Next: Chapter 14, Prev: 13-3, Up: Chapter 13 + +DESTRUCTURING-BIND +================== + +One last tool for slicing and dicing lists that I need to cover since +you'll need it in later chapters is the DESTRUCTURING-BIND macro. This +macro provides a way to destructure arbitrary lists, similar to the way +macro parameter lists can take apart their argument list. The basic +skeleton of a DESTRUCTURING-BIND is as follows: + + (destructuring-bind (parameter*) list + body-form*) + + The parameter list can include any of the types of parameters +supported in macro parameter lists such as &optional, &rest, and &key +parameters. (1) And, as in macro parameter lists, any parameter can be +replaced with a nested destructuring parameter list, which takes apart +the list that would otherwise have been bound to the replaced parameter. +The list form is evaluated once and should return a list, which is then +destructured and the appropriate values are bound to the variables in +the parameter list. Then the body-forms are evaluated in order with +those bindings in effect. Some simple examples follow: + + (destructuring-bind (x y z) (list 1 2 3) + (list :x x :y y :z z)) ==> (:X 1 :Y 2 :Z 3) + + (destructuring-bind (x y z) (list 1 (list 2 20) 3) + (list :x x :y y :z z)) ==> (:X 1 :Y (2 20) :Z 3) + + (destructuring-bind (x (y1 y2) z) (list 1 (list 2 20) 3) + (list :x x :y1 y1 :y2 y2 :z z)) ==> (:X 1 :Y1 2 :Y2 20 :Z 3) + + (destructuring-bind (x (y1 &optional y2) z) (list 1 (list 2 20) 3) + (list :x x :y1 y1 :y2 y2 :z z)) ==> (:X 1 :Y1 2 :Y2 20 :Z 3) + + (destructuring-bind (x (y1 &optional y2) z) (list 1 (list 2) 3) + (list :x x :y1 y1 :y2 y2 :z z)) ==> (:X 1 :Y1 2 :Y2 NIL :Z 3) + + (destructuring-bind (&key x y z) (list :x 1 :y 2 :z 3) + (list :x x :y y :z z)) ==> (:X 1 :Y 2 :Z 3) + + (destructuring-bind (&key x y z) (list :z 1 :y 2 :x 3) + (list :x x :y y :z z)) ==> (:X 3 :Y 2 :Z 1) + + One kind of parameter you can use with DESTRUCTURING-BIND and also in +macro parameter lists, though I didn't mention it in Chapter 8, is a +&whole parameter. If specified, it must be the first parameter in a +parameter list, and it's bound to the whole list form. (2) After a +&whole parameter, other parameters can appear as usual and will extract +specific parts of the list just as they would if the &whole parameter +weren't there. An example of using &whole with DESTRUCTURING-BIND looks +like this: + + (destructuring-bind (&whole whole &key x y z) (list :z 1 :y 2 :x 3) + (list :x x :y y :z z :whole whole)) + ==> (:X 3 :Y 2 :Z 1 :WHOLE (:Z 1 :Y 2 :X 3)) + + You'll use a &whole parameter in one of the macros that's part of the +HTML generation library you'll develop in Chapter 31. However, I have a +few more topics to cover before you can get to that. After two chapters +on the rather Lispy topic of cons cells, you can now turn to the more +prosaic matter of how to deal with files and filenames. + + ---------- Footnotes ---------- + + (1) Macro parameter lists do support one parameter type, &environment +parameters, which DESTRUCTURING-BIND doesn't. However, I didn't discuss +that parameter type in Chapter 8, and you don't need to worry about it +now either. + + (2) When a &whole parameter is used in a macro parameter list, the +form it's bound to is the whole macro form, including the name of the +macro. + + +File: pcl.info, Node: Chapter 14, Next: Chapter 15, Prev: Chapter 13, Up: Top + +14. Files and File I/O +====================== + +Common Lisp provides a rich library of functionality for dealing with +files. In this chapter I'll focus on a few basic file-related tasks: +reading and writing files and listing files in the file system. For +these basic tasks, Common Lisp's I/O facilities are similar to those in +other languages. Common Lisp provides a stream abstraction for reading +and writing data and an abstraction, called pathnames, for manipulating +filenames in an operating system-independent way. Additionally, Common +Lisp provides other bits of functionality unique to Lisp such as the +ability to read and write s-expressions. + +* Menu: + +* 14-1:: Reading File Data +* 14-2:: Reading Binary Data +* 14-3:: Bulk Reads +* 14-4:: File Output +* 14-5:: Closing Files +* 14-6:: Filenames +* 14-7:: How Pathnames Represent Filenames +* 14-8:: Constructing New Pathnames +* 14-9:: Two Representations of Directory Names +* 14-10:: Interacting with the File System +* 14-11:: Other Kinds of I/O + + +File: pcl.info, Node: 14-1, Next: 14-2, Prev: Chapter 14, Up: Chapter 14 + +Reading File Data +================= + +The most basic file I/O task is to read the contents of a file. You +obtain a stream from which you can read a file's contents with the OPEN +function. By default OPEN returns a character-based input stream you +can pass to a variety of functions that read one or more characters of +text: READ-CHAR reads a single character; READ-LINE reads a line of +text, returning it as a string with the end-of-line character(s) +removed; and READ reads a single s-expression, returning a Lisp object. +When you're done with the stream, you can close it with the CLOSE +function. + + The only required argument to OPEN is the name of the file to read. +As you'll see in the section "Filenames," Common Lisp provides a couple +of ways to represent a filename, but the simplest is to use a string +containing the name in the local file-naming syntax. So assuming that +/some/file/name.txt is a file, you can open it like this: + + (open "/some/file/name.txt") + + You can use the object returned as the first argument to any of the +read functions. For instance, to print the first line of the file, you +can combine OPEN, READ-LINE, and CLOSE as follows: + + (let ((in (open "/some/file/name.txt"))) + (format t "~a~%" (read-line in)) + (close in)) + + Of course, a number of things can go wrong while trying to open and +read from a file. The file may not exist. Or you may unexpectedly hit +the end of the file while reading. By default OPEN and the READ-* +functions will signal an error in these situations. In Chapter 19, I'll +discuss how to recover from such errors. For now, however, there's a +lighter-weight solution: each of these functions accepts arguments that +modify its behavior in these exceptional situations. + + If you want to open a possibly nonexistent file without OPEN +signaling an error, you can use the keyword argument :if-does-not-exist +to specify a different behavior. The three possible values are :error, +the default; :create, which tells it to go ahead and create the file and +then proceed as if it had already existed; and NIL, which tells it to +return NIL instead of a stream. Thus, you can change the previous +example to deal with the possibility that the file may not exist. + + (let ((in (open "/some/file/name.txt" :if-does-not-exist nil))) + (when in + (format t "~a~%" (read-line in)) + (close in))) + + The reading functions-READ-CHAR, READ-LINE, and READ-all take an +optional argument, which defaults to true, that specifies whether they +should signal an error if they're called at the end of the file. If +that argument is NIL, they instead return the value of their third +argument, which defaults to NIL. Thus, you could print all the lines in +a file like this: + + (let ((in (open "/some/file/name.txt" :if-does-not-exist nil))) + (when in + (loop for line = (read-line in nil) + while line do (format t "~a~%" line)) + (close in))) + + Of the three text-reading functions, READ is unique to Lisp. This is +the same function that provides the R in the REPL and that's used to +read Lisp source code. Each time it's called, it reads a single +s-expression, skipping whitespace and comments, and returns the Lisp +object denoted by the s-expression. For instance, suppose +/some/file/name.txt has the following contents: + + (1 2 3) + 456 + "a string" ; this is a comment + ((a b) + (c d)) + + In other words, it contains four s-expressions: a list of numbers, a +number, a string, and a list of lists. You can read those expressions +like this: + + CL-USER> (defparameter *s* (open "/some/file/name.txt")) + *S* + CL-USER> (read *s*) + (1 2 3) + CL-USER> (read *s*) + 456 + CL-USER> (read *s*) + "a string" + CL-USER> (read *s*) + ((A B) (C D)) + CL-USER> (close *s*) + T + + As you saw in Chapter 3, you can use PRINT to print Lisp objects in +"readable" form. Thus, whenever you need to store a bit of data in a +file, PRINT and READ provide an easy way to do it without having to +design a data format or write a parser. They even-as the previous +example demonstrated-give you comments for free. And because +s-expressions were designed to be human editable, it's also a fine +format for things like configuration files. (1) + + ---------- Footnotes ---------- + + (1) Note, however, that while the Lisp reader knows how to skip +comments, it completely skips them. Thus, if you use READ to read in a +configuration file containing comments and then use PRINT to save +changes to the data, you'll lose the comments. + + +File: pcl.info, Node: 14-2, Next: 14-3, Prev: 14-1, Up: Chapter 14 + +Reading Binary Data +=================== + +By default OPEN returns character streams, which translate the +underlying bytes to characters according to a particular +character-encoding scheme. (1) To read the raw bytes, you need to pass +OPEN an :element-type argument of '(unsigned-byte 8). (2) You can pass +the resulting stream to the function READ-BYTE, which will return an +integer between 0 and 255 each time it's called. READ-BYTE, like the +character-reading functions, also accepts optional arguments to specify +whether it should signal an error if called at the end of the file and +what value to return if not. In Chapter 24 you'll build a library that +allows you to conveniently read structured binary data using READ-BYTE. +(3) + + ---------- Footnotes ---------- + + (1) By default OPEN uses the default character encoding for the +operating system, but it also accepts a keyword parameter, +:external-format, that can pass implementation-defined values that +specify a different encoding. Character streams also translate the +platform-specific end-of-line sequence to the single character +#\Newline. + + (2) The type (unsigned-byte 8) indicates an 8-bit byte; Common Lisp +"byte" types aren't a fixed size since Lisp has run at various times on +architectures with byte sizes from 6 to 9 bits, to say nothing of the +PDP-10, which had individually addressable variable-length bit fields of +1 to 36 bits. + + (3) In general, a stream is either a character stream or a binary +stream, so you can't mix calls to READ-BYTE and READ-CHAR or other +character-based read functions. However, some implementations, such as +Allegro, support so-called bivalent streams, which support both +character and binary I/O. + + +File: pcl.info, Node: 14-3, Next: 14-4, Prev: 14-2, Up: Chapter 14 + +Bulk Reads +========== + +One last reading function, READ-SEQUENCE, works with both character and +binary streams. You pass it a sequence (typically a vector) and a +stream, and it attempts to fill the sequence with data from the stream. +It returns the index of the first element of the sequence that wasn't +filled or the length of the sequence if it was able to completely fill +it. You can also pass :start and :end keyword arguments to specify a +subsequence that should be filled instead. The sequence argument must +be a type that can hold elements of the stream's element type. Since +most operating systems support some form of block I/O, READ-SEQUENCE is +likely to be quite a bit more efficient than filling a sequence by +repeatedly calling READ-BYTE or READ-CHAR. + + +File: pcl.info, Node: 14-4, Next: 14-5, Prev: 14-3, Up: Chapter 14 + +File Output +=========== + +To write data to a file, you need an output stream, which you obtain by +calling OPEN with a :direction keyword argument of :output. When +opening a file for output, OPEN assumes the file shouldn't already exist +and will signal an error if it does. However, you can change that +behavior with the :if-exists keyword argument. Passing the value +:supersede tells OPEN to replace the existing file. Passing :append +causes OPEN to open the existing file such that new data will be written +at the end of the file, while :overwrite returns a stream that will +overwrite existing data starting from the beginning of the file. And +passing NIL will cause OPEN to return NIL instead of a stream if the +file already exists. A typical use of OPEN for output looks like this: + + (open "/some/file/name.txt" :direction :output :if-exists :supersede) + + Common Lisp also provides several functions for writing data: +WRITE-CHAR writes a single character to the stream. WRITE-LINE writes a +string followed by a newline, which will be output as the appropriate +end-of-line character or characters for the platform. Another function, +WRITE-STRING, writes a string without adding any end-of-line characters. +Two different functions can print just a newline: TERPRI-short for +"terminate print"-unconditionally prints a newline character, and +FRESH-LINE prints a newline character unless the stream is at the +beginning of a line. FRESH-LINE is handy when you want to avoid +spurious blank lines in textual output generated by different functions +called in sequence. For example, suppose you have one function that +generates output that should always be followed by a line break and +another that should start on a new line. But assume that if the +functions are called one after the other, you don't want a blank line +between the two bits of output. If you use FRESH-LINE at the beginning +of the second function, its output will always start on a new line, but +if it's called right after the first, it won't emit an extra line break. + + Several functions output Lisp data as s-expressions: PRINT prints an +s-expression preceded by an end-of-line and followed by a space. PRIN1 +prints just the s-expression. And the function PPRINT prints +s-expressions like PRINT and PRIN1 but using the "pretty printer," which +tries to print its output in an aesthetically pleasing way. + + However, not all objects can be printed in a form that READ will +understand. The variable *PRINT-READABLY* controls what happens if you +try to print such an object with PRINT, PRIN1, or PPRINT. When it's NIL, +these functions will print the object in a special syntax that's +guaranteed to cause READ to signal an error if it tries to read it; +otherwise they will signal an error rather than print the object. + + Another function, PRINC, also prints Lisp objects, but in a way +designed for human consumption. For instance, PRINC prints strings +without quotation marks. You can generate more elaborate text output +with the incredibly flexible if somewhat arcane FORMAT function. I'll +discuss some of the more important details of FORMAT, which essentially +defines a mini-language for emitting formatted output, in Chapter 18. + + To write binary data to a file, you have to OPEN the file with the +same :element-type argument as you did to read it: '(unsigned-byte 8). +You can then write individual bytes to the stream with WRITE-BYTE. + + The bulk output function WRITE-SEQUENCE accepts both binary and +character streams as long as all the elements of the sequence are of an +appropriate type for the stream, either characters or bytes. As with +READ-SEQUENCE, this function is likely to be quite a bit more efficient +than writing the elements of the sequence one at a time. + + +File: pcl.info, Node: 14-5, Next: 14-6, Prev: 14-4, Up: Chapter 14 + +Closing Files +============= + +As anyone who has written code that deals with lots of files knows, it's +important to close files when you're done with them, because file +handles tend to be a scarce resource. If you open files and don't close +them, you'll soon discover you can't open any more files. (1) It might +seem straightforward enough to just be sure every OPEN has a matching +CLOSE. For instance, you could always structure your file using code +like this: + + (let ((stream (open "/some/file/name.txt"))) + ;; do stuff with stream + (close stream)) + + However, this approach suffers from two problems. One is simply that +it's error prone-if you forget the CLOSE, the code will leak a file +handle every time it runs. The other-and more significant-problem is +that there's no guarantee you'll get to the CLOSE. For instance, if the +code prior to the CLOSE contains a RETURN or RETURN-FROM, you could +leave the LET without closing the stream. Or, as you'll see in Chapter +19, if any of the code before the CLOSE signals an error, control may +jump out of the LET to an error handler and never come back to close the +stream. + + Common Lisp provides a general solution to the problem of how to +ensure that certain code always runs: the special operator +UNWIND-PROTECT, which I'll discuss in Chapter 20. However, because the +pattern of opening a file, doing something with the resulting stream, +and then closing the stream is so common, Common Lisp provides a macro, +WITH-OPEN-FILE, built on top of UNWIND-PROTECT, to encapsulate this +pattern. This is the basic form: + + (with-open-file (stream-var open-argument*) + body-form*) + + The forms in body-forms are evaluated with stream-var bound to a file +stream opened by a call to OPEN with open-arguments as its arguments. +WITH-OPEN-FILE then ensures the stream in stream-var is closed before +the WITH-OPEN-FILE form returns. Thus, you can write this to read a +line from a file: + + (with-open-file (stream "/some/file/name.txt") + (format t "~a~%" (read-line stream))) + + To create a new file, you can write something like this: + + (with-open-file (stream "/some/file/name.txt" :direction :output) + (format stream "Some text.")) + + You'll probably use WITH-OPEN-FILE for 90-99 percent of the file I/O +you do-the only time you need to use raw OPEN and CLOSE calls is if you +need to open a file in a function and keep the stream around after the +function returns. In that case, you must take care to eventually close +the stream yourself, or you'll leak file descriptors and may eventually +end up unable to open any more files. + + ---------- Footnotes ---------- + + (1) Some folks expect this wouldn't be a problem in a +garbage-collected language such as Lisp. It is the case in most Lisp +implementations that a stream that becomes garbage will automatically be +closed. However, this isn't something to rely on-the problem is that +garbage collectors usually run only when memory is low; they don't know +about other scarce resources such as file handles. If there's plenty of +memory available, it's easy to run out of file handles long before the +garbage collector runs. + + +File: pcl.info, Node: 14-6, Next: 14-7, Prev: 14-5, Up: Chapter 14 + +Filenames +========= + +So far you've used strings to represent filenames. However, using +strings as filenames ties your code to a particular operating system and +file system. Likewise, if you programmatically construct names +according to the rules of a particular naming scheme (separating +directories with /, say), you also tie your code to a particular file +system. + + To avoid this kind of nonportability, Common Lisp provides another +representation of filenames: pathname objects. Pathnames represent +filenames in a structured way that makes them easy to manipulate without +tying them to a particular filename syntax. And the burden of +translating back and forth between strings in the local syntax-called +namestrings-and pathnames is placed on the Lisp implementation. + + Unfortunately, as with many abstractions designed to hide the details +of fundamentally different underlying systems, the pathname abstraction +introduces its own complications. When pathnames were designed, the set +of file systems in general use was quite a bit more variegated than +those in common use today. Consequently, some nooks and crannies of the +pathname abstraction make little sense if all you're concerned about is +representing Unix or Windows filenames. However, once you understand +which parts of the pathname abstraction you can ignore as artifacts of +pathnames' evolutionary history, they do provide a convenient way to +manipulate filenames. (1) + + Most places a filename is called for, you can use either a namestring +or a pathname. Which to use depends mostly on where the name +originated. Filenames provided by the user-for example, as arguments or +as values in configuration files-will typically be namestrings, since +the user knows what operating system they're running on and shouldn't be +expected to care about the details of how Lisp represents filenames. +But programmatically generated filenames will be pathnames because you +can create them portably. A stream returned by OPEN also represents a +filename, namely, the filename that was originally used to open the +stream. Together these three types are collectively referred to as +pathname designators. All the built-in functions that expect a filename +argument accept all three types of pathname designator. For instance, +all the places in the previous section where you used a string to +represent a filename, you could also have passed a pathname object or a +stream. + +* Menu: + +* 14-6-1:: How We Got Here + + ---------- Footnotes ---------- + + (1) Another reason the pathname system is considered somewhat baroque +is because of the inclusion of logical pathnames. However, you can use +the rest of the pathname system perfectly well without knowing anything +more about logical pathnames than that you can safely ignore them. +Briefly, logical pathnames allow Common Lisp programs to contain +references to pathnames without naming specific files. Logical +pathnames could then be mapped to specific locations in an actual file +system when the program was installed by defining a "logical pathname +translation" that translates logical pathnames matching certain +wildcards to pathnames representing files in the file system, so-called +physical pathnames. They have their uses in certain situations, but you +can get pretty far without worrying about them. + + +File: pcl.info, Node: 14-6-1, Next: 14-6, Prev: 14-6, Up: 14-6 + +How We Got Here +=============== + +The historical diversity of file systems in existence during the 70s and +80s can be easy to forget. Kent Pitman, one of the principal technical +editors of the Common Lisp standard, described the situation once in +comp.lang.lisp (Message-ID: sfwzo74np6w.fsf@world.std.com) thusly: + + The dominant file systems at the time the design [of Common Lisp] + was done were TOPS-10, TENEX, TOPS-20, VAX VMS, AT&T Unix, MIT + Multics, MIT ITS, not to mention a bunch of mainframe [OSs]. Some + were uppercase only, some mixed, some were case-sensitive but case- + translating (like CL). Some had dirs as files, some not. Some had + quote chars for funny file chars, some not. Some had wildcards, + some didn't. Some had :up in relative pathnames, some didn't. + Some had namable root dirs, some didn't. There were file systems + with no directories, file systems with non-hierarchical + directories, file systems with no file types, file systems with no + versions, file systems with no devices, and so on. + + If you look at the pathname abstraction from the point of view of any +single file system, it seems baroque. However, if you take even two +such similar file systems as Windows and Unix, you can already begin to +see differences the pathname system can help abstract away-Windows +filenames contain a drive letter, for instance, while Unix filenames +don't. The other advantage of having the pathname abstraction designed +to handle the wide variety of file systems that existed in the past is +that it's more likely to be able to handle file systems that may exist +in the future. If, say, versioning file systems come back into vogue, +Common Lisp will be ready. + + +File: pcl.info, Node: 14-7, Next: 14-8, Prev: 14-6, Up: Chapter 14 + +How Pathnames Represent Filenames +================================= + +A pathname is a structured object that represents a filename using six +components: host, device, directory, name, type, and version. Most of +these components take on atomic values, usually strings; only the +directory component is further structured, containing a list of +directory names (as strings) prefaced with the keyword :absolute or +:relative. However, not all pathname components are needed on all +platforms-this is one of the reasons pathnames strike many new Lispers +as gratuitously complex. On the other hand, you don't really need to +worry about which components may or may not be used to represent names +on a particular file system unless you need to create a new pathname +object from scratch, which you'll almost never need to do. Instead, +you'll usually get hold of pathname objects either by letting the +implementation parse a file system-specific namestring into a pathname +object or by creating a new pathname that takes most of its components +from an existing pathname. + + For instance, to translate a namestring to a pathname, you use the +PATHNAME function. It takes a pathname designator and returns an +equivalent pathname object. When the designator is already a pathname, +it's simply returned. When it's a stream, the original filename is +extracted and returned. When the designator is a namestring, however, +it's parsed according to the local filename syntax. The language +standard, as a platform-neutral document, doesn't specify any particular +mapping from namestring to pathname, but most implementations follow the +same conventions on a given operating system. + + On Unix file systems, only the directory, name, and type components +are typically used. On Windows, one more component-usually the device +or host-holds the drive letter. On these platforms, a namestring is +parsed by first splitting it into elements on the path separator-a slash +on Unix and a slash or backslash on Windows. The drive letter on +Windows will be placed into either the device or the host component. +All but the last of the other name elements are placed in a list +starting with :absolute or :relative depending on whether the name +(ignoring the drive letter, if any) began with a path separator. This +list becomes the directory component of the pathname. The last element +is then split on the rightmost dot, if any, and the two parts put into +the name and type components of the pathname. (1) + + You can examine these individual components of a pathname with the +functions PATHNAME-DIRECTORY, PATHNAME-NAME, and PATHNAME-TYPE. + + (pathname-directory (pathname "/foo/bar/baz.txt")) ==> (:ABSOLUTE "foo" "bar") + (pathname-name (pathname "/foo/bar/baz.txt")) ==> "baz" + (pathname-type (pathname "/foo/bar/baz.txt")) ==> "txt" + + Three other functions-PATHNAME-HOST, PATHNAME-DEVICE, and +PATHNAME-VERSION-allow you to get at the other three pathname +components, though they're unlikely to have interesting values on Unix. +On Windows either PATHNAME-HOST or PATHNAME-DEVICE will return the drive +letter. + + Like many other built-in objects, pathnames have their own read +syntax, #p followed by a double-quoted string. This allows you to print +and read back s-expressions containing pathname objects, but because the +syntax depends on the namestring parsing algorithm, such data isn't +necessarily portable between operating systems. + + (pathname "/foo/bar/baz.txt") ==> #p"/foo/bar/baz.txt" + + To translate a pathname back to a namestring-for instance, to present +to the user-you can use the function NAMESTRING, which takes a pathname +designator and returns a namestring. Two other functions, +DIRECTORY-NAMESTRING and FILE-NAMESTRING, return a partial namestring. +DIRECTORY-NAMESTRING combines the elements of the directory component +into a local directory name, and FILE-NAMESTRING combines the name and +type components. (2) + + (namestring #p"/foo/bar/baz.txt") ==> "/foo/bar/baz.txt" + (directory-namestring #p"/foo/bar/baz.txt") ==> "/foo/bar/" + (file-namestring #p"/foo/bar/baz.txt") ==> "baz.txt" + + ---------- Footnotes ---------- + + (1) Many Unix-based implementations treat filenames whose last +element starts with a dot and don't contain any other dots specially, +putting the whole element, with the dot, in the name component and +leaving the type component NIL. + + (pathname-name (pathname "/foo/.emacs")) ==> ".emacs" + (pathname-type (pathname "/foo/.emacs")) ==> NIL + + However, not all implementations follow this convention; some will +create a pathname with "" as the name and emacs as the type. + + (2) The name returned by FILE-NAMESTRING also includes the version +component on file systems that use it. + + +File: pcl.info, Node: 14-8, Next: 14-9, Prev: 14-7, Up: Chapter 14 + +Constructing New Pathnames +========================== + +You can construct arbitrary pathnames using the MAKE-PATHNAME function. +It takes one keyword argument for each pathname component and returns a +pathname with any supplied components filled in and the rest NIL. (1) + + (make-pathname + :directory '(:absolute "foo" "bar") + :name "baz" + :type "txt") ==> #p"/foo/bar/baz.txt" + + However, if you want your programs to be portable, you probably don't +want to make pathnames completely from scratch: even though the pathname +abstraction protects you from unportable filename syntax, filenames can +be unportable in other ways. For instance, the filename +/home/peter/foo.txt is no good on an OS X box where /home/ is called +/Users/. + + Another reason not to make pathnames completely from scratch is that +different implementations use the pathname components slightly +differently. For instance, as mentioned previously, some Windows-based +Lisp implementations store the drive letter in the device component +while others store it in the host component. If you write code like +this: + + (make-pathname :device "c" :directory '(:absolute "foo" "bar") :name "baz") + + it will be correct on some implementations but not on others. + + Rather than making names from scratch, you can build a new pathname +based on an existing pathname with MAKE-PATHNAME's keyword parameter +:defaults. With this parameter you can provide a pathname designator, +which will supply the values for any components not specified by other +arguments. For example, the following expression creates a pathname +with an .html extension and all other components the same as the +pathname in the variable input-file: + + (make-pathname :type "html" :defaults input-file) + + Assuming the value in input-file was a user-provided name, this code +will be robust in the face of operating system and implementation +differences such as whether filenames have drive letters in them and +where they're stored in a pathname if they do. (2) + + You can use the same technique to create a pathname with a different +directory component. + + (make-pathname :directory '(:relative "backups") :defaults input-file) + + However, this will create a pathname whose whole directory component +is the relative directory backups/, regardless of any directory +component input-file may have had. For example: + + (make-pathname :directory '(:relative "backups") + :defaults #p"/foo/bar/baz.txt") ==> #p"backups/baz.txt" + + Sometimes, though, you want to combine two pathnames, at least one of +which has a relative directory component, by combining their directory +components. For instance, suppose you have a relative pathname such as +#p"foo/bar.html" that you want to combine with an absolute pathname such +as #p"/www/html/" to get #p"/www/html/foo/bar.html". In that case, +MAKE-PATHNAME won't do; instead, you want MERGE-PATHNAMES. + + MERGE-PATHNAMES takes two pathnames and merges them, filling in any +NIL components in the first pathname with the corresponding value from +the second pathname, much like MAKE-PATHNAME fills in any unspecified +components with components from the :defaults argument. However, +MERGE-PATHNAMES treats the directory component specially: if the first +pathname's directory is relative, the directory component of the +resulting pathname will be the first pathname's directory relative to +the second pathname's directory. Thus: + + (merge-pathnames #p"foo/bar.html" #p"/www/html/") ==> #p"/www/html/foo/bar.html" + + The second pathname can also be relative, in which case the resulting +pathname will also be relative. + + (merge-pathnames #p"foo/bar.html" #p"html/") ==> #p"html/foo/bar.html" + + To reverse this process and obtain a filename relative to a +particular root directory, you can use the handy function +ENOUGH-NAMESTRING. + + (enough-namestring #p"/www/html/foo/bar.html" #p"/www/") ==> "html/foo/bar.html" + + You can then combine ENOUGH-NAMESTRING with MERGE-PATHNAMES to create +a pathname representing the same name but in a different root. + + (merge-pathnames + (enough-namestring #p"/www/html/foo/bar/baz.html" #p"/www/") + #p"/www-backups/") ==> #p"/www-backups/html/foo/bar/baz.html" + + MERGE-PATHNAMES is also used internally by the standard functions +that actually access files in the file system to fill in incomplete +pathnames. For instance, suppose you make a pathname with just a name +and a type. + + (make-pathname :name "foo" :type "txt") ==> #p"foo.txt" + + If you try to use this pathname as an argument to OPEN, the missing +components, such as the directory, must be filled in before Lisp will be +able to translate the pathname to an actual filename. Common Lisp will +obtain values for the missing components by merging the given pathname +with the value of the variable *DEFAULT-PATHNAME-DEFAULTS*. The initial +value of this variable is determined by the implementation but is +usually a pathname with a directory component representing the directory +where Lisp was started and appropriate values for the host and device +components, if needed. If invoked with just one argument, +MERGE-PATHNAMES will merge the argument with the value of +*DEFAULT-PATHNAME-DEFAULTS*. For instance, if +*DEFAULT-PATHNAME-DEFAULTS* is #p"/home/peter/", then you'd get the +following: + + (merge-pathnames #p"foo.txt") ==> #p"/home/peter/foo.txt" + + ---------- Footnotes ---------- + + (1) The host component may not default to NIL, but if not, it will be +an opaque implementation-defined value. + + (2) For absolutely maximum portability, you should really write this: + + (make-pathname :type "html" :version :newest :defaults input-file) + + Without the :version argument, on a file system with built-in +versioning, the output pathname would inherit its version number from +the input file which isn't likely to be right-if the input file has been +saved many times it will have a much higher version number than the +generated HTML file. On implementations without file versioning, the +:version argument should be ignored. It's up to you if you care that +much about portability. + + +File: pcl.info, Node: 14-9, Next: 14-10, Prev: 14-8, Up: Chapter 14 + +Two Representations of Directory Names +====================================== + +When dealing with pathnames that name directories, you need to be aware +of one wrinkle. Pathnames separate the directory and name components, +but Unix and Windows consider directories just another kind of file. +Thus, on those systems, every directory has two different pathname +representations. + + One representation, which I'll call file form, treats a directory +like any other file and puts the last element of the namestring into the +name and type components. The other representation, directory form, +places all the elements of the name in the directory component, leaving +the name and type components NIL. If /foo/bar/ is a directory, then both +of the following pathnames name it. + + (make-pathname :directory '(:absolute "foo") :name "bar") ; file form + (make-pathname :directory '(:absolute "foo" "bar")) ; directory form + + When you create pathnames with MAKE-PATHNAME, you can control which +form you get, but you need to be careful when dealing with namestrings. +All current implementations create file form pathnames unless the +namestring ends with a path separator. But you can't rely on +user-supplied namestrings necessarily being in one form or another. For +instance, suppose you've prompted the user for a directory to save a +file in and they entered "/home/peter". If you pass that value as the +:defaults argument of MAKE-PATHNAME like this: + + (make-pathname :name "foo" :type "txt" :defaults user-supplied-name) + + you'll end up saving the file in /home/foo.txt rather than the +intended /home/peter/foo.txt because the "peter" in the namestring will +be placed in the name component when user-supplied-name is converted to +a pathname. In the pathname portability library I'll discuss in the +next chapter, you'll write a function called pathname-as-directory that +converts a pathname to directory form. With that function you can +reliably save the file in the directory indicated by the user. + + (make-pathname + :name "foo" :type "txt" :defaults (pathname-as-directory user-supplied-name)) + + +File: pcl.info, Node: 14-10, Next: 14-11, Prev: 14-9, Up: Chapter 14 + +Interacting with the File System +================================ + +While the most common interaction with the file system is probably +OPENing files for reading and writing, you'll also occasionally want to +test whether a file exists, list the contents of a directory, delete and +rename files, create directories, and get information about a file such +as who owns it, when it was last modified, and its length. This is +where the generality of the pathname abstraction begins to cause a bit +of pain: because the language standard doesn't specify how functions +that interact with the file system map to any specific file system, +implementers are left with a fair bit of leeway. + + That said, most of the functions that interact with the file system +are still pretty straightforward. I'll discuss the standard functions +here and point out the ones that suffer from nonportability between +implementations. In the next chapter you'll develop a pathname +portability library to smooth over some of those nonportability issues. + + To test whether a file exists in the file system corresponding to a +pathname designator-a pathname, namestring, or file stream-you can use +the function PROBE-FILE. If the file named by the pathname designator +exists, PROBE-FILE returns the file's truename, a pathname with any file +system-level translations such as resolving symbolic links performed. +Otherwise, it returns NIL. However, not all implementations support +using this function to test whether a directory exists. Also, Common +Lisp doesn't provide a portable way to test whether a given file that +exists is a regular file or a directory. In the next chapter you'll +wrap PROBE-FILE with a new function, file-exists-p, that can both test +whether a directory exists and tell you whether a given name is the name +of a file or directory. + + Similarly, the standard function for listing files in the file +system, DIRECTORY, works fine for simple cases, but the differences +between implementations make it tricky to use portably. In the next +chapter you'll define a list-directory function that smoothes over some +of these differences. + + DELETE-FILE and RENAME-FILE do what their names suggest. DELETE-FILE +takes a pathname designator and deletes the named file, returning true +if it succeeds. Otherwise it signals a FILE-ERROR. (1) + + RENAME-FILE takes two pathname designators and renames the file named +by the first name to the second name. + + You can create directories with the function +ENSURE-DIRECTORIES-EXIST. It takes a pathname designator and ensures +that all the elements of the directory component exist and are +directories, creating them as necessary. It returns the pathname it was +passed, which makes it convenient to use inline. + + (with-open-file (out (ensure-directories-exist name) :direction :output) + ... + ) + + Note that if you pass ENSURE-DIRECTORIES-EXIST a directory name, it +should be in directory form, or the leaf directory won't be created. + + The functions FILE-WRITE-DATE and FILE-AUTHOR both take a pathname +designator. FILE-WRITE-DATE returns the time in number of seconds since +midnight January 1, 1900, Greenwich mean time (GMT), that the file was +last written, and FILE-AUTHOR returns, on Unix and Windows, the file +owner. (2) + + To find the length of a file, you can use the function FILE-LENGTH. +For historical reasons FILE-LENGTH takes a stream as an argument rather +than a pathname. In theory this allows FILE-LENGTH to return the length +in terms of the element type of the stream. However, since on most +present-day operating systems, the only information available about the +length of a file, short of actually reading the whole file to measure +it, is its length in bytes, that's what most implementations return, +even when FILE-LENGTH is passed a character stream. However, the +standard doesn't require this behavior, so for predictable results, the +best way to get the length of a file is to use a binary stream. (3) + + (with-open-file (in filename :element-type '(unsigned-byte 8)) + (file-length in)) + + A related function that also takes an open file stream as its +argument is FILE-POSITION. When called with just a stream, this function +returns the current position in the file-the number of elements that +have been read from or written to the stream. When called with two +arguments, the stream and a position designator, it sets the position of +the stream to the designated position. The position designator must be +the keyword :start, the keyword :end, or a non-negative integer. The +two keywords set the position of the stream to the start or end of the +file while an integer moves to the indicated position in the file. With +a binary stream the position is simply a byte offset into the file. +However, for character streams things are a bit more complicated because +of character-encoding issues. Your best bet, if you need to jump around +within a file of textual data, is to only ever pass, as a second +argument to the two-argument version of FILE-POSITION, a value +previously returned by the one-argument version of FILE-POSITION with +the same stream argument. + + ---------- Footnotes ---------- + + (1) See Chapter 19 for more on handling errors. + + (2) For applications that need access to other file attributes on a +particular operating system or file system, libraries provide bindings +to underlying C system calls. The Osicat library at +http://common-lisp.net/project/osicat/ provides a simple API built using +the Universal Foreign Function Interface (UFFI), which should run on +most Common Lisps that run on a POSIX operating system. + + (3) The number of bytes and characters in a file can differ even if +you're not using a multibyte character encoding. Because character +streams also translate platform-specific line endings to a single +#\Newline character, on Windows (which uses CRLF as its line ending) the +number of characters will typically be smaller than the number of bytes. +If you really have to know the number of characters in a file, you have +to bite the bullet and write something like this: + + (with-open-file (in filename) + (loop while (read-char in nil) count t)) + + or maybe something more efficient like this: + + (with-open-file (in filename) + (let ((scratch (make-string 4096))) + (loop for read = (read-sequence scratch in) + while (plusp read) sum read))) + + +File: pcl.info, Node: 14-11, Next: Chapter 15, Prev: 14-10, Up: Chapter 14 + +Other Kinds of I/O +================== + +In addition to file streams, Common Lisp supports other kinds of +streams, which can also be used with the various reading, writing, and +printing I/O functions. For instance, you can read data from, or write +data to, a string using STRING-STREAMs, which you can create with the +functions MAKE-STRING-INPUT-STREAM and MAKE-STRING-OUTPUT-STREAM. + + MAKE-STRING-INPUT-STREAM takes a string and optional start and end +indices to bound the area of the string from which data should be read +and returns a character stream that you can pass to any of the +character-based input functions such as READ-CHAR, READ-LINE, or READ. +For example, if you have a string containing a floating-point literal in +Common Lisp's syntax, you can convert it to a float like this: + + (let ((s (make-string-input-stream "1.23"))) + (unwind-protect (read s) + (close s))) + + Similarly, MAKE-STRING-OUTPUT-STREAM creates a stream you can use +with FORMAT, PRINT, WRITE-CHAR, WRITE-LINE, and so on. It takes no +arguments. Whatever you write, a string output stream will be +accumulated into a string that can then be obtained with the function +GET-OUTPUT-STREAM-STRING. Each time you call GET-OUTPUT-STREAM-STRING, +the stream's internal string is cleared so you can reuse an existing +string output stream. + + However, you'll rarely use these functions directly, because the +macros WITH-INPUT-FROM-STRING and WITH-OUTPUT-TO-STRING provide a more +convenient interface. WITH-INPUT-FROM-STRING is similar to +WITH-OPEN-FILE-it creates a string input stream from a given string and +then executes the forms in its body with the stream bound to the +variable you provide. For instance, instead of the LET form with the +explicit UNWIND-PROTECT, you'd probably write this: + + (with-input-from-string (s "1.23") + (read s)) + + The WITH-OUTPUT-TO-STRING macro is similar: it binds a newly created +string output stream to a variable you name and then executes its body. +After all the body forms have been executed, WITH-OUTPUT-TO-STRING +returns the value that would be returned by GET-OUTPUT-STREAM-STRING. + + CL-USER> (with-output-to-string (out) + (format out "hello, world ") + (format out "~s" (list 1 2 3))) + "hello, world (1 2 3)" + + The other kinds of streams defined in the language standard provide +various kinds of stream "plumbing," allowing you to plug together +streams in almost any configuration. A BROADCAST-STREAM is an output +stream that sends any data written to it to a set of output streams +provided as arguments to its constructor function, +MAKE-BROADCAST-STREAM. (1) Conversely, a CONCATENATED-STREAM is an input +stream that takes its input from a set of input streams, moving from +stream to stream as it hits the end of each stream. +CONCATENATED-STREAMs are constructed with the function +MAKE-CONCATENATED-STREAM, which takes any number of input streams as +arguments. + + Two kinds of bidirectional streams that can plug together streams in +a couple ways are TWO-WAY-STREAM and ECHO-STREAM. Their constructor +functions, MAKE-TWO-WAY-STREAM and MAKE-ECHO-STREAM, both take two +arguments, an input stream and an output stream, and return a stream of +the appropriate type, which you can use with both input and output +functions. + + In a TWO-WAY-STREAM every read you perform will return data read from +the underlying input stream, and every write will send data to the +underlying output stream. An ECHO-STREAM works essentially the same way +except that all the data read from the underlying input stream is also +echoed to the output stream. Thus, the output stream of an ECHO-STREAM +stream will contain a transcript of both sides of the conversation. + + Using these five kinds of streams, you can build almost any topology +of stream plumbing you want. + + Finally, although the Common Lisp standard doesn't say anything about +networking APIs, most implementations support socket programming and +typically implement sockets as another kind of stream, so you can use +all the regular I/O functions with them. (2) + + Now you're ready to move on to building a library that smoothes over +some of the differences between how the basic pathname functions behave +in different Common Lisp implementations. + + ---------- Footnotes ---------- + + (1) MAKE-BROADCAST-STREAM can make a data black hole by calling it +with no arguments. + + (2) The biggest missing piece in Common Lisp's standard I/O +facilities is a way for users to define new stream classes. There are, +however, two de facto standards for user-defined streams. During the +Common Lisp standardization, David Gray of Texas Instruments wrote a +draft proposal for an API to allow users to define new stream classes. +Unfortunately, there wasn't time to work out all the issues raised by +his draft to include it in the language standard. However, many +implementations support some form of so-called Gray Streams, basing +their API on Gray's draft proposal. Another, newer API, called Simple +Streams, has been developed by Franz and included in Allegro Common +Lisp. It was designed to improve the performance of user-defined +streams relative to Gray Streams and has been adopted by some of the +open-source Common Lisp implementations. + + +File: pcl.info, Node: Chapter 15, Next: Chapter 16, Prev: Chapter 14, Up: Top + +15. Practical: A Portable Pathname Library +========================================== + +As I discussed in the previous chapter, Common Lisp provides an +abstraction, the pathname, that's supposed to insulate you from the +details of how different operating systems and file systems name files. +Pathnames provide a useful API for manipulating names as names, but when +it comes to the functions that actually interact with the file system, +things get a bit hairy. + + The root of the problem, as I mentioned, is that the pathname +abstraction was designed to represent filenames on a much wider variety +of file systems than are commonly used now. Unfortunately, by making +pathnames abstract enough to account for a wide variety of file systems, +Common Lisp's designers left implementers with a fair number of choices +to make about how exactly to map the pathname abstraction onto any +particular file system. Consequently, different implementers, each +implementing the pathname abstraction for the same file system, just by +making different choices at a few key junctions, could end up with +conforming implementations that nonetheless provide different behavior +for several of the main pathname-related functions. + + However, one way or another, all implementations provide the same +basic functionality, so it's not too hard to write a library that +provides a consistent interface for the most common operations across +different implementations. That's your task for this chapter. In +addition to giving you several useful functions that you'll use in +future chapters, writing this library will give you a chance to learn +how to write code that deals with differences between implementations. + +* Menu: + +* 15-1:: The API +* 15-2:: *FEATURES* and Read-Time Conditionalization +* 15-3:: Listing a Directory +* 15-4:: Testing a File's Existence +* 15-5:: Walking a Directory Tree + + +File: pcl.info, Node: 15-1, Next: 15-2, Prev: Chapter 15, Up: Chapter 15 + +The API +======= + +The basic operations the library will support will be getting a list of +files in a directory and determining whether a file or directory with a +given name exists. You'll also write a function for recursively walking +a directory hierarchy, calling a given function for each pathname in the +tree. + + In theory, these directory listing and file existence operations are +already provided by the standard functions DIRECTORY and PROBE-FILE. +However, as you'll see, there are enough different ways to implement +these functions-all within the bounds of valid interpretations of the +language standard-that you'll want to write new functions that provide a +consistent behavior across implementations. + + +File: pcl.info, Node: 15-2, Next: 15-3, Prev: 15-1, Up: Chapter 15 + +*FEATURES* and Read-Time Conditionalization +=========================================== + +Before you can implement this API in a library that will run correctly +on multiple Common Lisp implementations, I need to show you the +mechanism for writing implementation-specific code. + + While most of the code you write can be "portable" in the sense that +it will run the same on any conforming Common Lisp implementation, you +may occasionally need to rely on implementation-specific functionality +or to write slightly different bits of code for different +implementations. To allow you to do so without totally destroying the +portability of your code, Common Lisp provides a mechanism, called +read-time conditionalization, that allows you to conditionally include +code based on various features such as what implementation it's being +run in. + + The mechanism consists of a variable *FEATURES* and two extra bits of +syntax understood by the Lisp reader. *FEATURES* is a list of symbols; +each symbol represents a "feature" that's present in the implementation +or on the underlying platform. These symbols are then used in feature +expressions that evaluate to true or false depending on whether the +symbols in the expression are present in *FEATURES*. The simplest +feature expression is a single symbol; the expression is true if the +symbol is in *FEATURES* and false if it isn't. Other feature +expressions are boolean expressions built out of NOT, AND, and OR +operators. For instance, if you wanted to conditionalize some code to +be included only if the features foo and bar were present, you could +write the feature expression (and foo bar). + + The reader uses feature expressions in conjunction with two bits of +syntax, #+ and #-. When the reader sees either of these bits of syntax, +it first reads a feature expression and then evaluates it as I just +described. When a feature expression following a #+ is true, the reader +reads the next expression normally. Otherwise it skips the next +expression, treating it as whitespace. #- works the same way except it +reads the form if the feature expression is false and skips it if it's +true. + + The initial value of *FEATURES* is implementation dependent, and what +functionality is implied by the presence of any given symbol is likewise +defined by the implementation. However, all implementations include at +least one symbol that indicates what implementation it is. For +instance, Allegro Common Lisp includes the symbol :allegro, CLISP +includes :clisp, SBCL includes :sbcl, and CMUCL includes :cmu. To avoid +dependencies on packages that may or may not exist in different +implementations, the symbols in *FEATURES* are usually keywords, and the +reader binds *PACKAGE* to the KEYWORD package while reading feature +expressions. Thus, a name with no package qualification will be read as +a keyword symbol. So, you could write a function that behaves slightly +differently in each of the implementations just mentioned like this: + + (defun foo () + #+allegro (do-one-thing) + #+sbcl (do-another-thing) + #+clisp (something-else) + #+cmu (yet-another-version) + #-(or allegro sbcl clisp cmu) (error "Not implemented")) + + In Allegro that code will be read as if it had been written like +this: + + (defun foo () + (do-one-thing)) + + while in SBCL the reader will read this: + + (defun foo () + (do-another-thing)) + + while in an implementation other than one of the ones specifically +conditionalized, it will read this: + + (defun foo () + (error "Not implemented")) + + Because the conditionalization happens in the reader, the compiler +doesn't even see expressions that are skipped. (1) This means you pay +no runtime cost for having different versions for different +implementations. Also, when the reader skips conditionalized +expressions, it doesn't bother interning symbols, so the skipped +expressions can safely contain symbols from packages that may not exist +in other implementations. + +* Menu: + +* 15-2-1:: Packaging the Library + + ---------- Footnotes ---------- + + (1) One slightly annoying consequence of the way read-time +conditionalization works is that there's no easy way to write a +fall-through case. For example, if you add support for another +implementation to foo by adding another expression guarded with #+, you +need to remember to also add the same feature to the or feature +expression after the #- or the ERROR form will be evaluated after your +new code runs. + + +File: pcl.info, Node: 15-2-1, Next: 15-2, Prev: 15-2, Up: 15-2 + +Packaging the Library +===================== + +Speaking of packages, if you download the complete code for this +library, you'll see that it's defined in a new package, +com.gigamonkeys.pathnames. I'll discuss the details of defining and +using packages in Chapter 21. For now you should note that some +implementations provide their own packages that contain functions with +some of the same names as the ones you'll define in this chapter and +make those names available in the CL-USER package. Thus, if you try to +define the functions from this library while in the CL-USER package, you +may get errors or warnings about clobbering existing definitions. To +avoid this possibility, you can create a file called packages.lisp with +the following contents: + + (in-package :cl-user) + + (defpackage :com.gigamonkeys.pathnames + (:use :common-lisp) + (:export + :list-directory + :file-exists-p + :directory-pathname-p + :file-pathname-p + :pathname-as-directory + :pathname-as-file + :walk-directory + :directory-p + :file-p)) + + and LOAD it. Then at the REPL or at the top of the file where you +type the definitions from this chapter, type the following expression: + + (in-package :com.gigamonkeys.pathnames) + + In addition to avoiding name conflicts with symbols already available +in CL-USER, packaging the library this way also makes it easier to use +in other code, as you'll see in several future chapters. + + +File: pcl.info, Node: 15-3, Next: 15-4, Prev: 15-2, Up: Chapter 15 + +Listing a Directory +=================== + +You can implement the function for listing a single directory, +list-directory, as a thin wrapper around the standard function +DIRECTORY. DIRECTORY takes a special kind of pathname, called a wild +pathname, that has one or more components containing the special value +:wild and returns a list of pathnames representing files in the file +system that match the wild pathname. (1) The matching algorithm-like +most things having to do with the interaction between Lisp and a +particular file system-isn't defined by the language standard, but most +implementations on Unix and Windows follow the same basic scheme. + + The DIRECTORY function has two problems that you need to address with +list-directory. The main one is that certain aspects of its behavior +differ fairly significantly between different Common Lisp +implementations, even on the same operating system. The other is that +while DIRECTORY provides a powerful interface for listing files, to use +it properly requires understanding some rather subtle points about the +pathname abstraction. Between these subtleties and the idiosyncrasies +of different implementations, actually writing portable code that uses +DIRECTORY to do something as simple as listing all the files and +subdirectories in a single directory can be a frustrating experience. +You can deal with those subtleties and idiosyncrasies once and for all, +by writing list-directory, and forget them thereafter. + + One subtlety I discussed in Chapter 14 is the two ways to represent +the name of a directory as a pathname: directory form and file form. + + To get DIRECTORY to return a list of files in /home/peter/, you need +to pass it a wild pathname whose directory component is the directory +you want to list and whose name and type components are :wild. Thus, to +get a listing of the files in /home/peter/, it might seem you could +write this: + + (directory (make-pathname :name :wild :type :wild :defaults home-dir)) + + where home-dir is a pathname representing /home/peter/. This would +work if home-dir were in directory form. But if it were in file +form-for example, if it had been created by parsing the namestring +"/home/peter"-then that same expression would list all the files in +/home since the name component "peter" would be replaced with :wild. + + To avoid having to worry about explicitly converting between +representations, you can define list-directory to accept a nonwild +pathname in either form, which it will then convert to the appropriate +wild pathname. + + To help with this, you should define a few helper functions. One, +component-present-p, will test whether a given component of a pathname +is "present," meaning neither NIL nor the special value :unspecific. +(2) Another, directory-pathname-p, tests whether a pathname is already +in directory form, and the third, pathname-as-directory, converts any +pathname to a directory form pathname. + + (defun component-present-p (value) + (and value (not (eql value :unspecific)))) + + (defun directory-pathname-p (p) + (and + (not (component-present-p (pathname-name p))) + (not (component-present-p (pathname-type p))) + p)) + + (defun pathname-as-directory (name) + (let ((pathname (pathname name))) + (when (wild-pathname-p pathname) + (error "Can't reliably convert wild pathnames.")) + (if (not (directory-pathname-p name)) + (make-pathname + :directory (append (or (pathname-directory pathname) (list :relative)) + (list (file-namestring pathname))) + :name nil + :type nil + :defaults pathname) + pathname))) + + Now it seems you could generate a wild pathname to pass to DIRECTORY +by calling MAKE-PATHNAME with a directory form name returned by +pathname-as-directory. Unfortunately, it's not quite that simple, +thanks to a quirk in CLISP's implementation of DIRECTORY. In CLISP, +DIRECTORY won't return files with no extension unless the type component +of the wildcard is NIL rather than :wild. So you can define a function, +directory-wildcard, that takes a pathname in either directory or file +form and returns a proper wildcard for the given implementation using +read-time conditionalization to make a pathname with a :wild type +component in all implementations except for CLISP and NIL in CLISP. + + (defun directory-wildcard (dirname) + (make-pathname + :name :wild + :type #-clisp :wild #+clisp nil + :defaults (pathname-as-directory dirname))) + + Note how each read-time conditional operates at the level of a single +expression After #-clisp, the expression :wild is either read or +skipped; likewise, after #+clisp, the NIL is read or skipped. + + Now you can take a first crack at the list-directory function. + + (defun list-directory (dirname) + (when (wild-pathname-p dirname) + (error "Can only list concrete directory names.")) + (directory (directory-wildcard dirname))) + + As it stands, this function would work in SBCL, CMUCL, and LispWorks. +Unfortunately, a couple more implementation differences remain to be +smoothed over. One is that not all implementations will return +subdirectories of the given directory. Allegro, SBCL, CMUCL, and +LispWorks do. OpenMCL doesn't by default but will if you pass DIRECTORY +a true value via the implementation-specific keyword argument +:directories. CLISP's DIRECTORY returns subdirectories only when it's +passed a wildcard pathname with :wild as the last element of the +directory component and NIL name and type components. In this case, it +returns only subdirectories, so you'll need to call DIRECTORY twice with +different wildcards and combine the results. + + Once you get all the implementations returning directories, you'll +discover they can also differ in whether they return the names of +directories in directory or file form. You want list-directory to +always return directory names in directory form so you can differentiate +subdirectories from regular files based on just the name. Except for +Allegro, all the implementations this library will support do that. +Allegro, on the other hand, requires you to pass DIRECTORY the +implementation-specific keyword argument :directories-are-files NIL to +get it to return directories in file form. + + Once you know how to make each implementation do what you want, +actually writing list-directory is simply a matter of combining the +different versions using read-time conditionals. + + (defun list-directory (dirname) + (when (wild-pathname-p dirname) + (error "Can only list concrete directory names.")) + (let ((wildcard (directory-wildcard dirname))) + + #+(or sbcl cmu lispworks) + (directory wildcard) + + #+openmcl + (directory wildcard :directories t) + + #+allegro + (directory wildcard :directories-are-files nil) + + #+clisp + (nconc + (directory wildcard) + (directory (clisp-subdirectories-wildcard wildcard))) + + #-(or sbcl cmu lispworks openmcl allegro clisp) + (error "list-directory not implemented"))) + + The function clisp-subdirectories-wildcard isn't actually specific to +CLISP, but since it isn't needed by any other implementation, you can +guard its definition with a read-time conditional. In this case, since +the expression following the #+ is the whole DEFUN, the whole function +definition will be included or not, depending on whether clisp is +present in *FEATURES*. + + #+clisp + (defun clisp-subdirectories-wildcard (wildcard) + (make-pathname + :directory (append (pathname-directory wildcard) (list :wild)) + :name nil + :type nil + :defaults wildcard)) + + ---------- Footnotes ---------- + + (1) Another special value, :wild-inferiors, can appear as part of the +directory component of a wild pathname, but you won't need it in this +chapter. + + (2) Implementations are allowed to return :unspecific instead of NIL +as the value of pathname components in certain situations such as when +the component isn't used by that implementation. + + +File: pcl.info, Node: 15-4, Next: 15-5, Prev: 15-3, Up: Chapter 15 + +Testing a File's Existence +========================== + +To replace PROBE-FILE, you can define a function called file-exists-p. +It should accept a pathname and return an equivalent pathname if the +file exists and NIL if it doesn't. It should be able to accept the name +of a directory in either directory or file form but should always return +a directory form pathname if the file exists and is a directory. This +will allow you to use file-exists-p, along with directory-pathname-p, to +test whether an arbitrary name is the name of a file or directory. + + In theory, file-exists-p is quite similar to the standard function +PROBE-FILE; indeed, in several implementations-SBCL, LispWorks, and +OpenMCL-PROBE-FILE already gives you the behavior you want for +file-exists-p. But not all implementations of PROBE-FILE behave quite +the same. + + Allegro and CMUCL's PROBE-FILE functions are close to what you +need-they will accept the name of a directory in either form but, +instead of returning a directory form name, simply return the name in +the same form as the argument it was passed. Luckily, if passed the +name of a nondirectory in directory form, they return NIL. So with those +implementations you can get the behavior you want by first passing the +name to PROBE-FILE in directory form-if the file exists and is a +directory, it will return the directory form name. If that call returns +NIL, then you try again with a file form name. + + CLISP, on the other hand, once again has its own way of doing things. +Its PROBE-FILE immediately signals an error if passed a name in +directory form, regardless of whether a file or directory exists with +that name. It also signals an error if passed a name in file form +that's actually the name of a directory. For testing whether a +directory exists, CLISP provides its own function: probe-directory (in +the ext package). This is almost the mirror image of PROBE-FILE: it +signals an error if passed a name in file form or if passed a name in +directory form that happens to name a file. The only difference is it +returns T rather than a pathname when the named directory exists. + + But even in CLISP you can implement the desired semantics by wrapping +the calls to PROBE-FILE and probe-directory in IGNORE-ERRORS. (1) + + (defun file-exists-p (pathname) + #+(or sbcl lispworks openmcl) + (probe-file pathname) + + #+(or allegro cmu) + (or (probe-file (pathname-as-directory pathname)) + (probe-file pathname)) + + #+clisp + (or (ignore-errors + (probe-file (pathname-as-file pathname))) + (ignore-errors + (let ((directory-form (pathname-as-directory pathname))) + (when (ext:probe-directory directory-form) + directory-form)))) + + #-(or sbcl cmu lispworks openmcl allegro clisp) + (error "file-exists-p not implemented")) + + The function pathname-as-file that you need for the CLISP +implementation of file-exists-p is the inverse of the previously defined +pathname-as-directory, returning a pathname that's the file form +equivalent of its argument. This function, despite being needed here +only by CLISP, is generally useful, so define it for all implementations +and make it part of the library. + + (defun pathname-as-file (name) + (let ((pathname (pathname name))) + (when (wild-pathname-p pathname) + (error "Can't reliably convert wild pathnames.")) + (if (directory-pathname-p name) + (let* ((directory (pathname-directory pathname)) + (name-and-type (pathname (first (last directory))))) + (make-pathname + :directory (butlast directory) + :name (pathname-name name-and-type) + :type (pathname-type name-and-type) + :defaults pathname)) + pathname))) + + ---------- Footnotes ---------- + + (1) This is slightly broken in the sense that if PROBE-FILE signals +an error for some other reason, this code will interpret it incorrectly. +Unfortunately, the CLISP documentation doesn't specify what errors might +be signaled by PROBE-FILE and probe-directory, and experimentation seems +to show that they signal simple-file-errors in most erroneous +situations. + + +File: pcl.info, Node: 15-5, Next: Chapter 16, Prev: 15-4, Up: Chapter 15 + +Walking a Directory Tree +======================== + +Finally, to round out this library, you can implement a function called +walk-directory. Unlike the functions defined previously, this function +doesn't need to do much of anything to smooth over implementation +differences; it just needs to use the functions you've already defined. +However, it's quite handy, and you'll use it several times in subsequent +chapters. It will take the name of a directory and a function and call +the function on the pathnames of all the files under the directory, +recursively. It will also take two keyword arguments: :directories and +:test. When :directories is true, it will call the function on the +pathnames of directories as well as regular files. The :test argument, +if provided, specifies another function that's invoked on each pathname +before the main function is; the main function will be called only if +the test function returns true. + + (defun walk-directory (dirname fn &key directories (test (constantly t))) + (labels + ((walk (name) + (cond + ((directory-pathname-p name) + (when (and directories (funcall test name)) + (funcall fn name)) + (dolist (x (list-directory name)) (walk x))) + ((funcall test name) (funcall fn name))))) + (walk (pathname-as-directory dirname)))) + + Now you have a useful library of functions for dealing with +pathnames. As I mentioned, these functions will come in handy in later +chapters, particularly Chapters 23 and 27, where you'll use +walk-directory to crawl through directory trees containing spam messages +and MP3 files. But before we get to that, though, I need to talk about +object orientation, the topic of the next two chapters. + + +File: pcl.info, Node: Chapter 16, Next: Chapter 17, Prev: Chapter 15, Up: Top + +16. Object Reorientation: Generic Functions +=========================================== + +Because the invention of Lisp predated the rise of object-oriented +programming by a couple decades, (1) new Lispers are sometimes surprised +to discover what a thoroughly object-oriented language Common Lisp is. +Common Lisp's immediate predecessors were developed at a time when +object orientation was an exciting new idea and there were many +experiments with ways to incorporate the ideas of object orientation, +especially as manifested in Smalltalk, into Lisp. As part of the Common +Lisp standardization, a synthesis of several of these experiments +emerged under the name Common Lisp Object System, or CLOS. The ANSI +standard incorporated CLOS into the language, so it no longer really +makes sense to speak of CLOS as a separate entity. + + The features CLOS contributed to Common Lisp range from those that +can hardly be avoided to relatively esoteric manifestations of Lisp's +language-as-language-building-tool philosophy. Complete coverage of all +these features is beyond the scope of this book, but in this chapter and +the next I'll describe the bread-and-butter features and give an +overview of Common Lisp's approach to objects. + + You should note at the outset that Common Lisp's object system offers +a fairly different embodiment of the principles of object orientation +than many other languages. If you have a deep understanding of the +fundamental ideas behind object orientation, you'll likely appreciate +the particularly powerful and general way Common Lisp manifests those +ideas. On the other hand, if your experience with object orientation +has been largely with a single language, you may find Common Lisp's +approach somewhat foreign; you should try to avoid assuming that there's +only one way for a language to support object orientation. (2) If you +have little object-oriented programming experience, you should have no +trouble understanding the explanations here, though it may help to +ignore the occasional comparisons to the way other languages do things. + +* Menu: + +* 16-1:: Generic Functions and Classes +* 16-2:: Generic Functions and Methods +* 16-3:: DEFGENERIC +* 16-4:: DEFMETHOD +* 16-5:: Method Combination +* 16-6:: The Standard Method Combination +* 16-7:: Other Method Combinations +* 16-8:: Multimethods +* 16-9:: To Be Continued . . . + + ---------- Footnotes ---------- + + (1) 1 + + (2) 2 + + +File: pcl.info, Node: 16-1, Next: 16-2, Prev: Chapter 16, Up: Chapter 16 + +Generic Functions and Classes +============================= + +The fundamental idea of object orientation is that a powerful way to +organize a program is to define data types and then associate operations +with those data types. In particular, you want to be able to invoke an +operation and have the exact behavior determined by the type of the +object or objects on which the operation was invoked. The classic +example used, seemingly by all introductions to object orientation, is +an operation draw that can be applied to objects representing various +geometric shapes. Different implementations of the draw operation can +be provided for drawing circles, triangles, and squares, and a call to +draw will actually result in drawing a circle, triangle, or square, +depending on the type of the object to which the draw operation is +applied. The different implementations of draw are defined separately, +and new versions can be defined that draw other shapes without having to +change the code of either the caller or any of the other draw +implementations. This feature of object orientation goes by the fancy +Greek name polymorphism, meaning "many forms," because a single +conceptual operation, such as drawing an object, can take many different +concrete forms. + + Common Lisp, like most object-oriented languages today, is +class-based; all objects are instances of a particular class. (1) The +class of an object determines its representation-built-in classes such +as NUMBER and STRING have opaque representations accessible only via the +standard functions for manipulating those types, while instances of +user-defined classes, as you'll see in the next chapter, consist of +named parts called slots. + + Classes are arranged in a hierarchy, a taxonomy for all objects. A +class can be defined as a subclass of other classes, called its +superclasses. A class inherits part of its definition from its +superclasses and instances of a class are also considered instances of +the superclasses. In Common Lisp, the hierarchy of classes has a single +root, the class T, which is a direct or indirect superclass of every +other class. Thus, every datum in Common Lisp is an instance of T. (2) +Common Lisp also supports multiple inheritance-a single class can have +multiple direct superclasses. + + Outside the Lisp family, almost all object-oriented languages follow +the basic pattern established by Simula of having behavior associated +with classes through methods or member functions that belong to a +particular class. In these languages, a method is invoked on a +particular object, and the class of that object determines what code +runs. This model of method invocation is called-after the Smalltalk +terminology-message passing. Conceptually, method invocation in a +message-passing system starts by sending a message containing the name +of the method to run and any arguments to the object on which the method +is being invoked. The object then uses its class to look up the method +associated with the name in the message and runs it. Because each class +can have its own method for a given name, the same message, sent to +different objects, can invoke different methods. + + Early Lisp object systems worked in a similar way, providing a +special function SEND that could be used to send a message to a +particular object. However, this wasn't entirely satisfactory, as it +made method invocations different from normal function calls. +Syntactically method invocations were written like this: + + (send object 'foo) + + rather than like this: + + (foo object) + + More significantly, because methods weren't functions, they couldn't +be passed as arguments to higher-order functions such as MAPCAR; if one +wanted to call a method on all the elements of a list with MAPCAR, one +had to write this: + + (mapcar #'(lambda (object) (send object 'foo)) objects) + + rather than this: + + (mapcar #'foo objects) + + Eventually the folks working on Lisp object systems unified methods +with functions by creating a new kind of function called a generic +function. In addition to solving the problems just described, generic +functions opened up new possibilities for the object system, including +many features that simply don't make sense in a message-passing object +system. + + Generic functions are the heart of Common Lisp's object system and +the topic of the rest of this chapter. While I can't talk about generic +functions without some mention of classes, for now I'll focus on how to +define and use generic functions. In the next chapter I'll show you how +to define your own classes. + + ---------- Footnotes ---------- + + (1) 3 + + (2) 4 + + +File: pcl.info, Node: 16-2, Next: 16-3, Prev: 16-1, Up: Chapter 16 + +Generic Functions and Methods +============================= + +A generic function defines an abstract operation, specifying its name +and a parameter list but no implementation. Here, for example, is how +you might define a generic function, draw, that will be used to draw +different kinds of shapes on the screen: + + (defgeneric draw (shape) + (:documentation "Draw the given shape on the screen.")) + + I'll discuss the syntax of DEFGENERIC in the next section; for now +just note that this definition doesn't contain any actual code. + + A generic function is generic in the sense that it can-at least in +theory-accept any objects as arguments. (1) However, by itself a +generic function can't actually do anything; if you just define a +generic function, no matter what arguments you call it with, it will +signal an error. The actual implementation of a generic function is +provided by methods. Each method provides an implementation of the +generic function for particular classes of arguments. Perhaps the +biggest difference between a generic function-based system and a +message-passing system is that methods don't belong to classes; they +belong to the generic function, which is responsible for determining +what method or methods to run in response to a particular invocation. + + Methods indicate what kinds of arguments they can handle by +specializing the required parameters defined by the generic function. +For instance, on the generic function draw, you might define one method +that specializes the shape parameter for objects that are instances of +the class circle while another method specializes shape for objects that +are instances of the class triangle. They would look like this, eliding +the actual drawing code: + + (defmethod draw ((shape circle)) + ...) + + (defmethod draw ((shape triangle)) + ...) + + When a generic function is invoked, it compares the actual arguments +it was passed with the specializers of each of its methods to find the +applicable methods-those methods whose specializers are compatible with +the actual arguments. If you invoke draw, passing an instance of +circle, the method that specialized shape on the class circle is +applicable, while if you pass it a triangle, then the method that +specializes shape on the class triangle applies. In simple cases, only +one method will be applicable, and it will handle the invocation. In +more complex cases, there may be multiple methods that apply; they're +then combined, as I'll discuss in the section "Method Combination," into +a single effective method that handles the invocation. + + You can specialize a parameter in two ways-usually you'll specify a +class that the argument must be an instance of. Because instances of a +class are also considered instances of that class's superclasses, a +method with a parameter specialized on a particular class can be +applicable whenever the corresponding argument is a direct instance of +the specializing class or of any of its subclasses. The other kind of +specializer is a so-called EQL specializer, which specifies a particular +object to which the method applies. + + When a generic function has only methods specialized on a single +parameter and all the specializers are class specializers, the result of +invoking a generic function is quite similar to the result of invoking a +method in a message-passing system-the combination of the name of the +operation and the class of the object on which it's invoked determines +what method to run. + + However, reversing the order of lookup opens up possibilities not +found in message-passing systems. Generic functions support methods +that specialize on multiple parameters, provide a framework that makes +multiple inheritance much more manageable, and let you use declarative +constructs to control how methods are combined into an effective method, +supporting several common usage patterns without a lot of boilerplate +code. I'll discuss those topics in a moment. But first you need to +look at the basics of the two macros used to define the generic +functions DEFGENERIC and DEFMETHOD. + + ---------- Footnotes ---------- + + (1) 5 + + +File: pcl.info, Node: 16-3, Next: 16-4, Prev: 16-2, Up: Chapter 16 + +DEFGENERIC +========== + +To give you a feel for these macros and the various facilities they +support, I'll show you some code you might write as part of a banking +application-or, rather, a toy banking application; the point is to look +at a few language features, not to learn how to really write banking +software. For instance, this code doesn't even pretend to deal with +such issues as multiple currencies let alone audit trails and +transactional integrity. + + Because I'm not going to discuss how to define new classes until the +next chapter, for now you can just assume that certain classes already +exist: for starters, assume there's a class bank-account and that it has +two subclasses, checking-account and savings-account. The class +hierarchy looks like this: + + + + The first generic function will be withdraw, which decreases the +account balance by a specified amount. If the balance is less than the +amount, it should signal an error and leave the balance unchanged. You +can start by defining the generic function with DEFGENERIC. + + The basic form of DEFGENERIC is similar to DEFUN except with no body. +The parameter list of DEFGENERIC specifies the parameters that must be +accepted by all the methods that will be defined on the generic +function. In the place of the body, a DEFGENERIC can contain various +options. One option you should always include is :documentation, which +you use to provide a string describing the purpose of the generic +function. Because a generic function is purely abstract, it's important +to be clear to both users and implementers what it's for. Thus, you +might define withdraw like this: + + (defgeneric withdraw (account amount) + (:documentation "Withdraw the specified amount from the account. + + Signal an error if the current balance is less than amount.")) + + +File: pcl.info, Node: 16-4, Next: 16-5, Prev: 16-3, Up: Chapter 16 + +DEFMETHOD +========= + +Now you're ready to use DEFMETHOD to define methods that implement +withdraw. (1) + + A method's parameter list must be congruent with its generic +function's. In this case, that means all methods defined on withdraw +must have exactly two required parameters. More generally, methods must +have the same number of required and optional parameters and must be +capable of accepting any arguments corresponding to any &rest or &key +parameters specified by the generic function. (2) + + Since the basics of withdrawing are the same for all accounts, you +can define a method that specializes the account parameter on the +bank-account class. You can assume the function balance returns the +current balance of the account and can be used with SETF-and thus with +DECF-to set the balance. The function ERROR is a standard function used +to signal an error, which I'll discuss in greater detail in Chapter 19. +Using those two functions, you can define a basic withdraw method that +looks like this: + + (defmethod withdraw ((account bank-account) amount) + (when (< (balance account) amount) + (error "Account overdrawn.")) + (decf (balance account) amount)) + + As this code suggests, the form of DEFMETHOD is even more like that +of DEFUN than DEFGENERIC's is. The only difference is that the required +parameters can be specialized by replacing the parameter name with a +two-element list. The first element is the name of the parameter, and +the second element is the specializer, either the name of a class or an +EQL specializer, the form of which I'll discuss in a moment. The +parameter name can be anything-it doesn't have to match the name used in +the generic function, though it often will. + + This method will apply whenever the first argument to withdraw is an +instance of bank-account. The second parameter, amount, is implicitly +specialized on T, and since all objects are instances of T, it doesn't +affect the applicability of the method. + + Now suppose all checking accounts have overdraft protection. That +is, each checking account is linked to another bank account that's drawn +upon when the balance of the checking account itself can't cover a +withdrawal. You can assume that the function overdraft-account takes a +checking-account object and returns a bank-account object representing +the linked account. + + Thus, withdrawing from a checking-account object requires a few extra +steps compared to withdrawing from a standard bank-account object. You +must first check whether the amount being withdrawn is greater than the +account's current balance and, if it is, transfer the difference from +the overdraft account. Then you can proceed as with a standard +bank-account object. + + So what you'd like to do is define a method on withdraw that +specializes on checking-account to handle the transfer and then lets the +method specialized on bank-account take control. Such a method might +look like this: + + (defmethod withdraw ((account checking-account) amount) + (let ((overdraft (- amount (balance account)))) + (when (plusp overdraft) + (withdraw (overdraft-account account) overdraft) + (incf (balance account) overdraft))) + (call-next-method)) + + The function CALL-NEXT-METHOD is part of the generic function +machinery used to combine applicable methods. It indicates that control +should be passed from this method to the method specialized on +bank-account. (3) When it's called with no arguments, as it is here, +the next method is invoked with whatever arguments were originally +passed to the generic function. It can also be called with arguments, +which will then be passed onto the next method. + + You aren't required to invoke CALL-NEXT-METHOD in every method. +However, if you don't, the new method is then responsible for completely +implementing the desired behavior of the generic function. For example, +if you had a subclass of bank-account, proxy-account, that didn't +actually keep track of its own balance but instead delegated withdrawals +to another account, you might write a method like this (assuming a +function, proxied-account, that returns the proxied account): + + (defmethod withdraw ((proxy proxy-account) amount) + (withdraw (proxied-account proxy) amount)) + + Finally, DEFMETHOD also allows you to create methods specialized on a +particular object with an EQL specializer. For example, suppose the +banking app is going to be deployed in a particularly corrupt bank. +Suppose the variable *account-of-bank-president* holds a reference to a +particular bank account that belongs-as the name suggests-to the bank's +president. Further suppose the variable *bank* represents the bank as a +whole, and the function embezzle steals money from the bank. The bank +president might ask you to "fix" withdraw to handle his account +specially. + + (defmethod withdraw ((account (eql *account-of-bank-president*)) amount) + (let ((overdraft (- amount (balance account)))) + (when (plusp overdraft) + (incf (balance account) (embezzle *bank* overdraft))) + (call-next-method))) + + Note, however, that the form in the EQL specializer that provides the +object to specialize on-*account-of-bank-president* in this case-is +evaluated once, when the DEFMETHOD is evaluated. This method will be +specialized on the value of *account-of-bank-president* at the time the +method is defined; changing the variable later won't change the method. + + ---------- Footnotes ---------- + + (1) 6 + + (2) 7 + + (3) 8 + + +File: pcl.info, Node: 16-5, Next: 16-6, Prev: 16-4, Up: Chapter 16 + +Method Combination +================== + +Outside the body of a method, CALL-NEXT-METHOD has no meaning. Within a +method, it's given a meaning by the generic function machinery that +builds an effective method each time the generic function is invoked +using all the methods applicable to that particular invocation. This +notion of building an effective method by combining applicable methods +is the heart of the generic function concept and is the thing that +allows generic functions to support facilities not found in +message-passing systems. So it's worth taking a closer look at what's +really happening. Folks with the message-passing model deeply ingrained +in their consciousness should pay particular attention because generic +functions turn method dispatching inside out compared to message +passing, making the generic function, rather than the class, the prime +mover. + + Conceptually, the effective method is built in three steps: First, +the generic function builds a list of applicable methods based on the +actual arguments it was passed. Second, the list of applicable methods +is sorted according to the specificity of their parameter specializers. +Finally, methods are taken in order from the sorted list and their code +combined to produce the effective method. (1) + + To find applicable methods, the generic function compares the actual +arguments with the corresponding parameter specializers in each of its +methods. A method is applicable if, and only if, all the specializers +are compatible with the corresponding arguments. + + When the specializer is the name of a class, it's compatible if it +names the actual class of the argument or one of its superclasses. +(Recall that parameters without explicit specializers are implicitly +specialized on the class T so will be compatible with any argument.) An +EQL specializer is compatible only when the argument is the same object +as was specified in the specializer. + + Because all the arguments are checked against the corresponding +specializers, they all affect whether a method is applicable. Methods +that explicitly specialize more than one parameter are called +multimethods; I'll discuss them in the section "Multimethods." + + After the applicable methods have been found, the generic function +machinery needs to sort them before it can combine them into an +effective method. To order two applicable methods, the generic function +compares their parameter specializers from left to right, (2) and the +first specializer that's different between the two methods determines +their ordering, with the method with the more specific specializer +coming first. + + Because only applicable methods are being sorted, you know all class +specializers will name classes that the corresponding argument is +actually an instance of. In the typical case, if two class specializers +differ, one will be a subclass of the other. In that case, the +specializer naming the subclass is considered more specific. This is +why the method that specialized account on checking-account was +considered more specific than the method that specialized it on +bank-account. + + Multiple inheritance slightly complicates the notion of specificity +since the actual argument may be an instance of two classes, neither of +which is a subclass of the other. If such classes are used as parameter +specializers, the generic function can't order them using only the rule +that subclasses are more specific than their superclasses. In the next +chapter I'll discuss how the notion of specificity is extended to deal +with multiple inheritance. For now, suffice it to say that there's a +deterministic algorithm for ordering class specializers. + + Finally, an EQL specializer is always more specific than any class +specializer, and because only applicable methods are being considered, +if more than one method has an EQL specializer for a particular +parameter, they must all have the same EQL specializer. The comparison +of those methods will thus be decided based on other parameters. + + ---------- Footnotes ---------- + + (1) 9 + + (2) 10 + + +File: pcl.info, Node: 16-6, Next: 16-7, Prev: 16-5, Up: Chapter 16 + +The Standard Method Combination +=============================== + +Now that you understand how the applicable methods are found and sorted, +you're ready to take a closer look at the last step-how the sorted list +of methods is combined into a single effective method. By default, +generic functions use what's called the standard method combination. +The standard method combination combines methods so that +CALL-NEXT-METHOD works as you've already seen-the most specific method +runs first, and each method can pass control to the next most specific +method via CALL-NEXT-METHOD. + + However, there's a bit more to it than that. The methods I've been +discussing so far are called primary methods. Primary methods, as their +name suggests, are responsible for providing the primary implementation +of a generic function. The standard method combination also supports +three kinds of auxiliary methods: :before, :after, and :around methods. +An auxiliary method definition is written with DEFMETHOD like a primary +method but with a method qualifier, which names the type of method, +between the name of the method and the parameter list. For instance, a +:before method on withdraw that specializes the account parameter on the +class bank-account would start like this: + + (defmethod withdraw :before ((account bank-account) amount) ...) + + Each kind of auxiliary method is combined into the effective method +in a different way. All the applicable :before methods-not just the +most specific-are run as part of the effective method. They run, as +their name suggests, before the most specific primary method and are run +in most-specific-first order. Thus, :before methods can be used to do +any preparation needed to ensure that the primary method can run. For +instance, you could've used a :before method specialized on +checking-account to implement the overdraft protection on checking +accounts like this: + + (defmethod withdraw :before ((account checking-account) amount) + (let ((overdraft (- amount (balance account)))) + (when (plusp overdraft) + (withdraw (overdraft-account account) overdraft) + (incf (balance account) overdraft)))) + + This :before method has three advantages over a primary method. One +is that it makes it immediately obvious how the method changes the +overall behavior of the withdraw function-it's not going to interfere +with the main behavior or change the result returned. + + The next advantage is that a primary method specialized on a class +more specific than checking-account won't interfere with this :before +method, making it easier for an author of a subclass of checking-account +to extend the behavior of withdraw while keeping part of the old +behavior. + + Lastly, since a :before method doesn't have to call CALL-NEXT-METHOD +to pass control to the remaining methods, it's impossible to introduce a +bug by forgetting to. + + The other auxiliary methods also fit into the effective method in +ways suggested by their names. All the :after methods run after the +primary methods in most-specific-last order, that is, the reverse of the +:before methods. Thus, the :before and :after methods combine to create +a sort of nested wrapping around the core functionality provided by the +primary methods-each more-specific :before method will get a chance to +set things up so the less-specific :before methods and primary methods +can run successfully, and each more-specific :after method will get a +chance to clean up after all the primary methods and less-specific +:after methods. + + Finally, :around methods are combined much like primary methods +except they're run "around" all the other methods. That is, the code +from the most specific :around method is run before anything else. +Within the body of an :around method, CALL-NEXT-METHOD will lead to the +code of the next most specific :around method or, in the least specific +:around method, to the complex of :before, primary, and :after methods. +Almost all :around methods will contain such a call to CALL-NEXT-METHOD +because an :around method that doesn't will completely hijack the +implementation of the generic function from all the methods except for +more-specific :around methods. + + Occasionally that kind of hijacking is called for, but typically +:around methods are used to establish some dynamic context in which the +rest of the methods will run-to bind a dynamic variable, for example, or +to establish an error handler (as I'll discuss in Chapter 19). About +the only time it's appropriate for an :around method to not call +CALL-NEXT-METHOD is when it returns a result cached from a previous call +to CALL-NEXT-METHOD. At any rate, an :around method that doesn't call +CALL-NEXT-METHOD is responsible for correctly implementing the semantics +of the generic function for all classes of arguments to which the method +may apply, including future subclasses. + + Auxiliary methods are just a convenient way to express certain common +patterns more concisely and concretely. They don't actually allow you +to do anything you couldn't do by combining primary methods with +diligent adherence to a few coding conventions and some extra typing. +Perhaps their biggest benefit is that they provide a uniform framework +for extending generic functions. Often a library will define a generic +function and provide a default primary method, allowing users of the +library to customize its behavior by defining appropriate auxiliary +methods. + + +File: pcl.info, Node: 16-7, Next: 16-8, Prev: 16-6, Up: Chapter 16 + +Other Method Combinations +========================= + +In addition to the standard method combination, the language specifies +nine other built-in method combinations known as the simple built-in +method combinations. You can also define custom method combinations, +though that's a fairly esoteric feature and beyond the scope of this +book. I'll briefly cover how to use the simple built-in combinations to +give you a sense of the possibilities. + + All the simple combinations follow the same pattern: instead of +invoking the most specific primary method and letting it invoke +less-specific primary methods via CALL-NEXT-METHOD, the simple method +combinations produce an effective method that contains the code of all +the primary methods, one after another, all wrapped in a call to the +function, macro, or special operator that gives the method combination +its name. The nine combinations are named for the operators: +, AND, +OR, LIST, APPEND, NCONC, MIN, MAX, and PROGN. The simple combinations +also support only two kinds of methods, primary methods, which are +combined as just described, and :around methods, which work like :around +methods in the standard method combination. + + For example, a generic function that uses the + method combination +will return the sum of all the results returned by its primary methods. +Note that the AND and OR method combinations won't necessarily run all +the primary methods because of those macros' short-circuiting behavior-a +generic function using the AND combination will return NIL as soon as +one of the methods does and will return the value of the last method +otherwise. Similarly, the OR combination will return the first non-NIL +value returned by any of the methods. + + To define a generic function that uses a particular method +combination, you include a :method-combination option in the DEFGENERIC +form. The value supplied with this option is the name of the method +combination you want to use. For example, to define a generic function, +priority, that returns the sum of values returned by individual methods +using the + method combination, you might write this: + + (defgeneric priority (job) + (:documentation "Return the priority at which the job should be run.") + (:method-combination +)) + + By default all these method combinations combine the primary methods +in most-specific-first order. However, you can reverse the order by +including the keyword :most-specific-last after the name of the method +combination in the DEFGENERIC form. The order probably doesn't matter +if you're using the + combination unless the methods have side effects, +but for demonstration purposes you can change priority to use +most-specific-last order like this: + + (defgeneric priority (job) + (:documentation "Return the priority at which the job should be run.") + (:method-combination + :most-specific-last)) + + The primary methods on a generic function that uses one of these +combinations must be qualified with the name of the method combination. +Thus, a primary method defined on priority might look like this: + + (defmethod priority + ((job express-job)) 10) + + This makes it obvious when you see a method definition that it's part +of a particular kind of generic function. + + All the simple built-in method combinations also support :around +methods that work like :around methods in the standard method +combination: the most specific :around method runs before any other +methods, and CALL-NEXT-METHOD is used to pass control to +less-and-less-specific :around methods until it reaches the combined +primary methods. The :most-specific-last option doesn't affect the +order of :around methods. And, as I mentioned before, the built-in +method combinations don't support :before or :after methods. + + Like the standard method combination, these method combinations don't +allow you to do anything you couldn't do "by hand." Rather, they allow +you to express what you want and let the language take care of wiring +everything together for you, making your code both more concise and more +expressive. + + That said, probably 99 percent of the time, the standard method +combination will be exactly what you want. Of the remaining 1 percent, +probably 99 percent of them will be handled by one of the simple +built-in method combinations. If you run into one of the 1 percent of 1 +percent of cases where none of the built-in combinations suffices, you +can look up DEFINE-METHOD-COMBINATION in your favorite Common Lisp +reference. + + +File: pcl.info, Node: 16-8, Next: 16-9, Prev: 16-7, Up: Chapter 16 + +Multimethods +============ + +Methods that explicitly specialize more than one of the generic +function's required parameters are called multimethods. Multimethods +are where generic functions and message passing really part ways. +Multimethods don't fit into message-passing languages because they don't +belong to a particular class; instead, each multimethod defines a part +of the implementations of a given generic function that applies when the +generic function is invoked with arguments that match all the method's +specialized parameters. + +* Menu: + +* 16-8-1:: Multimethods vs. Method Overloading + + Multimethods are perfect for all those situations where, in a +message-passing language, you struggle to decide to which class a +certain behavior ought to belong. Is the sound a drum makes when it's +hit with a drumstick a function of what kind of drum it is or what kind +of stick you use to hit it? Both, of course. To model this situation +in Common Lisp, you simply define a generic function beat that takes two +arguments. + + (defgeneric beat (drum stick) + (:documentation + "Produce a sound by hitting the given drum with the given stick.")) + + Then you can define various multimethods to implement beat for the +combinations you care about. For example: + + (defmethod beat ((drum snare-drum) (stick wooden-drumstick)) ...) + (defmethod beat ((drum snare-drum) (stick brush)) ...) + (defmethod beat ((drum snare-drum) (stick soft-mallet)) ...) + (defmethod beat ((drum tom-tom) (stick wooden-drumstick)) ...) + (defmethod beat ((drum tom-tom) (stick brush)) ...) + (defmethod beat ((drum tom-tom) (stick soft-mallet)) ...) + + Multimethods don't help with the combinatorial explosion-if you need +to model five kinds of drums and six kinds of sticks, and every +combination makes a different sound, there's no way around it; you need +thirty different methods to implement all the combinations, with or +without multimethods. What multimethods do save you from is having to +write a bunch of dispatching code by letting you use the same built-in +polymorphic dispatching that's so useful when dealing with methods +specialized on a single parameter. (1) + + Multimethods also save you from having to tightly couple one set of +classes with the other. In the drum/stick example, nothing requires the +implementation of the drum classes to know about the various classes of +drumstick, and nothing requires the drumstick classes to know anything +about the various classes of drum. The multimethods connect the +otherwise independent classes to describe their joint behavior without +requiring any cooperation from the classes themselves. + + ---------- Footnotes ---------- + + (1) 11 + + +File: pcl.info, Node: 16-8-1, Next: 16-8, Prev: 16-8, Up: 16-8 + +Multimethods vs. Method Overloading +=================================== + +Programmers used to statically typed message-passing languages such as +Java and C++ may think multimethods sound similar to a feature of those +languages called method overloading. However, these two language +features are actually quite different since overloaded methods are +chosen at compile time, based on the compile-time type of the arguments, +not at runtime. To see how this works, consider the following two Java +classes: + + public class A { + public void foo(A a) { System.out.println("A/A"); } + public void foo(B b) { System.out.println("A/B"); } + } + public class B extends A { + public void foo(A a) { System.out.println("B/A"); } + public void foo(B b) { System.out.println("B/B"); } + } + + Now consider what happens when you run the main method from this +class. + + public class Main { + public static void main(String[] argv) { + A obj = argv[0].equals("A") ? new A() : new B(); + obj.foo(obj); + } + } + + When you tell Main to instantiate an A, it prints "A/A" as you'd +probably expect. + + bash$ java com.gigamonkeys.Main A + A/A + + However, if you tell Main to instantiate a B, then the true type of +obj is taken into account for only half the dispatching. + + bash$ java com.gigamonkeys.Main B + B/A + + If overloaded methods worked like Common Lisp's multimethods, then +that would print "B/B" instead. It is possible to implement multiple +dispatch by hand in message-passing languages, but this runs against the +grain of the message-passing model since the code in a multiply +dispatched method doesn't belong to any one class. + + +File: pcl.info, Node: 16-9, Next: Chapter 17, Prev: 16-8, Up: Chapter 16 + +To Be Continued . . . +===================== + +I've covered the basics-and a bit beyond-of generic functions, the verbs +of Common Lisp's object system. In the next chapter I'll show you how +to define your own classes. + + +File: pcl.info, Node: Chapter 17, Next: Chapter 18, Prev: Chapter 16, Up: Top + +17. Object Reorientation: Classes +================================= + +If generic functions are the verbs of the object system, classes are the +nouns. As I mentioned in the previous chapter, all values in a Common +Lisp program are instances of some class. Furthermore, all classes are +organized into a single hierarchy rooted at the class T. + + The class hierarchy consists of two major families of classes, +built-in and user-defined classes. Classes that represent the data +types you've been learning about up until now, classes such as INTEGER, +STRING, and LIST, are all built-in. They live in their own section of +the class hierarchy, arranged into appropriate sub- and superclass +relationships, and are manipulated by the functions I've been discussing +for much of the book up until now. You can't subclass these classes, +but, as you saw in the previous chapter, you can define methods that +specialize on them, effectively extending the behavior of those classes. +(1) + + But when you want to create new nouns-for instance, the classes used +in the previous chapter for representing bank accounts-you need to +define your own classes. That's the subject of this chapter. + +* Menu: + +* 17-1:: DEFCLASS +* 17-2:: Slot Specifiers +* 17-3:: Object Initialization +* 17-4:: Accessor Functions +* 17-5:: WITH-SLOTS and WITH-ACCESSORS +* 17-6:: Class-Allocated Slots +* 17-7:: Slots and Inheritance +* 17-8:: Multiple Inheritance +* 17-9:: Good Object-Oriented Design + + ---------- Footnotes ---------- + + (1) Defining new methods for an existing class may seem strange to +folks used to statically typed languages such as C++ and Java in which +all the methods of a class must be defined as part of the class +definition. But programmers with experience in dynamically typed +object-oriented languages such as Smalltalk and Objective C will find +nothing strange about adding new behaviors to existing classes. + + +File: pcl.info, Node: 17-1, Next: 17-2, Prev: Chapter 17, Up: Chapter 17 + +DEFCLASS +======== + +You create user-defined classes with the DEFCLASS macro. Because +behaviors are associated with a class by defining generic functions and +methods specialized on the class, DEFCLASS is responsible only for +defining the class as a data type. + + The three facets of the class as a data type are its name, its +relation to other classes, and the names of the slots that make up +instances of the class. (1) The basic form of a DEFCLASS is quite +simple. + + (defclass name (direct-superclass-name*) + (slot-specifier*)) + +* Menu: + +* 17-1-1:: What Are "User-Defined Classes"? + + As with functions and variables, you can use any symbol as the name +of a new class. (2) Class names are in a separate namespace from both +functions and variables, so you can have a class, function, and variable +all with the same name. You'll use the class name as the argument to +MAKE-INSTANCE, the function that creates new instances of user-defined +classes. + + The direct-superclass-names specify the classes of which the new +class is a subclass. If no superclasses are listed, the new class will +directly subclass STANDARD-OBJECT. Any classes listed must be other +user-defined classes, which ensures that each new class is ultimately +descended from STANDARD-OBJECT. STANDARD-OBJECT in turn subclasses T, so +all user-defined classes are part of the single class hierarchy that +also contains all the built-in classes. + + Eliding the slot specifiers for a moment, the DEFCLASS forms of some +of the classes you used in the previous chapter might look like this: + + (defclass bank-account () ...) + + (defclass checking-account (bank-account) ...) + + (defclass savings-account (bank-account) ...) + + I'll discuss in the section "Multiple Inheritance" what it means to +list more than one direct superclass in direct-superclass-names. + + ---------- Footnotes ---------- + + (1) In other object-oriented languages, slots might be called fields, +member variables, or attributes. + + (2) As when naming functions and variables, it's not quite true that +you can use any symbol as a class name-you can't use names defined by +the language standard. You'll see in Chapter 21 how to avoid such name +conflicts. + + +File: pcl.info, Node: 17-1-1, Next: 17-1, Prev: 17-1, Up: 17-1 + +What Are "User-Defined Classes"? +================================ + +The term user-defined classes isn't a term from the language +standard-technically what I'm talking about when I say user-defined +classes are classes that subclass STANDARD-OBJECT and whose metaclass is +STANDARD-CLASS. But since I'm not going to talk about the ways you can +define classes that don't subclass STANDARD-OBJECT and whose metaclass +isn't STANDARD-CLASS, you don't really have to worry about that. +User-defined isn't a perfect term for these classes since the +implementation may define certain classes the same way. However, to +call them standard classes would be even more confusing since the +built-in classes, such as INTEGER and STRING, are just as standard, if +not more so, because they're defined by the language standard but they +don't extend STANDARD-OBJECT. To further complicate matters, it's also +possible for users to define new classes that don't subclass +STANDARD-OBJECT. In particular, the macro DEFSTRUCT also defines new +classes. But that's largely for backward compatibility-DEFSTRUCT +predated CLOS and was retrofitted to define classes when CLOS was +integrated into the language. But the classes it creates are fairly +limited compared to DEFCLASSed classes. So in this chapter I'll be +discussing only classes defined with DEFCLASS that use the default +metaclass of STANDARD-CLASS, and I'll refer to them as user-defined for +lack of a better term. + + +File: pcl.info, Node: 17-2, Next: 17-3, Prev: 17-1, Up: Chapter 17 + +Slot Specifiers +=============== + +The bulk of a DEFCLASS form consists of the list of slot specifiers. +Each slot specifier defines a slot that will be part of each instance of +the class. Each slot in an instance is a place that can hold a value, +which can be accessed using the SLOT-VALUE function. SLOT-VALUE takes +an object and the name of a slot as arguments and returns the value of +the named slot in the given object. It can be used with SETF to set the +value of a slot in an object. + + A class also inherits slot specifiers from its superclasses, so the +set of slots actually present in any object is the union of all the +slots specified in a class's DEFCLASS form and those specified in all +its superclasses. + + At the minimum, a slot specifier names the slot, in which case the +slot specifier can be just a name. For instance, you could define a +bank-account class with two slots, customer-name and balance, like this: + + (defclass bank-account () + (customer-name + balance)) + + Each instance of this class will contain two slots, one to hold the +name of the customer the account belongs to and another to hold the +current balance. With this definition, you can create new bank-account +objects using MAKE-INSTANCE. + + (make-instance 'bank-account) ==> #<BANK-ACCOUNT #x724b93ba> + + The argument to MAKE-INSTANCE is the name of the class to +instantiate, and the value returned is the new object. (1) The printed +representation of an object is determined by the generic function +PRINT-OBJECT. In this case, the applicable method will be one provided +by the implementation, specialized on STANDARD-OBJECT. Since not every +object can be printed so that it can be read back, the STANDARD-OBJECT +print method uses the #<> syntax, which will cause the reader to signal +an error if it tries to read it. The rest of the representation is +implementation-defined but will typically be something like the output +just shown, including the name of the class and some distinguishing +value such as the address of the object in memory. In Chapter 23 you'll +see an example of how to define a method on PRINT-OBJECT to make objects +of a certain class be printed in a more informative form. + + Using the definition of bank-account just given, new objects will be +created with their slots unbound. Any attempt to get the value of an +unbound slot signals an error, so you must set a slot before you can +read it. + + (defparameter *account* (make-instance 'bank-account)) ==> *ACCOUNT* + (setf (slot-value *account* 'customer-name) "John Doe") ==> "John Doe" + (setf (slot-value *account* 'balance) 1000) ==> 1000 + + Now you can access the value of the slots. + + (slot-value *account* 'customer-name) ==> "John Doe" + (slot-value *account* 'balance) ==> 1000 + + ---------- Footnotes ---------- + + (1) The argument to MAKE-INSTANCE can actually be either the name of +the class or a class object returned by the function CLASS-OF or +FIND-CLASS. + + +File: pcl.info, Node: 17-3, Next: 17-4, Prev: 17-2, Up: Chapter 17 + +Object Initialization +===================== + +Since you can't do much with an object with unbound slots, it'd be nice +to be able to create objects with their slots already initialized. +Common Lisp provides three ways to control the initial value of slots. +The first two involve adding options to the slot specifier in the +DEFCLASS form: with the :initarg option, you can specify a name that can +then be used as a keyword parameter to MAKE-INSTANCE and whose argument +will be stored in the slot. A second option, :initform, lets you +specify a Lisp expression that will be used to compute a value for the +slot if no :initarg argument is passed to MAKE-INSTANCE. Finally, for +complete control over the initialization, you can define a method on the +generic function INITIALIZE-INSTANCE, which is called by MAKE-INSTANCE. +(1) + + A slot specifier that includes options such as :initarg or :initform +is written as a list starting with the name of the slot followed by the +options. For example, if you want to modify the definition of +bank-account to allow callers of MAKE-INSTANCE to pass the customer name +and the initial balance and to provide a default value of zero dollars +for the balance, you'd write this: + + (defclass bank-account () + ((customer-name + :initarg :customer-name) + (balance + :initarg :balance + :initform 0))) + + Now you can create an account and specify the slot values at the same +time. + + (defparameter *account* + (make-instance 'bank-account :customer-name "John Doe" :balance 1000)) + + (slot-value *account* 'customer-name) ==> "John Doe" + (slot-value *account* 'balance) ==> 1000 + + If you don't supply a :balance argument to MAKE-INSTANCE, the +SLOT-VALUE of balance will be computed by evaluating the form specified +with the :initform option. But if you don't supply a :customer-name +argument, the customer-name slot will be unbound, and an attempt to read +it before you set it will signal an error. + + (slot-value (make-instance 'bank-account) 'balance) ==> 0 + (slot-value (make-instance 'bank-account) 'customer-name) ==> error + + If you want to ensure that the customer name is supplied when the +account is created, you can signal an error in the initform since it +will be evaluated only if an initarg isn't supplied. You can also use +initforms that generate a different value each time they're +evaluated-the initform is evaluated anew for each object. To experiment +with these techniques, you can modify the customer-name slot specifier +and add a new slot, account-number, that's initialized with the value of +an ever-increasing counter. + + (defvar *account-numbers* 0) + + (defclass bank-account () + ((customer-name + :initarg :customer-name + :initform (error "Must supply a customer name.")) + (balance + :initarg :balance + :initform 0) + (account-number + :initform (incf *account-numbers*)))) + + Most of the time the combination of :initarg and :initform options +will be sufficient to properly initialize an object. However, while an +initform can be any Lisp expression, it has no access to the object +being initialized, so it can't initialize one slot based on the value of +another. For that you need to define a method on the generic function +INITIALIZE-INSTANCE. + + The primary method on INITIALIZE-INSTANCE specialized on +STANDARD-OBJECT takes care of initializing slots based on their :initarg +and :initform options. Since you don't want to disturb that, the most +common way to add custom initialization code is to define an :after +method specialized on your class. (2) For instance, suppose you want to +add a slot account-type that needs to be set to one of the values :gold, +:silver, or :bronze based on the account's initial balance. You might +change your class definition to this, adding the account-type slot with +no options: + + (defclass bank-account () + ((customer-name + :initarg :customer-name + :initform (error "Must supply a customer name.")) + (balance + :initarg :balance + :initform 0) + (account-number + :initform (incf *account-numbers*)) + account-type)) + + Then you can define an :after method on INITIALIZE-INSTANCE that sets +the account-type slot based on the value that has been stored in the +balance slot. (3) + + (defmethod initialize-instance :after ((account bank-account) &key) + (let ((balance (slot-value account 'balance))) + (setf (slot-value account 'account-type) + (cond + ((>= balance 100000) :gold) + ((>= balance 50000) :silver) + (t :bronze))))) + + The &key in the parameter list is required to keep the method's +parameter list congruent with the generic function's-the parameter list +specified for the INITIALIZE-INSTANCE generic function includes &key in +order to allow individual methods to supply their own keyword parameters +but doesn't require any particular ones. Thus, every method must +specify &key even if it doesn't specify any &key parameters. + + On the other hand, if an INITIALIZE-INSTANCE method specialized on a +particular class does specify a &key parameter, that parameter becomes a +legal parameter to MAKE-INSTANCE when creating an instance of that +class. For instance, if the bank sometimes pays a percentage of the +initial balance as a bonus when an account is opened, you could +implement that using a method on INITIALIZE-INSTANCE that takes a +keyword argument to specify the percentage of the bonus like this: + + (defmethod initialize-instance :after ((account bank-account) + &key opening-bonus-percentage) + (when opening-bonus-percentage + (incf (slot-value account 'balance) + (* (slot-value account 'balance) (/ opening-bonus-percentage 100))))) + + By defining this INITIALIZE-INSTANCE method, you make +:opening-bonus-percentage a legal argument to MAKE-INSTANCE when +creating a bank-account object. + + CL-USER> (defparameter *acct* (make-instance + 'bank-account + :customer-name "Sally Sue" + :balance 1000 + :opening-bonus-percentage 5)) + *ACCT* + CL-USER> (slot-value *acct* 'balance) + 1050 + + ---------- Footnotes ---------- + + (1) Another way to affect the values of slots is with the +:default-initargs option to DEFCLASS. This option is used to specify +forms that will be evaluated to provide arguments for specific +initialization parameters that aren't given a value in a particular call +to MAKE-INSTANCE. You don't need to worry about :default-initargs for +now. + + (2) Adding an :after method to INITIALIZE-INSTANCE is the Common Lisp +analog to defining a constructor in Java or C++ or an __init__ method in +Python. + + (3) One mistake you might make until you get used to using auxiliary +methods is to define a method on INITIALIZE-INSTANCE but without the +:after qualifier. If you do that, you'll get a new primary method that +shadows the default one. You can remove the unwanted primary method +using the functions REMOVE-METHOD and FIND-METHOD. Certain development +environments may provide a graphical user interface to do the same +thing. + + (remove-method #'initialize-instance + (find-method #'initialize-instance () (list (find-class 'bank-account)))) + + +File: pcl.info, Node: 17-4, Next: 17-5, Prev: 17-3, Up: Chapter 17 + +Accessor Functions +================== + +Between MAKE-INSTANCE and SLOT-VALUE, you have all the tools you need +for creating and manipulating instances of your classes. Everything +else you might want to do can be implemented in terms of those two +functions. However, as anyone familiar with the principles of good +object-oriented programming practices knows, directly accessing the +slots (or fields or member variables) of an object can lead to fragile +code. The problem is that directly accessing slots ties your code too +tightly to the concrete structure of your class. For example, suppose +you decide to change the definition of bank-account so that, instead of +storing the current balance as a number, you store a list of +time-stamped withdrawals and deposits. Code that directly accesses the +balance slot will likely break if you change the class definition to +remove the slot or to store the new list in the old slot. On the other +hand, if you define a function, balance, that accesses the slot, you can +redefine it later to preserve its behavior even if the internal +representation changes. And code that uses such a function will +continue to work without modification. + + Another advantage to using accessor functions rather than direct +access to slots via SLOT-VALUE is that they let you limit the ways +outside code can modify a slot. (1) It may be fine for users of the +bank-account class to get the current balance, but you may want all +modifications to the balance to go through other functions you'll +provide, such as deposit and withdraw. If clients know they're supposed +to manipulate objects only through the published functional API, you can +provide a balance function but not make it SETFable if you want the +balance to be read-only. + + Finally, using accessor functions makes your code tidier since it +helps you avoid lots of uses of the rather verbose SLOT-VALUE function. + + It's trivial to define a function that reads the value of the balance +slot. + + (defun balance (account) + (slot-value account 'balance)) + + However, if you know you're going to define subclasses of +bank-account, it might be a good idea to define balance as a generic +function. That way, you can provide different methods on balance for +those subclasses or extend its definition with auxiliary methods. So +you might write this instead: + + (defgeneric balance (account)) + + (defmethod balance ((account bank-account)) + (slot-value account 'balance)) + + As I just discussed, you don't want callers to be able to directly +set the balance, but for other slots, such as customer-name, you may +also want to provide a function to set them. The cleanest way to define +such a function is as a SETF function. + + A SETF function is a way to extend SETF, defining a new kind of place +that it knows how to set. The name of a SETF function is a two-item +list whose first element is the symbol setf and whose second element is +a symbol, typically the name of a function used to access the place the +SETF function will set. A SETF function can take any number of +arguments, but the first argument is always the value to be assigned to +the place. (2) You could, for instance, define a SETF function to set +the customer-name slot in a bank-account like this: + + (defun (setf customer-name) (name account) + (setf (slot-value account 'customer-name) name)) + + After evaluating that definition, an expression like the following +one: + + (setf (customer-name my-account) "Sally Sue") + + will be compiled as a call to the SETF function you just defined with +"Sally Sue" as the first argument and the value of my-account as the +second argument. + + Of course, as with reader functions, you'll probably want your SETF +function to be generic, so you'd actually define it like this: + + (defgeneric (setf customer-name) (value account)) + + (defmethod (setf customer-name) (value (account bank-account)) + (setf (slot-value account 'customer-name) value)) + + And of course you'll also want to define a reader function for +customer-name. + + (defgeneric customer-name (account)) + + (defmethod customer-name ((account bank-account)) + (slot-value account 'customer-name)) + + This allows you to write the following: + + (setf (customer-name *account*) "Sally Sue") ==> "Sally Sue" + + (customer-name *account*) ==> "Sally Sue" + + There's nothing hard about writing these accessor functions, but it +wouldn't be in keeping with The Lisp Way to have to write them all by +hand. Thus, DEFCLASS supports three slot options that allow you to +automatically create reader and writer functions for a specific slot. + + The :reader option specifies a name to be used as the name of a +generic function that accepts an object as its single argument. When +the DEFCLASS is evaluated, the generic function is created, if it +doesn't already exist. Then a method specializing its single argument +on the new class and returning the value of the slot is added to the +generic function. The name can be anything, but it's typical to name it +the same as the slot itself. Thus, instead of explicitly writing the +balance generic function and method as shown previously, you could +change the slot specifier for the balance slot in the definition of +bank-account to this: + + (balance + :initarg :balance + :initform 0 + :reader balance) + + The :writer option is used to create a generic function and method +for setting the value of a slot. The function and method created follow +the requirements for a SETF function, taking the new value as the first +argument and returning it as the result, so you can define a SETF +function by providing a name such as (setf customer-name). For +instance, you could provide reader and writer methods for customer-name +equivalent to the ones you just wrote by changing the slot specifier to +this: + + (customer-name + :initarg :customer-name + :initform (error "Must supply a customer name.") + :reader customer-name + :writer (setf customer-name)) + + Since it's quite common to want both reader and writer functions, +DEFCLASS also provides an option, :accessor, that creates both a reader +function and the corresponding SETF function. So instead of the slot +specifier just shown, you'd typically write this: + + (customer-name + :initarg :customer-name + :initform (error "Must supply a customer name.") + :accessor customer-name) + + Finally, one last slot option you should know about is the +:documentation option, which you can use to provide a string that +documents the purpose of the slot. Putting it all together and adding a +reader method for the account-number and account-type slots, the +DEFCLASS form for the bank-account class would look like this: + + (defclass bank-account () + ((customer-name + :initarg :customer-name + :initform (error "Must supply a customer name.") + :accessor customer-name + :documentation "Customer's name") + (balance + :initarg :balance + :initform 0 + :reader balance + :documentation "Current account balance") + (account-number + :initform (incf *account-numbers*) + :reader account-number + :documentation "Account number, unique within a bank.") + (account-type + :reader account-type + :documentation "Type of account, one of :gold, :silver, or :bronze."))) + + ---------- Footnotes ---------- + + (1) Of course, providing an accessor function doesn't really limit +anything since other code can still use SLOT-VALUE to get at slots +directly. Common Lisp doesn't provide strict encapsulation of slots the +way some languages such as C++ and Java do; however, if the author of a +class provides accessor functions and you ignore them, using SLOT-VALUE +instead, you had better know what you're doing. It's also possible to +use the package system, which I'll discuss in Chapter 21, to make it +even more obvious that certain slots aren't to be accessed directly, by +not exporting the names of the slots. + + (2) One consequence of defining a SETF function-say, (setf foo)-is +that if you also define the corresponding accessor function, foo in this +case, you can use all the modify macros built upon SETF, such as INCF, +DECF, PUSH, and POP, on the new kind of place. + + +File: pcl.info, Node: 17-5, Next: 17-6, Prev: 17-4, Up: Chapter 17 + +WITH-SLOTS and WITH-ACCESSORS +============================= + +While using accessor functions will make your code easier to maintain, +they can still be a bit verbose. And there will be times, when writing +methods that implement the low-level behaviors of a class, that you may +specifically want to access slots directly to set a slot that has no +writer function or to get at the slot value without causing any +auxiliary methods defined on the reader function to run. + + This is what SLOT-VALUE is for; however, it's still quite verbose. +To make matters worse, a function or method that accesses the same slot +several times can become clogged with calls to accessor functions and +SLOT-VALUE. For example, even a fairly simple method such as the +following, which assesses a penalty on a bank-account if its balance +falls below a certain minimum, is cluttered with calls to balance and +SLOT-VALUE: + + (defmethod assess-low-balance-penalty ((account bank-account)) + (when (< (balance account) *minimum-balance*) + (decf (slot-value account 'balance) (* (balance account) .01)))) + + And if you decide you want to directly access the slot value in order +to avoid running auxiliary methods, it gets even more cluttered. + + (defmethod assess-low-balance-penalty ((account bank-account)) + (when (< (slot-value account 'balance) *minimum-balance*) + (decf (slot-value account 'balance) (* (slot-value account 'balance) .01)))) + + Two standard macros, WITH-SLOTS and WITH-ACCESSORS, can help tidy up +this clutter. Both macros create a block of code in which simple +variable names can be used to refer to slots on a particular object. +WITH-SLOTS provides direct access to the slots, as if by SLOT-VALUE, +while WITH-ACCESSORS provides a shorthand for accessor methods. + + The basic form of WITH-SLOTS is as follows: + + (with-slots (slot*) instance-form + body-form*) + + Each element of slots can be either the name of a slot, which is also +used as a variable name, or a two-item list where the first item is a +name to use as a variable and the second is the name of the slot. The +instance-form is evaluated once to produce the object whose slots will +be accessed. Within the body, each occurrence of one of the variable +names is translated to a call to SLOT-VALUE with the object and the +appropriate slot name as arguments. (1) Thus, you can write +assess-low-balance-penalty like this: + + (defmethod assess-low-balance-penalty ((account bank-account)) + (with-slots (balance) account + (when (< balance *minimum-balance*) + (decf balance (* balance .01))))) + + or, using the two-item list form, like this: + + (defmethod assess-low-balance-penalty ((account bank-account)) + (with-slots ((bal balance)) account + (when (< bal *minimum-balance*) + (decf bal (* bal .01))))) + + If you had defined balance with an :accessor rather than just a +:reader, then you could also use WITH-ACCESSORS. The form of +WITH-ACCESSORS is the same as WITH-SLOTS except each element of the slot +list is a two-item list containing a variable name and the name of an +accessor function. Within the body of WITH-ACCESSORS, a reference to +one of the variables is equivalent to a call to the corresponding +accessor function. If the accessor function is SETFable, then so is the +variable. + + (defmethod assess-low-balance-penalty ((account bank-account)) + (with-accessors ((balance balance)) account + (when (< balance *minimum-balance*) + (decf balance (* balance .01))))) + + The first balance is the name of the variable, and the second is the +name of the accessor function; they don't have to be the same. You +could, for instance, write a method to merge two accounts using two +calls to WITH-ACCESSORS, one for each account. + + (defmethod merge-accounts ((account1 bank-account) (account2 bank-account)) + (with-accessors ((balance1 balance)) account1 + (with-accessors ((balance2 balance)) account2 + (incf balance1 balance2) + (setf balance2 0)))) + + The choice of whether to use WITH-SLOTS versus WITH-ACCESSORS is the +same as the choice between SLOT-VALUE and an accessor function: +low-level code that provides the basic functionality of a class may use +SLOT-VALUE or WITH-SLOTS to directly manipulate slots in ways not +supported by accessor functions or to explicitly avoid the effects of +auxiliary methods that may have been defined on the accessor functions. +But you should generally use accessor functions or WITH-ACCESSORS unless +you have a specific reason not to. + + ---------- Footnotes ---------- + + (1) The "variable" names provided by WITH-SLOTS and WITH-ACCESSORS +aren't true variables; they're implemented using a special kind of +macro, called a symbol macro, that allows a simple name to expand into +arbitrary code. Symbol macros were introduced into the language to +support WITH-SLOTS and WITH-ACCESSORS, but you can also use them for +your own purposes. I'll discuss them in a bit more detail in Chapter +20. + + +File: pcl.info, Node: 17-6, Next: 17-7, Prev: 17-5, Up: Chapter 17 + +Class-Allocated Slots +===================== + +The last slot option you need to know about is :allocation. The value +of :allocation can be either :instance or :class and defaults to +:instance if not specified. When a slot has :class allocation, the slot +has only a single value, which is stored in the class and shared by all +instances. + + However, :class slots are accessed the same as :instance +slots-they're accessed with SLOT-VALUE or an accessor function, which +means you can access the slot value only through an instance of the +class even though it isn't actually stored in the instance. The +:initform and :initarg options have essentially the same effect except +the initform is evaluated once when the class is defined rather than +each time an instance is created. On the other hand, passing an initarg +to MAKE-INSTANCE will set the value, affecting all instances of the +class. + + Because you can't get at a class-allocated slot without an instance +of the class, class-allocated slots aren't really equivalent to static +or class fields in languages such as Java, C++, and Python. (1) Rather, +class-allocated slots are used primarily to save space; if you're going +to create many instances of a class and all instances are going to have +a reference to the same object-say, a pool of shared resources-you can +save the cost of each instance having its own reference by making the +slot class-allocated. + + ---------- Footnotes ---------- + + (1) The Meta Object Protocol (MOP), which isn't part of the language +standard but is supported by most Common Lisp implementations, provides +a function, class-prototype, that returns an instance of a class that +can be used to access class slots. If you're using an implementation +that supports the MOP and happen to be translating some code from +another language that makes heavy use of static or class fields, this +may give you a way to ease the translation. But it's not all that +idiomatic. + + +File: pcl.info, Node: 17-7, Next: 17-8, Prev: 17-6, Up: Chapter 17 + +Slots and Inheritance +===================== + +As I discussed in the previous chapter, classes inherit behavior from +their superclasses thanks to the generic function machinery-a method +specialized on class A is applicable not only to direct instances of A +but also to instances of A's subclasses. Classes also inherit slots +from their superclasses, but the mechanism is slightly different. + + In Common Lisp a given object can have only one slot with a +particular name. However, it's possible that more than one class in the +inheritance hierarchy of a given class will specify a slot with a +particular name. This can happen either because a subclass includes a +slot specifier with the same name as a slot specified in a superclass or +because multiple superclasses specify slots with the same name. + + Common Lisp resolves these situations by merging all the specifiers +with the same name from the new class and all its superclasses to create +a single specifier for each unique slot name. When merging specifiers, +different slot options are treated differently. For instance, since a +slot can have only a single default value, if multiple classes specify +an :initform, the new class uses the one from the most specific class. +This allows a subclass to specify a different default value than the one +it would otherwise inherit. + + On the other hand, :initargs needn't be exclusive-each :initarg +option in a slot specifier creates a keyword parameter that can be used +to initialize the slot; multiple parameters don't create a conflict, so +the new slot specifier contains all the :initargs. Callers of +MAKE-INSTANCE can use any of the :initargs to initialize the slot. If a +caller passes multiple keyword arguments that initialize the same slot, +then the leftmost argument in the call to MAKE-INSTANCE is used. + + Inherited :reader, :writer, and :accessor options aren't included in +the merged slot specifier since the methods created by the superclass's +DEFCLASS will already apply to the new class. The new class can, +however, create its own accessor functions by supplying its own :reader, +:writer, or :accessor options. + + Finally, the :allocation option is, like :initform, determined by the +most specific class that specifies the slot. Thus, it's possible for +all instances of one class to share a :class slot while instances of a +subclass may each have their own :instance slot of the same name. And a +sub-subclass may then redefine it back to :class slot, so all instances +of that class will again share a single slot. In the latter case, the +slot shared by instances of the sub-subclass is different than the slot +shared by the original superclass. + + For instance, suppose you have these classes: + + (defclass foo () + ((a :initarg :a :initform "A" :accessor a) + (b :initarg :b :initform "B" :accessor b))) + + (defclass bar (foo) + ((a :initform (error "Must supply a value for a")) + (b :initarg :the-b :accessor the-b :allocation :class))) + + When instantiating the class bar, you can use the inherited initarg, +:a, to specify a value for the slot a and, in fact, must do so to avoid +an error, since the :initform supplied by bar supersedes the one +inherited from foo. To initialize the b slot, you can use either the +inherited initarg :b or the new initarg :the-b. However, because of the +:allocation option on the b slot in bar, the value specified will be +stored in the slot shared by all instances of bar. That same slot can +be accessed either with the method on the generic function b that +specializes on foo or with the new method on the generic function the-b +that specializes directly on bar. To access the a slot on either a foo +or a bar, you'll continue to use the generic function a. + + Usually merging slot definitions works quite nicely. However, it's +important to be aware when using multiple inheritance that two unrelated +slots that happen to have the same name can be merged into a single slot +in the new class. Thus, methods specialized on different classes could +end up manipulating the same slot when applied to a class that extends +those classes. This isn't much of a problem in practice since, as +you'll see in Chapter 21, you can use the package system to avoid +collisions between names in independently developed pieces of code. + + +File: pcl.info, Node: 17-8, Next: 17-9, Prev: 17-7, Up: Chapter 17 + +Multiple Inheritance +==================== + +All the classes you've seen so far have had only a single direct +superclass. Common Lisp also supports multiple inheritance-a class can +have multiple direct superclasses, inheriting applicable methods and +slot specifiers from all of them. + + Multiple inheritance doesn't dramatically change any of the +mechanisms of inheritance I've discussed so far-every user-defined class +already has multiple superclasses since they all extend STANDARD-OBJECT, +which extends T, and so have at least two superclasses. The wrinkle +that multiple inheritance adds is that a class can have more than one +direct superclass. This complicates the notion of class specificity +that's used both when building the effective methods for a generic +function and when merging inherited slot specifiers. + + That is, if classes could have only a single direct superclass, +ordering classes by specificity would be trivial-a class and all its +superclasses could be ordered in a straight line starting from the class +itself, followed by its single direct superclass, followed by its direct +superclass, all the way up to T. But when a class has multiple direct +superclasses, those superclasses are typically not related to each +other-indeed, if one was a subclass of another, you wouldn't need to +subclass both directly. In that case, the rule that subclasses are more +specific than their superclasses isn't enough to order all the +superclasses. So Common Lisp uses a second rule that sorts unrelated +superclasses according to the order they're listed in the DEFCLASS's +direct superclass list-classes earlier in the list are considered more +specific than classes later in the list. This rule is admittedly +somewhat arbitrary but does allow every class to have a linear class +precedence list, which can be used to determine which superclasses +should be considered more specific than others. Note, however, there's +no global ordering of classes-each class has its own class precedence +list, and the same classes can appear in different orders in different +classes' class precedence lists. + + To see how this works, let's add a class to the banking app: +money-market-account. A money market account combines the +characteristics of a checking account and a savings account: a customer +can write checks against it, but it also earns interest. You might +define it like this: + + (defclass money-market-account (checking-account savings-account) ()) + + The class precedence list for money-market-account will be as +follows: + + (money-market-account + checking-account + savings-account + bank-account + standard-object + t) + + Note how this list satisfies both rules: every class appears before +all its superclasses, and checking-account and savings-account appear in +the order specified in DEFCLASS. + + This class defines no slots of its own but will inherit slots from +both of its direct superclasses, including the slots they inherit from +their superclasses. Likewise, any method that's applicable to any class +in the class precedence list will be applicable to a +money-market-account object. Because all slot specifiers for the same +slot are merged, it doesn't matter that money-market-account inherits +the same slot specifiers from bank-account twice. (1) + + Multiple inheritance is easiest to understand when the different +superclasses provide completely independent slots and behaviors. For +instance, money-market-account will inherit slots and behaviors for +dealing with checks from checking-account and slots and behaviors for +computing interest from savings-account. You don't have to worry about +the class precedence list for methods and slots inherited from only one +superclass or another. + + However, it's also possible to inherit different methods for the same +generic function from different superclasses. In that case, the class +precedence list does come into play. For instance, suppose the banking +application defined a generic function print-statement used to generate +monthly statements. Presumably there would already be methods for +print-statement specialized on both checking-account and +savings-account. Both of these methods will be applicable to instances +of money-market-account, but the one specialized on checking-account +will be considered more specific than the one on savings-account because +checking-account precedes savings-account in money-market-account's +class precedence list. + + Assuming the inherited methods are all primary methods and you +haven't defined any other methods, the method specialized on +checking-account will be used if you invoke print-statement on +money-market-account. However, that won't necessarily give you the +behavior you want since you probably want a money market account's +statement to contain elements of both a checking account and a savings +account statement. + + You can modify the behavior of print-statement for +money-market-accounts in a couple ways. One straightforward way is to +define a new primary method specialized on money-market-account. This +gives you the most control over the new behavior but will probably +require more new code than some other options I'll discuss in a moment. +The problem is that while you can use CALL-NEXT-METHOD to call "up" to +the next most specific method, namely, the one specialized on +checking-account, there's no way to invoke a particular less-specific +method, such as the one specialized on savings-account. Thus, if you +want to be able to reuse the code that prints the savings-account part +of the statement, you'll need to break that code into a separate +function, which you can then call directly from both the +money-market-account and savings-account print-statement methods. + + Another possibility is to write the primary methods of all three +classes to call CALL-NEXT-METHOD. Then the method specialized on +money-market-account will use CALL-NEXT-METHOD to invoke the method +specialized on checking-account. When that method calls +CALL-NEXT-METHOD, it will result in running the savings-account method +since it will be the next most specific method according to +money-market-account's class precedence list. + + Of course, if you're going to rely on a coding convention-that every +method calls CALL-NEXT-METHOD-to ensure all the applicable methods run +at some point, you should think about using auxiliary methods instead. +In this case, instead of defining primary methods on print-statement for +checking-account and savings-account, you can define those methods as +:after methods, defining a single primary method on bank-account. Then, +print-statement, called on a money-market-account, will print a basic +account statement, output by the primary method specialized on +bank-account, followed by details output by the :after methods +specialized on savings-account and checking-account. And if you want to +add details specific to money-market-accounts, you can define an :after +method specialized on money-market-account, which will run last of all. + + The advantage of using auxiliary methods is that it makes it quite +clear which methods are primarily responsible for implementing the +generic function and which ones are only contributing additional bits of +functionality. The disadvantage is that you don't get fine-grained +control over the order in which the auxiliary methods run-if you wanted +the checking-account part of the statement to print before the +savings-account part, you'd have to change the order in which the +money-market-account subclasses those classes. But that's a fairly +dramatic change that could affect other methods and inherited slots. In +general, if you find yourself twiddling the order of the direct +superclass list as a way of fine-tuning the behavior of specific +methods, you probably need to step back and rethink your approach. + + On the other hand, if you don't care exactly what the order is but +want it to be consistent across several generic functions, then using +auxiliary methods may be just the thing. For example, if in addition to +print-statement you have a print-detailed-statement generic function, +you can implement both functions using :after methods on the various +subclasses of bank-account, and the order of the parts of both a regular +and a detailed statement will be the same. + + ---------- Footnotes ---------- + + (1) In other words, Common Lisp doesn't suffer from the diamond +inheritance problem the way, say, C++ does. In C++, when one class +subclasses two classes that both inherit a member variable from a common +superclass, the bottom class inherits the member variable twice, leading +to no end of confusion. + + +File: pcl.info, Node: 17-9, Next: Chapter 18, Prev: 17-8, Up: Chapter 17 + +Good Object-Oriented Design +=========================== + +That's about it for the main features of Common Lisp's object system. +If you have lots of experience with object-oriented programming, you can +probably see how Common Lisp's features can be used to implement good +object-oriented designs. However, if you have less experience with +object orientation, you may need to spend some time absorbing the +object-oriented way of thinking. Unfortunately, that's a fairly large +topic and beyond the scope of this book. Or, as the man page for Perl's +object system puts it, "Now you need just to go off and buy a book about +object-oriented design methodology and bang your forehead with it for +the next six months or so." Or you can wait for some of the practical +chapters, later in this book, where you'll see several examples of how +these features are used in practice. For now, however, you're ready to +take a break from all this theory of object orientation and turn to the +rather different topic of how to make good use of Common Lisp's +powerful, but sometimes cryptic, FORMAT function. + + +File: pcl.info, Node: Chapter 18, Next: Chapter 19, Prev: Chapter 17, Up: Top + +18. A Few FORMAT Recipes +======================== + +Common Lisp's FORMAT function is-along with the extended LOOP macro-one +of the two Common Lisp features that inspires a strong emotional +response in a lot of Common Lisp users. Some love it; others hate it. +(1) + + FORMAT's fans love it for its great power and concision, while its +detractors hate it because of the potential for misuse and its opacity. +Complex FORMAT control strings sometimes bear a suspicious resemblance +to line noise, but FORMAT remains popular with Common Lispers who like +to be able to generate little bits of human-readable output without +having to clutter their code with lots of output-generating code. While +FORMAT's control strings can be cryptic, at least a single FORMAT +expression doesn't clutter things up too badly. For instance, suppose +you want to print the values in a list delimited with commas. You could +write this: + + (loop for cons on list + do (format t "~a" (car cons)) + when (cdr cons) do (format t ", ")) + + That's not too bad, but anyone reading this code has to mentally +parse it just to figure out that all it's doing is printing the contents +of list to standard output. On the other hand, you can tell at a glance +that the following expression is printing list, in some form, to +standard output: + + (format t "~{~a~^, ~}" list) + + If you care exactly what form the output will take, then you'll have +to examine the control string, but if all you want is a first-order +approximation of what this line of code is doing, that's immediately +available. + + At any rate, you should have at least a reading knowledge of FORMAT, +and it's worth getting a sense of what it can do before you affiliate +yourself with the pro- or anti-FORMAT camp. It's also important to +understand at least the basics of FORMAT because other standard +functions, such as the condition-signaling functions discussed in the +next chapter, use FORMAT-style control strings to generate output. + + To further complicate matters, FORMAT supports three quite different +kinds of formatting: printing tables of data, pretty-printing +s-expressions, and generating human-readable messages with interpolated +values. Printing tables of data as text is a bit passé these days; +it's one of those reminders that Lisp is nearly as old as FORTRAN. In +fact, several of the directives you can use to print floating-point +values in fixed-width fields were based quite directly on FORTRAN edit +descriptors, which are used in FORTRAN to read and print columns of data +arranged in fixed-width fields. However, using Common Lisp as a FORTRAN +replacement is beyond the scope of this book, so I won't discuss those +aspects of FORMAT. + + Pretty-printing is likewise beyond the scope of this book-not because +it's passé but just because it's too big a topic. Briefly, the Common +Lisp pretty printer is a customizable system for printing +block-structured data such as-but not limited to-s-expressions while +varying indentation and dynamically adding line breaks as needed. It's +a great thing when you need it, but it's not often needed in day-to-day +programming. (2) + + Instead, I'll focus on the parts of FORMAT you can use to generate +human-readable strings with interpolated values. Even limiting the +scope in that way, there's still a fair bit to cover. You shouldn't +feel obliged to remember every detail described in this chapter. You +can get quite far with just a few FORMAT idioms. I'll describe the most +important features of FORMAT first; it's up to you how much of a FORMAT +wizard you want to become. + +* Menu: + +* 18-1:: The FORMAT Function +* 18-2:: FORMAT Directives +* 18-3:: Basic Formatting +* 18-4:: Character and Integer Directives +* 18-5:: Floating-Point Directives +* 18-6:: English-Language Directives +* 18-7:: Conditional Formatting +* 18-8:: Iteration +* 18-9:: Hop, Skip, Jump +* 18-10:: And More . . . + + ---------- Footnotes ---------- + + (1) Of course, most folks realize it's not worth getting that worked +up over anything in a programming language and use it or not without a +lot of angst. On the other hand, it's interesting that these two +features are the two features in Common Lisp that implement what are +essentially domain-specific languages using a syntax not based on +s-expressions. The syntax of FORMAT's control strings is character +based, while the extended LOOP macro can be understood only in terms of +the grammar of the LOOP keywords. That one of the common knocks on both +FORMAT and LOOP is that they "aren't Lispy enough" is evidence that +Lispers really do like the s-expression syntax. + + (2) Readers interested in the pretty printer may want to read the +paper "XP: A Common Lisp Pretty Printing System" by Richard Waters. +It's a description of the pretty printer that was eventually +incorporated into Common Lisp. You can download it from +ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1102a.pdf. + + +File: pcl.info, Node: 18-1, Next: 18-2, Prev: Chapter 18, Up: Chapter 18 + +The FORMAT Function +=================== + +As you've seen in previous chapters, the FORMAT function takes two +required arguments: a destination for its output and a control string +that contains literal text and embedded directives. Any additional +arguments provide the values used by the directives in the control +string that interpolate values into the output. I'll refer to these +arguments as format arguments. + + The first argument to FORMAT, the destination for the output, can be +T, NIL, a stream, or a string with a fill pointer. T is shorthand for +the stream *STANDARD-OUTPUT*, while NIL causes FORMAT to generate its +output to a string, which it then returns. (1) If the destination is a +stream, the output is written to the stream. And if the destination is +a string with a fill pointer, the formatted output is added to the end +of the string and the fill pointer is adjusted appropriately. Except +when the destination is NIL and it returns a string, FORMAT returns NIL. + + The second argument, the control string, is, in essence, a program in +the FORMAT language. The FORMAT language isn't Lispy at all-its basic +syntax is based on characters, not s-expressions, and it's optimized for +compactness rather than easy comprehension. This is why a complex +FORMAT control string can end up looking like line noise. + + Most of FORMAT's directives simply interpolate an argument into the +output in one form or another. Some directives, such as ~%, which +causes FORMAT to emit a newline, don't consume any arguments. And +others, as you'll see, can consume more than one argument. One +directive even allows you to jump around in the list of arguments in +order to process the same argument more than once or to skip certain +arguments in certain situations. But before I discuss specific +directives, let's look at the general syntax of a directive. + + ---------- Footnotes ---------- + + (1) To slightly confuse matters, most other I/O functions also accept +T and NIL as stream designators but with a different meaning: as a +stream designator, T designates the bidirectional stream *TERMINAL-IO*, +while NIL designates *STANDARD-OUTPUT* as an output stream and +*STANDARD-INPUT* as an input stream. + + +File: pcl.info, Node: 18-2, Next: 18-3, Prev: 18-1, Up: Chapter 18 + +FORMAT Directives +================= + +All directives start with a tilde (~) and end with a single character +that identifies the directive. You can write the character in either +upper- or lowercase. Some directives take prefix parameters, which are +written immediately following the tilde, separated by commas, and used +to control things such as how many digits to print after the decimal +point when printing a floating-point number. For example, the ~$ +directive, one of the directives used to print floating-point values, by +default prints two digits following the decimal point. + + CL-USER> (format t "~$" pi) + 3.14 + NIL + + However, with a prefix parameter, you can specify that it should +print its argument to, say, five decimal places like this: + + CL-USER> (format t "~5$" pi) + 3.14159 + NIL + + The values of prefix parameters are either numbers, written in +decimal, or characters, written as a single quote followed by the +desired character. The value of a prefix parameter can also be derived +from the format arguments in two ways: A prefix parameter of v causes +FORMAT to consume one format argument and use its value for the prefix +parameter. And a prefix parameter of # will be evaluated as the number +of remaining format arguments. For example: + + CL-USER> (format t "~v$" 3 pi) + 3.142 + NIL + CL-USER> (format t "~#$" pi) + 3.1 + NIL + + I'll give some more realistic examples of how you can use the # +argument in the section "Conditional Formatting." + + You can also omit prefix parameters altogether. However, if you want +to specify one parameter but not the ones before it, you must include a +comma for each unspecified parameter. For instance, the ~F directive, +another directive for printing floating-point values, also takes a +parameter to control the number of decimal places to print, but it's the +second parameter rather than the first. If you want to use ~F to print +a number to five decimal places, you can write this: + + CL-USER> (format t "~,5f" pi) + 3.14159 + NIL + + You can also modify the behavior of some directives with colon and +at-sign modifiers, which are placed after any prefix parameters and +before the directive's identifying character. These modifiers change +the behavior of the directive in small ways. For instance, with a colon +modifier, the ~D directive used to output integers in decimal emits the +number with commas separating every three digits, while the at-sign +modifier causes ~D to include a plus sign when the number is positive. + + CL-USER> (format t "~d" 1000000) + 1000000 + NIL + CL-USER> (format t "~:d" 1000000) + 1,000,000 + NIL + CL-USER> (format t "~@d" 1000000) + +1000000 + NIL + + When it makes sense, you can combine the colon and at-sign modifiers +to get both modifications. + + CL-USER> (format t "~:@d" 1000000) + +1,000,000 + NIL + + In directives where the two modified behaviors can't be meaningfully +combined, using both modifiers is either undefined or given a third +meaning. + + +File: pcl.info, Node: 18-3, Next: 18-4, Prev: 18-2, Up: Chapter 18 + +Basic Formatting +================ + +Now you're ready to look at specific directives. I'll start with +several of the most commonly used directives, including some you've seen +in previous chapters. + + The most general-purpose directive is ~A, which consumes one format +argument of any type and outputs it in aesthetic (human-readable) form. +For example, strings are output without quotation marks or escape +characters, and numbers are output in a natural way for the type of +number. If you just want to emit a value for human consumption, this +directive is your best bet. + + (format nil "The value is: ~a" 10) ==> "The value is: 10" + (format nil "The value is: ~a" "foo") ==> "The value is: foo" + (format nil "The value is: ~a" (list 1 2 3)) ==> "The value is: (1 2 3)" + + A closely related directive, ~S, likewise consumes one format +argument of any type and outputs it. However, ~S tries to generate +output that can be read back in with READ. Thus, strings will be +enclosed in quotation marks, symbols will be package-qualified when +necessary, and so on. Objects that don't have a READable representation +are printed with the unreadable object syntax, #<>. With a colon +modifier, both the ~A and ~S directives emit NIL as () rather than NIL. +Both the ~A and ~S directives also take up to four prefix parameters, +which can be used to control whether padding is added after (or before +with the at-sign modifier) the value, but those parameters are only +really useful for generating tabular data. + + The other two most frequently used directives are ~%, which emits a +newline, and ~&, which emits a fresh line. The difference between the +two is that ~% always emits a newline, while ~& emits one only if it's +not already at the beginning of a line. This is handy when writing +loosely coupled functions that each generate a piece of output and that +need to be combined in different ways. For instance, if one function +generates output that ends with a newline (~%) and another function +generates some output that starts with a fresh line (~&), you don't have +to worry about getting an extra blank line if you call them one after +the other. Both of these directives can take a single prefix parameter +that specifies the number of newlines to emit. The ~% directive will +simply emit that many newline characters, while the ~& directive will +emit either n - 1 or n newlines, depending on whether it starts at the +beginning of a line. + + Less frequently used is the related ~~ directive, which causes FORMAT +to emit a literal tilde. Like the ~% and ~& directives, it can be +parameterized with a number that controls how many tildes to emit. + + +File: pcl.info, Node: 18-4, Next: 18-5, Prev: 18-3, Up: Chapter 18 + +Character and Integer Directives +================================ + +In addition to the general-purpose directives, ~A and ~S, FORMAT +supports several directives that can be used to emit values of specific +types in particular ways. One of the simplest of these is the ~C +directive, which is used to emit characters. It takes no prefix +arguments but can be modified with the colon and at-sign modifiers. +Unmodified, its behavior is no different from ~A except that it works +only with characters. The modified versions are more useful. With a +colon modifier, ~:C outputs nonprinting characters such as space, tab, +and newline by name. This is useful if you want to emit a message to +the user about some character. For instance, the following: + + (format t "Syntax error. Unexpected character: ~:c" char) + + can emit messages like this: + + Syntax error. Unexpected character: a + + but also like the following: + + Syntax error. Unexpected character: Space + + With the at-sign modifier, ~@C will emit the character in Lisp's +literal character syntax. + + CL-USER> (format t "~#\a + NIL + + With both the colon and at-sign modifiers, the ~C directive can print +extra information about how to enter the character at the keyboard if it +requires special key combinations. For instance, on the Macintosh, in +certain applications you can enter a null character (character code 0 in +ASCII or in any ASCII superset such as ISO-8859-1 or Unicode) by +pressing the Control key and typing . In OpenMCL, if you print the null +character with the ~:C directive, it tells you this: + + (format nil "~: + + However, not all Lisps implement this aspect of the ~C directive. +And even if they do, it may or may not be accurate-for instance, if +you're running OpenMCL in SLIME, the C- key chord is intercepted by +Emacs, invoking set-mark-command. (1) + + Format directives dedicated to emitting numbers are another important +category. While you can use the ~A and ~S directives to emit numbers, +if you want fine control over how they're printed, you need to use one +of the number-specific directives. The numeric directives can be +divided into two subcategories: directives for formatting integer values +and directives for formatting floating-point values. + + Five closely related directives format integer values: ~D, ~X, ~O, +~B, and ~R. The most frequently used is the ~D directive, which outputs +integers in base 10. + + (format nil "~d" 1000000) ==> "1000000" + + As I mentioned previously, with a colon modifier it adds commas. + + (format nil "~:d" 1000000) ==> "1,000,000" + + And with an at-sign modifier, it always prints a sign. + + (format nil "~@d" 1000000) ==> "+1000000" + + And the two modifiers can be combined. + + (format nil "~:@d" 1000000) ==> "+1,000,000" + + The first prefix parameter can specify a minimum width for the +output, and the second parameter can specify a padding character to use. +The default padding character is space, and padding is always inserted +before the number itself. + + (format nil "~12d" 1000000) ==> " 1000000" + (format nil "~12,'0d" 1000000) ==> "000001000000" + + These parameters are handy for formatting things such as dates in a +fixed-width format. + + (format nil "~4,'0d-~2,'0d-~2,'0d" 2005 6 10) ==> "2005-06-10" + + The third and fourth parameters are used in conjunction with the +colon modifier: the third parameter specifies the character to use as +the separator between groups and digits, and the fourth parameter +specifies the number of digits per group. These parameters default to a +comma and the number 3. Thus, you can use the directive ~:D without +parameters to output large integers in standard format for the United +States but can change the comma to a period and the grouping from 3 to 4 +with ~,,'.,4D. + + (format nil "~:d" 100000000) ==> "100,000,000" + (format nil "~,,'.,4:d" 100000000) ==> "1.0000.0000" + + Note that you must use commas to hold the places of the unspecified +width and padding character parameters, allowing them to keep their +default values. + + The ~X, ~O, and ~B directives work just like the ~D directive except +they emit numbers in hexadecimal (base 16), octal (base 8), and binary +(base 2). + + (format nil "~x" 1000000) ==> "f4240" + (format nil "~o" 1000000) ==> "3641100" + (format nil "~b" 1000000) ==> "11110100001001000000" + + Finally, the ~R directive is the general radix directive. Its first +parameter is a number between 2 and 36 (inclusive) that indicates what +base to use. The remaining parameters are the same as the four +parameters accepted by the ~D, ~X, ~O, and ~B directives, and the colon +and at-sign modifiers modify its behavior in the same way. The ~R +directive also has some special behavior when used with no prefix +parameters, which I'll discuss in the section "English-Language +Directives." + + ---------- Footnotes ---------- + + (1) This variant on the ~C directive makes more sense on platforms +like the Lisp Machines where key press events were represented by Lisp +characters. + + +File: pcl.info, Node: 18-5, Next: 18-6, Prev: 18-4, Up: Chapter 18 + +Floating-Point Directives +========================= + +Four directives format floating-point values: ~F, ~E, ~G, and ~$. The +first three of these are the directives based on FORTRAN's edit +descriptors. I'll skip most of the details of those directives since +they mostly have to do with formatting floating-point values for use in +tabular form. However, you can use the ~F, ~E, and ~$ directives to +interpolate floating-point values into text. The ~G, or general, +floating-point directive, on the other hand, combines aspects of the ~F +and ~E directives in a way that only really makes sense for generating +tabular output. + + The ~F directive emits its argument, which should be a number, (1) in +decimal format, possibly controlling the number of digits after the +decimal point. The ~F directive is, however, allowed to use +computerized scientific notation if the number is sufficiently large or +small. The ~E directive, on the other hand, always emits numbers in +computerized scientific notation. Both of these directives take a +number of prefix parameters, but you need to worry only about the +second, which controls the number of digits to print after the decimal +point. + + (format nil "~f" pi) ==> "3.141592653589793d0" + (format nil "~,4f" pi) ==> "3.1416" + (format nil "~e" pi) ==> "3.141592653589793d+0" + (format nil "~,4e" pi) ==> "3.1416d+0" + + The ~$, or monetary, directive is similar to ~F but a bit simpler. +As its name suggests, it's intended for emitting monetary units. With +no parameters, it's basically equivalent to ~,2F. To modify the number +of digits printed after the decimal point, you use the first parameter, +while the second parameter controls the minimum number of digits to +print before the decimal point. + + (format nil "~$" pi) ==> "3.14" + (format nil "~2,4$" pi) ==> "0003.14" + + All three directives, ~F, ~E, and ~$, can be made to always print a +sign, plus or minus, with the at-sign modifier. (2) + + ---------- Footnotes ---------- + + (1) Technically, if the argument isn't a real number, ~F is supposed +to format it as if by the ~D directive, which in turn behaves like the +~A directive if the argument isn't a number, but not all implementations +get this right. + + (2) Well, that's what the language standard says. For some reason, +perhaps rooted in a common ancestral code base, several Common Lisp +implementations don't implement this aspect of the ~F directive +correctly. + + +File: pcl.info, Node: 18-6, Next: 18-7, Prev: 18-5, Up: Chapter 18 + +English-Language Directives +=========================== + +Some of the handiest FORMAT directives for generating human-readable +messages are the ones for emitting English text. These directives allow +you to emit numbers as English words, to emit plural markers based on +the value of a format argument, and to apply case conversions to +sections of FORMAT's output. + + The ~R directive, which I discussed in "Character and Integer +Directives," when used with no base specified, prints numbers as English +words or Roman numerals. When used with no prefix parameter and no +modifiers, it emits the number in words as a cardinal number. + + (format nil "~r" 1234) ==> "one thousand two hundred thirty-four" + + With the colon modifier, it emits the number as an ordinal. + + (format nil "~:r" 1234) ==> "one thousand two hundred thirty-fourth" + + And with an at-sign modifier, it emits the number as a Roman numeral; +with both an at-sign and a colon, it emits "old-style" Roman numerals in +which fours and nines are written as IIII and VIIII instead of IV and +IX. + + (format nil "~@r" 1234) ==> "MCCXXXIV" + (format nil "~:@r" 1234) ==> "MCCXXXIIII" + + For numbers too large to be represented in the given form, ~R behaves +like ~D. + + To help you generate messages with words properly pluralized, FORMAT +provides the ~P directive, which simply emits an s unless the +corresponding argument is 1. + + (format nil "file~p" 1) ==> "file" + (format nil "file~p" 10) ==> "files" + (format nil "file~p" 0) ==> "files" + + Typically, however, you'll use ~P with the colon modifier, which +causes it to reprocess the previous format argument. + + (format nil "~r file~:p" 1) ==> "one file" + (format nil "~r file~:p" 10) ==> "ten files" + (format nil "~r file~:p" 0) ==> "zero files" + + With the at-sign modifier, which can be combined with the colon +modifier, ~P emits either y or ies. + + (format nil "~r famil~:@p" 1) ==> "one family" + (format nil "~r famil~:@p" 10) ==> "ten families" + (format nil "~r famil~:@p" 0) ==> "zero families" + + Obviously, ~P can't solve all pluralization problems and is no help +for generating messages in other languages, but it's handy for the cases +it does handle. And the ~[ directive, which I'll discuss in a moment, +gives you a more flexible way to conditionalize parts of FORMAT's +output. + + The last directive for dealing with emitting English text is ~(, +which allows you to control the case of text in the output. Each ~( is +paired with a ~), and all the output generated by the portion of the +control string between the two markers will be converted to all +lowercase. + + (format nil "~(~a~)" "FOO") ==> "foo" + (format nil "~(~@r~)" 124) ==> "cxxiv" + + You can modify ~( with an at sign to make it capitalize the first +word in a section of text, with a colon to make it to capitalize all +words, and with both modifiers to convert all text to uppercase. (A +word for the purpose of this directive is a sequence of alphanumeric +characters delimited by nonalphanumeric characters or the ends of the +text.) + + (format nil "~(~a~)" "tHe Quick BROWN foX") ==> "the quick brown fox" + (format nil "~@(~a~)" "tHe Quick BROWN foX") ==> "The quick brown fox" + (format nil "~:(~a~)" "tHe Quick BROWN foX") ==> "The Quick Brown Fox" + (format nil "~:@(~a~)" "tHe Quick BROWN foX") ==> "THE QUICK BROWN FOX" + + +File: pcl.info, Node: 18-7, Next: 18-8, Prev: 18-6, Up: Chapter 18 + +Conditional Formatting +====================== + +In addition to directives that interpolate arguments and modify other +output, FORMAT provides several directives that implement simple control +constructs within the control string. One of these, which you used in +Chapter 9, is the conditional directive ~[. This directive is closed by +a corresponding ~], and in between are a number of clauses separated by +~;. The job of the ~[ directive is to pick one of the clauses, which is +then processed by FORMAT. With no modifiers or parameters, the clause is +selected by numeric index; the ~[ directive consumes a format argument, +which should be a number, and takes the nth (zero-based) clause where N +is the value of the argument. + + (format nil "~[cero~;uno~;dos~]" 0) ==> "cero" + (format nil "~[cero~;uno~;dos~]" 1) ==> "uno" + (format nil "~[cero~;uno~;dos~]" 2) ==> "dos" + + If the value of the argument is greater than the number of clauses, +nothing is printed. + + (format nil "~[cero~;uno~;dos~]" 3) ==> "" + + However, if the last clause separator is ~:; instead of ~;, then the +last clause serves as a default clause. + + (format nil "~[cero~;uno~;dos~:;mucho~]" 3) ==> "mucho" + (format nil "~[cero~;uno~;dos~:;mucho~]" 100) ==> "mucho" + + It's also possible to specify the clause to be selected using a +prefix parameter. While it'd be silly to use a literal value in the +control string, recall that # used as a prefix parameter means the +number of arguments remaining to be processed. Thus, you can define a +format string such as the following: + + (defparameter *list-etc* + "~#[NONE~;~a~;~a and ~a~:;~a, ~a~]~#[~; and ~a~:;, ~a, etc~].") + + and then use it like this: + + (format nil *list-etc*) ==> "NONE." + (format nil *list-etc* 'a) ==> "A." + (format nil *list-etc* 'a 'b) ==> "A and B." + (format nil *list-etc* 'a 'b 'c) ==> "A, B and C." + (format nil *list-etc* 'a 'b 'c 'd) ==> "A, B, C, etc." + (format nil *list-etc* 'a 'b 'c 'd 'e) ==> "A, B, C, etc." + + Note that the control string actually contains two ~[~] +directives-both of which use # to select the clause to use. The first +consumes between zero and two arguments, while the second consumes one +more, if available. FORMAT will silently ignore any arguments not +consumed while processing the control string. + + With a colon modifier, the ~[ can contain only two clauses; the +directive consumes a single argument and processes the first clause if +the argument is NIL and the second clause is otherwise. You used this +variant of ~[ in Chapter 9 to generate pass/fail messages, like this: + + (format t "~:[FAIL~;pass~]" test-result) + + Note that either clause can be empty, but the directive must contain +a ~;. + + Finally, with an at-sign modifier, the ~[ directive can have only one +clause. The directive consumes one argument and, if it's non-NIL, +processes the clause after backing up to make the argument available to +be consumed again. + + (format nil "~@[x = ~a ~]~@[y = ~a~]" 10 20) ==> "x = 10 y = 20" + (format nil "~@[x = ~a ~]~@[y = ~a~]" 10 nil) ==> "x = 10 " + (format nil "~@[x = ~a ~]~@[y = ~a~]" nil 20) ==> "y = 20" + (format nil "~@[x = ~a ~]~@[y = ~a~]" nil nil) ==> "" + + +File: pcl.info, Node: 18-8, Next: 18-9, Prev: 18-7, Up: Chapter 18 + +Iteration +========= + +Another FORMAT directive that you've seen already, in passing, is the +iteration directive ~{. This directive tells FORMAT to iterate over the +elements of a list or over the implicit list of the format arguments. + + With no modifiers, ~{ consumes one format argument, which must be a +list. Like the ~[ directive, which is always paired with a ~] +directive, the ~{ directive is always paired with a closing ~}. The +text between the two markers is processed as a control string, which +draws its arguments from the list consumed by the ~{ directive. FORMAT +will repeatedly process this control string for as long as the list +being iterated over has elements left. In the following example, the ~{ +consumes the single format argument, the list (1 2 3), and then +processes the control string "~a, ", repeating until all the elements of +the list have been consumed. + + (format nil "~{~a, ~}" (list 1 2 3)) ==> "1, 2, 3, " + + However, it's annoying that in the output the last element of the +list is followed by a comma and a space. You can fix that with the ~^ +directive; within the body of a ~{ directive, the ~^ causes the +iteration to stop immediately, without processing the rest of the +control string, when no elements remain in the list. Thus, to avoid +printing the comma and space after the last element of a list, you can +precede them with a ~^. + + (format nil "~{~a~^, ~}" (list 1 2 3)) ==> "1, 2, 3" + + The first two times through the iteration, there are still +unprocessed elements in the list when the ~^ is processed. The third +time through, however, after the ~a directive consumes the 3, the ~^ +will cause FORMAT to break out of the iteration without printing the +comma and space. + + With an at-sign modifier, ~{ processes the remaining format arguments +as a list. + + (format nil "~@{~a~^, ~}" 1 2 3) ==> "1, 2, 3" + + Within the body of a ~{...~}, the special prefix parameter # refers +to the number of items remaining to be processed in the list rather than +the number of remaining format arguments. You can use that, along with +the ~[ directive, to print a comma-separated list with an "and" before +the last item like this: + + (format nil "~{~a~#[~;, and ~:;, ~]~}" (list 1 2 3)) ==> "1, 2, and 3" + + However, that doesn't really work right if the list is two items long +because it adds an extra comma. + + (format nil "~{~a~#[~;, and ~:;, ~]~}" (list 1 2)) ==> "1, and 2" + + You could fix that in a bunch of ways. The following takes advantage +of the behavior of ~@{ when nested inside another ~{ or ~@{ directive-it +iterates over whatever items remain in the list being iterated over by +the outer ~{. You can combine that with a ~#[ directive to make the +following control string for formatting lists according to English +grammar: + + (defparameter *english-list* + "~{~#[~;~a~;~a and ~a~:;~@{~a~#[~;, and ~:;, ~]~}~]~}") + + (format nil *english-list* '()) ==> "" + (format nil *english-list* '(1)) ==> "1" + (format nil *english-list* '(1 2)) ==> "1 and 2" + (format nil *english-list* '(1 2 3)) ==> "1, 2, and 3" + (format nil *english-list* '(1 2 3 4)) ==> "1, 2, 3, and 4" + + While that control string verges on being "write-only" code, it's not +too hard to understand if you take it a bit at a time. The outer +~{...~} will consume and iterate over a list. The whole body of the +iteration then consists of a ~#[...~]; the output generated each time +through the iteration will thus depend on the number of items left to be +processed from the list. Splitting apart the ~#[...~] directive on the +~; clause separators, you can see that it's made up of four clauses, the +last of which is a default clause because it's preceded by a ~:; rather +than a plain ~;. The first clause, for when there are zero elements to +be processed, is empty, which makes sense-if there are no more elements +to be processed, the iteration would've stopped already. The second +clause handles the case of one element with a simple ~a directive. Two +elements are handled with "~a and ~a". And the default clause, which +handles three or more elements, consists of another iteration directive, +this time using ~@{ to iterate over the remaining elements of the list +being processed by the outer ~{. And the body of that iteration is the +control string that can handle a list of three or more elements +correctly, which is fine in this context. Because the ~@{ loop consumes +all the remaining list items, the outer loop iterates only once. + + If you wanted to print something special such as "<empty>" when the +list was empty, you have a couple ways to do it. Perhaps the easiest is +to put the text you want into the first (zeroth) clause of the outer ~#[ +and then add a colon modifier to the closing ~} of the outer +iteration-the colon forces the iteration to be run at least once, even +if the list is empty, at which point FORMAT processes the zeroth clause +of the conditional directive. + + (defparameter *english-list* + "~{~#[<empty>~;~a~;~a and ~a~:;~@{~a~#[~;, and ~:;, ~]~}~]~:}") + + (format nil *english-list* '()) ==> "<empty>" + + Amazingly, the ~{ directive provides even more variations with +different combinations of prefix parameters and modifiers. I won't +discuss them other than to say you can use an integer prefix parameter +to limit the maximum number of iterations and that, with a colon +modifier, each element of the list (either an actual list or the list +constructed by the ~@{ directive) must itself be a list whose elements +will then be used as arguments to the control string in the ~:{...~} +directive. + + +File: pcl.info, Node: 18-9, Next: 18-10, Prev: 18-8, Up: Chapter 18 + +Hop, Skip, Jump +=============== + +A much simpler directive is the ~* directive, which allows you to jump +around in the list of format arguments. In its basic form, without +modifiers, it simply skips the next argument, consuming it without +emitting anything. More often, however, it's used with a colon +modifier, which causes it to move backward, allowing the same argument +to be used a second time. For instance, you can use ~:* to print a +numeric argument once as a word and once in numerals like this: + + (format nil "~r ~:*(~d)" 1) ==> "one (1)" + + Or you could implement a directive similar to ~:P for an irregular +plural by combing ~:* with ~[. + + (format nil "I saw ~r el~:*~[ves~;f~:;ves~]." 0) ==> "I saw zero elves." + (format nil "I saw ~r el~:*~[ves~;f~:;ves~]." 1) ==> "I saw one elf." + (format nil "I saw ~r el~:*~[ves~;f~:;ves~]." 2) ==> "I saw two elves." + + In this control string, the ~R prints the format argument as a +cardinal number. Then the ~:* directive backs up so the number is also +used as the argument to the ~[ directive, selecting between the clauses +for when the number is zero, one, or anything else. (1) + + Within an ~{ directive, ~* skips or backs up over the items in the +list. For instance, you could print only the keys of a plist like this: + + (format nil "~{~s~*~^ ~}" '(:a 10 :b 20)) ==> ":A :B" + + The ~* directive can also be given a prefix parameter. With no +modifiers or with the colon modifier, this parameter specifies the +number of arguments to move forward or backward and defaults to one. +With an at-sign modifier, the prefix parameter specifies an absolute, +zero-based index of the argument to jump to, defaulting to zero. The +at-sign variant of ~* can be useful if you want to use different control +strings to generate different messages for the same arguments and if +different messages need to use the arguments in different orders. (2) + + ---------- Footnotes ---------- + + (1) If you find "I saw zero elves" to be a bit clunky, you could use +a slightly more elaborate format string that makes another use of ~:* +like this: + + (format nil "I saw ~[no~:;~:*~r~] el~:*~[ves~;f~:;ves~]." 0) ==> "I saw no elves." + (format nil "I saw ~[no~:;~:*~r~] el~:*~[ves~;f~:;ves~]." 1) ==> "I saw one elf." + (format nil "I saw ~[no~:;~:*~r~] el~:*~[ves~;f~:;ves~]." 2) ==> "I saw two elves." + + (2) This kind of problem can arise when trying to localize an +application and translate human-readable messages into different +languages. FORMAT can help with some of these problems but is by no +means a full-blown localization system. + + +File: pcl.info, Node: 18-10, Next: Chapter 19, Prev: 18-9, Up: Chapter 18 + +And More . . . +============== + +And there's more-I haven't mentioned the ~? directive, which can take +snippets of control strings from the format arguments or the ~/ +directive, which allows you to call an arbitrary function to handle the +next format argument. And then there are all the directives for +generating tabular and pretty-printed output. But the directives +discussed in this chapter should be plenty for the time being. + + In the next chapter, you'll move onto Common Lisp's condition system, +the Common Lisp analog to other languages' exception and error handling +systems. + + +File: pcl.info, Node: Chapter 19, Next: Chapter 20, Prev: Chapter 18, Up: Top + +19. Beyond Exception Handling: Conditions and Restarts +====================================================== + +One of Lisp's great features is its condition system. It serves a +similar purpose to the exception handling systems in Java, Python, and +C++ but is more flexible. In fact, its flexibility extends beyond error +handling-conditions are more general than exceptions in that a condition +can represent any occurrence during a program's execution that may be of +interest to code at different levels on the call stack. For example, in +the section "Other Uses for Conditions," you'll see that conditions can +be used to emit warnings without disrupting execution of the code that +emits the warning while allowing code higher on the call stack to +control whether the warning message is printed. For the time being, +however, I'll focus on error handling. + + The condition system is more flexible than exception systems because +instead of providing a two-part division between the code that signals +an error (1) and the code that handles it, (2) the condition system +splits the responsibilities into three parts-signaling a condition, +handling it, and restarting. In this chapter, I'll describe how you +could use conditions in part of a hypothetical application for analyzing +log files. You'll see how you could use the condition system to allow a +low-level function to detect a problem while parsing a log file and +signal an error, to allow mid-level code to provide several possible +ways of recovering from such an error, and to allow code at the highest +level of the application to define a policy for choosing which recovery +strategy to use. + + To start, I'll introduce some terminology: errors, as I'll use the +term, are the consequences of Murphy's law. If something can go wrong, +it will: a file that your program needs to read will be missing, a disk +that you need to write to will be full, the server you're talking to +will crash, or the network will go down. If any of these things happen, +it may stop a piece of code from doing what you want. But there's no +bug; there's no place in the code that you can fix to make the +nonexistent file exist or the disk not be full. However, if the rest of +the program is depending on the actions that were going to be taken, +then you'd better deal with the error somehow or you will have +introduced a bug. So, errors aren't caused by bugs, but neglecting to +handle an error is almost certainly a bug. + + So, what does it mean to handle an error? In a well-written program, +each function is a black box hiding its inner workings. Programs are +then built out of layers of functions: high-level functions are built on +top of the lower-level functions, and so on. This hierarchy of +functionality manifests itself at runtime in the form of the call stack: +if high calls medium, which calls low, when the flow of control is in +low, it's also still in medium and high, that is, they're still on the +call stack. + + Because each function is a black box, function boundaries are an +excellent place to deal with errors. Each function-low, for example-has +a job to do. Its direct caller-medium in this case-is counting on it to +do its job. However, an error that prevents it from doing its job puts +all its callers at risk: medium called low because it needs the work +done that low does; if that work doesn't get done, medium is in trouble. +But this means that medium's caller, high, is also in trouble-and so on +up the call stack to the very top of the program. On the other hand, +because each function is a black box, if any of the functions in the +call stack can somehow do their job despite underlying errors, then none +of the functions above it needs to know there was a problem-all those +functions care about is that the function they called somehow did the +work expected of it. + + In most languages, errors are handled by returning from a failing +function and giving the caller the choice of either recovering or +failing itself. Some languages use the normal function return +mechanism, while languages with exceptions return control by throwing or +raising an exception. Exceptions are a vast improvement over using +normal function returns, but both schemes suffer from a common flaw: +while searching for a function that can recover, the stack unwinds, +which means code that might recover has to do so without the context of +what the lower-level code was trying to do when the error actually +occurred. + + Consider the hypothetical call chain of high, medium, low. If low +fails and medium can't recover, the ball is in high's court. For high +to handle the error, it must either do its job without any help from +medium or somehow change things so calling medium will work and call it +again. The first option is theoretically clean but implies a lot of +extra code-a whole extra implementation of whatever it was medium was +supposed to do. And the further the stack unwinds, the more work that +needs to be redone. The second option-patching things up and +retrying-is tricky; for high to be able to change the state of the world +so a second call into medium won't end up causing an error in low, it'd +need an unseemly knowledge of the inner workings of both medium and low, +contrary to the notion that each function is a black box. + +* Menu: + +* 19-1:: The Lisp Way +* 19-2:: Conditions +* 19-3:: Condition Handlers +* 19-4:: Restarts +* 19-5:: Providing Multiple Restarts +* 19-6:: Other Uses for Conditions + + ---------- Footnotes ---------- + + (1) Throws or raises an exception in Java/Python terms + + (2) Catches the exception in Java/Python terms + + +File: pcl.info, Node: 19-1, Next: 19-2, Prev: Chapter 19, Up: Chapter 19 + +The Lisp Way +============ + +Common Lisp's error handling system gives you a way out of this +conundrum by letting you separate the code that actually recovers from +an error from the code that decides how to recover. Thus, you can put +recovery code in low-level functions without committing to actually +using any particular recovery strategy, leaving that decision to code in +high-level functions. + + To get a sense of how this works, let's suppose you're writing an +application that reads some sort of textual log file, such as a Web +server's log. Somewhere in your application you'll have a function to +parse the individual log entries. Let's assume you'll write a function, +parse-log-entry, that will be passed a string containing the text of a +single log entry and that is supposed to return a log-entry object +representing the entry. This function will be called from a function, +parse-log-file, that reads a complete log file and returns a list of +objects representing all the entries in the file. + + To keep things simple, the parse-log-entry function will not be +required to parse incorrectly formatted entries. It will, however, be +able to detect when its input is malformed. But what should it do when +it detects bad input? In C you'd return a special value to indicate +there was a problem. In Java or Python you'd throw or raise an +exception. In Common Lisp, you signal a condition. + + +File: pcl.info, Node: 19-2, Next: 19-3, Prev: 19-1, Up: Chapter 19 + +Conditions +========== + +A condition is an object whose class indicates the general nature of the +condition and whose instance data carries information about the details +of the particular circumstances that lead to the condition being +signaled. (1) In this hypothetical log analysis program, you might +define a condition class, malformed-log-entry-error, that +parse-log-entry will signal if it's given data it can't parse. + + Condition classes are defined with the DEFINE-CONDITION macro, which +works essentially the same as DEFCLASS except that the default +superclass of classes defined with DEFINE-CONDITION is CONDITION rather +than STANDARD-OBJECT. Slots are specified in the same way, and condition +classes can singly and multiply inherit from other classes that descend +from CONDITION. But for historical reasons, condition classes aren't +required to be instances of STANDARD-OBJECT, so some of the functions +you use with DEFCLASSed classes aren't required to work with conditions. +In particular, a condition's slots can't be accessed using SLOT-VALUE; +you must specify either a :reader option or an :accessor option for any +slot whose value you intend to use. Likewise, new condition objects are +created with MAKE-CONDITION rather than MAKE-INSTANCE. MAKE-CONDITION +initializes the slots of the new condition based on the :initargs it's +passed, but there's no way to further customize a condition's +initialization, equivalent to INITIALIZE-INSTANCE. (2) + + When using the condition system for error handling, you should define +your conditions as subclasses of ERROR, a subclass of CONDITION. Thus, +you might define malformed-log-entry-error, with a slot to hold the +argument that was passed to parse-log-entry, like this: + + (define-condition malformed-log-entry-error (error) + ((text :initarg :text :reader text))) + + ---------- Footnotes ---------- + + (1) In this respect, a condition is a lot like an exception in Java +or Python except not all conditions represent an error or exceptional +situation. + + (2) In some Common Lisp implementations, conditions are defined as +subclasses of STANDARD-OBJECT, in which case SLOT-VALUE, MAKE-INSTANCE, +and INITIALIZE-INSTANCE will work, but it's not portable to rely on it. + + +File: pcl.info, Node: 19-3, Next: 19-4, Prev: 19-2, Up: Chapter 19 + +Condition Handlers +================== + +In parse-log-entry you'll signal a malformed-log-entry-error if you +can't parse the log entry. You signal errors with the function ERROR, +which calls the lower-level function SIGNAL and drops into the debugger +if the condition isn't handled. You can call ERROR two ways: you can +pass it an already instantiated condition object, or you can pass it the +name of the condition class and any initargs needed to construct a new +condition, and it will instantiate the condition for you. The former is +occasionally useful for resignaling an existing condition object, but +the latter is more concise. Thus, you could write parse-log-entry like +this, eliding the details of actually parsing a log entry: + + (defun parse-log-entry (text) + (if (well-formed-log-entry-p text) + (make-instance 'log-entry ...) + (error 'malformed-log-entry-error :text text))) + + What happens when the error is signaled depends on the code above +parse-log-entry on the call stack. To avoid landing in the debugger, +you must establish a condition handler in one of the functions leading +to the call to parse-log-entry. When a condition is signaled, the +signaling machinery looks through a list of active condition handlers, +looking for a handler that can handle the condition being signaled based +on the condition's class. Each condition handler consists of a type +specifier indicating what types of conditions it can handle and a +function that takes a single argument, the condition. At any given +moment there can be many active condition handlers established at +various levels of the call stack. When a condition is signaled, the +signaling machinery finds the most recently established handler whose +type specifier is compatible with the condition being signaled and calls +its function, passing it the condition object. + + The handler function can then choose whether to handle the condition. +The function can decline to handle the condition by simply returning +normally, in which case control returns to the SIGNAL function, which +will search for the next most recently established handler with a +compatible type specifier. To handle the condition, the function must +transfer control out of SIGNAL via a nonlocal exit. In the next +section, you'll see how a handler can choose where to transfer control. +However, many condition handlers simply want to unwind the stack to the +place where they were established and then run some code. The macro +HANDLER-CASE establishes this kind of condition handler. The basic form +of a HANDLER-CASE is as follows: + + (handler-case expression + error-clause*) + + where each error-clause is of the following form: + + (condition-type ([var]) code) + + If the expression returns normally, then its value is returned by the +HANDLER-CASE. The body of a HANDLER-CASE must be a single expression; +you can use PROGN to combine several expressions into a single form. +If, however, the expression signals a condition that's an instance of +any of the condition-types specified in any error-clause, then the code +in the appropriate error clause is executed and its value returned by +the HANDLER-CASE. The var, if included, is the name of the variable that +will hold the condition object when the handler code is executed. If +the code doesn't need to access the condition object, you can omit the +variable name. + + For instance, one way to handle the malformed-log-entry-error +signaled by parse-log-entry in its caller, parse-log-file, would be to +skip the malformed entry. In the following function, the HANDLER-CASE +expression will either return the value returned by parse-log-entry or +return NIL if a malformed-log-entry-error is signaled. (The it in the +LOOP clause collect it is another LOOP keyword, which refers to the +value of the most recently evaluated conditional test, in this case the +value of entry.) + + (defun parse-log-file (file) + (with-open-file (in file :direction :input) + (loop for text = (read-line in nil nil) while text + for entry = (handler-case (parse-log-entry text) + (malformed-log-entry-error () nil)) + when entry collect it))) + + When parse-log-entry returns normally, its value will be assigned to +entry and collected by the LOOP. But if parse-log-entry signals a +malformed-log-entry-error, then the error clause will return NIL, which +won't be collected. + +* Menu: + +* 19-3-1:: JAVA-STYLE EXCEPTON HANDLING + + This version of parse-log-file has one serious deficiency: it's doing +too much. As its name suggests, the job of parse-log-file is to parse +the file and produce a list of log-entry objects; if it can't, it's not +its place to decide what to do instead. What if you want to use +parse-log-file in an application that wants to tell the user that the +log file is corrupted or one that wants to recover from malformed +entries by fixing them up and re-parsing them? Or maybe an application +is fine with skipping them but only until a certain number of corrupted +entries have been seen. + + You could try to fix this problem by moving the HANDLER-CASE to a +higher-level function. However, then you'd have no way to implement the +current policy of skipping individual entries-when the error was +signaled, the stack would be unwound all the way to the higher-level +function, abandoning the parsing of the log file altogether. What you +want is a way to provide the current recovery strategy without requiring +that it always be used. + + +File: pcl.info, Node: 19-3-1, Next: 19-3, Prev: 19-3, Up: 19-3 + +JAVA-STYLE EXCEPTON HANDLING +============================ + +HANDLER-CASE is the nearest analog in Common Lisp to Java- or +Python-style exception handling. Where you might write this in Java: + + try { + doStuff(); + doMoreStuff(); + } catch (SomeException se) { + recover(se); + } + + or this in Python: + + try: + doStuff() + doMoreStuff() + except SomeException, se: + recover(se) + + in Common Lisp you'd write this: + + (handler-case + (progn + (do-stuff) + (do-more-stuff)) + (some-exception (se) (recover se))) + + +File: pcl.info, Node: 19-4, Next: 19-5, Prev: 19-3, Up: Chapter 19 + +Restarts +======== + +The condition system lets you do this by splitting the error handling +code into two parts. You place code that actually recovers from errors +into restarts, and condition handlers can then handle a condition by +invoking an appropriate restart. You can place restart code in mid- or +low-level functions, such as parse-log-file or parse-log-entry, while +moving the condition handlers into the upper levels of the application. + + To change parse-log-file so it establishes a restart instead of a +condition handler, you can change the HANDLER-CASE to a RESTART-CASE. +The form of RESTART-CASE is quite similar to a HANDLER-CASE except the +names of restarts are just names, not necessarily the names of condition +types. In general, a restart name should describe the action the +restart takes. In parse-log-file, you can call the restart +skip-log-entry since that's what it does. The new version will look +like this: + + (defun parse-log-file (file) + (with-open-file (in file :direction :input) + (loop for text = (read-line in nil nil) while text + for entry = (restart-case (parse-log-entry text) + (skip-log-entry () nil)) + when entry collect it))) + + If you invoke this version of parse-log-file on a log file containing +corrupted entries, it won't handle the error directly; you'll end up in +the debugger. However, there among the various restarts presented by +the debugger will be one called skip-log-entry, which, if you choose it, +will cause parse-log-file to continue on its way as before. To avoid +ending up in the debugger, you can establish a condition handler that +invokes the skip-log-entry restart automatically. + + The advantage of establishing a restart rather than having +parse-log-file handle the error directly is it makes parse-log-file +usable in more situations. The higher-level code that invokes +parse-log-file doesn't have to invoke the skip-log-entry restart. It +can choose to handle the error at a higher level. Or, as I'll show in +the next section, you can add restarts to parse-log-entry to provide +other recovery strategies, and then condition handlers can choose which +strategy they want to use. + + But before I can talk about that, you need to see how to set up a +condition handler that will invoke the skip-log-entry restart. You can +set up the handler anywhere in the chain of calls leading to +parse-log-file. This may be quite high up in your application, not +necessarily in parse-log-file's direct caller. For instance, suppose +the main entry point to your application is a function, log-analyzer, +that finds a bunch of logs and analyzes them with the function +analyze-log, which eventually leads to a call to parse-log-file. +Without any error handling, it might look like this: + + (defun log-analyzer () + (dolist (log (find-all-logs)) + (analyze-log log))) + + The job of analyze-log is to call, directly or indirectly, +parse-log-file and then do something with the list of log entries +returned. An extremely simple version might look like this: + + (defun analyze-log (log) + (dolist (entry (parse-log-file log)) + (analyze-entry entry))) + + where the function analyze-entry is presumably responsible for +extracting whatever information you care about from each log entry and +stashing it away somewhere. + + Thus, the path from the top-level function, log-analyzer, to +parse-log-entry, which actually signals an error, is as follows: + + + + Assuming you always want to skip malformed log entries, you could +change this function to establish a condition handler that invokes the +skip-log-entry restart for you. However, you can't use HANDLER-CASE to +establish the condition handler because then the stack would be unwound +to the function where the HANDLER-CASE appears. Instead, you need to +use the lower-level macro HANDLER-BIND. The basic form of HANDLER-BIND +is as follows: + + (handler-bind (binding*) form*) + + where each binding is a list of a condition type and a handler +function of one argument. After the handler bindings, the body of the +HANDLER-BIND can contain any number of forms. Unlike the handler code +in HANDLER-CASE, the handler code must be a function object, and it must +accept a single argument. A more important difference between +HANDLER-BIND and HANDLER-CASE is that the handler function bound by +HANDLER-BIND will be run without unwinding the stack-the flow of control +will still be in the call to parse-log-entry when this function is +called. The call to INVOKE-RESTART will find and invoke the most +recently bound restart with the given name. So you can add a handler to +log-analyzer that will invoke the skip-log-entry restart established in +parse-log-file like this: (1) + + (defun log-analyzer () + (handler-bind ((malformed-log-entry-error + #'(lambda (c) + (invoke-restart 'skip-log-entry)))) + (dolist (log (find-all-logs)) + (analyze-log log)))) + + In this HANDLER-BIND, the handler function is an anonymous function +that invokes the restart skip-log-entry. You could also define a named +function that does the same thing and bind it instead. In fact, a +common practice when defining a restart is to define a function, with +the same name and taking a single argument, the condition, that invokes +the eponymous restart. Such functions are called restart functions. +You could define a restart function for skip-log-entry like this: + + (defun skip-log-entry (c) + (invoke-restart 'skip-log-entry)) + + Then you could change the definition of log-analyzer to this: + + (defun log-analyzer () + (handler-bind ((malformed-log-entry-error #'skip-log-entry)) + (dolist (log (find-all-logs)) + (analyze-log log)))) + + As written, the skip-log-entry restart function assumes that a +skip-log-entry restart has been established. If a +malformed-log-entry-error is ever signaled by code called from +log-analyzer without a skip-log-entry having been established, the call +to INVOKE-RESTART will signal a CONTROL-ERROR when it fails to find the +skip-log-entry restart. If you want to allow for the possibility that a +malformed-log-entry-error might be signaled from code that doesn't have +a skip-log-entry restart established, you could change the +skip-log-entry function to this: + + (defun skip-log-entry (c) + (let ((restart (find-restart 'skip-log-entry))) + (when restart (invoke-restart restart)))) + + FIND-RESTART looks for a restart with a given name and returns an +object representing the restart if the restart is found and NIL if not. +You can invoke the restart by passing the restart object to +INVOKE-RESTART. Thus, when skip-log-entry is bound with HANDLER-BIND, it +will handle the condition by invoking the skip-log-entry restart if one +is available and otherwise will return normally, giving other condition +handlers, bound higher on the stack, a chance to handle the condition. + + ---------- Footnotes ---------- + + (1) The compiler may complain if the parameter is never used. You +can silence that warning by adding a declaration (declare (ignore c)) as +the first expression in the LAMBDA body. + + +File: pcl.info, Node: 19-5, Next: 19-6, Prev: 19-4, Up: Chapter 19 + +Providing Multiple Restarts +=========================== + +Since restarts must be explicitly invoked to have any effect, you can +define multiple restarts, each providing a different recovery strategy. +As I mentioned earlier, not all log-parsing applications will +necessarily want to skip malformed entries. Some applications might +want parse-log-file to include a special kind of object representing +malformed entries in the list of log-entry objects; other applications +may have some way to repair a malformed entry and may want a way to pass +the fixed entry back to parse-log-entry. + + To allow more complex recovery protocols, restarts can take arbitrary +arguments, which are passed in the call to INVOKE-RESTART. You can +provide support for both the recovery strategies I just mentioned by +adding two restarts to parse-log-entry, each of which takes a single +argument. One simply returns the value it's passed as the return value +of parse-log-entry, while the other tries to parse its argument in the +place of the original log entry. + + (defun parse-log-entry (text) + (if (well-formed-log-entry-p text) + (make-instance 'log-entry ...) + (restart-case (error 'malformed-log-entry-error :text text) + (use-value (value) value) + (reparse-entry (fixed-text) (parse-log-entry fixed-text))))) + + The name USE-VALUE is a standard name for this kind of restart. +Common Lisp defines a restart function for USE-VALUE similar to the +skip-log-entry function you just defined. So, if you wanted to change +the policy on malformed entries to one that created an instance of +malformed-log-entry, you could change log-analyzer to this (assuming the +existence of a malformed-log-entry class with a :text initarg): + + (defun log-analyzer () + (handler-bind ((malformed-log-entry-error + #'(lambda (c) + (use-value + (make-instance 'malformed-log-entry :text (text c)))))) + (dolist (log (find-all-logs)) + (analyze-log log)))) + + You could also have put these new restarts into parse-log-file +instead of parse-log-entry. However, you generally want to put restarts +in the lowest-level code possible. It wouldn't, though, be appropriate +to move the skip-log-entry restart into parse-log-entry since that would +cause parse-log-entry to sometimes return normally with NIL, the very +thing you started out trying to avoid. And it'd be an equally bad idea +to remove the skip-log-entry restart on the theory that the condition +handler could get the same effect by invoking the use-value restart with +NIL as the argument; that would require the condition handler to have +intimate knowledge of how the parse-log-file works. As it stands, the +skip-log-entry is a properly abstracted part of the log-parsing API. + + +File: pcl.info, Node: 19-6, Next: Chapter 20, Prev: 19-5, Up: Chapter 19 + +Other Uses for Conditions +========================= + +While conditions are mainly used for error handling, they can be used +for other purposes-you can use conditions, condition handlers, and +restarts to build a variety of protocols between low- and high-level +code. The key to understanding the potential of conditions is to +understand that merely signaling a condition has no effect on the flow +of control. + + The primitive signaling function SIGNAL implements the mechanism of +searching for an applicable condition handler and invoking its handler +function. The reason a handler can decline to handle a condition by +returning normally is because the call to the handler function is just a +regular function call-when the handler returns, control passes back to +SIGNAL, which then looks for another, less recently bound handler that +can handle the condition. If SIGNAL runs out of condition handlers +before the condition is handled, it also returns normally. + + The ERROR function you've been using calls SIGNAL. If the error is +handled by a condition handler that transfers control via HANDLER-CASE +or by invoking a restart, then the call to SIGNAL never returns. But if +SIGNAL returns, ERROR invokes the debugger by calling the function +stored in *DEBUGGER-HOOK*. Thus, a call to ERROR can never return +normally; the condition must be handled either by a condition handler or +in the debugger. + + Another condition signaling function, WARN, provides an example of a +different kind of protocol built on the condition system. Like ERROR, +WARN calls SIGNAL to signal a condition. But if SIGNAL returns, WARN +doesn't invoke the debugger-it prints the condition to *ERROR-OUTPUT* +and returns NIL, allowing its caller to proceed. WARN also establishes +a restart, MUFFLE-WARNING, around the call to SIGNAL that can be used by +a condition handler to make WARN return without printing anything. The +restart function MUFFLE-WARNING finds and invokes its eponymous restart, +signaling a CONTROL-ERROR if no such restart is available. Of course, a +condition signaled with WARN could also be handled in some other way-a +condition handler could "promote" a warning to an error by handling it +as if it were an error. + + For instance, in the log-parsing application, if there were ways a +log entry could be slightly malformed but still parsable, you could +write parse-log-entry to go ahead and parse the slightly defective +entries but to signal a condition with WARN when it did. Then the +larger application could choose to let the warning print, to muffle the +warning, or to treat the warning like an error, recovering the same way +it would from a malformed-log-entry-error. + + A third error-signaling function, CERROR, provides yet another +protocol. Like ERROR, CERROR will drop you into the debugger if the +condition it signals isn't handled. But like WARN, it establishes a +restart before it signals the condition. The restart, CONTINUE, causes +CERROR to return normally-if the restart is invoked by a condition +handler, it will keep you out of the debugger altogether. Otherwise, +you can use the restart once you're in the debugger to resume the +computation immediately after the call to CERROR. The function CONTINUE +finds and invokes the CONTINUE restart if it's available and returns NIL +otherwise. + + You can also build your own protocols on SIGNAL-whenever low-level +code needs to communicate information back up the call stack to +higher-level code, the condition mechanism is a reasonable mechanism to +use. But for most purposes, one of the standard error or warning +protocols should suffice. + + You'll use the condition system in future practical chapters, both +for regular error handling and, in Chapter 25, to help in handling a +tricky corner case of parsing ID3 files. Unfortunately, it's the fate +of error handling to always get short shrift in programming texts-proper +error handling, or lack thereof, is often the biggest difference between +illustrative code and hardened, production-quality code. The trick to +writing the latter has more to do with adopting a particularly rigorous +way of thinking about software than with the details of any particular +programming language constructs. That said, if your goal is to write +that kind of software, you'll find the Common Lisp condition system is +an excellent tool for writing robust code and one that fits quite nicely +into Common Lisp's incremental development style. + +* Menu: + +* 19-6-1:: Writing Robust Software + + In the next chapter I'll give a quick overview of some of the 25 +special operators you haven't had a chance to use yet, at least not +directly. + + +File: pcl.info, Node: 19-6-1, Next: 19-6, Prev: 19-6, Up: 19-6 + +Writing Robust Software +======================= + +For information on writing robust software, you could do worse than to +start by finding a copy of Software Reliability (John Wiley & Sons, +1976) by Glenford J. Meyers. Bertrand Meyer's writings on Design By +Contract also provide a useful way of thinking about software +correctness. For instance, see Chapters 11 and 12 of his +Object-Oriented Software Construction (Prentice Hall, 1997). Keep in +mind, however, that Bertrand Meyer is the inventor of Eiffel, a +statically typed bondage and discipline language in the Algol/Ada +school. While he has a lot of smart things to say about object +orientation and software reliability, there's a fairly wide gap between +his view of programming and The Lisp Way. Finally, for an excellent +overview of the larger issues surrounding building fault-tolerant +systems, see Chapter 3 of the classic Transaction Processing: Concepts +and Techniques (Morgan Kaufmann, 1993) by Jim Gray and Andreas Reuter. + + +File: pcl.info, Node: Chapter 20, Next: Chapter 21, Prev: Chapter 19, Up: Top + +20. The Special Operators +========================= + +In a way, the most impressive aspect of the condition system covered in +the previous chapter is that if it wasn't already part of the language, +it could be written entirely as a user-level library. This is possible +because Common Lisp's special operators-while none touches directly on +signaling or handling conditions-provide enough access to the underlying +machinery of the language to be able to do things such as control the +unwinding of the stack. + + In previous chapters I've discussed the most frequently used special +operators, but it's worth being familiar with the others for two +reasons. First, some of the infrequently used special operators are +used infrequently simply because whatever need they address doesn't +arise that often. It's good to be familiar with these special operators +so when one of them is called for, you'll at least know it exists. +Second, because the 25 special operators-along with the basic rule for +evaluating function calls and the built-in data types-provide the +foundation for the rest of the language, a passing familiarity with them +will help you understand how the language works. + + In this chapter, I'll discuss all the special operators, some briefly +and some at length, so you can see how they fit together. I'll point +out which ones you can expect to use directly in your own code, which +ones serve as the basis for other constructs that you use all the time, +and which ones you'll rarely use directly but which can be handy in +macro-generated code. + +* Menu: + +* 20-1:: Controlling Evaluation +* 20-2:: Manipulating the Lexical Environment +* 20-3:: Local Flow of Control +* 20-4:: Unwinding the Stack +* 20-5:: Multiple Values +* 20-6:: EVAL-WHEN +* 20-7:: Other Special Operators + + +File: pcl.info, Node: 20-1, Next: 20-2, Prev: Chapter 20, Up: Chapter 20 + +Controlling Evaluation +====================== + +The first category of special operators contains the three operators +that provide basic control over the evaluation of forms. They're QUOTE, +IF, and PROGN, and I've discussed them all already. However, it's worth +noting how each of these special operators provides one fundamental kind +of control over the evaluation of one or more forms. QUOTE prevents +evaluation altogether and allows you to get at s-expressions as data. +IF provides the fundamental boolean choice operation from which all +other conditional execution constructs can be built. (1) And PROGN +provides the ability to sequence a number of forms. + + ---------- Footnotes ---------- + + (1) Of course, if IF wasn't a special operator but some other +conditional form, such as COND, was, you could build IF as a macro. +Indeed, in many Lisp dialects, starting with McCarthy's original Lisp, +COND was the primitive conditional evaluation operator. + + +File: pcl.info, Node: 20-2, Next: 20-3, Prev: 20-1, Up: Chapter 20 + +Manipulating the Lexical Environment +==================================== + +The largest class of special operators contains the operators that +manipulate and access the lexical environment. LET and LET*, which I've +already discussed, are examples of special operators that manipulate the +lexical environment since they can introduce new lexical bindings for +variables. Any construct, such as a DO or DOTIMES, that binds lexical +variables will have to expand into a LET or LET*. (1) The SETQ special +operator is one that accesses the lexical environment since it can be +used to set variables whose bindings were created by LET and LET*. + + Variables, however, aren't the only thing that can be named within a +lexical scope. While most functions are defined globally with DEFUN, +it's also possible to create local functions with the special operators +FLET and LABELS, local macros with MACROLET, and a special kind of +macro, called a symbol macro, with SYMBOL-MACROLET. + + Much like LET allows you to introduce a lexical variable whose scope +is the body of the LET, FLET and LABELS let you define a function that +can be referred to only within the scope of the FLET or LABELS form. +These special operators are handy when you need a local function that's +a bit too complex to define inline as a LAMBDA expression or that you +need to use more than once. Both have the same basic form, which looks +like this: + + (flet (function-definition*) + body-form*) + + and like this: + + (labels (function-definition*) + body-form*) + + where each function-definition has the following form: + + (name (parameter*) form*) + + The difference between FLET and LABELS is that the names of the +functions defined with FLET can be used only in the body of the FLET, +while the names introduced by LABELS can be used immediately, including +in the bodies of the functions defined by the LABELS. Thus, LABELS can +define recursive functions, while FLET can't. It might seem limiting +that FLET can't be used to define recursive functions, but Common Lisp +provides both FLET and LABELS because sometimes it's useful to be able +to write local functions that can call another function of the same +name, either a globally defined function or a local function from an +enclosing scope. + + Within the body of a FLET or LABELS, you can use the names of the +functions defined just like any other function, including with the +FUNCTION special operator. Since you can use FUNCTION to get the +function object representing a function defined with FLET or LABELS, and +since a FLET or LABELS can be in the scope of other binding forms such +as LETs, these functions can be closures. + + Because the local functions can refer to variables from the enclosing +scope, they can often be written to take fewer parameters than the +equivalent helper functions. This is particularly handy when you need +to pass a function that takes a single argument as a functional +parameter. For example, in the following function, which you'll see +again in Chapter 25, the FLETed function, count-version, takes a single +argument, as required by walk-directory, but can also use the variable +versions, introduced by the enclosing LET: + + (defun count-versions (dir) + (let ((versions (mapcar #'(lambda (x) (cons x 0)) '(2 3 4)))) + (flet ((count-version (file) + (incf (cdr (assoc (major-version (read-id3 file)) versions))))) + (walk-directory dir #'count-version :test #'mp3-p)) + versions)) + + This function could also be written using an anonymous function in +the place of the FLETed count-version, but giving the function a +meaningful name makes it a bit easier to read. + + And when a helper function needs to recurse, an anonymous function +just won't do. (2) When you don't want to define a recursive helper +function as a global function, you can use LABELS. For example, the +following function, collect-leaves, uses the recursive helper function +walk to walk a tree and gather all the atoms in the tree into a list, +which collect-leaves then returns (after reversing it): + + (defun collect-leaves (tree) + (let ((leaves ())) + (labels ((walk (tree) + (cond + ((null tree)) + ((atom tree) (push tree leaves)) + (t (walk (car tree)) + (walk (cdr tree)))))) + (walk tree)) + (nreverse leaves))) + + Notice again how, within the walk function, you can refer to the +variable, leaves, introduced by the enclosing LET. + + FLET and LABELS are also useful operations to use in macro +expansions-a macro can expand into code that contains a FLET or LABELS +to create functions that can be used within the body of the macro. This +technique can be used either to introduce functions that the user of the +macro will call or simply as a way of organizing the code generated by +the macro. This, for instance, is how a function such as +CALL-NEXT-METHOD, which can be used only within a method definition, +might be defined. + + A near relative to FLET and LABELS is the special operator MACROLET, +which you can use to define local macros. Local macros work just like +global macros defined with DEFMACRO except without cluttering the global +namespace. When a MACROLET form is evaluated, the body forms are +evaluated with the local macro definitions in effect and possibly +shadowing global function and macro definitions or local definitions +from enclosing forms. Like FLET and LABELS, MACROLET can be used +directly, but it's also a handy target for macro-generated code-by +wrapping some user-supplied code in a MACROLET, a macro can provide +constructs that can be used only within that code or can shadow a +globally defined macro. You'll see an example of this latter use of +MACROLET in Chapter 31. + + Finally, one last macro-defining special operator is SYMBOL-MACROLET, +which defines a special kind of macro called, appropriately enough, a +symbol macro. Symbol macros are like regular macros except they can't +take arguments and are referred to with a plain symbol rather than a +list form. In other words, after you've defined a symbol macro with a +particular name, any use of that symbol in a value position will be +expanded and the resulting form evaluated in its place. This is how +macros such as WITH-SLOTS and WITH-ACCESSORS are able to define +"variables" that access the state of a particular object under the +covers. For instance, the following WITH-SLOTS form: + + (with-slots (x y z) foo (list x y z))) + + might expand into this code that uses SYMBOL-MACROLET: + + (let ((#:g149 foo)) + (symbol-macrolet + ((x (slot-value #:g149 'x)) + (y (slot-value #:g149 'y)) + (z (slot-value #:g149 'z))) + (list x y z))) + + When the expression (list x y z) is evaluated, the symbols x, y, and +z will be replaced with their expansions, such as (slot-value #:g149 +'x). (3) + + Symbol macros are most often local, defined with SYMBOL-MACROLET, but +Common Lisp also provides a macro DEFINE-SYMBOL-MACRO that defines a +global symbol macro. A symbol macro defined with SYMBOL-MACROLET +shadows other symbol macros of the same name defined with +DEFINE-SYMBOL-MACRO or enclosing SYMBOL-MACROLET forms. + + ---------- Footnotes ---------- + + (1) Well, technically those constructs could also expand into a +LAMBDA expression since, as I mentioned in Chapter 6, LET could be +defined-and was in some earlier Lisps-as a macro that expands into an +invocation of an anonymous function. + + (2) Surprising as it may seem, it actually is possible to make +anonymous functions recurse. However, you must use a rather esoteric +mechanism known as the Y combinator. But the Y combinator is an +interesting theoretical result, not a practical programming tool, so is +well outside the scope of this book. + + (3) It's not required that WITH-SLOTS be implemented with +SYMBOL-MACROLET-in some implementations, WITH-SLOTS may walk the code +provided and generate an expansion with x, y, and z already replaced +with the appropriate SLOT-VALUE forms. You can see how your +implementation does it by evaluating this form: + + (macroexpand-1 '(with-slots (x y z) obj (list x y z))) + + However, walking the body is much easier for the Lisp implementation +to do than for user code; to replace x, y, and z only when they appear +in value positions requires a code walker that understands the syntax of +all special operators and that recursively expands all macro forms in +order to determine whether their expansions include the symbols in value +positions. The Lisp implementation obviously has such a code walker at +its disposal, but it's one of the few parts of Lisp that's not exposed +to users of the language. + + +File: pcl.info, Node: 20-3, Next: 20-4, Prev: 20-2, Up: Chapter 20 + +Local Flow of Control +===================== + +The next four special operators I'll discuss also create and use names +in the lexical environment but for the purposes of altering the flow of +control rather than defining new functions and macros. I've mentioned +all four of these special operators in passing because they provide the +underlying mechanisms used by other language features. They're BLOCK, +RETURN-FROM, TAGBODY, and GO. The first two, BLOCK and RETURN-FROM, are +used together to write code that returns immediately from a section of +code-I discussed RETURN-FROM in Chapter 5 as a way to return immediately +from a function, but it's more general than that. The other two, +TAGBODY and GO, provide a quite low-level goto construct that's the +basis for all the higher-level looping constructs you've already seen. + + The basic skeleton of a BLOCK form is this: + + (block name + form*) + + The name is a symbol, and the forms are Lisp forms. The forms are +evaluated in order, and the value of the last form is returned as the +value of the BLOCK unless a RETURN-FROM is used to return from the block +early. A RETURN-FROM form, as you saw in Chapter 5, consists of the +name of the block to return from and, optionally, a form that provides a +value to return. When a RETURN-FROM is evaluated, it causes the named +BLOCK to return immediately. If RETURN-FROM is called with a return +value form, the BLOCK will return the resulting value; otherwise, the +BLOCK evaluates to NIL. + + A BLOCK name can be any symbol, which includes NIL. Many of the +standard control construct macros, such as DO, DOTIMES, and DOLIST, +generate an expansion consisting of a BLOCK named NIL. This allows you +to use the RETURN macro, which is a bit of syntactic sugar for +(return-from nil ...), to break out of such loops. Thus, the following +loop will print at most ten random numbers, stopping as soon as it gets +a number greater than 50: + + (dotimes (i 10) + (let ((answer (random 100))) + (print answer) + (if (> answer 50) (return)))) + + Function-defining macros such as DEFUN, FLET, and LABELS, on the +other hand, wrap their bodies in a BLOCK with the same name as the +function. That's why you can use RETURN-FROM to return from a function. + + TAGBODY and GO have a similar relationship to each other as BLOCK and +RETURN-FROM: a TAGBODY form defines a context in which names are defined +that can be used by GO. The skeleton of a TAGBODY is as follows: + + (tagbody + tag-or-compound-form*) + + where each tag-or-compound-form is either a symbol, called a tag, or +a nonempty list form. The list forms are evaluated in order and the +tags ignored, except as I'll discuss in a moment. After the last form +of the TAGBODY is evaluated, the TAGBODY returns NIL. Anywhere within +the lexical scope of the TAGBODY you can use the GO special operator to +jump immediately to any of the tags, and evaluation will resume with the +form following the tag. For instance, you can write a trivial infinite +loop with TAGBODY and GO like this: + + (tagbody + top + (print 'hello) + (go top)) + + Note that while the tag names must appear at the top level of the +TAGBODY, not nested within other forms, the GO special operator can +appear anywhere within the scope of the TAGBODY. This means you could +write a loop that loops a random number of times like this: + + (tagbody + top + (print 'hello) + (when (plusp (random 10)) (go top))) + + An even sillier example of TAGBODY, which shows you can have multiple +tags in a single TAGBODY, looks like this: + + (tagbody + a (print 'a) (if (zerop (random 2)) (go c)) + b (print 'b) (if (zerop (random 2)) (go a)) + c (print 'c) (if (zerop (random 2)) (go b))) + + This form will jump around randomly printing as, bs, and cs until +eventually the last RANDOM expression returns 1 and the control falls +off the end of the TAGBODY. + + TAGBODY is rarely used directly since it's almost always easier to +write iterative constructs in terms of the existing looping macros. +It's handy, however, for translating algorithms written in other +languages into Common Lisp, either automatically or manually. An +example of an automatic translation tool is the FORTRAN-to-Common Lisp +translator, f2cl, that translates FORTRAN source code into Common Lisp +in order to make various FORTRAN libraries available to Common Lisp +programmers. Since many FORTRAN libraries were written before the +structured programming revolution, they're full of gotos. The f2cl +compiler can simply translate those gotos to GOs within appropriate +TAGBODYs. (1) + + Similarly, TAGBODY and GO can be handy when translating algorithms +described in prose or by flowcharts-for instance, in Donald Knuth's +classic series The Art of Computer Programming, he describes algorithms +using a "recipe" format: step 1, do this; step 2, do that; step 3, go +back to step 2; and so on. For example, on page 142 of The Art of +Computer Programming, Volume 2: Seminumerical Algorithms, Third Edition +(Addison-Wesley, 1998), he describes Algorithm S, which you'll use in +Chapter 27, in this form: + + Algorithm S (Selection sampling technique). To select n records at random from a set of N, where 0 < n <= N. + + S1. [Initialize.] Set t <-- 0, m <-- 0. (During this algorithm, m represents the number of records selected so far, and t is the total number of input records that we have dealt with.) + + S2. [Generate U.] Generate a random number U, uniformly distributed between zero and one. + + S3. [Test.] If (N - t)U >= n - m, go to step S5. + + S4. [Select.] Select the next record for the sample, and increase m and t by 1. If m < n, go to step S2; otherwise the sample is complete and the algorithm terminates. + + S5. [Skip.] Skip the next record (do not include it in the sample), increase t by 1, and go back to step S2. + + This description can be easily translated into a Common Lisp +function, after renaming a few variables, as follows: + + (defun algorithm-s (n max) ; max is N in Knuth's algorithm + (let (seen ; t in Knuth's algorithm + selected ; m in Knuth's algorithm + u ; U in Knuth's algorithm + (records ())) ; the list where we save the records selected + (tagbody + s1 + (setf seen 0) + (setf selected 0) + s2 + (setf u (random 1.0)) + s3 + (when (>= (* (- max seen) u) (- n selected)) (go s5)) + s4 + (push seen records) + (incf selected) + (incf seen) + (if (< selected n) + (go s2) + (return-from algorithm-s (nreverse records))) + s5 + (incf seen) + (go s2)))) + + It's not the prettiest code, but it's easy to verify that it's a +faithful translation of Knuth's algorithm. But, this code, unlike +Knuth's prose description, can be run and tested. Then you can start +refactoring, checking after each change that the function still works. +(2) + + After pushing the pieces around a bit, you might end up with +something like this: + + (defun algorithm-s (n max) + (loop for seen from 0 + when (< (* (- max seen) (random 1.0)) n) + collect seen and do (decf n) + until (zerop n))) + + While it may not be immediately obvious that this code correctly +implements Algorithm S, if you got here via a series of functions that +all behave identically to the original literal translation of Knuth's +recipe, you'd have good reason to believe it's correct. + + ---------- Footnotes ---------- + + (1) One version of f2cl is available as part of the Common Lisp Open +Code Collection (CLOCC): http://clocc.sourceforge.net/. By contrast, +consider the tricks the authors of f2j, a FORTRAN-to-Java translator, +have to play. Although the Java Virtual Machine (JVM) has a goto +instruction, it's not directly exposed in Java. So to compile FORTRAN +gotos, they first compile the FORTRAN code into legal Java source with +calls to a dummy class to represent the labels and gotos. Then they +compile the source with a regular Java compiler and postprocess the byte +codes to translate the dummy calls into JVM-level byte codes. Clever, +but what a pain. + + (2) Since this algorithm depends on values returned by RANDOM, you +may want to test it with a consistent random seed, which you can get by +binding *RANDOM-STATE* to the value of (make-random-state nil) around +each call to algorithm-s. For instance, you can do a basic sanity check +of algorithm-s by evaluating this: + + (let ((*random-state* (make-random-state nil))) (algorithm-s 10 200)) + + If your refactorings are all valid, this expression should evaluate +to the same list each time. + + +File: pcl.info, Node: 20-4, Next: 20-5, Prev: 20-3, Up: Chapter 20 + +Unwinding the Stack +=================== + +Another aspect of the language that special operators give you control +over is the behavior of the call stack. For instance, while you +normally use BLOCK and TAGBODY to manage the flow of control within a +single function, you can also use them, in conjunction with closures, to +force an immediate nonlocal return from a function further down on the +stack. That's because BLOCK names and TAGBODY tags can be closed over +by any code within the lexical scope of the BLOCK or TAGBODY. For +example, consider this function: + + (defun foo () + (format t "Entering foo~%") + (block a + (format t " Entering BLOCK~%") + (bar #'(lambda () (return-from a))) + (format t " Leaving BLOCK~%")) + (format t "Leaving foo~%")) + + The anonymous function passed to bar uses RETURN-FROM to return from +the BLOCK. But that RETURN-FROM doesn't get evaluated until the +anonymous function is invoked with FUNCALL or APPLY. Now suppose bar +looks like this: + + (defun bar (fn) + (format t " Entering bar~%") + (baz fn) + (format t " Leaving bar~%")) + + Still, the anonymous function isn't invoked. Now look at baz. + + (defun baz (fn) + (format t " Entering baz~%") + (funcall fn) + (format t " Leaving baz~%")) + + Finally the function is invoked. But what does it mean to +RETURN-FROM a block that's several layers up on the call stack? Turns +out it works fine-the stack is unwound back to the frame where the BLOCK +was established and control returns from the BLOCK. The FORMAT +expressions in foo, bar, and baz show this: + + CL-USER> (foo) + Entering foo + Entering BLOCK + Entering bar + Entering baz + Leaving foo + NIL + + Note that the only "Leaving . . ." message that prints is the one +that appears after the BLOCK in foo. + + Because the names of blocks are lexically scoped, a RETURN-FROM +always returns from the smallest enclosing BLOCK in the lexical +environment where the RETURN-FROM form appears even if the RETURN-FROM +is executed in a different dynamic context. For instance, bar could +also contain a BLOCK named a, like this: + + (defun bar (fn) + (format t " Entering bar~%") + (block a (baz fn)) + (format t " Leaving bar~%")) + + This extra BLOCK won't change the behavior of foo at all-the name a +is resolved lexically, at compile time, not dynamically, so the +intervening block has no effect on the RETURN-FROM. Conversely, the name +of a BLOCK can be used only by RETURN-FROMs appearing within the lexical +scope of the BLOCK; there's no way for code outside the block to return +from the block except by invoking a closure that closes over a +RETURN-FROM from the lexical scope of the BLOCK. + + TAGBODY and GO work the same way, in this regard, as BLOCK and +RETURN-FROM. When you invoke a closure that contains a GO form, if the +GO is evaluated, the stack will unwind back to the appropriate TAGBODY +and then jump to the specified tag. + + BLOCK names and TAGBODY tags, however, differ from lexical variable +bindings in one important way. As I discussed in Chapter 6, lexical +bindings have indefinite extent, meaning the bindings can stick around +even after the binding form has returned. BLOCKs and TAGBODYs, on the +other hand, have dynamic extent-you can RETURN-FROM a BLOCK or GO to a +TAGBODY tag only while the BLOCK or TAGBODY is on the call stack. In +other words, a closure that captures a block name or TAGBODY tag can be +passed down the stack to be invoked later, but it can't be returned up +the stack. If you invoke a closure that tries to RETURN-FROM a BLOCK, +after the BLOCK itself has returned, you'll get an error. Likewise, +trying to GO to a TAGBODY that no longer exists will cause an error. +(1) + + It's unlikely you'll need to use BLOCK and TAGBODY yourself for this +kind of stack unwinding. But you'll likely be using them indirectly +whenever you use the condition system, so understanding how they work +should help you understand better what exactly, for instance, invoking a +restart is doing. (2) + + CATCH and THROW are another pair of special operators that can force +the stack to unwind. You'll use these operators even less often than +the others mentioned so far-they're holdovers from earlier Lisp dialects +that didn't have Common Lisp's condition system. They definitely +shouldn't be confused with try/catch and try/except constructs from +languages such as Java and Python. + + CATCH and THROW are the dynamic counterparts of BLOCK and +RETURN-FROM. That is, you wrap CATCH around a body of code and then use +THROW to cause the CATCH form to return immediately with a specified +value. The difference is that the association between a CATCH and THROW +is established dynamically-instead of a lexically scoped name, the label +for a CATCH is an object, called a catch tag, and any THROW evaluated +within the dynamic extent of the CATCH that throws that object will +unwind the stack back to the CATCH form and cause it to return +immediately. Thus, you can write a version of the foo, bar, and baz +functions from before using CATCH and THROW instead of BLOCK and +RETURN-FROM like this: + + (defparameter *obj* (cons nil nil)) ; i.e. some arbitrary object + + (defun foo () + (format t "Entering foo~%") + (catch *obj* + (format t " Entering CATCH~%") + (bar) + (format t " Leaving CATCH~%")) + (format t "Leaving foo~%")) + + (defun bar () + (format t " Entering bar~%") + (baz) + (format t " Leaving bar~%")) + + (defun baz () + (format t " Entering baz~%") + (throw *obj* nil) + (format t " Leaving baz~%")) + + Notice how it isn't necessary to pass a closure down the stack-baz +can call THROW directly. The result is quite similar to the earlier +version. + + CL-USER> (foo) + Entering foo + Entering CATCH + Entering bar + Entering baz + Leaving foo + NIL + + However, CATCH and THROW are almost too dynamic. In both the CATCH +and the THROW, the tag form is evaluated, which means their values are +both determined at runtime. Thus, if some code in bar reassigned or +rebound *obj*, the THROW in baz wouldn't throw to the same CATCH. This +makes CATCH and THROW much harder to reason about than BLOCK and +RETURN-FROM. The only advantage, which the version of foo, bar, and baz +that use CATCH and THROW demonstrates, is there's no need to pass down a +closure in order for low-level code to return from a CATCH-any code that +runs within the dynamic extent of a CATCH can cause it to return by +throwing the right object. + + In older Lisp dialects that didn't have anything like Common Lisp's +condition system, CATCH and THROW were used for error handling. +However, to keep them manageable, the catch tags were usually just +quoted symbols, so you could tell by looking at a CATCH and a THROW +whether they would hook up at runtime. In Common Lisp you'll rarely +have any call to use CATCH and THROW since the condition system is so +much more flexible. + + The last special operator related to controlling the stack is another +one I've mentioned in passing before-UNWIND-PROTECT. UNWIND-PROTECT lets +you control what happens as the stack unwinds-to make sure that certain +code always runs regardless of how control leaves the scope of the +UNWIND-PROTECT, whether by a normal return, by a restart being invoked, +or by any of the ways discussed in this section. (3) The basic skeleton +of UNWIND-PROTECT looks like this: + + (unwind-protect protected-form + cleanup-form*) + + The single protected-form is evaluated, and then, regardless of how +it returns, the cleanup-forms are evaluated. If the protected-form +returns normally, then whatever it returns is returned from the +UNWIND-PROTECT after the cleanup forms run. The cleanup forms are +evaluated in the same dynamic environment as the UNWIND-PROTECT, so the +same dynamic variable bindings, restarts, and condition handlers will be +visible to code in cleanup forms as were visible just before the +UNWIND-PROTECT. + + You'll occasionally use UNWIND-PROTECT directly. More often you'll +use it as the basis for WITH- style macros, similar to WITH-OPEN-FILE, +that evaluate any number of body forms in a context where they have +access to some resource that needs to be cleaned up after they're done, +regardless of whether they return normally or bail via a restart or +other nonlocal exit. For example, if you were writing a database +library that defined functions open-connection and close-connection, you +might write a macro like this: (4) + + (defmacro with-database-connection ((var &rest open-args) &body body) + `(let ((,var (open-connection ,@open-args))) + (unwind-protect (progn ,@body) + (close-connection ,var)))) + + which lets you write code like this: + + (with-database-connection (conn :host "foo" :user "scott" :password "tiger") + (do-stuff conn) + (do-more-stuff conn)) + + and not have to worry about closing the database connection, since +the UNWIND-PROTECT will make sure it gets closed no matter what happens +in the body of the with-database-connection form. + + ---------- Footnotes ---------- + + (1) This is a pretty reasonable restriction-it's not entirely clear +what it'd mean to return from a form that has already returned-unless, +of course, you're a Scheme programmer. Scheme supports continuations, a +language construct that makes it possible to return from the same +function call more than once. But for a variety of reasons, few, if +any, languages other than Scheme support this kind of continuation. + + (2) If you're the kind of person who likes to know how things work +all the way down to the bits, it may be instructive to think about how +you might implement the condition system's macros using BLOCK, TAGBODY, +closures, and dynamic variables. + + (3) UNWIND-PROTECT is essentially equivalent to try/finally +constructs in Java and Python. + + (4) And indeed, CLSQL, the multi-Lisp, multidatabase SQL interface +library, provides a similar macro called with-database. CLSQL's home +page is at http://clsql.b9.com. + + +File: pcl.info, Node: 20-5, Next: 20-6, Prev: 20-4, Up: Chapter 20 + +Multiple Values +=============== + +Another feature of Common Lisp that I've mentioned in passing-in Chapter +11, when I discussed GETHASH-is the ability for a single form to return +multiple values. I'll discuss it in greater detail now. It is, +however, slightly misplaced in a chapter on special operators since the +ability to return multiple values isn't provided by just one or two +special operators but is deeply integrated into the language. The +operators you'll most often use when dealing with multiple values are +macros and functions, not special operators. But it is the case that +the basic ability to get at multiple return values is provided by a +special operator, MULTIPLE-VALUE-CALL, upon which the more commonly used +MULTIPLE-VALUE-BIND macro is built. + + The key thing to understand about multiple values is that returning +multiple values is quite different from returning a list-if a form +returns multiple values, unless you do something specific to capture the +multiple values, all but the primary value will be silently discarded. +To see the distinction, consider the function GETHASH, which returns two +values: the value found in the hash table and a boolean that's NIL when +no value was found. If it returned those two values in a list, every +time you called GETHASH you'd have to take apart the list to get at the +actual value, regardless of whether you cared about the second return +value. Suppose you have a hash table, *h*, that contains numeric +values. If GETHASH returned a list, you couldn't write something like +this: + + (+ (gethash 'a *h*) (gethash 'b *h*)) + + because + expects its arguments to be numbers, not lists. But +because the multiple value mechanism silently discards the secondary +return value when it's not wanted, this form works fine. + + There are two aspects to using multiple values-returning multiple +values and getting at the nonprimary values returned by forms that +return multiple values. The starting points for returning multiple +values are the functions VALUES and VALUES-LIST. These are regular +functions, not special operators, so their arguments are passed in the +normal way. VALUES takes a variable number of arguments and returns +them as multiple values; VALUES-LIST takes a single list and returns its +elements as multiple values. In other words: + + (values-list x) === (apply #'values x) + + The mechanism by which multiple values are returned is implementation +dependent just like the mechanism for passing arguments into functions +is. Almost all language constructs that return the value of some +subform will "pass through" multiple values, returning all the values +returned by the subform. Thus, a function that returns the result of +calling VALUES or VALUES-LIST will itself return multiple values-and so +will another function whose result comes from calling the first +function. And so on. (1) + + But when a form is evaluated in a value position, only the primary +value will be used, which is why the previous addition form works the +way you'd expect. The special operator MULTIPLE-VALUE-CALL provides the +mechanism for getting your hands on the multiple values returned by a +form. MULTIPLE-VALUE-CALL is similar to FUNCALL except that while +FUNCALL is a regular function and, therefore, can see and pass on only +the primary values passed to it, MULTIPLE-VALUE-CALL passes, to the +function returned by its first subform, all the values returned by the +remaining subforms. + + (funcall #'+ (values 1 2) (values 3 4)) ==> 4 + (multiple-value-call #'+ (values 1 2) (values 3 4)) ==> 10 + + However, it's fairly rare that you'll simply want to pass all the +values returned by a function onto another function. More likely, +you'll want to stash the multiple values in different variables and then +do something with them. The MULTIPLE-VALUE-BIND macro, which you saw in +Chapter 11, is the most frequently used operator for accepting multiple +return values. Its skeleton looks like this: + + (multiple-value-bind (variable*) values-form + body-form*) + + The values-form is evaluated, and the multiple values it returns are +bound to the variables. Then the body-forms are evaluated with those +bindings in effect. Thus: + + (multiple-value-bind (x y) (values 1 2) + (+ x y)) ==> 3 + + Another macro, MULTIPLE-VALUE-LIST, is even simpler-it takes a single +form, evaluates it, and collects the resulting multiple values into a +list. In other words, it's the inverse of VALUES-LIST. + + CL-USER> (multiple-value-list (values 1 2)) + (1 2) + CL-USER> (values-list (multiple-value-list (values 1 2))) + 1 + 2 + + However, if you find yourself using MULTIPLE-VALUE-LIST a lot, it may +be a sign that some function should be returning a list to start with +rather than multiple values. + + Finally, if you want to assign multiple values returned by a form to +existing variables, you can use VALUES as a SETFable place. For +example: + + CL-USER> (defparameter *x* nil) + *X* + CL-USER> (defparameter *y* nil) + *Y* + CL-USER> (setf (values *x* *y*) (floor (/ 57 34))) + 1 + 23/34 + CL-USER> *x* + 1 + CL-USER> *y* + 23/34 + + ---------- Footnotes ---------- + + (1) A small handful of macros don't pass through extra return values +of the forms they evaluate. In particular, the PROG1 macro, which +evaluates a number of forms like a PROGN before returning the value of +the first form, returns that form's primary value only. Likewise, +PROG2, which returns the value of the second of its subforms, returns +only the primary value. The special operator MULTIPLE-VALUE-PROG1 is a +variant of PROG1 that returns all the values returned by the first form. +It's a minor wart that PROG1 doesn't already behave like +MULTIPLE-VALUE-PROG1, but neither is used often enough that it matters +much. The OR and COND macros are also not always transparent to +multiple values, returning only the primary value of certain subforms. + + +File: pcl.info, Node: 20-6, Next: 20-7, Prev: 20-5, Up: Chapter 20 + +EVAL-WHEN +========= + +A special operator you'll need to understand in order to write certain +kinds of macros is EVAL-WHEN. For some reason, Lisp books often treat +EVAL-WHEN as a wizards-only topic. But the only prerequisite to +understanding EVAL-WHEN is an understanding of how the two functions +LOAD and COMPILE-FILE interact. And understanding EVAL-WHEN will be +important as you start writing certain kinds of more sophisticated +macros, such as the ones you'll write in Chapters 24 and 31. + + I've touched briefly on the relation between LOAD and COMPILE-FILE in +previous chapters, but it's worth reviewing again here. The job of LOAD +is to load a file and evaluate all the top-level forms it contains. The +job of COMPILE-FILE is to compile a source file into a FASL file, which +can then be loaded with LOAD such that (load "foo.lisp") and (load +"foo.fasl") are essentially equivalent. + + Because LOAD evaluates each form before reading the next, the side +effects of evaluating forms earlier in the file can affect how forms +later in the form are read and evaluated. For instance, evaluating an +IN-PACKAGE form changes the value of *PACKAGE*, which will affect the +way subsequent forms are read. (1) Similarly, a DEFMACRO form early in +a file can define a macro that can then be used by code later in the +file. (2) + + COMPILE-FILE, on the other hand, normally doesn't evaluate the forms +it's compiling; it's when the FASL is loaded that the forms-or their +compiled equivalents-will be evaluated. However, COMPILE-FILE must +evaluate some forms, such as IN-PACKAGE and DEFMACRO forms, in order to +keep the behavior of (load "foo.lisp") and (load "foo.fasl") consistent. + + So how do macros such as IN-PACKAGE and DEFMACRO work when processed +by COMPILE-FILE? In some pre-Common Lisp versions of Lisp, the file +compiler simply knew it should evaluate certain macros in addition to +compiling them. Common Lisp avoided the need for such kludges by +borrowing the EVAL-WHEN special operator from Maclisp. This operator, +as its name suggests, allows you to control when specific bits of code +are evaluated. The skeleton of an EVAL-WHEN form looks like this: + + (eval-when (situation*) + body-form*) + + There are three possible situations-:compile-toplevel, +:load-toplevel, and :execute-and which ones you specify controls when +the body-forms will be evaluated. An EVAL-WHEN with multiple situations +is equivalent to several EVAL-WHEN forms, one per situation, each with +the same body code. To explain the meaning of the three situations, +I'll need to explain a bit about how COMPILE-FILE, which is also +referred to as the file compiler, goes about compiling a file. + + To explain how COMPILE-FILE compiles EVAL-WHEN forms, I need to +introduce a distinction between compiling top-level forms and compiling +non-top-level forms. A top-level form is, roughly speaking, one that +will be compiled into code that will be run when the FASL is loaded. +Thus, all forms that appear directly at the top level of a source file +are compiled as top-level forms. Similarly, any forms appearing +directly in a top-level PROGN are compiled as top-level forms since the +PROGN itself doesn't do anything-it just groups together its subforms, +which will be run when the FASL is loaded. (3) Similarly, forms +appearing directly in a MACROLET or SYMBOL-MACROLET are compiled as +top-level forms because after the compiler has expanded the local macros +or symbol macros, there will be no remnant of the MACROLET or +SYMBOL-MACROLET in the compiled code. Finally, the expansion of a +top-level macro form will be compiled as a top-level form. + + Thus, a DEFUN appearing at the top level of a source file is a +top-level form-the code that defines the function and associates it with +its name will run when the FASL is loaded-but the forms within the body +of the function, which won't run until the function is called, aren't +top-level forms. Most forms are compiled the same when compiled as +top-level and non-top-level forms, but the semantics of an EVAL-WHEN +depend on whether it's being compiled as a top- level form, compiled as +a non-top-level form, or simply evaluated, combined with what situations +are listed in its situation list. + + The situations :compile-toplevel and :load-toplevel control the +meaning of an EVAL-WHEN compiled as a top-level form. When +:compile-toplevel is present, the file compiler will evaluate the +subforms at compile time. When :load-toplevel is present, it will +compile the subforms as top-level forms. If neither of these situations +is present in a top-level EVAL-WHEN, the compiler ignores it. + + When an EVAL-WHEN is compiled as a non-top-level form, it's either +compiled like a PROGN, if the :execute situation is specified, or +ignored. Similarly, an evaluated EVAL-WHEN-which includes top-level +EVAL-WHENs in a source file processed by LOAD and EVAL-WHENs evaluated +at compile time because they appear as subforms of a top-level EVAL-WHEN +with the :compile-toplevel situation-is treated like a PROGN if :execute +is present and ignored otherwise. + + Thus, a macro such as IN-PACKAGE can have the necessary effect at +both compile time and when loading from source by expanding into an +EVAL-WHEN like the following: + + (eval-when (:compile-toplevel :load-toplevel :execute) + (setf *package* (find-package "PACKAGE-NAME"))) + + *PACKAGE* will be set at compile time because of the +:compile-toplevel situation, set when the FASL is loaded because of +:load-toplevel, and set when the source is loaded because of the +:execute. + + There are two ways you're most likely to use EVAL-WHEN. One is if you +want to write macros that need to save some information at compile time +to be used when generating the expansion of other macro forms in the +same file. This typically arises with definitional macros where a +definition early in a file can affect the code generated for a +definition later in the same file. You'll write this kind of macro in +Chapter 24. + + The other time you might need EVAL-WHEN is if you want to put the +definition of a macro and helper functions it uses in the same file as +code that uses the macro. DEFMACRO already includes an EVAL-WHEN in its +expansion so the macro definition is immediately available to be used +later in the file. But DEFUN normally doesn't make function definitions +available at compile time. But if you use a macro in the same file as +it's defined in, you need the macro and any functions it uses to be +defined. If you wrap the DEFUNs of any helper functions used by the +macro in an EVAL-WHEN with :compile-toplevel, the definitions will be +available when the macro's expansion function runs. You'll probably +want to include :load-toplevel and :execute as well since the macros +will also need the function definitions after the file is compiled and +loaded or if you load the source instead of compiling. + + ---------- Footnotes ---------- + + (1) The reason loading a file with an IN-PACKAGE form in it has no +effect on the value of *PACKAGE* after LOAD returns is because LOAD +binds *PACKAGE* to its current value before doing anything else. In +other words, something equivalent to the following LET is wrapped around +the rest of the code in LOAD: + + (let ((*package* *package*)) ...) + + Any assignment to *PACKAGE* will be to the new binding, and the old +binding will be restored when LOAD returns. It also binds the variable +*READTABLE*, which I haven't discussed, in the same way. + + (2) In some implementations, you may be able to get away with +evaluating DEFUNs that use undefined macros in the function body as long +as the macros are defined before the function is actually called. But +that works, if at all, only when LOADing the definitions from source, +not when compiling with COMPILE-FILE, so in general macro definitions +must be evaluated before they're used. + + (3) By contrast, the subforms in a top-level LET aren't compiled as +top-level forms because they're not run directly when the FASL is +loaded. They will run, but it's in the runtime context of the bindings +established by the LET. Theoretically, a LET that binds no variables +could be treated like a PROGN, but it's not-the forms appearing in a LET +are never treated as top-level forms. + + +File: pcl.info, Node: 20-7, Next: Chapter 21, Prev: 20-6, Up: Chapter 20 + +Other Special Operators +======================= + +The four remaining special operators, LOCALLY, THE, LOAD-TIME-VALUE, and +PROGV, all allow you to get at parts of the underlying language that +can't be accessed any other way. LOCALLY and THE are part of Common +Lisp's declaration system, which is used to communicate things to the +compiler that don't affect the meaning of your code but that may help +the compiler generate better code-faster, clearer error messages, and so +on. (1) I'll discuss declarations briefly in Chapter 32. + + The other two, LOAD-TIME-VALUE and PROGV, are infrequently used, and +explaining the reason why you might ever want to use them would take +longer than explaining what they do. So I'll just tell you what they do +so you know they're there. Someday you'll hit on one of those rare +times when they're just the thing, and then you'll be ready. + + LOAD-TIME-VALUE is used, as its name suggests, to create a value +that's determined at load time. When the file compiler compiles code +that contains a LOAD-TIME-VALUE form, it arranges to evaluate the first +subform once, when the FASL is loaded, and for the code containing the +LOAD-TIME-VALUE form to refer to that value. In other words, instead of +writing this: + + (defvar *loaded-at* (get-universal-time)) + + (defun when-loaded () *loaded-at*) + + you can write the following: + + (defun when-loaded () (load-time-value (get-universal-time))) + + In code not processed by COMPILE-FILE, LOAD-TIME-VALUE is evaluated +once when the code is compiled, which may be when you explicitly compile +a function with COMPILE or earlier because of implicit compilation +performed by the implementation in the course of evaluating the code. +In uncompiled code, LOAD-TIME-VALUE evaluates its form each time it's +evaluated. + + Finally, PROGV creates new dynamic bindings for variables whose names +are determined at runtime. This is mostly useful for implementing +embedded interpreters for languages with dynamically scoped variables. +The basic skeleton is as follows: + + (progv symbols-list values-list + body-form*) + + where symbols-list is a form that evaluates to a list of symbols and +values-list is a form that evaluates to a list of values. Each symbol +is dynamically bound to the corresponding value, and then the body-forms +are evaluated. The difference between PROGV and LET is that because +symbols-list is evaluated at runtime, the names of the variables to bind +can be determined dynamically. As I say, this isn't something you need +to do often. + + And that's it for special operators. In the next chapter, I'll get +back to hard-nosed practical topics and show you how to use Common +Lisp's package system to take control of your namespaces so you can +write libraries and applications that can coexist without stomping on +each other's names. + + ---------- Footnotes ---------- + + (1) The one declaration that has an effect on the semantics of a +program is the SPECIAL declaration mentioned in Chapter 6. + + +File: pcl.info, Node: Chapter 21, Next: Chapter 22, Prev: Chapter 20, Up: Top + +21. Programming in the Large: Packages and Symbols +================================================== + +In Chapter 4 I discussed how the Lisp reader translates textual names +into objects to be passed to the evaluator, representing them with a +kind of object called a symbol. It turns out that having a built-in +data type specifically for representing names is quite handy for a lot +of kinds of programming. (1) That, however, isn't the topic of this +chapter. In this chapter I'll discuss one of the more immediate and +practical aspects of dealing with names: how to avoid name conflicts +between independently developed pieces of code. + + Suppose, for instance, you're writing a program and decide to use a +third-party library. You don't want to have to know the name of every +function, variable, class, or macro used in the internals of that +library in order to avoid conflicts between those names and the names +you use in your program. You'd like for most of the names in the +library and the names in your program to be considered distinct even if +they happen to have the same textual representation. At the same time, +you'd like certain names defined in the library to be readily +accessible-the names that make up its public API, which you'll want to +use in your program. + + In Common Lisp, this namespace problem boils down to a question of +controlling how the reader translates textual names into symbols: if you +want two occurrences of the same name to be considered the same by the +evaluator, you need to make sure the reader uses the same symbol to +represent each name. Conversely, if you want two names to be considered +distinct, even if they happen to have the same textual name, you need +the reader to create different symbols to represent each name. + +* Menu: + +* 21-1:: How the Reader Uses Packages +* 21-2:: A Bit of Package and Symbol Vocabulary +* 21-3:: Three Standard Packages +* 21-4:: Defining Your Own Packages +* 21-5:: Packaging Reusable Libraries +* 21-6:: Importing Individual Names +* 21-7:: Packaging Mechanics +* 21-8:: Package Gotchas + + ---------- Footnotes ---------- + + (1) The kind of programming that relies on a symbol data type is +called, appropriately enough, symbolic computation. It's typically +contrasted to numeric programming. An example of a primarily symbolic +program that all programmers should be familiar with is a compiler-it +treats the text of a program as symbolic data and translates it into a +new form. + + +File: pcl.info, Node: 21-1, Next: 21-2, Prev: Chapter 21, Up: Chapter 21 + +How the Reader Uses Packages +============================ + +In Chapter 4 I discussed briefly how the Lisp reader translates names +into symbols, but I glossed over most of the details-now it's time to +take a closer look at what actually happens. + + I'll start by describing the syntax of names understood by the reader +and how that syntax relates to packages. For the moment you can think +of a package as a table that maps strings to symbols. As you'll see in +the next section, the actual mapping is slightly more flexible than a +simple lookup table but not in ways that matter much to the reader. +Each package also has a name, which can be used to find the package +using the function FIND-PACKAGE. + + The two key functions that the reader uses to access the +name-to-symbol mappings in a package are FIND-SYMBOL and INTERN. Both +these functions take a string and, optionally, a package. If not +supplied, the package argument defaults to the value of the global +variable *PACKAGE*, also called the current package. + + FIND-SYMBOL looks in the package for a symbol with the given string +for a name and returns it, or NIL if no symbol is found. INTERN also +will return an existing symbol; otherwise it creates a new symbol with +the string as its name and adds it to the package. + + Most names you use are unqualified, names that contain no colons. +When the reader reads such a name, it translates it to a symbol by +converting any unescaped letters to uppercase and passing the resulting +string to INTERN. Thus, each time the reader reads the same name in the +same package, it'll get the same symbol object. This is important +because the evaluator uses the object identity of symbols to determine +which function, variable, or other program element a given symbol refers +to. Thus, the reason an expression such as (hello-world) results in +calling a particular hello-world function is because the reader returns +the same symbol when it reads the function call as it did when it read +the DEFUN form that defined the function. + + A name containing either a single colon or a double colon is a +package-qualified name. When the reader reads a package-qualified name, +it splits the name on the colon(s) and uses the first part as the name +of a package and the second part as the name of the symbol. The reader +looks up the appropriate package and uses it to translate the symbol +name to a symbol object. + + A name containing only a single colon must refer to an external +symbol-one the package exports for public use. If the named package +doesn't contain a symbol with a given name, or if it does but it hasn't +been exported, the reader signals an error. A double-colon name can +refer to any symbol from the named package, though it's usually a bad +idea-the set of exported symbols defines a package's public interface, +and if you don't respect the package author's decision about what names +to make public and which ones to keep private, you're asking for trouble +down the road. On the other hand, sometimes a package author will +neglect to export a symbol that really ought to be public. In that +case, a double-colon name lets you get work done without having to wait +for the next version of the package to be released. + + Two other bits of symbol syntax the reader understands are those for +keyword symbols and uninterned symbols. Keyword symbols are written +with names starting with a colon. Such symbols are interned in the +package named KEYWORD and automatically exported. Additionally, when +the reader interns a symbol in the KEYWORD, it also defines a constant +variable with the symbol as both its name and value. This is why you +can use keywords in argument lists without quoting them-when they appear +in a value position, they evaluate to themselves. Thus: + + (eql ':foo :foo) ==> T + + The names of keyword symbols, like all symbols, are converted to all +uppercase by the reader before they're interned. The name doesn't +include the leading colon. + + (symbol-name :foo) ==> "FOO" + + Uninterned symbols are written with a leading #:. These names (minus +the #:) are converted to uppercase as normal and then translated into +symbols, but the symbols aren't interned in any package; each time the +reader reads a #: name, it creates a new symbol. Thus: + + (eql '#:foo '#:foo) ==> NIL + + You'll rarely, if ever, write this syntax yourself, but will +sometimes see it when you print an s-expression containing symbols +returned by the function GENSYM. + + (gensym) ==> #:G3128 + + +File: pcl.info, Node: 21-2, Next: 21-3, Prev: 21-1, Up: Chapter 21 + +A Bit of Package and Symbol Vocabulary +====================================== + +As I mentioned previously, the mapping from names to symbols implemented +by a package is slightly more flexible than a simple lookup table. At +its core, every package contains a name-to-symbol lookup table, but a +symbol can be made accessible via an unqualified name in a given package +in other ways. To talk sensibly about these other mechanisms, you'll +need a little bit of vocabulary. + + To start with, all the symbols that can be found in a given package +using FIND-SYMBOL are said to be accessible in that package. In other +words, the accessible symbols in a package are those that can be +referred to with unqualified names when the package is current. + + A symbol can be accessible in two ways. The first is for the +package's name-to-symbol table to contain an entry for the symbol, in +which case the symbol is said to be present in the package. When the +reader interns a new symbol in a package, it's added to the package's +name-to-symbol table. The package in which a symbol is first interned +is called the symbol's home package. + + The other way a symbol can be accessible in a package is if the +package inherits it. A package inherits symbols from other packages by +using the other packages. Only external symbols in the used packages +are inherited. A symbol is made external in a package by exporting it. +In addition to causing it to be inherited by using packages, exporting a +symbol also-as you saw in the previous section-makes it possible to +refer to the symbol using a single-colon qualified name. + + To keep the mappings from names to symbols deterministic, the package +system allows only one symbol to be accessible in a given package for +each name. That is, a package can't have a present symbol and an +inherited symbol with the same name or inherit two different symbols, +from different packages, with the same name. However, you can resolve +conflicts by making one of the accessible symbols a shadowing symbol, +which makes the other symbols of the same name inaccessible. In +addition to its name-to-symbol table, each package maintains a list of +shadowing symbols. + + An existing symbol can be imported into another package by adding it +to the package's name-to-symbol table. Thus, the same symbol can be +present in multiple packages. Sometimes you'll import symbols simply +because you want them to be accessible in the importing package without +using their home package. Other times you'll import a symbol because +only present symbols can be exported or be shadowing symbols. For +instance, if a package needs to use two packages that have external +symbols of the same name, one of the symbols must be imported into the +using package in order to be added to its shadowing list and make the +other symbol inaccessible. + + Finally, a present symbol can be uninterned from a package, which +causes it to be removed from the name-to-symbol table and, if it's a +shadowing symbol, from the shadowing list. You might unintern a symbol +from a package to resolve a conflict between the symbol and an external +symbol from a package you want to use. A symbol that isn't present in +any package is called an uninterned symbol, can no longer be read by the +reader, and will be printed using the #:foo syntax. + + +File: pcl.info, Node: 21-3, Next: 21-4, Prev: 21-2, Up: Chapter 21 + +Three Standard Packages +======================= + +In the next section I'll show you how to define your own packages, +including how to make one package use another and how to export, shadow, +and import symbols. But first let's look at a few packages you've been +using already. When you first start Lisp, the value of *PACKAGE* is +typically the COMMON-LISP-USER package, also known as CL-USER. (1) +CL-USER uses the package COMMON-LISP, which exports all the names +defined by the language standard. Thus, when you type an expression at +the REPL, all the names of standard functions, macros, variables, and so +on, will be translated to the symbols exported from COMMON-LISP, and all +other names will be interned in the COMMON-LISP-USER package. For +example, the name *PACKAGE* is exported from COMMON-LISP-if you want to +see the value of *PACKAGE*, you can type this: + + CL-USER> *package* + #<The COMMON-LISP-USER package> + + because COMMON-LISP-USER uses COMMON-LISP. Or you can use a +package-qualified name. + + CL-USER> common-lisp:*package* + #<The COMMON-LISP-USER package> + + You can even use COMMON-LISP's nickname, CL. + + CL-USER> cl:*package* + #<The COMMON-LISP-USER package> + + But *X* isn't a symbol in COMMON-LISP, so you if type this: + + CL-USER> (defvar *x* 10) + *X* + + the reader reads DEFVAR as the symbol from the COMMON-LISP package +and *X* as a symbol in COMMON-LISP-USER. + + The REPL can't start in the COMMON-LISP package because you're not +allowed to intern new symbols in it; COMMON-LISP-USER serves as a +"scratch" package where you can create your own names while still having +easy access to all the symbols in COMMON-LISP. (2) Typically, all +packages you'll define will also use COMMON-LISP, so you don't have to +write things like this: + + (cl:defun (x) (cl:+ x 2)) + + The third standard package is the KEYWORD package, the package the +Lisp reader uses to intern names starting with colon. Thus, you can +also refer to any keyword symbol with an explicit package qualification +of keyword like this: + + CL-USER> :a + :A + CL-USER> keyword:a + :A + CL-USER> (eql :a keyword:a) + T + + ---------- Footnotes ---------- + + (1) Every package has one official name and zero or more nicknames +that can be used anywhere you need to use the package name, such as in +package-qualified names or to refer to the package in a DEFPACKAGE or +IN-PACKAGE form. + + (2) COMMON-LISP-USER is also allowed to provide access to symbols +exported by other implementation-defined packages. While this is +intended as a convenience for the user-it makes implementation-specific +functionality readily accessible-it can also cause confusion for new +Lispers: Lisp will complain about an attempt to redefine some name that +isn't listed in the language standard. To see what packages +COMMON-LISP-USER inherits symbols from in a particular implementation, +evaluate this expression at the REPL: + + (mapcar #'package-name (package-use-list :cl-user)) + + And to find out what package a symbol came from originally, evaluate +this: + + (package-name (symbol-package 'some-symbol)) + + with some-symbol replaced by the symbol in question. For instance: + + (package-name (symbol-package 'car)) ==> "COMMON-LISP" + (package-name (symbol-package 'foo)) ==> "COMMON-LISP-USER" + + Symbols inherited from implementation-defined packages will return +some other value. + + +File: pcl.info, Node: 21-4, Next: 21-5, Prev: 21-3, Up: Chapter 21 + +Defining Your Own Packages +========================== + +Working in COMMON-LISP-USER is fine for experiments at the REPL, but +once you start writing actual programs you'll want to define new +packages so different programs loaded into the same Lisp environment +don't stomp on each other's names. And when you write libraries that +you intend to use in different contexts, you'll want to define separate +packages and then export the symbols that make up the libraries' public +APIs. + + However, before you start defining packages, it's important to +understand one thing about what packages do not do. Packages don't +provide direct control over who can call what function or access what +variable. They provide you with basic control over namespaces by +controlling how the reader translates textual names into symbol objects, +but it isn't until later, in the evaluator, that the symbol is +interpreted as the name of a function or variable or whatever else. +Thus, it doesn't make sense to talk about exporting a function or a +variable from a package. You can export symbols to make certain names +easier to refer to, but the package system doesn't allow you to restrict +how those names are used. (1) + + With that in mind, you can start looking at how to define packages +and tie them together. You define new packages with the macro +DEFPACKAGE, which allows you to not only create the package but to +specify what packages it uses, what symbols it exports, and what symbols +it imports from other packages and to resolve conflicts by creating +shadowing symbols. (2) + + I'll describe the various options in terms of how you might use +packages while writing a program that organizes e-mail messages into a +searchable database. The program is purely hypothetical, as are the +libraries I'll refer to-the point is to look at how the packages used in +such a program might be structured. + + The first package you'd need is one to provide a namespace for the +application-you want to be able to name your functions, variables, and +so on, without having to worry about name collisions with unrelated +code. So you'd define a new package with DEFPACKAGE. + + If the application is simple enough to be written with no libraries +beyond the facilities provided by the language itself, you could define +a simple package like this: + + (defpackage :com.gigamonkeys.email-db + (:use :common-lisp)) + + This defines a package, named COM.GIGAMONKEYS.EMAIL-DB, that inherits +all the symbols exported by the COMMON-LISP package. (3) + + You actually have several choices of how to represent the names of +packages and, as you'll see, the names of symbols in a DEFPACKAGE. +Packages and symbols are named with strings. However, in a DEFPACKAGE +form, you can specify the names of packages and symbols with string +designators. A string designator is either a string, which designates +itself; a symbol, which designates its name; or a character, which +designates a one-character string containing just the character. Using +keyword symbols, as in the previous DEFPACKAGE, is a common style that +allows you to write the names in lowercase-the reader will convert the +names to uppercase for you. You could also write the DEFPACKAGE with +strings, but then you have to write them in all uppercase, because the +true names of most symbols and packages are in fact uppercase because of +the case conversion performed by the reader. (4) + + (defpackage "COM.GIGAMONKEYS.EMAIL-DB" + (:use "COMMON-LISP")) + + You could also use nonkeyword symbols-the names in DEFPACKAGE aren't +evaluated-but then the very act of reading the DEFPACKAGE form would +cause those symbols to be interned in the current package, which at the +very least will pollute that namespace and may also cause problems later +if you try to use the package. (5) + + To read code in this package, you need to make it the current package +with the IN-PACKAGE macro: + + (in-package :com.gigamonkeys.email-db) + + If you type this expression at the REPL, it will change the value of +*PACKAGE*, affecting how the REPL reads subsequent expressions, until +you change it with another call to IN-PACKAGE. Similarly, if you include +an IN-PACKAGE in a file that's loaded with LOAD or compiled with +COMPILE-FILE, it will change the package, affecting the way subsequent +expressions in the file are read. (6) + + With the current package set to the COM.GIGAMONKEYS.EMAIL-DB package, +other than names inherited from the COMMON-LISP package, you can use any +name you want for whatever purpose you want. Thus, you could define a +new hello-world function that could coexist with the hello-world +function previously defined in COMMON-LISP-USER. Here's the behavior of +the existing function: + + CL-USER> (hello-world) + hello, world + NIL + + Now you can switch to the new package using IN-PACKAGE. (7) Notice +how the prompt changes-the exact form is determined by the development +environment, but in SLIME the default prompt consists of an abbreviated +version of the package name. + + CL-USER> (in-package :com.gigamonkeys.email-db) + #<The COM.GIGAMONKEYS.EMAIL-DB package> + EMAIL-DB> + + You can define a new hello-world in this package: + + EMAIL-DB> (defun hello-world () (format t "hello from EMAIL-DB package~%")) + HELLO-WORLD + + And test it, like this: + + EMAIL-DB> (hello-world) + hello from EMAIL-DB package + NIL + + Now switch back to CL-USER. + + EMAIL-DB> (in-package :cl-user) + #<The COMMON-LISP-USER package> + CL-USER> + + And the old function is undisturbed. + + CL-USER> (hello-world) + hello, world + NIL + + ---------- Footnotes ---------- + + (1) This is different from the Java package system, which provides a +namespace for classes but is also involved in Java's access control +mechanism. The non-Lisp language with a package system most like Common +Lisp's packages is Perl. + + (2) All the manipulations performed by DEFPACKAGE can also be +performed with functions that man- ipulate package objects. However, +since a package generally needs to be fully defined before it can be +used, those functions are rarely used. Also, DEFPACKAGE takes care of +performing all the package manipulations in the right order-for +instance, DEFPACKAGE adds symbols to the shadowing list before it tries +to use the used packages. + + (3) In many Lisp implementations the :use clause is optional if you +want only to :use COMMON-LISP-if it's omitted, the package will +automatically inherit names from an implementation-defined list of +packages that will usually include COMMON-LISP. However, your code will +be more portable if you always explicitly specify the packages you want +to :use. Those who are averse to typing can use the package's nickname +and write (:use :cl). + + (4) Using keywords instead of strings has another advantage-Allegro +provides a "modern mode" Lisp in which the reader does no case +conversion of names and in which, instead of a COMMON-LISP package with +uppercase names, provides a common-lisp package with lowercase names. +Strictly speaking, this Lisp isn't a conforming Common Lisp since all +the names in the standard are defined to be uppercase. But if you write +your DEFPACKAGE forms using keyword symbols, they will work both in +Common Lisp and in this near relative. + + (5) Some folks, instead of keywords, use uninterned symbols, using +the #: syntax. + + (defpackage #:com.gigamonkeys.email-db + (:use #:common-lisp)) + + This saves a tiny bit of memory by not interning any symbols in the +keyword package-the symbol can become garbage after DEFPACKAGE (or the +code it expands into) is done with it. However, the difference is so +slight that it really boils down to a matter of aesthetics. + + (6) The reason to use IN-PACKAGE instead of just SETFing *PACKAGE* is +that IN-PACKAGE expands into code that will run when the file is +compiled by COMPILE-FILE as well as when the file is loaded, changing +the way the reader reads the rest of the file during compilation. + + (7) In the REPL buffer in SLIME you can also change packages with a +REPL shortcut. Type a comma, and then enter change-package at the +Command: prompt. + + +File: pcl.info, Node: 21-5, Next: 21-6, Prev: 21-4, Up: Chapter 21 + +Packaging Reusable Libraries +============================ + +While working on the e-mail database, you might write several functions +related to storing and retrieving text that don't have anything in +particular to do with e-mail. You might realize that those functions +could be useful in other programs and decide to repackage them as a +library. You should define a new package, but this time you'll export +certain names to make them available to other packages. + + (defpackage :com.gigamonkeys.text-db + (:use :common-lisp) + (:export :open-db + :save + :store)) + + Again, you use the COMMON-LISP package, because you'll need access to +standard functions within COM.GIGAMONKEYS.TEXT-DB. The :export clause +specifies names that will be external in COM.GIGAMONKEYS.TEXT-DB and +thus accessible in packages that :use it. Therefore, after you've +defined this package, you can change the definition of the main +application package to the following: + + (defpackage :com.gigamonkeys.email-db + (:use :common-lisp :com.gigamonkeys.text-db)) + + Now code written in COM.GIGAMONKEYS.EMAIL-DB can use unqualified +names to refer to the exported symbols from both COMMON-LISP and +COM.GIGAMONKEYS.TEXT-DB. All other names will continue to be interned +directly in the COM.GIGAMONKEYS.EMAIL-DB package. + + +File: pcl.info, Node: 21-6, Next: 21-7, Prev: 21-5, Up: Chapter 21 + +Importing Individual Names +========================== + +Now suppose you find a third-party library of functions for manipulating +e-mail messages. The names used in the library's API are exported from +the package COM.ACME.EMAIL, so you could :use that package to get easy +access to those names. But suppose you need to use only one function +from this library, and other exported symbols conflict with names you +already use (or plan to use) in our own code. (1) In this case, you can +import the one symbol you need with an :import-from clause in the +DEFPACKAGE. For instance, if the name of the function you want to use is +parse-email-address, you can change the DEFPACKAGE to this: + + (defpackage :com.gigamonkeys.email-db + (:use :common-lisp :com.gigamonkeys.text-db) + (:import-from :com.acme.email :parse-email-address)) + + Now anywhere the name parse-email-address appears in code read in the +COM.GIGAMONKEYS.EMAIL-DB package, it will be read as the symbol from +COM.ACME.EMAIL. If you need to import more than one symbol from a single +package, you can include multiple names after the package name in a +single :import-from clause. A DEFPACKAGE can also include multiple +:import-from clauses in order to import symbols from different packages. + + Occasionally you'll run into the opposite situation-a package may +export a bunch of names you want to use and a few you don't. Rather +than listing all the symbols you do want to use in an :import-from +clause, you can instead :use the package and then list the names you +don't want to inherit in a :shadow clause. For instance, suppose the +COM.ACME.TEXT package exports a bunch of names of functions and classes +used in text processing. Further suppose that most of these functions +and classes are ones you'll want to use in your code, but one of the +names, build-index, conflicts with a name you've already used. You can +make the build-index from COM.ACME.TEXT inaccessible by shadowing it. + + (defpackage :com.gigamonkeys.email-db + (:use + :common-lisp + :com.gigamonkeys.text-db + :com.acme.text) + (:import-from :com.acme.email :parse-email-address) + (:shadow :build-index)) + + The :shadow clause causes a new symbol named BUILD-INDEX to be +created and added directly to COM.GIGAMONKEYS.EMAIL-DB's name-to-symbol +map. Now if the reader reads the name BUILD-INDEX, it will translate it +to the symbol in COM.GIGAMONKEYS.EMAIL-DB's map, rather than the one +that would otherwise be inherited from COM.ACME.TEXT. The new symbol is +also added to a shadowing symbols list that's part of the +COM.GIGAMONKEYS.EMAIL-DB package, so if you later use another package +that also exports a BUILD-INDEX symbol, the package system will know +there's no conflict-that you want the symbol from +COM.GIGAMONKEYS.EMAIL-DB to be used rather than any other symbols with +the same name inherited from other packages. + + A similar situation can arise if you want to use two packages that +export the same name. In this case the reader won't know which +inherited name to use when it reads the textual name. In such +situations you must resolve the ambiguity by shadowing the conflicting +names. If you don't need to use the name from either package, you could +shadow the name with a :shadow clause, creating a new symbol with the +same name in your package. But if you actually want to use one of the +inherited symbols, then you need to resolve the ambiguity with a +:shadowing-import-from clause. Like an :import-from clause, a +:shadowing-import-from clause consists of a package name followed by the +names to import from that package. For instance, if COM.ACME.TEXT +exports a name SAVE that conflicts with the name exported from +COM.GIGAMONKEYS.TEXT-DB, you could resolve the ambiguity with the +following DEFPACKAGE: + + (defpackage :com.gigamonkeys.email-db + (:use + :common-lisp + :com.gigamonkeys.text-db + :com.acme.text) + (:import-from :com.acme.email :parse-email-address) + (:shadow :build-index) + (:shadowing-import-from :com.gigamonkeys.text-db :save)) + + ---------- Footnotes ---------- + + (1) During development, if you try to :use a package that exports a +symbol with the same name as a symbol already interned in the using +package, Lisp will signal an error and typically offer you a restart +that will unintern the offending symbol from the using package. For +more on this, see the section "Package Gotchas." + + +File: pcl.info, Node: 21-7, Next: 21-8, Prev: 21-6, Up: Chapter 21 + +Packaging Mechanics +=================== + +That covers the basics of how to use packages to manage namespaces in +several common situations. However, another level of how to use +packages is worth discussing-the raw mechanics of how to organize code +that uses different packages. In this section I'll discuss a few rules +of thumb about how to organize code-where to put your DEFPACKAGE forms +relative to the code that uses your packages via IN-PACKAGE. + + Because packages are used by the reader, a package must be defined +before you can LOAD or COMPILE-FILE a file that contains an IN-PACKAGE +expression switching to that package. Packages also must be defined +before other DEFPACKAGE forms can refer to them. For instance, if +you're going to :use COM.GIGAMONKEYS.TEXT-DB in +COM.GIGAMONKEYS.EMAIL-DB, then COM.GIGAMONKEYS.TEXT-DB's DEFPACKAGE must +be evaluated before the DEFPACKAGE of COM.GIGAMONKEYS.EMAIL-DB. + + The best first step toward making sure packages exist when they need +to is to put all your DEFPACKAGEs in files separate from the code that +needs to be read in those packages. Some folks like to create a +foo-package.lisp file for each individual package, and others create a +single packages.lisp that contains all the DEFPACKAGE forms for a group +of related packages. Either approach is reasonable, though the +one-file-per-package approach also requires that you arrange to load the +individual files in the right order according to the interpackage +dependencies. + + Either way, once all the DEFPACKAGE forms have been separated from +the code that will be read in the packages they define, you can arrange +to LOAD the files containing the DEFPACKAGEs before you compile or load +any of the other files. For simple programs you can do this by hand: +simply LOAD the file or files containing the DEFPACKAGE forms, possibly +compiling them with COMPILE-FILE first. Then LOAD the files that use +those packages, again optionally compiling them first with COMPILE-FILE. +Note, however, that the packages don't exist until you LOAD the package +definitions, either the source or the files produced by COMPILE-FILE. +Thus, if you're compiling everything, you must still LOAD all the +package definitions before you can COMPILE-FILE any files to be read in +the packages. + + Doing these steps by hand will get tedious after a while. For simple +programs you can automate the steps by writing a file, load.lisp, that +contains the appropriate LOAD and COMPILE-FILE calls in the right order. +Then you can just LOAD that file. For more complex programs you'll want +to use a system definition facility to manage loading and compiling +files in the right order. (1) + + The other key rule of thumb is that each file should contain exactly +one IN-PACKAGE form, and it should be the first form in the file other +than comments. Files containing DEFPACKAGE forms should start with +(in-package "COMMON-LISP-USER"), and all other files should contain an +IN-PACKAGE of one of your packages. + + If you violate this rule and switch packages in the middle of a file, +you'll confuse human readers who don't notice the second IN-PACKAGE. +Also, many Lisp development environments, particularly Emacs-based ones +such as SLIME, look for an IN-PACKAGE to determine the package they +should use when communicating with Common Lisp. Multiple IN-PACKAGE +forms per file may confuse these tools as well. + + On the other hand, it's fine to have multiple files read in the same +package, each with an identical IN-PACKAGE form. It's just a matter of +how you like to organize your code. + + The other bit of packaging mechanics has to do with how to name +packages. Package names live in a flat namespace-package names are just +strings, and different packages must have textually distinct names. +Thus, you have to consider the possibility of conflicts between package +names. If you're using only packages you developed yourself, then you +can probably get away with using short names for your packages. But if +you're planning to use third-party libraries or to publish your code for +use by other programmers, then you need to follow a naming convention +that will minimize the possibility of name collisions between different +packages. Many Lispers these days are adopting Java-style names, like +the ones used in this chapter, consisting of a reversed Internet domain +name followed by a dot and a descriptive string. + + ---------- Footnotes ---------- + + (1) The code for the "Practical" chapters, available from this book's +Web site, uses the ASDF system definition library. ASDF stands for +Another System Definition Facility. + + +File: pcl.info, Node: 21-8, Next: Chapter 22, Prev: 21-7, Up: Chapter 21 + +Package Gotchas +=============== + +Once you're familiar with packages, you won't spend a bunch of time +thinking about them. There's just not that much to them. However, a +couple of gotchas that bite most new Lisp programmers make the package +system seem more complicated and unfriendly than it really is. + + The number-one gotcha arises most commonly when playing around at the +REPL. You'll be looking at some library that defines certain interesting +functions. You'll try to call one of the functions like this: + + CL-USER> (foo) + + and get dropped into the debugger with this error: + + attempt to call `FOO' which is an undefined function. + [Condition of type UNDEFINED-FUNCTION] + + Restarts: + 0: [TRY-AGAIN] Try calling FOO again. + 1: [RETURN-VALUE] Return a value instead of calling FOO. + 2: [USE-VALUE] Try calling a function other than FOO. + 3: [STORE-VALUE] Setf the symbol-function of FOO and call it again. + 4: [ABORT] Abort handling SLIME request. + 5: [ABORT] Abort entirely from this (lisp) process. + + Ah, of course-you forgot to use the library's package. So you quit +the debugger and try to USE-PACKAGE the library's package in order to +get access to the name FOO so you can call the function. + + CL-USER> (use-package :foolib) + + But that drops you back into the debugger with this error message: + + Using package `FOOLIB' results in name conflicts for these symbols: FOO + [Condition of type PACKAGE-ERROR] + + Restarts: + 0: [CONTINUE] Unintern the conflicting symbols from the `COMMON-LISP-USER' package. + 1: [ABORT] Abort handling SLIME request. + 2: [ABORT] Abort entirely from this (lisp) process. + + Huh? The problem is the first time you called foo, the reader read +the name foo and interned it in CL-USER before the evaluator got hold of +it and discovered that this newly interned symbol isn't the name of a +function. This new symbol then conflicts with the symbol of the same +name exported from the FOOLIB package. If you had remembered to +USE-PACKAGE FOOLIB before you tried to call foo, the reader would have +read foo as the inherited symbol and not interned a foo symbol in +CL-USER. + + However, all isn't lost, because the first restart offered by the +debugger will patch things up in just the right way: it will unintern +the foo symbol from COMMON-LISP-USER, putting the CL-USER package back +to the state it was in before you called foo, allowing the USE-PACKAGE +to proceed and allowing for the inherited foo to be available in +CL-USER. + + This kind of problem can also occur when loading and compiling files. +For instance, if you defined a package, MY-APP, for code that was going +to use functions with names from the FOOLIB package, but forgot to :use +FOOLIB, when you compile the files with an (in-package :my-app) in them, +the reader will intern new symbols in MY-APP for the names that were +supposed to be read as symbols from FOOLIB. When you try to run the +compiled code, you'll get undefined function errors. If you then try to +redefine the MY-APP package to :use FOOLIB, you'll get the conflicting +symbols error. The solution is the same: select the restart to unintern +the conflicting symbols from MY-APP. You'll then need to recompile the +code in the MY-APP package so it will refer to the inherited names. + + The next gotcha is essentially the reverse of the first gotcha. In +this case, you'd have defined a package-again, let's say it's +MY-APP-that uses another package, say, FOOLIB. Now you start writing +code in the MY-APP package. Although you used FOOLIB in order to be +able to refer to the foo function, FOOLIB may export other symbols as +well. If you use one of those exported symbols-say, bar-as the name of +a function in your own code, Lisp won't complain. Instead, the name of +your function will be the symbol exported by FOOLIB, which will clobber +the definition of bar from FOOLIB. + + This gotcha is more insidious because it doesn't cause an error-from +the evaluator's point of view it's just being asked to associate a new +function with an old name, something that's perfectly legal. It's +suspect only because the code doing the redefining was read with a +different value for *PACKAGE* than the name's package. But the +evaluator doesn't necessarily know that. However, in most Lisps you'll +get an warning about "redefining BAR, originally defined in?". You +should heed those warnings. If you clobber a definition from a library, +you can restore it by reloading the library code with LOAD. (1) + + The last package-related gotcha is, by comparison, quite trivial, but +it bites most Lisp programmers at least a few times: you define a +package that uses COMMON-LISP and maybe a few libraries. Then at the +REPL you change to that package to play around. Then you decide to quit +Lisp altogether and try to call (quit). However, quit isn't a name from +the COMMON-LISP package-it's defined by the implementation in some +implementation-specific package that happens to be used by +COMMON-LISP-USER. The solution is simple-change packages back to CL-USER +to quit. Or use the SLIME REPL shortcut quit, which will also save you +from having to remember that in certain Common Lisp implementations the +function to quit is exit, not quit. + + You're almost done with your tour of Common Lisp. In the next +chapter I'll discuss the details of the extended LOOP macro. After +that, the rest of the book is devoted to "practicals": a spam filter, a +library for parsing binary files, and various parts of a streaming MP3 +server with a Web interface. + + ---------- Footnotes ---------- + + (1) Some Common Lisp implementations, such as Allegro and SBCL, +provide a facility for "locking" the symbols in a particular package so +they can be used in defining forms such as DEFUN, DEFVAR, and DEFCLASS +only when their home package is the current package. + + +File: pcl.info, Node: Chapter 22, Next: Chapter 23, Prev: Chapter 21, Up: Top + +22. LOOP for Black Belts +======================== + +In Chapter 7 I briefly discussed the extended LOOP macro. As I +mentioned then, LOOP provides what is essentially a special-purpose +language just for writing iteration constructs. + + This might seem like a lot of bother-inventing a whole language just +for writing loops. But if you think about the ways loops are used in +programs, it actually makes a fair bit of sense. Any program of any +size at all will contain quite a number of loops. And while they won't +all be the same, they won't all be unique either; patterns will emerge, +particularly if you include the code immediately preceding and following +the loops-patterns of how things are set up for the loop, patterns in +what gets done in the loop proper, and patterns in what gets done after +the loop. The LOOP language captures these patterns so you can express +them directly. + + The LOOP macro has a lot of parts-one of the main complaints of +LOOP's detractors is that it's too complex. In this chapter, I'll +tackle LOOP head on, giving you a systematic tour of the various parts +and how they fit together. + +* Menu: + +* 22-1:: The Parts of a LOOP +* 22-2:: Iteration Control +* 22-3:: Counting Loops +* 22-4:: Looping Over Collections and Packages +* 22-5:: Equals-Then Iteration +* 22-6:: Local Variables +* 22-7:: Destructuring Variables +* 22-8:: Value Accumulation +* 22-9:: Unconditional Execution +* 22-10:: Conditional Execution +* 22-11:: Setting Up and Tearing Down +* 22-12:: Termination Tests +* 22-13:: Putting It All Together + + +File: pcl.info, Node: 22-1, Next: 22-2, Prev: Chapter 22, Up: Chapter 22 + +The Parts of a LOOP +=================== + +You can do the following in a LOOP: + + * Step variables numerically and over various data structures + * Collect, count, sum, minimize, and maximize values seen while + looping + * Execute arbitrary Lisp expressions + * Decide when to terminate the loop + * Conditionally do any of these + + Additionally, LOOP provides syntax for the following: + + * Creating local variables for use within the loop + * Specifying arbitrary Lisp expressions to run before and after the + loop proper + + The basic structure of a LOOP is a set of clauses, each of which +begins with a loop keyword.(1) How each clause is parsed by the LOOP +macro depends on the keyword. Some of the main keywords, which you saw +in Chapter 7, are for, collecting, summing, counting, do, and finally. + + ---------- Footnotes ---------- + + (1) The term loop keyword is a bit unfortunate, as loop keywords +aren't keywords in the normal sense of being symbols in the KEYWORD +package. In fact, any symbol, from any package, with the appropriate +name will do; the LOOP macro cares only about their names. Typically, +though, they're written with no package qualifier and are thus read (and +interned as necessary) in the current package. + + +File: pcl.info, Node: 22-2, Next: 22-3, Prev: 22-1, Up: Chapter 22 + +Iteration Control +================= + +Most of the so-called iteration control clauses start with the loop +keyword for, or its synonym as, (1) followed by the name of a variable. +What follows after the variable name depends on the type of for clause. + + The subclauses of a for clause can iterate over the following: + + * Ranges of numbers, up or down, by specified intervals + * The individual items of a list + * The cons cells that make up a list + * The elements of a vector, including subtypes such as strings and + bit vectors + * The pairs of a hash table + * The symbols in a package + * The results of repeatedly evaluating a given form + + A single loop can have multiple for clauses with each clause naming +its own variable. When a loop has multiple for clauses, the loop +terminates as soon as any for clause reaches its end condition. For +instance, the following loop: + + (loop + for item in list + for i from 1 to 10 + do (something)) + + will iterate at most ten times but may stop sooner if list contains +fewer than ten items. + + ---------- Footnotes ---------- + + (1) Because one of the goals of LOOP is to allow loop expressions to +be written with a quasi-English syntax, many of the keywords have +synonyms that are treated the same by LOOP but allow some freedom to +express things in slightly more idiomatic English for different +contexts. + + +File: pcl.info, Node: 22-3, Next: 22-4, Prev: 22-2, Up: Chapter 22 + +Counting Loops +============== + +Arithmetic iteration clauses control the number of times the loop body +will be executed by stepping a variable over a range of numbers, +executing the body once per step. These clauses consist of from one to +three of the following prepositional phrases after the for (or as): the +from where phrase, the to where phrase, and the by how much phrase. + + The from where phrase specifies the initial value of the clause's +variable. It consists of one of the prepositions from, downfrom, or +upfrom followed by a form, which supplies the initial value (a number). + + The to where phrase specifies a stopping point for the loop and +consists of one of the prepositions to, upto, below, downto, or above +followed by a form, which supplies the stopping point. With upto and +downto, the loop body will be terminated (without executing the body +again) when the variable passes the stopping point; with below and +above, it stops one iteration earlier.The by how much phrase consists of +the prepositions by and a form, which must evaluate to a positive +number. The variable will be stepped (up or down, as determined by the +other phrases) by this amount on each iteration or by one if it's +omitted. + + You must specify at least one of these prepositional phrases. The +defaults are to start at zero, increment the variable by one at each +iteration, and go forever or, more likely, until some other clause +terminates the loop. You can modify any or all of these defaults by +adding the appropriate prepositional phrases. The only wrinkle is that +if you want decremental stepping, there's no default from where value, +so you must specify one with either from or downfrom. So, the +following: + + (loop for i upto 10 collect i) + + collects the first eleven integers (zero to ten), but the behavior of +this: + + (loop for i downto -10 collect i) ; wrong + + is undefined. Instead, you need to write this: + + (loop for i from 0 downto -10 collect i) + + Also note that because LOOP is a macro, which runs at compile time, +it has to be able to determine the direction to step the variable based +solely on the prepositions-not the values of the forms, which may not be +known until runtime. So, the following: + + (loop for i from 10 to 20 ...) + + works fine since the default is incremental stepping. But this: + + (loop for i from 20 to 10 ...) + + won't know to count down from twenty to ten. Worse yet, it won't +give you an error-it will just not execute the loop since i is already +greater than ten. Instead, you must write this: + + (loop for i from 20 downto 10 ...) + + or this: + + (loop for i downfrom 20 to 10 ...) + + Finally, if you just want a loop that repeats a certain number of +times, you can replace a clause of the following form: + + for i from 1 to number-form + + with a repeat clause like this: + + repeat number-form + + These clauses are identical in effect except the repeat clause +doesn't create an explicit loop variable. + + +File: pcl.info, Node: 22-4, Next: 22-5, Prev: 22-3, Up: Chapter 22 + +Looping Over Collections and Packages +===================================== + +The for clauses for iterating over lists are much simpler than the +arithmetic clauses. They support only two prepositional phrases, in and +on. + + A phrase of this form: + + for var in list-form + + steps var over all the elements of the list produced by evaluating +list-form. + + (loop for i in (list 10 20 30 40) collect i) ==> (10 20 30 40) + + Occasionally this clause is supplemented with a by phrase, which +specifies a function to use to move down the list. The default is CDR +but can be any function that takes a list and returns a sublist. For +instance, you could collect every other element of a list with a loop +like this: + + (loop for i in (list 10 20 30 40) by #'cddr collect i) ==> (10 30) + + An on prepositional phrase is used to step var over the cons cells +that make up a list. + + (loop for x on (list 10 20 30) collect x) ==> ((10 20 30) (20 30) (30)) + + This phrase too can take a by preposition: + + (loop for x on (list 10 20 30 40) by #'cddr collect x) ==> ((10 20 30 40) (30 40)) + + Looping over the elements of a vector (which includes strings and bit +vectors) is similar to looping over the elements of a list except the +preposition across is used instead of in. (1) For instance: + + (loop for x across "abcd" collect x) ==> (#\a #\b #\c #\d) + + Iterating over a hash table or package is slightly more complicated +because hash tables and packages have different sets of values you might +want to iterate over-the keys or values in a hash table and the +different kinds of symbols in a package. Both kinds of iteration follow +the same pattern. The basic pattern looks like this: + + (loop for var being the things in hash-or-package ...) + + For hash tables, the possible values for things are hash-keys and +hash-values, which cause var to be bound to successive values of either +the keys or the values of the hash table. The hash-or-package form is +evaluated once to produce a value, which must be a hash table. + + To iterate over a package, things can be symbols, present-symbols, +and external-symbols, which cause var to be bound to each of the symbols +accessible in a package, each of the symbols present in a package (in +other words, interned or imported into that package), or each of the +symbols that have been exported from the package. The hash-or-package +form is evaluated to produce the name of a package, which is looked up +as if by FIND-PACKAGE or a package object. Synonyms are also available +for parts of the for clause. In place of the, you can use each; you can +use of instead of in; and you can write the things in the singular (for +example, hash-key or symbol). + + Finally, since you'll often want both the keys and the values when +iterating over a hash table, the hash table clauses support a using +subclause at the end of the hash table clause. + + (loop for k being the hash-keys in h using (hash-value v) ...) + (loop for v being the hash-values in h using (hash-key k) ...) + + Both of these loops will bind k to each key in the hash table and v +to the corresponding value. Note that the first element of the using +subclause must be in the singular form. (2) + + ---------- Footnotes ---------- + + (1) You may wonder why LOOP can't figure out whether it's looping +over a list or a vector without needing different prepositions. This is +another consequence of LOOP being a macro: the value of the list or +vector won't be known until runtime, but LOOP, as a macro, has to +generate code at compile time. And LOOP's designers wanted it to +generate extremely efficient code. To be able to generate efficient +code for looping across, say, a vector, it needs to know at compile time +that the value will be a vector at runtime-thus, the different +prepositions are needed. + + (2) Don't ask me why LOOP's authors chickened out on the +no-parentheses style for the using subclause. + + +File: pcl.info, Node: 22-5, Next: 22-6, Prev: 22-4, Up: Chapter 22 + +Equals-Then Iteration +===================== + +If none of the other for clauses supports exactly the form of variable +stepping you need, you can take complete control over stepping with an +equals-then clause. This clause is similar to the binding clauses in a +DO loop but cast in a more Algolish syntax. The template is as follows: + + (loop for var = initial-value-form [ then step-form ] ...) + + As usual, var is the name of the variable to be stepped. Its initial +value is obtained by evaluating initial-value-form once before the first +iteration. In each subsequent iteration, step-form is evaluated, and +its value becomes the new value of var. With no then part to the +clause, the initial-value-form is reevaluated on each iteration to +provide the new value. Note that this is different from a DO binding +clause with no step form. + + The step-form can refer to other loop variables, including variables +created by other for clauses later in the loop. For instance: + + (loop repeat 5 + for x = 0 then y + for y = 1 then (+ x y) + collect y) ==> (1 2 4 8 16) + + However, note that each for clause is evaluated separately in the +order it appears. So in the previous loop, on the second iteration x is +set to the value of y before y changes (in other words, 1). But y is +then set to the sum of its old value (still 1) and the new value of x. +If the order of the for clauses is reversed, the results change. + + (loop repeat 5 + for y = 1 then (+ x y) + for x = 0 then y + collect y) ==> (1 1 2 4 8) + + Often, however, you'll want the step forms for multiple variables to +be evaluated before any of the variables is given its new value (similar +to how DO steps its variables). In that case, you can join multiple for +clauses by replacing all but the first for with and. You saw this +formulation already in the LOOP version of the Fibonacci computation in +Chapter 7. Here's another variant, based on the two previous examples: + + (loop repeat 5 + for x = 0 then y + and y = 1 then (+ x y) + collect y) ==> (1 1 2 3 5) + + +File: pcl.info, Node: 22-6, Next: 22-7, Prev: 22-5, Up: Chapter 22 + +Local Variables +=============== + +While the main variables needed within a loop are usually declared +implicitly in for clauses, sometimes you'll need auxiliary variables, +which you can declare with with clauses. + + with var [ = value-form ] + + The name var becomes the name of a local variable that will cease to +exist when the loop finishes. If the with clause contains an = +value-form part, the variable will be initialized, before the first +iteration of the loop, to the value of value-form. + + Multiple with clauses can appear in a loop; each clause is evaluated +independently in the order it appears and the value is assigned before +proceeding to the next clause, allowing later variables to depend on the +value of already declared variables. Mutually independent variables can +be declared in one with clause with an and between each declaration. + + +File: pcl.info, Node: 22-7, Next: 22-8, Prev: 22-6, Up: Chapter 22 + +Destructuring Variables +======================= + +One handy feature of LOOP I haven't mentioned yet is the ability to +destructure list values assigned to loop variables. This lets you take +apart the value of lists that would otherwise be assigned to a loop +variable, similar to the way DESTRUCTURING-BIND works but a bit less +elaborate. Basically, you can replace any loop variable in a for or +with clause with a tree of symbols, and the list value that would have +been assigned to the simple variable will instead be destructured into +variables named by the symbols in the tree. A simple example looks like +this: + + CL-USER> (loop for (a b) in '((1 2) (3 4) (5 6)) + do (format t "a: ~a; b: ~a~%" a b)) + a: 1; b: 2 + a: 3; b: 4 + a: 5; b: 6 + NIL + + The tree can also include dotted lists, in which case the name after +the dot acts like a &rest parameter, being bound to a list containing +any remaining elements of the list. This is particularly handy with +for/on loops since the value is always a list. For instance, this LOOP +(which I used in Chapter 18 to emit a comma-delimited list): + + (loop for cons on list + do (format t "~a" (car cons)) + when (cdr cons) do (format t ", ")) + + could also be written like this: + + (loop for (item . rest) on list + do (format t "~a" item) + when rest do (format t ", ")) + + If you want to ignore a value in the destructured list, you can use +NIL in place of a variable name. + + (loop for (a nil) in '((1 2) (3 4) (5 6)) collect a) ==> (1 3 5) + + If the destructuring list contains more variables than there are +values in the list, the extra variables are set to NIL, making all the +variables essentially like &optional parameters. There isn't, however, +any equivalent to &key parameters. + + +File: pcl.info, Node: 22-8, Next: 22-9, Prev: 22-7, Up: Chapter 22 + +Value Accumulation +================== + +The value accumulation clauses are perhaps the most powerful part of +LOOP. While the iteration control clauses provide a concise syntax for +expressing the basic mechanics of looping, they aren't dramatically +different from the equivalent mechanisms provided by DO, DOLIST, and +DOTIMES. + + The value accumulation clauses, on the other hand, provide a concise +notation for a handful of common loop idioms having to do with +accumulating values while looping. Each accumulation clause starts with +a verb and follows this pattern: + + verb form [ into var ] + + Each time through the loop, an accumulation clause evaluates form and +saves the value in a manner determined by the verb. With an into +subclause, the value is saved into the variable named by var. The +variable is local to the loop, as if it'd been declared in a with +clause. With no into subclause, the accumulation clause instead +accumulates a default value for the whole loop expression. + + The possible verbs are collect, append, nconc, count, sum, maximize, +and minimize. Also available as synonyms are the present participle +forms: collecting, appending, nconcing, counting, summing, maximizing, +and minimizing. + + A collect clause builds up a list containing all the values of form +in the order they're seen. This is a particularly useful construct +because the code you'd have to write to collect a list in order as +efficiently as LOOP does is more painful than you'd normally write by +hand. (1) Related to collect are the verbs append and nconc. These +verbs also accumulate values into a list, but they join the values, +which must be lists, into a single list as if by the functions APPEND or +NCONC. (2) + + The remaining accumulation clauses are used to accumulate numeric +values. The verb count counts the number of times form is true, sum +collects a running total of the values of form, maximize collects the +largest value seen for form, and minimize collects the smallest. For +instance, suppose you define a variable *random* that contains a list of +random numbers. + + (defparameter *random* (loop repeat 100 collect (random 10000))) + + Then the following loop will return a list containing various summary +information about the numbers: + + (loop for i in *random* + counting (evenp i) into evens + counting (oddp i) into odds + summing i into total + maximizing i into max + minimizing i into min + finally (return (list min max total evens odds))) + + ---------- Footnotes ---------- + + (1) The trick is to keep ahold of the tail of the list and add new +cons cells by SETFing the CDR of the tail. A handwritten equivalent of +the code generated by (loop for i upto 10 collect i) would look like +this: + + (do ((list nil) (tail nil) (i 0 (1+ i))) + ((> i 10) list) + (let ((new (cons i nil))) + (if (null list) + (setf list new) + (setf (cdr tail) new)) + (setf tail new))) + + Of course you'll rarely, if ever, write code like that. You'll use +either LOOP or (if, for some reason, you don't want to use LOOP) the +standard PUSH/NREVERSE idiom for collecting values. + + (2) Recall that NCONC is the destructive version of APPEND-it's safe +to use an nconc clause only if the values you're collecting are fresh +lists that don't share any structure with other lists. For instance, +this is safe: + + (loop for i upto 3 nconc (list i i)) ==> (0 0 1 1 2 2 3 3) + + But this will get you into trouble: + + (loop for i on (list 1 2 3) nconc i) ==> undefined + + The later will most likely get into an infinite loop as the various +parts of the list produced by (list 1 2 3) are destructively modified to +point to each other. But even that's not guaranteed-the behavior is +simply undefined. + + +File: pcl.info, Node: 22-9, Next: 22-10, Prev: 22-8, Up: Chapter 22 + +Unconditional Execution +======================= + +As useful as the value accumulation constructs are, LOOP wouldn't be a +very good general-purpose iteration facility if there wasn't a way to +execute arbitrary Lisp code in the loop body. + + The simplest way to execute arbitrary code within a loop body is with +a do clause. Compared to the clauses I've described so far, with their +prepositions and subclauses, do is a model of Yodaesque simplicity. (1) +A do clause consists of the word do (or doing) followed by one or more +Lisp forms that are all evaluated when the do clause is. The do clause +ends at the closing parenthesis of the loop or the next loop keyword. + + For instance, to print the numbers from one to ten, you could write +this: + + (loop for i from 1 to 10 do (print i)) + + Another, more dramatic, form of immediate execution is a return +clause. This clause consists of the word return followed by a single +Lisp form, which is evaluated, with the resulting value immediately +returned as the value of the loop. + + You can also break out of a loop in a do clause using any of Lisp's +normal control flow operators, such as RETURN and RETURN-FROM. Note that +a return clause always returns from the immediately enclosing LOOP +expression, while a RETURN or RETURN-FROM in a do clause can return from +any enclosing expression. For instance, compare the following: + + (block outer + (loop for i from 0 return 100) ; 100 returned from LOOP + (print "This will print") + 200) ==> 200 + + to this: + + (block outer + (loop for i from 0 do (return-from outer 100)) ; 100 returned from BLOCK + (print "This won't print") + 200) ==> 100 + + The do and return clauses are collectively called the unconditional +execution clauses. + + ---------- Footnotes ---------- + + (1) "No! Try not. Do . . . or do not. There is no try." - +Yoda, The Empire Strikes Back + + +File: pcl.info, Node: 22-10, Next: 22-11, Prev: 22-9, Up: Chapter 22 + +Conditional Execution +===================== + +Because a do clause can contain arbitrary Lisp forms, you can use any +Lisp expressions you want, including control constructs such as IF and +WHEN. So, the following is one way to write a loop that prints only the +even numbers between one and ten: + + (loop for i from 1 to 10 do (when (evenp i) (print i))) + + However, sometimes you'll want conditional control at the level of +loop clauses. For instance, suppose you wanted to sum only the even +numbers between one and ten using a summing clause. You couldn't write +such a loop with a do clause because there'd be no way to "call" the sum +i in the middle of a regular Lisp form. In cases like this, you need to +use one of LOOP's own conditional expressions like this: + + (loop for i from 1 to 10 when (evenp i) sum i) ==> 30 + + LOOP provides three conditional constructs, and they all follow this +basic pattern: + + conditional test-form loop-clause + + The conditional can be if, when, or unless. The test-form is any +regular Lisp form, and loop-clause can be a value accumulation clause +(count, collect, and so on), an unconditional execution clause, or +another conditional execution clause. Multiple loop clauses can be +attached to a single conditional by joining them with and. + + As an extra bit of syntactic sugar, within the first loop clause, +after the test form, you can use the variable it to refer to the value +returned by the test form. For instance, the following loop collects +the non-NIL values found in some-hash when looking up the keys in +some-list: + + (loop for key in some-list when (gethash key some-hash) collect it) + + A conditional clause is executed each time through the loop. An if +or when clause executes its loop-clause if test-form evaluates to true. +An unless reverses the test, executing loop-clause only when test-form +is NIL. Unlike their Common Lisp namesakes, LOOP's if and when are +merely synonyms-there's no difference in their behavior. + + All three conditional clauses can also take an else branch, which is +followed by another loop clause or multiple clauses joined by and. When +conditional clauses are nested, the set of clauses connected to an inner +conditional clause can be closed with the word end. The end is optional +when not needed to disambiguate a nested conditional-the end of a +conditional clause will be inferred from the end of the loop or the +start of another clause not joined by and. + + The following rather silly loop demonstrates the various forms of +LOOP conditionals. The update-analysis function will be called each +time through the loop with the latest values of the various variables +accumulated by the clauses within the conditionals. + + (loop for i from 1 to 100 + if (evenp i) + minimize i into min-even and + maximize i into max-even and + unless (zerop (mod i 4)) + sum i into even-not-fours-total + end + and sum i into even-total + else + minimize i into min-odd and + maximize i into max-odd and + when (zerop (mod i 5)) + sum i into fives-total + end + and sum i into odd-total + do (update-analysis min-even + max-even + min-odd + max-odd + even-total + odd-total + fives-total + even-not-fours-total)) + + +File: pcl.info, Node: 22-11, Next: 22-12, Prev: 22-10, Up: Chapter 22 + +Setting Up and Tearing Down +=========================== + +One of the key insights the designers of the LOOP language had about +actual loops "in the wild" is that the loop proper is often preceded by +a bit of code to set things up and then followed by some more code that +does something with the values computed by the loop. A trivial example, +in Perl, (1) might look like this: + + my $evens_sum = 0; + my $odds_sum = 0; + foreach my $i (@list_of_numbers) { + if ($i % 2) { + $odds_sum += $i; + } else { + $evens_sum += $i; + } + } + if ($evens_sum > $odds_sum) { + print "Sum of evens greater\n"; + } else { + print "Sum of odds greater\n"; + } + + The loop proper in this code is the foreach statement. But the +foreach loop doesn't stand on its own: the code in the loop body refers +to variables declared in the two lines before the loop. (2) And the +work the loop does is all for naught without the if statement after the +loop that actually reports the results. In Common Lisp, of course, the +LOOP construct is an expression that returns a value, so there's even +more often a need to do something after the loop proper, namely, +generate the return value. + + So, said the LOOP designers, let's give a way to include the code +that's really part of the loop in the loop itself. Thus, LOOP provides +two keywords, initially and finally, that introduce code to be run +outside the loop's main body. + + After the initially or finally, these clauses consist of all the Lisp +forms up to the start of the next loop clause or the end of the loop. +All the initially forms are combined into a single prologue, which runs +once, immediately after all the local loop variables are initialized and +before the body of the loop. The finally forms are similarly combined +into a epilogue to be run after the last iteration of the loop body. +Both the prologue and epilogue code can refer to local loop variables. + + The prologue is always run, even if the loop body iterates zero +times. The loop can return without running the epilogue if any of the +following happens: + + * A return clause executes. + * RETURN , RETURN-FROM, or another transfer of control construct is + called from within a Lisp form within the body. (3) + * The loop is terminated by an always, never, or thereis clause, as + I'll discuss in the next section. + + Within the epilogue code, RETURN or RETURN-FROM can be used to +explicitly provide a return value for the loop. Such an explicit return +value will take precedence over any value that might otherwise be +provided by an accumulation or termination test clause. + + To allow RETURN-FROM to be used to return from a specific loop +(useful when nesting LOOP expressions), you can name a LOOP with the +loop keyword named. If a named clause appears in a loop, it must be the +first clause. For a simple example, assume lists is a list of lists and +you want to find an item that matches some criteria in one of those +nested lists. You could find it with a pair of nested loops like this: + + (loop named outer for list in lists do + (loop for item in list do + (if (what-i-am-looking-for-p item) + (return-from outer item)))) + + ---------- Footnotes ---------- + + (1) I'm not picking on Perl here-this example would look pretty much +the same in any language that bases its syntax on C's. + + (2) Perl would let you get away with not declaring those variables if +your program didn't use strict. But you should always use strict in +Perl. The equivalent code in Python, Java, or C would always require +the variables to be declared. + + (3) You can cause a loop to finish normally, running the epilogue, +from Lisp code executed as part of the loop body with the local macro +LOOP-FINISH. + + +File: pcl.info, Node: 22-12, Next: 22-13, Prev: 22-11, Up: Chapter 22 + +Termination Tests +================= + +While the for and repeat clauses provide the basic infrastructure for +controlling the number of iterations, sometimes you'll need to break out +of a loop early. You've already seen how a return clause or a RETURN or +RETURN-FROM within a do clause can immediately terminate the loop; but +just as there are common patterns for accumulating values, there are +also common patterns for deciding when it's time to bail on a loop. +These patterns are supported in LOOP by the termination clauses, while, +until, always, never, and thereis. They all follow the same pattern. + + loop-keyword test-form + + All five evaluate test-form each time through the iteration and +decide, based on the resulting value, whether to terminate the loop. +They differ in what happens after they terminate the loop-if they do-and +how they decide. + + The loop keywords while and until introduce the "mild" termination +clauses. When they decide to terminate the loop, control passes to the +epilogue, skipping the rest of the loop body. The epilogue can then +return a value or do whatever it wants to finish the loop. A while +clause terminates the loop the first time the test form is false; until, +conversely, stops it the first time the test form is true. + + Another form of mild termination is provided by the LOOP-FINISH +macro. This is a regular Lisp form, not a loop clause, so it can be +used anywhere within the Lisp forms of a do clause. It also causes an +immediate jump to the loop epilogue. It can be useful when the decision +to break out of the loop can't be easily condensed into a single form +that can be used with a while or until clause. + + The other three clauses-always, never, and thereis-terminate the loop +with extreme prejudice; they immediately return from the loop, skipping +not only any subsequent loop clauses but also the epilogue. They also +provide a default value for the loop even when they don't cause the loop +to terminate. However, if the loop is not terminated by one of these +termination tests, the epilogue is run and can return a value other than +the default provided by the termination clauses. + + Because these clauses provide their own return values, they can't be +combined with accumulation clauses unless the accumulation clause has an +into subclause. The compiler (or interpreter) should signal an error at +compile time if they are.The always and never clauses return only +boolean values, so they're most useful when you need to use a loop +expression as part of a predicate. You can use always to check that the +test form is true on every iteration of the loop. Conversely, never +tests that the test form evaluates to NIL on every iteration. If the +test form fails (returning NIL in an always clause or non-NIL in a never +clause), the loop is immediately terminated, returning NIL. If the loop +runs to completion, the default value of T is provided. + + For instance, if you want to test that all the numbers in a list, +numbers, are even, you can write this: + + (if (loop for n in numbers always (evenp n)) + (print "All numbers even.")) + + Equivalently you could write the following: + + (if (loop for n in numbers never (oddp n)) + (print "All numbers even.")) + + A thereis clause is used to test whether the test form is ever true. +As soon as the test form returns a non-NIL value, the loop is +terminated, returning that value. If the loop runs to completion, the +thereis clause provides a default return value of NIL. + + (loop for char across "abc123" thereis (digit-char-p char)) ==> 1 + + (loop for char across "abcdef" thereis (digit-char-p char)) ==> NIL + + +File: pcl.info, Node: 22-13, Next: Chapter 23, Prev: 22-12, Up: Chapter 22 + +Putting It All Together +======================= + +Now you've seen all the main features of the LOOP facility. You can +combine any of the clauses I've discussed as long as you abide by the +following rules: + + * The named clause, if any, must be the first clause. + * After the named clause come all the initially, with, for, and + repeat clauses. + * Then comes the body clauses: conditional and unconditional + execution, accumulation, and termination test. (1) + * End with any finally clauses. + + The LOOP macro will expand into code that performs the following +actions: + + * Initializes all local loop variables as declared with with or for + clauses as well as those implicitly created by accumulation + clauses. The initial value forms are evaluated in the order the + clauses appear in the loop. + * Execute the forms provided by any initially clauses-the prologue-in + the order they appear in the loop. + * Iterate, executing the body of the loop as described in the next + paragraph. + * Execute the forms provided by any finally clauses-the epilogue-in + the order they appear in the loop. + + While the loop is iterating, the body is executed by first stepping +any iteration control variables and then executing any conditional or +unconditional execution, accumulation, or termination test clauses in +the order they appear in the loop code. If any of the clauses in the +loop body terminate the loop, the rest of the body is skipped and the +loop returns, possibly after running the epilogue. + + And that's pretty much all there is to it. (2) You'll use LOOP +fairly often in the code later in this book, so it's worth having some +knowledge of it. Beyond that, it's up to you how much you use it. + + And with that, you're ready to dive into the practical chapters that +make up the rest of the book-up first, writing a spam filter. + + ---------- Footnotes ---------- + + (1) Some Common Lisp implementations will let you get away with +mixing body clauses and for clauses, but that's strictly undefined, and +some implementations will reject such loops. + + (2) The one aspect of LOOP I haven't touched on at all is the syntax +for declaring the types of loop variables. Of course, I haven't +discussed type declarations outside of LOOP either. I'll cover the +general topic a bit in Chapter 32. For information on how they work +with LOOP, consult your favorite Common Lisp reference. + + +File: pcl.info, Node: Chapter 23, Next: Chapter 24, Prev: Chapter 22, Up: Top + +23. Practical: A Spam Filter +============================ + +In 2002 Paul Graham, having some time on his hands after selling Viaweb +to Yahoo, wrote the essay "A Plan for Spam" (1) that launched a minor +revolution in spam-filtering technology. Prior to Graham's article, +most spam filters were written in terms of handcrafted rules: if a +message has XXX in the subject, it's probably a spam; if a message has a +more than three or more words in a row in ALL CAPITAL LETTERS, it's +probably a spam. Graham spent several months trying to write such a +rule-based filter before realizing it was fundamentally a soul-sucking +task. + + To recognize individual spam features you have to try to get into + the mind of the spammer, and frankly I want to spend as little time + inside the minds of spammers as possible. + + To avoid having to think like a spammer, Graham decided to try +distinguishing spam from nonspam, a.k.a. ham, based on statistics +gathered about which words occur in which kinds of e-mails. The filter +would keep track of how often specific words appear in both spam and ham +messages and then use the frequencies associated with the words in a new +message to compute a probability that it was either spam or ham. He +called his approach Bayesian filtering after the statistical technique +that he used to combine the individual word frequencies into an overall +probability. (2) + +* Menu: + +* 23-1:: The Heart of a Spam Filter +* 23-2:: Training the Filter +* 23-3:: Per-Word Statistics +* 23-4:: Combining Probabilities +* 23-5:: Inverse Chi Square +* 23-6:: Training the Filter +* 23-7:: Testing the Filter +* 23-8:: A Couple of Utility Functions +* 23-9:: Analyzing the Results +* 23-10:: What's Next + + ---------- Footnotes ---------- + + (1) Available at http://www.paulgraham.com/spam.html and also in +Hackers & Painters: Big Ideas from the Computer Age (O'Reilly, 2004) + + (2) There has since been some disagreement over whether the technique +Graham described was actually "Bayesian." However, the name has stuck +and is well on its way to becoming a synonym for "statistical" when +talking about spam filters. + + +File: pcl.info, Node: 23-1, Next: 23-2, Prev: Chapter 23, Up: Chapter 23 + +The Heart of a Spam Filter +========================== + +In this chapter, you'll implement the core of a spam-filtering engine. +You won't write a soup-to-nuts spam-filtering application; rather, +you'll focus on the functions for classifying new messages and training +the filter. + + This application is going to be large enough that it's worth defining +a new package to avoid name conflicts. For instance, in the source code +you can download from this book's Web site, I use the package name +COM.GIGAMONKEYS.SPAM, defining a package that uses both the standard +COMMON-LISP package and the COM.GIGAMONKEYS.PATHNAMES package from +Chapter 15, like this: + + (defpackage :com.gigamonkeys.spam + (:use :common-lisp :com.gigamonkeys.pathnames)) + + Any file containing code for this application should start with this +line: + + (in-package :com.gigamonkeys.spam) + + You can use the same package name or replace com.gigamonkeys with +some domain you control. (1) + + You can also type this same form at the REPL to switch to this +package to test the functions you write. In SLIME this will change the +prompt from CL-USER> to SPAM> like this: + + CL-USER> (in-package :com.gigamonkeys.spam) + #<The COM.GIGAMONKEYS.SPAM package> + SPAM> + + Once you have a package defined, you can start on the actual code. +The main function you'll need to implement has a simple job-take the +text of a message as an argument and classify the message as spam, ham, +or unsure. You can easily implement this basic function by defining it +in terms of other functions that you'll write in a moment. + + (defun classify (text) + (classification (score (extract-features text)))) + + Reading from the inside out, the first step in classifying a message +is to extract features to pass to the score function. In score you'll +compute a value that can then be translated into one of three +classifications-spam, ham, or unsure-by the function classification. Of +the three functions, classification is the simplest. You can assume +score will return a value near 1 if the message is a spam, near 0 if +it's a ham, and near .5 if it's unclear. + + Thus, you can implement classification like this: + + (defparameter *max-ham-score* .4) + (defparameter *min-spam-score* .6) + + (defun classification (score) + (cond + ((<= score *max-ham-score*) 'ham) + ((>= score *min-spam-score*) 'spam) + (t 'unsure))) + + The extract-features function is almost as straightforward, though it +requires a bit more code. For the moment, the features you'll extract +will be the words appearing in the text. For each word, you need to +keep track of the number of times it has been seen in a spam and the +number of times it has been seen in a ham. A convenient way to keep +those pieces of data together with the word itself is to define a class, +word-feature, with three slots. + + (defclass word-feature () + ((word + :initarg :word + :accessor word + :initform (error "Must supply :word") + :documentation "The word this feature represents.") + (spam-count + :initarg :spam-count + :accessor spam-count + :initform 0 + :documentation "Number of spams we have seen this feature in.") + (ham-count + :initarg :ham-count + :accessor ham-count + :initform 0 + :documentation "Number of hams we have seen this feature in."))) + + You'll keep the database of features in a hash table so you can +easily find the object representing a given feature. You can define a +special variable, *feature-database*, to hold a reference to this hash +table. + + (defvar *feature-database* (make-hash-table :test #'equal)) + + You should use DEFVAR rather than DEFPARAMETER because you don't want +*feature-database* to be reset if you happen to reload the file +containing this definition during development-you might have data stored +in *feature-database* that you don't want to lose. Of course, that +means if you do want to clear out the feature database, you can't just +reevaluate the DEFVAR form. So you should define a function +clear-database. + + (defun clear-database () + (setf *feature-database* (make-hash-table :test #'equal))) + + To find the features present in a given message, the code will need +to extract the individual words and then look up the corresponding +word-feature object in *feature-database*. If *feature-database* +contains no such feature, it'll need to create a new word-feature to +represent the word. You can encapsulate that bit of logic in a +function, intern-feature, that takes a word and returns the appropriate +feature, creating it if necessary. + + (defun intern-feature (word) + (or (gethash word *feature-database*) + (setf (gethash word *feature-database*) + (make-instance 'word-feature :word word)))) + + You can extract the individual words from the message text using a +regular expression. For example, using the Common Lisp Portable +Perl-Compatible Regular Expression (CL-PPCRE) library written by Edi +Weitz, you can write extract-words like this: (2) + + (defun extract-words (text) + (delete-duplicates + (cl-ppcre:all-matches-as-strings "[a-zA-Z]{3,}" text) + :test #'string=)) + + Now all that remains to implement extract-features is to put +extract-features and intern-feature together. Since extract-words +returns a list of strings and you want a list with each string +translated to the corresponding word-feature, this is a perfect time to +use MAPCAR. + + (defun extract-features (text) + (mapcar #'intern-feature (extract-words text))) + + You can test these functions at the REPL like this: + + SPAM> (extract-words "foo bar baz") + ("foo" "bar" "baz") + + And you can make sure the DELETE-DUPLICATES is working like this: + + SPAM> (extract-words "foo bar baz foo bar") + ("baz" "foo" "bar") + + You can also test extract-features. + + SPAM> (extract-features "foo bar baz foo bar") + (#<WORD-FEATURE #x71ef28da> #<WORD-FEATURE #x71e3809a> + #<WORD-FEATURE #x71ef28aa>) + + However, as you can see, the default method for printing arbitrary +objects isn't very informative. As you work on this program, it'll be +useful to be able to print word-feature objects in a less opaque way. +Luckily, as I mentioned in Chapter 17, the printing of all objects is +implemented in terms of a generic function PRINT-OBJECT, so to change +the way word-feature objects are printed, you just need to define a +method on PRINT-OBJECT that specializes on word-feature. To make +implementing such methods easier, Common Lisp provides the macro +PRINT-UNREADABLE-OBJECT. (3) + + The basic form of PRINT-UNREADABLE-OBJECT is as follows: + + (print-unreadable-object (object stream-variable &key type identity) + body-form*) + + The object argument is an expression that evaluates to the object to +be printed. Within the body of PRINT-UNREADABLE-OBJECT, stream-variable +is bound to a stream to which you can print anything you want. Whatever +you print to that stream will be output by PRINT-UNREADABLE-OBJECT and +enclosed in the standard syntax for unreadable objects, #<>. (4) + + PRINT-UNREADABLE-OBJECT also lets you include the type of the object +and an indication of the object's identity via the keyword parameters +type and identity. If they're non-NIL, the output will start with the +name of the object's class and end with an indication of the object's +identity similar to what's printed by the default PRINT-OBJECT method +for STANDARD-OBJECTs. For word-feature, you probably want to define a +PRINT-OBJECT method that includes the type but not the identity along +with the values of the word, ham-count, and spam-count slots. Such a +method would look like this: + + (defmethod print-object ((object word-feature) stream) + (print-unreadable-object (object stream :type t) + (with-slots (word ham-count spam-count) object + (format stream "~s :hams ~d :spams ~d" word ham-count spam-count)))) + + Now when you test extract-features at the REPL, you can see more +clearly what features are being extracted. + + SPAM> (extract-features "foo bar baz foo bar") + (#<WORD-FEATURE "baz" :hams 0 :spams 0> + #<WORD-FEATURE "foo" :hams 0 :spams 0> + #<WORD-FEATURE "bar" :hams 0 :spams 0>) + + ---------- Footnotes ---------- + + (1) It would, however, be poor form to distribute a version of this +application using a package starting with com.gigamonkeys since you +don't control that domain. + + (2) A version of CL-PPCRE is included with the book's source code +available from the book's Web site. Or you can download it from Weitz's +site at http://www.weitz.de/cl-ppcre/. + + (3) The main reason to use PRINT-UNREADABLE-OBJECT is that it takes +care of signaling the appropriate error if someone tries to print your +object readably, such as with the ~S FORMAT directive. + + (4) PRINT-UNREADABLE-OBJECT also signals an error if it's used when +the printer control variable *PRINT-READABLY* is true. Thus, a +PRINT-OBJECT method consisting solely of a PRINT-UNREADABLE-OBJECT form +will correctly implement the PRINT-OBJECT contract with regard to +*PRINT-READABLY*. + + +File: pcl.info, Node: 23-2, Next: 23-3, Prev: 23-1, Up: Chapter 23 + +Training the Filter +=================== + +Now that you have a way to keep track of individual features, you're +almost ready to implement score. But first you need to write the code +you'll use to train the spam filter so score will have some data to use. +You'll define a function, train, that takes some text and a symbol +indicating what kind of message it is-ham or spam-and that increments +either the ham count or the spam count of all the features present in +the text as well as a global count of hams or spams processed. Again, +you can take a top-down approach and implement it in terms of other +functions that don't yet exist. + + (defun train (text type) + (dolist (feature (extract-features text)) + (increment-count feature type)) + (increment-total-count type)) + + You've already written extract-features, so next up is +increment-count, which takes a word-feature and a message type and +increments the appropriate slot of the feature. Since there's no reason +to think that the logic of incrementing these counts is going to change +for different kinds of objects, you can write this as a regular +function. (1) Because you defined both ham-count and spam-count with an +:accessor option, you can use INCF and the accessor functions created by +DEFCLASS to increment the appropriate slot. + + (defun increment-count (feature type) + (ecase type + (ham (incf (ham-count feature))) + (spam (incf (spam-count feature))))) + + The ECASE construct is a variant of CASE, both of which are similar +to case statements in Algol-derived languages (renamed switch in C and +its progeny). They both evaluate their first argument-the key form-and +then find the clause whose first element-the key-is the same value +according to EQL. In this case, that means the variable type is +evaluated, yielding whatever value was passed as the second argument to +increment-count. + + The keys aren't evaluated. In other words, the value of type will be +compared to the literal objects read by the Lisp reader as part of the +ECASE form. In this function, that means the keys are the symbols ham +and spam, not the values of any variables named ham and spam. So, if +increment-count is called like this: + + (increment-count some-feature 'ham) + + the value of type will be the symbol ham, and the first branch of the +ECASE will be evaluated and the feature's ham count incremented. On the +other hand, if it's called like this: + + (increment-count some-feature 'spam) + + then the second branch will run, incrementing the spam count. Note +that the symbols ham and spam are quoted when calling increment-count +since otherwise they'd be evaluated as the names of variables. But +they're not quoted when they appear in ECASE since ECASE doesn't +evaluate the keys. (2) + + The E in ECASE stands for "exhaustive" or "error," meaning ECASE +should signal an error if the key value is anything other than one of +the keys listed. The regular CASE is looser, returning NIL if no +matching clause is found. + + To implement increment-total-count, you need to decide where to store +the counts; for the moment, two more special variables, *total-spams* +and *total-hams*, will do fine. + + (defvar *total-spams* 0) + (defvar *total-hams* 0) + + (defun increment-total-count (type) + (ecase type + (ham (incf *total-hams*)) + (spam (incf *total-spams*)))) + + You should use DEFVAR to define these two variables for the same +reason you used it with *feature-database*-they'll hold data built up +while you run the program that you don't necessarily want to throw away +just because you happen to reload your code during development. But +you'll want to reset those variables if you ever reset +*feature-database*, so you should add a few lines to clear-database as +shown here: + + (defun clear-database () + (setf + *feature-database* (make-hash-table :test #'equal) + *total-spams* 0 + *total-hams* 0)) + + ---------- Footnotes ---------- + + (1) If you decide later that you do need to have different versions +of increment-feature for different classes, you can redefine +increment-count as a generic function and this function as a method +specialized on word-feature. + + (2) Technically, the key in each clause of a CASE or ECASE is +interpreted as a list designator, an object that designates a list of +objects. A single nonlist object, treated as a list designator, +designates a list containing just that one object, while a list +designates itself. Thus, each clause can have multiple keys; CASE and +ECASE will select the clause whose list of keys contains the value of +the key form. For example, if you wanted to make good a synonym for ham +and bad a synonym for spam, you could write increment-count like this: + + (defun increment-count (feature type) + (ecase type + ((ham good) (incf (ham-count feature))) + ((spam bad) (incf (spam-count feature))))) + + +File: pcl.info, Node: 23-3, Next: 23-4, Prev: 23-2, Up: Chapter 23 + +Per-Word Statistics +=================== + +The heart of a statistical spam filter is, of course, the functions that +compute statistics-based probabilities. The mathematical nuances (1) of +why exactly these computations work are beyond the scope of this +book-interested readers may want to refer to several papers by Gary +Robinson. (2) I'll focus rather on how they're implemented. + + The starting point for the statistical computations is the set of +measured values-the frequencies stored in *feature-database*, +*total-spams*, and *total-hams*. Assuming that the set of messages +trained on is statistically representative, you can treat the observed +frequencies as probabilities of the same features showing up in hams and +spams in future messages. + + The basic plan is to classify a message by extracting the features it +contains, computing the individual probability that a given message +containing the feature is a spam, and then combining all the individual +probabilities into a total score for the message. Messages with many +"spammy" features and few "hammy" features will receive a score near 1, +and messages with many hammy features and few spammy features will score +near 0. + + The first statistical function you need is one that computes the +basic probability that a message containing a given feature is a spam. +By one point of view, the probability that a given message containing +the feature is a spam is the ratio of spam messages containing the +feature to all messages containing the feature. Thus, you could compute +it this way: + + (defun spam-probability (feature) + (with-slots (spam-count ham-count) feature + (/ spam-count (+ spam-count ham-count)))) + + The problem with the value computed by this function is that it's +strongly affected by the overall probability that any message will be a +spam or a ham. For instance, suppose you get nine times as much ham as +spam in general. A completely neutral feature will then appear in one +spam for every nine hams, giving you a spam probability of 1/10 +according to this function. + + But you're more interested in the probability that a given feature +will appear in a spam message, independent of the overall probability of +getting a spam or ham. Thus, you need to divide the spam count by the +total number of spams trained on and the ham count by the total number +of hams. To avoid division-by-zero errors, if either of *total-spams* +or *total-hams* is zero, you should treat the corresponding frequency as +zero. (Obviously, if the total number of either spams or hams is zero, +then the corresponding per-feature count will also be zero, so you can +treat the resulting frequency as zero without ill effect.) + + (defun spam-probability (feature) + (with-slots (spam-count ham-count) feature + (let ((spam-frequency (/ spam-count (max 1 *total-spams*))) + (ham-frequency (/ ham-count (max 1 *total-hams*)))) + (/ spam-frequency (+ spam-frequency ham-frequency))))) + + This version suffers from another problem-it doesn't take into +account the number of messages analyzed to arrive at the per-word +probabilities. Suppose you've trained on 2,000 messages, half spam and +half ham. Now consider two features that have appeared only in spams. +One has appeared in all 1,000 spams, while the other appeared only once. +According to the current definition of spam-probability, the appearance +of either feature predicts that a message is spam with equal +probability, namely, 1. + + However, it's still quite possible that the feature that has appeared +only once is actually a neutral feature-it's obviously rare in either +spams or hams, appearing only once in 2,000 messages. If you trained on +another 2,000 messages, it might very well appear one more time, this +time in a ham, making it suddenly a neutral feature with a spam +probability of .5. + + So it seems you might like to compute a probability that somehow +factors in the number of data points that go into each feature's +probability. In his papers, Robinson suggested a function based on the +Bayesian notion of incorporating observed data into prior knowledge or +assumptions. Basically, you calculate a new probability by starting +with an assumed prior probability and a weight to give that assumed +probability before adding new information. Robinson's function is this: + + (defun bayesian-spam-probability (feature &optional + (assumed-probability 1/2) + (weight 1)) + (let ((basic-probability (spam-probability feature)) + (data-points (+ (spam-count feature) (ham-count feature)))) + (/ (+ (* weight assumed-probability) + (* data-points basic-probability)) + (+ weight data-points)))) + + Robinson suggests values of 1/2 for assumed-probability and 1 for +weight. Using those values, a feature that has appeared in one spam and +no hams has a bayesian-spam-probability of 0.75, a feature that has +appeared in 10 spams and no hams has a bayesian-spam-probability of +approximately 0.955, and one that has matched in 1,000 spams and no hams +has a spam probability of approximately 0.9995. + + ---------- Footnotes ---------- + + (1) Speaking of mathematical nuances, hard-core statisticians may be +offended by the sometimes loose use of the word probability in this +chapter. However, since even the pros, who are divided between the +Bayesians and the frequentists, can't agree on what a probability is, +I'm not going to worry about it. This is a book about programming, not +statistics. + + (2) Robinson's articles that directly informed this chapter are "A +Statistical Approach to the Spam Problem" (published in the Linux +Journal and available at http://www.linuxjournal.com/ +article.php?sid=6467 and in a shorter form on Robinson's blog at +http://radio.weblogs.com/ 0101454/stories/2002/09/16/spamDetection.html) +and "Why Chi? Motivations for the Use of Fisher's Inverse Chi-Square +Procedure in Spam Classification" (available at +http://garyrob.blogs.com/ whychi93.pdf). Another article that may be +useful is "Handling Redundancy in Email Token Probabilities" (available +at http://garyrob.blogs.com//handlingtokenredundancy94.pdf). The +archived mailing lists of the SpamBayes project +(http://spambayes.sourceforge.net/) also contain a lot of useful +information about different algorithms and approaches to testing spam +filters. + + +File: pcl.info, Node: 23-4, Next: 23-5, Prev: 23-3, Up: Chapter 23 + +Combining Probabilities +======================= + +Now that you can compute the bayesian-spam-probability of each +individual feature you find in a message, the last step in implementing +the score function is to find a way to combine a bunch of individual +probabilities into a single value between 0 and 1. + + If the individual feature probabilities were independent, then it'd +be mathematically sound to multiply them together to get a combined +probability. But it's unlikely they actually are independent-certain +features are likely to appear together, while others never do. (1) + + Robinson proposed using a method for combining probabilities invented +by the statistician R. A. Fisher. Without going into the details of +exactly why his technique works, it's this: First you combine the +probabilities by multiplying them together. This gives you a number +nearer to 0 the more low probabilities there were in the original set. +Then take the log of that number and multiply by -2. Fisher showed in +1950 that if the individual probabilities were independent and drawn +from a uniform distribution between 0 and 1, then the resulting value +would be on a chi-square distribution. This value and twice the number +of probabilities can be fed into an inverse chi-square function, and +it'll return the probability that reflects the likelihood of obtaining a +value that large or larger by combining the same number of randomly +selected probabilities. When the inverse chi-square function returns a +low probability, it means there was a disproportionate number of low +probabilities (either a lot of relatively low probabilities or a few +very low probabilities) in the individual probabilities. + + To use this probability in determining whether a given message is a +spam, you start with a null hypothesis, a straw man you hope to knock +down. The null hypothesis is that the message being classified is in +fact just a random collection of features. If it were, then the +individual probabilities-the likelihood that each feature would appear +in a spam-would also be random. That is, a random selection of features +would usually contain some features with a high probability of appearing +in spam and other features with a low probability of appearing in spam. +If you were to combine these randomly selected probabilities according +to Fisher's method, you should get a middling combined value, which the +inverse chi-square function will tell you is quite likely to arise just +by chance, as, in fact, it would have. But if the inverse chi-square +function returns a very low probability, it means it's unlikely the +probabilities that went into the combined value were selected at random; +there were too many low probabilities for that to be likely. So you can +reject the null hypothesis and instead adopt the alternative hypothesis +that the features involved were drawn from a biased sample-one with few +high spam probability features and many low spam probability features. +In other words, it must be a ham message. + + However, the Fisher method isn't symmetrical since the inverse +chi-square function returns the probability that a given number of +randomly selected probabilities would combine to a value as large or +larger than the one you got by combining the actual probabilities. This +asymmetry works to your advantage because when you reject the null +hypothesis, you know what the more likely hypothesis is. When you +combine the individual spam probabilities via the Fisher method, and it +tells you there's a high probability that the null hypothesis is +wrong-that the message isn't a random collection of words-then it means +it's likely the message is a ham. The number returned is, if not +literally the probability that the message is a ham, at least a good +measure of its "hamminess." Conversely, the Fisher combination of the +individual ham probabilities gives you a measure of the message's +"spamminess." + + To get a final score, you need to combine those two measures into a +single number that gives you a combined hamminess-spamminess score +ranging from 0 to 1. The method recommended by Robinson is to add half +the difference between the hamminess and spamminess scores to 1/2, in +other words, to average the spamminess and 1 minus the hamminess. This +has the nice effect that when the two scores agree (high spamminess and +low hamminess, or vice versa) you'll end up with a strong indicator near +either 0 or 1. But when the spamminess and hamminess scores are both +high or both low, then you'll end up with a final value near 1/2, which +you can treat as an "uncertain" classification. + + The score function that implements this scheme looks like this: + + (defun score (features) + (let ((spam-probs ()) (ham-probs ()) (number-of-probs 0)) + (dolist (feature features) + (unless (untrained-p feature) + (let ((spam-prob (float (bayesian-spam-probability feature) 0.0d0))) + (push spam-prob spam-probs) + (push (- 1.0d0 spam-prob) ham-probs) + (incf number-of-probs)))) + (let ((h (- 1 (fisher spam-probs number-of-probs))) + (s (- 1 (fisher ham-probs number-of-probs)))) + (/ (+ (- 1 h) s) 2.0d0)))) + + You take a list of features and loop over them, building up two lists +of probabilities, one listing the probabilities that a message +containing each feature is a spam and the other that a message +containing each feature is a ham. As an optimization, you can also +count the number of probabilities while looping over them and pass the +count to fisher to avoid having to count them again in fisher itself. +The value returned by fisher will be low if the individual probabilities +contained too many low probabilities to have come from random text. +Thus, a low fisher score for the spam probabilities means there were +many hammy features; subtracting that score from 1 gives you a +probability that the message is a ham. Conversely, subtracting the +fisher score for the ham probabilities gives you the probability that +the message was a spam. Combining those two probabilities gives you an +overall spamminess score between 0 and 1. + + Within the loop, you can use the function untrained-p to skip +features extracted from the message that were never seen during +training. These features will have spam counts and ham counts of zero. +The untrained-p function is trivial. + + (defun untrained-p (feature) + (with-slots (spam-count ham-count) feature + (and (zerop spam-count) (zerop ham-count)))) + + The only other new function is fisher itself. Assuming you already +had an inverse-chi-square function, fisher is conceptually simple. + + (defun fisher (probs number-of-probs) + "The Fisher computation described by Robinson." + (inverse-chi-square + (* -2 (log (reduce #'* probs))) + (* 2 number-of-probs))) + + Unfortunately, there's a small problem with this straightforward +implementation. While using REDUCE is a concise and idiomatic way of +multiplying a list of numbers, in this particular application there's a +danger the product will be too small a number to be represented as a +floating-point number. In that case, the result will underflow to zero. +And if the product of the probabilities underflows, all bets are off +because taking the LOG of zero will either signal an error or, in some +implementation, result in a special negative-infinity value, which will +render all subsequent calculations essentially meaningless. This is +particularly unfortunate in this function because the Fisher method is +most sensitive when the input probabilities are low-near zero-and +therefore in the most danger of causing the multiplication to underflow. + + Luckily, you can use a bit of high-school math to avoid this problem. +Recall that the log of a product is the same as the sum of the logs of +the factors. So instead of multiplying all the probabilities and then +taking the log, you can sum the logs of each probability. And since +REDUCE takes a :key keyword parameter, you can use it to perform the +whole calculation. Instead of this: + + (log (reduce #'* probs)) + + write this: + + (reduce #'+ probs :key #'log) + + ---------- Footnotes ---------- + + (1) Techniques that combine nonindependent probabilities as though +they were, in fact, independent, are called naive Bayesian. Graham's +original proposal was essentially a naive Bayesian classifier with some +"empirically derived" constant factors thrown in. + + +File: pcl.info, Node: 23-5, Next: 23-6, Prev: 23-4, Up: Chapter 23 + +Inverse Chi Square +================== + +The implementation of inverse-chi-square in this section is a fairly +straightforward translation of a version written in Python by Robinson. +The exact mathematical meaning of this function is beyond the scope of +this book, but you can get an intuitive sense of what it does by +thinking about how the values you pass to fisher will affect the result: +the more low probabilities you pass to fisher, the smaller the product +of the probabilities will be. The log of a small product will be a +negative number with a large absolute value, which is then multiplied by +-2, making it an even larger positive number. Thus, the more low +probabilities were passed to fisher, the larger the value it'll pass to +inverse-chi-square. Of course, the number of probabilities involved +also affects the value passed to inverse-chi-square. Since +probabilities are, by definition, less than or equal to 1, the more +probabilities that go into a product, the smaller it'll be and the +larger the value passed to inverse-chi-square. Thus, inverse-chi-square +should return a low probability when the Fisher combined value is +abnormally large for the number of probabilities that went into it. The +following function does exactly that: + + (defun inverse-chi-square (value degrees-of-freedom) + (assert (evenp degrees-of-freedom)) + (min + (loop with m = (/ value 2) + for i below (/ degrees-of-freedom 2) + for prob = (exp (- m)) then (* prob (/ m i)) + summing prob) + 1.0)) + + Recall from Chapter 10 that EXP raises e to the argument given. +Thus, the larger value is, the smaller the initial value of prob will +be. But that initial value will then be adjusted upward slightly for +each degree of freedom as long as m is greater than the number of +degrees of freedom. Since the value returned by inverse-chi-square is +supposed to be another probability, it's important to clamp the value +returned with MIN since rounding errors in the multiplication and +exponentiation may cause the LOOP to return a sum just a shade over 1. + + +File: pcl.info, Node: 23-6, Next: 23-7, Prev: 23-5, Up: Chapter 23 + +Training the Filter +=================== + +Since you wrote classify and train to take a string argument, you can +test them easily at the REPL. If you haven't yet, you should switch to +the package in which you've been writing this code by evaluating an +IN-PACKAGE form at the REPL or using the SLIME shortcut change-package. +To use the SLIME shortcut, type a comma at the REPL and then type the +name at the prompt. Pressing Tab while typing the package name will +autocomplete based on the packages your Lisp knows about. Now you can +invoke any of the functions that are part of the spam application. You +should first make sure the database is empty. + + SPAM> (clear-database) + + Now you can train the filter with some text. + + SPAM> (train "Make money fast" 'spam) + + And then see what the classifier thinks. + + SPAM> (classify "Make money fast") + SPAM + SPAM> (classify "Want to go to the movies?") + UNSURE + + While ultimately all you care about is the classification, it'd be +nice to be able to see the raw score too. The easiest way to get both +values without disturbing any other code is to change classification to +return multiple values. + + (defun classification (score) + (values + (cond + ((<= score *max-ham-score*) 'ham) + ((>= score *min-spam-score*) 'spam) + (t 'unsure)) + score)) + + You can make this change and then recompile just this one function. +Because classify returns whatever classification returns, it'll also now +return two values. But since the primary return value is the same, +callers of either function who expect only one value won't be affected. +Now when you test classify, you can see exactly what score went into the +classification. + + SPAM> (classify "Make money fast") + SPAM + 0.863677101854273D0 + SPAM> (classify "Want to go to the movies?") + UNSURE + 0.5D0 + + And now you can see what happens if you train the filter with some +more ham text. + + SPAM> (train "Do you have any money for the movies?" 'ham) + 1 + SPAM> (classify "Make money fast") + SPAM + 0.7685351219857626D0 + + It's still spam but a bit less certain since money was seen in ham +text. + + SPAM> (classify "Want to go to the movies?") + HAM + 0.17482223132078922D0 + + And now this is clearly recognizable ham thanks to the presence of +the word movies, now a hammy feature. + + However, you don't really want to train the filter by hand. What +you'd really like is an easy way to point it at a bunch of files and +train it on them. And if you want to test how well the filter actually +works, you'd like to then use it to classify another set of files of +known types and see how it does. So the last bit of code you'll write +in this chapter will be a test harness that tests the filter on a corpus +of messages of known types, using a certain fraction for training and +then measuring how accurate the filter is when classifying the +remainder. + + +File: pcl.info, Node: 23-7, Next: 23-8, Prev: 23-6, Up: Chapter 23 + +Testing the Filter +================== + +To test the filter, you need a corpus of messages of known types. You +can use messages lying around in your inbox, or you can grab one of the +corpora available on the Web. For instance, the SpamAssassin corpus (1) +contains several thousand messages hand classified as spam, easy ham, +and hard ham. To make it easy to use whatever files you have, you can +define a test rig that's driven off an array of file/type pairs. You +can define a function that takes a filename and a type and adds it to +the corpus like this: + + (defun add-file-to-corpus (filename type corpus) + (vector-push-extend (list filename type) corpus)) + + The value of corpus should be an adjustable vector with a fill +pointer. For instance, you can make a new corpus like this: + + (defparameter *corpus* (make-array 1000 :adjustable t :fill-pointer 0)) + + If you have the hams and spams already segregated into separate +directories, you might want to add all the files in a directory as the +same type. This function, which uses the list-directory function from +Chapter 15, will do the trick: + + (defun add-directory-to-corpus (dir type corpus) + (dolist (filename (list-directory dir)) + (add-file-to-corpus filename type corpus))) + + For instance, suppose you have a directory mail containing two +subdirectories, spam and ham, each containing messages of the indicated +type; you can add all the files in those two directories to *corpus* +like this: + + SPAM> (add-directory-to-corpus "mail/spam/" 'spam *corpus*) + NIL + SPAM> (add-directory-to-corpus "mail/ham/" 'ham *corpus*) + NIL + + Now you need a function to test the classifier. The basic strategy +will be to select a random chunk of the corpus to train on and then test +the corpus by classifying the remainder of the corpus, comparing the +classification returned by the classify function to the known +classification. The main thing you want to know is how accurate the +classifier is-what percentage of the messages are classified correctly? +But you'll probably also be interested in what messages were +misclassified and in what direction-were there more false positives or +more false negatives? To make it easy to perform different analyses of +the classifier's behavior, you should define the testing functions to +build a list of raw results, which you can then analyze however you +like. + + The main testing function might look like this: + + (defun test-classifier (corpus testing-fraction) + (clear-database) + (let* ((shuffled (shuffle-vector corpus)) + (size (length corpus)) + (train-on (floor (* size (- 1 testing-fraction))))) + (train-from-corpus shuffled :start 0 :end train-on) + (test-from-corpus shuffled :start train-on))) + + This function starts by clearing out the feature database. (2) Then +it shuffles the corpus, using a function you'll implement in a moment, +and figures out, based on the testing-fraction parameter, how many +messages it'll train on and how many it'll reserve for testing. The two +helper functions train-from-corpus and test-from-corpus will both take +:start and :end keyword parameters, allowing them to operate on a +subsequence of the given corpus. + + The train-from-corpus function is quite simple-simply loop over the +appropriate part of the corpus, use DESTRUCTURING-BIND to extract the +filename and type from the list found in each element, and then pass the +text of the named file and the type to train. Since some mail messages, +such as those with attachments, are quite large, you should limit the +number of characters it'll take from the message. It'll obtain the text +with a function start-of-file, which you'll implement in a moment, that +takes a filename and a maximum number of characters to return. +train-from-corpus looks like this: + + (defparameter *max-chars* (* 10 1024)) + + (defun train-from-corpus (corpus &key (start 0) end) + (loop for idx from start below (or end (length corpus)) do + (destructuring-bind (file type) (aref corpus idx) + (train (start-of-file file *max-chars*) type)))) + + The test-from-corpus function is similar except you want to return a +list containing the results of each classification so you can analyze +them after the fact. Thus, you should capture both the classification +and score returned by classify and then collect a list of the filename, +the actual type, the type returned by classify, and the score. To make +the results more human readable, you can include keywords in the list to +indicate which values are which. + + (defun test-from-corpus (corpus &key (start 0) end) + (loop for idx from start below (or end (length corpus)) collect + (destructuring-bind (file type) (aref corpus idx) + (multiple-value-bind (classification score) + (classify (start-of-file file *max-chars*)) + (list + :file file + :type type + :classification classification + :score score))))) + + ---------- Footnotes ---------- + + (1) Several spam corpora including the SpamAssassin corpus are linked +to from http://nexp.cs.pdx.edu/~psam/cgi-bin/view/PSAM/CorpusSets. + + (2) If you wanted to conduct a test without disturbing the existing +database, you could bind *feature-database*, *total-spams*, and +*total-hams* with a LET, but then you'd have no way of looking at the +database after the fact-unless you returned the values you used within +the function. + + +File: pcl.info, Node: 23-8, Next: 23-9, Prev: 23-7, Up: Chapter 23 + +A Couple of Utility Functions +============================= + +To finish the implementation of test-classifier, you need to write the +two utility functions that don't really have anything particularly to do +with spam filtering, shuffle-vector and start-of-file. + + An easy and efficient way to implement shuffle-vector is using the +Fisher-Yates algorithm. (1) You can start by implementing a function, +nshuffle-vector, that shuffles a vector in place. This name follows the +same naming convention of other destructive functions such as NCONC and +NREVERSE. It looks like this: + + (defun nshuffle-vector (vector) + (loop for idx downfrom (1- (length vector)) to 1 + for other = (random (1+ idx)) + do (unless (= idx other) + (rotatef (aref vector idx) (aref vector other)))) + vector) + + The nondestructive version simply makes a copy of the original vector +and passes it to the destructive version. + + (defun shuffle-vector (vector) + (nshuffle-vector (copy-seq vector))) + + The other utility function, start-of-file, is almost as +straightforward with just one wrinkle. The most efficient way to read +the contents of a file into memory is to create an array of the +appropriate size and use READ-SEQUENCE to fill it in. So it might seem +you could make a character array that's either the size of the file or +the maximum number of characters you want to read, whichever is smaller. +Unfortunately, as I mentioned in Chapter 14, the function FILE-LENGTH +isn't entirely well defined when dealing with character streams since +the number of characters encoded in a file can depend on both the +character encoding used and the particular text in the file. In the +worst case, the only way to get an accurate measure of the number of +characters in a file is to actually read the whole file. Thus, it's +ambiguous what FILE-LENGTH should do when passed a character stream; in +most implementations, FILE-LENGTH always returns the number of octets in +the file, which may be greater than the number of characters that can be +read from the file. + + However, READ-SEQUENCE returns the number of characters actually +read. So, you can attempt to read the number of characters reported by +FILE-LENGTH and return a substring if the actual number of characters +read was smaller. + + (defun start-of-file (file max-chars) + (with-open-file (in file) + (let* ((length (min (file-length in) max-chars)) + (text (make-string length)) + (read (read-sequence text in))) + (if (< read length) + (subseq text 0 read) + text)))) + + ---------- Footnotes ---------- + + (1) This algorithm is named for the same Fisher who invented the +method used for combining probabilities and for Frank Yates, his +coauthor of the book Statistical Tables for Biological, Agricultural and +Medical Research (Oliver & Boyd, 1938) in which, according to Knuth, +they provided the first published description of the algorithm. + + +File: pcl.info, Node: 23-9, Next: 23-10, Prev: 23-8, Up: Chapter 23 + +Analyzing the Results +===================== + +Now you're ready to write some code to analyze the results generated by +test-classifier. Recall that test-classifier returns the list returned +by test-from-corpus in which each element is a plist representing the +result of classifying one file. This plist contains the name of the +file, the actual type of the file, the classification, and the score +returned by classify. The first bit of analytical code you should write +is a function that returns a symbol indicating whether a given result +was correct, a false positive, a false negative, a missed ham, or a +missed spam. You can use DESTRUCTURING-BIND to pull out the :type and +:classification elements of an individual result list (using +&allow-other-keys to tell DESTRUCTURING-BIND to ignore any other +key/value pairs it sees) and then use nested ECASE to translate the +different pairings into a single symbol. + + (defun result-type (result) + (destructuring-bind (&key type classification &allow-other-keys) result + (ecase type + (ham + (ecase classification + (ham 'correct) + (spam 'false-positive) + (unsure 'missed-ham))) + (spam + (ecase classification + (ham 'false-negative) + (spam 'correct) + (unsure 'missed-spam)))))) + + You can test out this function at the REPL. + + SPAM> (result-type '(:FILE #p"foo" :type ham :classification ham :score 0)) + CORRECT + SPAM> (result-type '(:FILE #p"foo" :type spam :classification spam :score 0)) + CORRECT + SPAM> (result-type '(:FILE #p"foo" :type ham :classification spam :score 0)) + FALSE-POSITIVE + SPAM> (result-type '(:FILE #p"foo" :type spam :classification ham :score 0)) + FALSE-NEGATIVE + SPAM> (result-type '(:FILE #p"foo" :type ham :classification unsure :score 0)) + MISSED-HAM + SPAM> (result-type '(:FILE #p"foo" :type spam :classification unsure :score 0)) + MISSED-SPAM + + Having this function makes it easy to slice and dice the results of +test-classifier in a variety of ways. For instance, you can start by +defining predicate functions for each type of result. + + (defun false-positive-p (result) + (eql (result-type result) 'false-positive)) + + (defun false-negative-p (result) + (eql (result-type result) 'false-negative)) + + (defun missed-ham-p (result) + (eql (result-type result) 'missed-ham)) + + (defun missed-spam-p (result) + (eql (result-type result) 'missed-spam)) + + (defun correct-p (result) + (eql (result-type result) 'correct)) + + With those functions, you can easily use the list and sequence +manipulation functions I discussed in Chapter 11 to extract and count +particular kinds of results. + + SPAM> (count-if #'false-positive-p *results*) + 6 + SPAM> (remove-if-not #'false-positive-p *results*) + ((:FILE #p"ham/5349" :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.9999983107355541d0) + (:FILE #p"ham/2746" :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.6286468956619795d0) + (:FILE #p"ham/3427" :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.9833753501352983d0) + (:FILE #p"ham/7785" :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.9542788587998488d0) + (:FILE #p"ham/1728" :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.684339162891261d0) + (:FILE #p"ham/10581" :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.9999924537959615d0)) + + You can also use the symbols returned by result-type as keys into a +hash table or an alist. For instance, you can write a function to print +a summary of the counts and percentages of each type of result using an +alist that maps each type plus the extra symbol total to a count. + + (defun analyze-results (results) + (let* ((keys '(total correct false-positive + false-negative missed-ham missed-spam)) + (counts (loop for x in keys collect (cons x 0)))) + (dolist (item results) + (incf (cdr (assoc 'total counts))) + (incf (cdr (assoc (result-type item) counts)))) + (loop with total = (cdr (assoc 'total counts)) + for (label . count) in counts + do (format t "~&~@(~a~):~20t~5d~,5t: ~6,2f%~%" + label count (* 100 (/ count total)))))) + + This function will give output like this when passed a list of +results generated by test-classifier: + + SPAM> (analyze-results *results*) + Total: 3761 : 100.00% + Correct: 3689 : 98.09% + False-positive: 4 : 0.11% + False-negative: 9 : 0.24% + Missed-ham: 19 : 0.51% + Missed-spam: 40 : 1.06% + NIL + + And as a last bit of analysis you might want to look at why an +individual message was classified the way it was. The following +functions will show you: + + (defun explain-classification (file) + (let* ((text (start-of-file file *max-chars*)) + (features (extract-features text)) + (score (score features)) + (classification (classification score))) + (show-summary file text classification score) + (dolist (feature (sorted-interesting features)) + (show-feature feature)))) + + (defun show-summary (file text classification score) + (format t "~&~a" file) + (format t "~2%~a~2%" text) + (format t "Classified as ~a with score of ~,5f~%" classification score)) + + (defun show-feature (feature) + (with-slots (word ham-count spam-count) feature + (format + t "~&~2t~a~30thams: ~5d; spams: ~5d;~,10tprob: ~,f~%" + word ham-count spam-count (bayesian-spam-probability feature)))) + + (defun sorted-interesting (features) + (sort (remove-if #'untrained-p features) #'< :key #'bayesian-spam-probability)) + + +File: pcl.info, Node: 23-10, Next: Chapter 24, Prev: 23-9, Up: Chapter 23 + +What's Next +=========== + +Obviously, you could do a lot more with this code. To turn it into a +real spam-filtering application, you'd need to find a way to integrate +it into your normal e-mail infrastructure. One approach that would make +it easy to integrate with almost any e-mail client is to write a bit of +code to act as a POP3 proxy-that's the protocol most e-mail clients use +to fetch mail from mail servers. Such a proxy would fetch mail from +your real POP3 server and serve it to your mail client after either +tagging spam with a header that your e-mail client's filters can easily +recognize or simply putting it aside. Of course, you'd also need a way +to communicate with the filter about misclassifications-as long as +you're setting it up as a server, you could also provide a Web +interface. I'll talk about how to write Web interfaces in Chapter 26, +and you'll build one, for a different application, in Chapter 29. + + Or you might want to work on improving the basic classification-a +likely place to start is to make extract-features more sophisticated. +In particular, you could make the tokenizer smarter about the internal +structure of e-mail-you could extract different kinds of features for +words appearing in the body versus the message headers. And you could +decode various kinds of message encoding such as base 64 and quoted +printable since spammers often try to obfuscate their message with those +encodings. + + But I'll leave those improvements to you. Now you're ready to head +down the path of building a streaming MP3 server, starting by writing a +general-purpose library for parsing binary files. + + +File: pcl.info, Node: Chapter 24, Next: Chapter 25, Prev: Chapter 23, Up: Top + +24. Practical: Parsing Binary Files +=================================== + +In this chapter I'll show you how to build a library that you can use to +write code for reading and writing binary files. You'll use this +library in Chapter 25 to write a parser for ID3 tags, the mechanism used +to store metadata such as artist and album names in MP3 files. This +library is also an example of how to use macros to extend the language +with new constructs, turning it into a special-purpose language for +solving a particular problem, in this case reading and writing binary +data. Because you'll develop the library a bit at a time, including +several partial versions, it may seem you're writing a lot of code. But +when all is said and done, the whole library is fewer than 150 lines of +code, and the longest macro is only 20 lines long. + +* Menu: + +* 24-1:: Binary Files +* 24-2:: Binary Format Basics +* 24-3:: Strings in Binary Files +* 24-4:: Composite Structures +* 24-5:: Designing the Macros +* 24-6:: Making the Dream a Reality +* 24-7:: Reading Binary Objects +* 24-8:: Writing Binary Objects +* 24-9:: Adding Inheritance and Tagged Structures +* 24-10:: Keeping Track of Inherited Slots +* 24-11:: Tagged Structures +* 24-12:: Primitive Binary Types +* 24-13:: The Current Object Stack + + +File: pcl.info, Node: 24-1, Next: 24-2, Prev: Chapter 24, Up: Chapter 24 + +Binary Files +============ + +At a sufficiently low level of abstraction, all files are "binary" in +the sense that they just contain a bunch of numbers encoded in binary +form. However, it's customary to distinguish between text files, where +all the numbers can be interpreted as characters representing +human-readable text, and binary files, which contain data that, if +interpreted as characters, yields nonprintable characters. (1) + + Binary file formats are usually designed to be both compact and +efficient to parse-that's their main advantage over text-based formats. +To meet both those criteria, they're usually composed of on-disk +structures that are easily mapped to data structures that a program +might use to represent the same data in memory. (2) + + The library will give you an easy way to define the mapping between +the on-disk structures defined by a binary file format and in-memory +Lisp objects. Using the library, it should be easy to write a program +that can read a binary file, translating it into Lisp objects that you +can manipulate, and then write back out to another properly formatted +binary file. + + ---------- Footnotes ---------- + + (1) In ASCII, the first 32 characters are nonprinting control +characters originally used to control the behavior of a Teletype +machine, causing it to do such things as sound the bell, back up one +character, move to a new line, and move the carriage to the beginning of +the line. Of these 32 control characters, only three, the newline, +carriage return, and horizontal tab, are typically found in text files. + + (2) Some binary file formats are in-memory data structures-on many +operating systems it's possible to map a file into memory, and low-level +languages such as C can then treat the region of memory containing the +contents of the file just like any other memory; data written to that +area of memory is saved to the underlying file when it's unmapped. +However, these formats are platform-dependent since the in-memory +representation of even such simple data types as integers depends on the +hardware on which the program is running. Thus, any file format that's +intended to be portable must define a canonical representation for all +the data types it uses that can be mapped to the actual in-memory data +representation on a particular kind of machine or in a particular +language. + + +File: pcl.info, Node: 24-2, Next: 24-3, Prev: 24-1, Up: Chapter 24 + +Binary Format Basics +==================== + +The starting point for reading and writing binary files is to open the +file for reading or writing individual bytes. As I discussed in Chapter +14, both OPEN and WITH-OPEN-FILE accept a keyword argument, +:element-type, that controls the basic unit of transfer for the stream. +When you're dealing with binary files, you'll specify (unsigned-byte 8). +An input stream opened with such an :element-type will return an integer +between 0 and 255 each time it's passed to READ-BYTE. Conversely, you +can write bytes to an (unsigned-byte 8) output stream by passing numbers +between 0 and 255 to WRITE-BYTE. + + Above the level of individual bytes, most binary formats use a +smallish number of primitive data types-numbers encoded in various ways, +textual strings, bit fields, and so on-which are then composed into more +complex structures. So your first task is to define a framework for +writing code to read and write the primitive data types used by a given +binary format. + + To take a simple example, suppose you're dealing with a binary format +that uses an unsigned 16-bit integer as a primitive data type. To read +such an integer, you need to read the two bytes and then combine them +into a single number by multiplying one byte by 256, a.k.a. 2^8, and +adding it to the other byte. For instance, assuming the binary format +specifies that such 16-bit quantities are stored in big-endian (1) form, +with the most significant byte first, you can read such a number with +this function: + + (defun read-u2 (in) + (+ (* (read-byte in) 256) (read-byte in))) + + However, Common Lisp provides a more convenient way to perform this +kind of bit twiddling. The function LDB, whose name stands for load +byte, can be used to extract and set (with SETF) any number of +contiguous bits from an integer. (2) The number of bits and their +position within the integer is specified with a byte specifier created +with the BYTE function. BYTE takes two arguments, the number of bits to +extract (or set) and the position of the rightmost bit where the least +significant bit is at position zero. LDB takes a byte specifier and the +integer from which to extract the bits and returns the positive integer +represented by the extracted bits. Thus, you can extract the least +significant octet of an integer like this: + + (ldb (byte 8 0) #xabcd) ==> 205 ; 205 is #xcd + + To get the next octet, you'd use a byte specifier of (byte 8 8) like +this: + + (ldb (byte 8 8) #xabcd) ==> 171 ; 171 is #xab + + You can use LDB with SETF to set the specified bits of an integer +stored in a SETFable place. + + CL-USER> (defvar *num* 0) + *NUM* + CL-USER> (setf (ldb (byte 8 0) *num*) 128) + 128 + CL-USER> *num* + 128 + CL-USER> (setf (ldb (byte 8 8) *num*) 255) + 255 + CL-USER> *num* + 65408 + + Thus, you can also write read-u2 like this: (3) + + (defun read-u2 (in) + (let ((u2 0)) + (setf (ldb (byte 8 8) u2) (read-byte in)) + (setf (ldb (byte 8 0) u2) (read-byte in)) + u2)) + + To write a number out as a 16-bit integer, you need to extract the +individual 8-bit bytes and write them one at a time. To extract the +individual bytes, you just need to use LDB with the same byte +specifiers. + + (defun write-u2 (out value) + (write-byte (ldb (byte 8 8) value) out) + (write-byte (ldb (byte 8 0) value) out)) + + Of course, you can also encode integers in many other ways-with +different numbers of bytes, with different endianness, and in signed and +unsigned format. + + ---------- Footnotes ---------- + + (1) The term big-endian and its opposite, little-endian, borrowed +from Jonathan Swift's Gulliver's Travels, refer to the way a multibyte +number is represented in an ordered sequence of bytes such as in memory +or in a file. For instance, the number 43981, or abcd in hex, +represented as a 16-bit quantity, consists of two bytes, ab and cd. It +doesn't matter to a computer in what order these two bytes are stored as +long as everybody agrees. Of course, whenever there's an arbitrary +choice to be made between two equally good options, the one thing you +can be sure of is that everybody is not going to agree. For more than +you ever wanted to know about it, and to see where the terms big-endian +and little-endian were first applied in this fashion, read "On Holy Wars +and a Plea for Peace" by Danny Cohen, available at +http://khavrinen.lcs.mit.edu/wollman/ien-137.txt. + + (2) LDB and DPB, a related function, were named after the DEC PDP-10 +assembly functions that did essentially the same thing. Both functions +operate on integers as if they were represented using twos-complement +format, regardless of the internal representation used by a particular +Common Lisp implementation. + + (3) Common Lisp also provides functions for shifting and masking the +bits of integers in a way that may be more familiar to C and Java +programmers. For instance, you could write read-u2 yet a third way, +using those functions, like this: + + (defun read-u2 (in) + (logior (ash (read-byte in) 8) (read-byte in))) + + which would be roughly equivalent to this Java method: + + public int readU2 (InputStream in) throws IOException { + return (in.read() << 8) | (in.read()); + } + + The names LOGIOR and ASH are short for LOGical Inclusive OR and +Arithmetic SHift. ASH shifts an integer a given number of bits to the +left when its second argument is positive or to the right if the second +argument is negative. LOGIOR combines integers by logically oring each +bit. Another function, LOGAND, performs a bitwise and, which can be +used to mask off certain bits. However, for the kinds of bit twiddling +you'll need to do in this chapter and the next, LDB and BYTE will be +both more convenient and more idiomatic Common Lisp style. + + +File: pcl.info, Node: 24-3, Next: 24-4, Prev: 24-2, Up: Chapter 24 + +Strings in Binary Files +======================= + +Textual strings are another kind of primitive data type you'll find in +many binary formats. When you read files one byte at a time, you can't +read and write strings directly-you need to decode and encode them one +byte at a time, just as you do with binary-encoded numbers. And just as +you can encode an integer in several ways, you can encode a string in +many ways. To start with, the binary format must specify how individual +characters are encoded. + + To translate bytes to characters, you need to know both what +character code and what character encoding you're using. A character +code defines a mapping from positive integers to characters. Each +number in the mapping is called a code point. For instance, ASCII is a +character code that maps the numbers from 0-127 to particular characters +used in the Latin alphabet. A character encoding, on the other hand, +defines how the code points are represented as a sequence of bytes in a +byte-oriented medium such as a file. For codes that use eight or fewer +bits, such as ASCII and ISO-8859-1, the encoding is trivial-each numeric +value is encoded as a single byte. + + Nearly as straightforward are pure double-byte encodings, such as +UCS-2, which map between 16-bit values and characters. The only reason +double-byte encodings can be more complex than single-byte encodings is +that you may also need to know whether the 16-bit values are supposed to +be encoded in big-endian or little-endian format. + + Variable-width encodings use different numbers of octets for +different numeric values, making them more complex but allowing them to +be more compact in many cases. For instance, UTF-8, an encoding +designed for use with the Unicode character code, uses a single octet to +encode the values 0-127 while using up to four octets to encode values +up to 1,114,111. (1) + + Since the code points from 0-127 map to the same characters in +Unicode as they do in ASCII, a UTF-8 encoding of text consisting only of +characters also in ASCII is the same as the ASCII encoding. On the +other hand, texts consisting mostly of characters requiring four bytes +in UTF-8 could be more compactly encoded in a straight double-byte +encoding. + + Common Lisp provides two functions for translating between numeric +character codes and character objects: CODE-CHAR, which takes an numeric +code and returns as a character, and CHAR-CODE, which takes a character +and returns its numeric code. The language standard doesn't specify +what character encoding an implementation must use, so there's no +guarantee you can represent every character that can possibly be encoded +in a given file format as a Lisp character. However, almost all +contemporary Common Lisp implementations use ASCII, ISO-8859-1, or +Unicode as their native character code. Because Unicode is a superset +ofISO-8859-1, which is in turn a superset of ASCII, if you're using a +Unicode Lisp, CODE-CHAR and CHAR-CODE can be used directly for +translating any of those three character codes. (2) + + In addition to specifying a character encoding, a string encoding +must also specify how to encode the length of the string. Three +techniques are typically used in binary file formats. + + The simplest is to not encode it but to let it be implicit in the +position of the string in some larger structure: a particular element of +a file may always be a string of a certain length, or a string may be +the last element of a variable-length data structure whose overall size +determines how many bytes are left to read as string data. Both these +techniques are used in ID3 tags, as you'll see in the next chapter. + + The other two techniques can be used to encode variable-length +strings without relying on context. One is to encode the length of the +string followed by the character data-the parser reads an integer value +(in some specified integer format) and then reads that number of +characters. Another is to write the character data followed by a +delimiter that can't appear in the string such as a null character. + + The different representations have different advantages and +disadvantages, but when you're dealing with already specified binary +formats, you won't have any control over which encoding is used. +However, none of the encodings is particularly more difficult to read +and write than any other. Here, as an example, is a function that reads +a null-terminated ASCII string, assuming your Lisp implementation uses +ASCII or one of its supersets such as ISO-8859-1 or full Unicode as its +native character encoding: + + (defconstant +null+ (code-char 0)) + + (defun read-null-terminated-ascii (in) + (with-output-to-string (s) + (loop for char = (code-char (read-byte in)) + until (char= char +null+) do (write-char char s)))) + + The WITH-OUTPUT-TO-STRING macro, which I mentioned in Chapter 14, is +an easy way to build up a string when you don't know how long it'll be. +It creates a STRING-STREAM and binds it to the variable name specified, +s in this case. All characters written to the stream are collected into +a string, which is then returned as the value of the +WITH-OUTPUT-TO-STRING form. + + To write a string back out, you just need to translate the characters +back to numeric values that can be written with WRITE-BYTE and then +write the null terminator after the string contents. + + (defun write-null-terminated-ascii (string out) + (loop for char across string + do (write-byte (char-code char) out)) + (write-byte (char-code +null+) out)) + + As these examples show, the main intellectual challenge-such as it +is-of reading and writing primitive elements of binary files is +understanding how exactly to interpret the bytes that appear in a file +and to map them to Lisp data types. If a binary file format is well +specified, this should be a straightforward proposition. Actually +writing functions to read and write a particular encoding is, as they +say, a simple matter of programming. + + Now you can turn to the issue of reading and writing more complex +on-disk structures and how to map them to Lisp objects. + + ---------- Footnotes ---------- + + (1) Originally, UTF-8 was designed to represent a 31-bit character +code and used up to six bytes per code point. However, the maximum +Unicode code point is #x10ffff, so a UTF-8 encoding of Unicode requires +at most four bytes per code point. + + (2) If you need to parse a file format that uses other character +codes, or if you need to parse files containing arbitrary Unicode +strings using a non-Unicode-Common-Lisp implementation, you can always +represent such strings in memory as vectors of integer code points. +They won't be Lisp strings, so you won't be able to manipulate or +compare them with the string functions, but you'll still be able to do +anything with them that you can with arbitrary vectors. + + +File: pcl.info, Node: 24-4, Next: 24-5, Prev: 24-3, Up: Chapter 24 + +Composite Structures +==================== + +Since binary formats are usually used to represent data in a way that +makes it easy to map to in-memory data structures, it should come as no +surprise that composite on-disk structures are usually defined in ways +similar to the way programming languages define in-memory structures. +Usually a composite on-disk structure will consist of a number of named +parts, each of which is itself either a primitive type such as a number +or a string, another composite structure, or possibly a collection of +such values. + + For instance, an ID3 tag defined in the 2.2 version of the +specification consists of a header made up of a three-character +ISO-8859-1 string, which is always "ID3"; two one-byte unsigned integers +that specify the major version and revision of the specification; eight +bits worth of boolean flags; and four bytes that encode the size of the +tag in an encoding particular to the ID3 specification. Following the +header is a list of frames, each of which has its own internal +structure. After the frames are as many null bytes as are necessary to +pad the tag out to the size specified in the header. + + If you look at the world through the lens of object orientation, +composite structures look a lot like classes. For instance, you could +write a class to represent an ID3 tag. + + (defclass id3-tag () + ((identifier :initarg :identifier :accessor identifier) + (major-version :initarg :major-version :accessor major-version) + (revision :initarg :revision :accessor revision) + (flags :initarg :flags :accessor flags) + (size :initarg :size :accessor size) + (frames :initarg :frames :accessor frames))) + + An instance of this class would make a perfect repository to hold the +data needed to represent an ID3 tag. You could then write functions to +read and write instances of this class. For example, assuming the +existence of certain other functions for reading the appropriate +primitive data types, a read-id3-tag function might look like this: + + (defun read-id3-tag (in) + (let ((tag (make-instance 'id3-tag))) + (with-slots (identifier major-version revision flags size frames) tag + (setf identifier (read-iso-8859-1-string in :length 3)) + (setf major-version (read-u1 in)) + (setf revision (read-u1 in)) + (setf flags (read-u1 in)) + (setf size (read-id3-encoded-size in)) + (setf frames (read-id3-frames in :tag-size size))) + tag)) + + The write-id3-tag function would be structured similarly-you'd use +the appropriate write-* functions to write out the values stored in the +slots of the id3-tag object. + + It's not hard to see how you could write the appropriate classes to +represent all the composite data structures in a specification along +with read-foo and write-foo functions for each class and for necessary +primitive types. But it's also easy to tell that all the reading and +writing functions are going to be pretty similar, differing only in the +specifics of what types they read and the names of the slots they store +them in. It's particularly irksome when you consider that in the ID3 +specification it takes about four lines of text to specify the structure +of an ID3 tag, while you've already written eighteen lines of code and +haven't even written write-id3-tag yet. + + What you'd really like is a way to describe the structure of +something like an ID3 tag in a form that's as compressed as the +specification's pseudocode yet that can also be expanded into code that +defines the id3-tag class and the functions that translate between bytes +on disk and instances of the class. Sounds like a job for a macro. + + +File: pcl.info, Node: 24-5, Next: 24-6, Prev: 24-4, Up: Chapter 24 + +Designing the Macros +==================== + +Since you already have a rough idea what code your macros will need to +generate, the next step, according to the process for writing a macro I +outlined in Chapter 8, is to switch perspectives and think about what a +call to the macro should look like. Since the goal is to be able to +write something as compressed as the pseudocode in the ID3 +specification, you can start there. The header of an ID3 tag is +specified like this: + + ID3/file identifier "ID3" + ID3 version $02 00 + ID3 flags %xx000000 + ID3 size 4 * %0xxxxxxx + + In the notation of the specification, this means the "file +identifier" slot of an ID3 tag is the string "ID3" in ISO-8859-1 +encoding. The version consists of two bytes, the first of which-for +this version of the specification-has the value 2 and the second of +which-again for this version of the specification-is 0. The flags slot +is eight bits, of which all but the first two are 0, and the size +consists of four bytes, each of which has a 0 in the most significant +bit. + + Some information isn't captured by this pseudocode. For instance, +exactly how the four bytes that encode the size are to be interpreted is +described in a few lines of prose. Likewise, the spec describes in +prose how the frame and subsequent padding is stored after this header. +But most of what you need to know to be able to write code to read and +write an ID3 tag is specified by this pseudocode. Thus, you ought to be +able to write an s-expression version of this pseudocode and have it +expanded into the class and function definitions you'd otherwise have to +write by hand-something, perhaps, like this: + + (define-binary-class id3-tag + ((file-identifier (iso-8859-1-string :length 3)) + (major-version u1) + (revision u1) + (flags u1) + (size id3-tag-size) + (frames (id3-frames :tag-size size)))) + + The basic idea is that this form defines a class id3-tag similar to +the way you could with DEFCLASS, but instead of specifying things such +as :initarg and :accessors, each slot specification consists of the name +of the slot-file-identifier, major-version, and so on-and information +about how that slot is represented on disk. Since this is just a bit of +fantasizing, you don't have to worry about exactly how the macro +define-binary-class will know what to do with expressions such as +(iso-8859-1-string :length 3), u1, id3-tag-size, and (id3-frames +:tag-size size); as long as each expression contains the information +necessary to know how to read and write a particular data encoding, you +should be okay. + + +File: pcl.info, Node: 24-6, Next: 24-7, Prev: 24-5, Up: Chapter 24 + +Making the Dream a Reality +========================== + +Okay, enough fantasizing about good-looking code; now you need to get to +work writing define-binary-class-writing the code that will turn that +concise expression of what an ID3 tag looks like into code that can +represent one in memory, read one off disk, and write it back out. + + To start with, you should define a package for this library. Here's +the package file that comes with the version you can download from the +book's Web site: + + (in-package :cl-user) + + (defpackage :com.gigamonkeys.binary-data + (:use :common-lisp :com.gigamonkeys.macro-utilities) + (:export :define-binary-class + :define-tagged-binary-class + :define-binary-type + :read-value + :write-value + :*in-progress-objects* + :parent-of-type + :current-binary-object + :+null+)) + + The COM.GIGAMONKEYS.MACRO-UTILITIES package contains the with-gensyms +and once-only macros from Chapter 8. + + Since you already have a handwritten version of the code you want to +generate, it shouldn't be too hard to write such a macro. Just take it +in small pieces, starting with a version of define-binary-class that +generates just the DEFCLASS form. + + If you look back at the define-binary-class form, you'll see that it +takes two arguments, the name id3-tag and a list of slot specifiers, +each of which is itself a two-item list. From those pieces you need to +build the appropriate DEFCLASS form. Clearly, the biggest difference +between the define-binary-class form and a proper DEFCLASS form is in +the slot specifiers. A single slot specifier from define-binary-class +looks something like this: + + (major-version u1) + + But that's not a legal slot specifier for a DEFCLASS. Instead, you +need something like this: + + (major-version :initarg :major-version :accessor major-version) + + Easy enough. First define a simple function to translate a symbol to +the corresponding keyword symbol. + + (defun as-keyword (sym) (intern (string sym) :keyword)) + + Now define a function that takes a define-binary-class slot specifier +and returns a DEFCLASS slot specifier. + + (defun slot->defclass-slot (spec) + (let ((name (first spec))) + `(,name :initarg ,(as-keyword name) :accessor ,name))) + + You can test this function at the REPL after switching to your new +package with a call to IN-PACKAGE. + + BINARY-DATA> (slot->defclass-slot '(major-version u1)) + (MAJOR-VERSION :INITARG :MAJOR-VERSION :ACCESSOR MAJOR-VERSION) + + Looks good. Now the first version of define-binary-class is trivial. + + (defmacro define-binary-class (name slots) + `(defclass ,name () + ,(mapcar #'slot->defclass-slot slots))) + + This is simple template-style macro-define-binary-class generates a +DEFCLASS form by interpolating the name of the class and a list of slot +specifiers constructed by applying slot->defclass-slot to each element +of the list of slots specifiers from the define-binary-class form. + + To see exactly what code this macro generates, you can evaluate this +expression at the REPL. + + (macroexpand-1 '(define-binary-class id3-tag + ((identifier (iso-8859-1-string :length 3)) + (major-version u1) + (revision u1) + (flags u1) + (size id3-tag-size) + (frames (id3-frames :tag-size size))))) + + The result, slightly reformatted here for better readability, should +look familiar since it's exactly the class definition you wrote by hand +earlier: + + (defclass id3-tag () + ((identifier :initarg :identifier :accessor identifier) + (major-version :initarg :major-version :accessor major-version) + (revision :initarg :revision :accessor revision) + (flags :initarg :flags :accessor flags) + (size :initarg :size :accessor size) + (frames :initarg :frames :accessor frames))) + + +File: pcl.info, Node: 24-7, Next: 24-8, Prev: 24-6, Up: Chapter 24 + +Reading Binary Objects +====================== + +Next you need to make define-binary-class also generate a function that +can read an instance of the new class. Looking back at the read-id3-tag +function you wrote before, this seems a bit trickier, as the +read-id3-tag wasn't quite so regular-to read each slot's value, you had +to call a different function. Not to mention, the name of the function, +read-id3-tag, while derived from the name of the class you're defining, +isn't one of the arguments to define-binary-class and thus isn't +available to be interpolated into a template the way the class name was. + + You could deal with both of those problems by devising and following +a naming convention so the macro can figure out the name of the function +to call based on the name of the type in the slot specifier. However, +this would require define-binary-class to generate the name +read-id3-tag, which is possible but a bad idea. Macros that create +global definitions should generally use only names passed to them by +their callers; macros that generate names under the covers can cause +hard-to-predict-and hard-to-debug-name conflicts when the generated +names happen to be the same as names used elsewhere. (1) + + You can avoid both these inconveniences by noticing that all the +functions that read a particular type of value have the same fundamental +purpose, to read a value of a specific type from a stream. Speaking +colloquially, you might say they're all instances of a single generic +operation. And the colloquial use of the word generic should lead you +directly to the solution to your problem: instead of defining a bunch of +independent functions, all with different names, you can define a single +generic function, read-value, with methods specialized to read different +types of values. + + That is, instead of defining functions read-iso-8859-1-string and +read-u1, you can define read-value as a generic function taking two +required arguments, a type and a stream, and possibly some keyword +arguments. + + (defgeneric read-value (type stream &key) + (:documentation "Read a value of the given type from the stream.")) + + By specifying &key without any actual keyword parameters, you allow +different methods to define their own &key parameters without requiring +them to do so. This does mean every method specialized on read-value +will have to include either &key or an &rest parameter in its parameter +list to be compatible with the generic function. + + Then you'll define methods that use EQL specializers to specialize +the type argument on the name of the type you want to read. + + (defmethod read-value ((type (eql 'iso-8859-1-string)) in &key length) ...) + + (defmethod read-value ((type (eql 'u1)) in &key) ...) + + Then you can make define-binary-class generate a read-value method +specialized on the type name id3-tag, and that method can be implemented +in terms of calls to read-value with the appropriate slot types as the +first argument. The code you want to generate is going to look like +this: + + (defmethod read-value ((type (eql 'id3-tag)) in &key) + (let ((object (make-instance 'id3-tag))) + (with-slots (identifier major-version revision flags size frames) object + (setf identifier (read-value 'iso-8859-1-string in :length 3)) + (setf major-version (read-value 'u1 in)) + (setf revision (read-value 'u1 in)) + (setf flags (read-value 'u1 in)) + (setf size (read-value 'id3-encoded-size in)) + (setf frames (read-value 'id3-frames in :tag-size size))) + object)) + + So, just as you needed a function to translate a define-binary-class +slot specifier to a DEFCLASS slot specifier in order to generate the +DEFCLASS form, now you need a function that takes a define-binary-class +slot specifier and generates the appropriate SETF form, that is, +something that takes this: + + (identifier (iso-8859-1-string :length 3)) + + and returns this: + + (setf identifier (read-value 'iso-8859-1-string in :length 3)) + + However, there's a difference between this code and the DEFCLASS slot +specifier: it includes a reference to a variable in-the method parameter +from the read-value method-that wasn't derived from the slot specifier. +It doesn't have to be called in, but whatever name you use has to be the +same as the one used in the method's parameter list and in the other +calls to read-value. For now you can dodge the issue of where that name +comes from by defining slot->read-value to take a second argument of the +name of the stream variable. + + (defun slot->read-value (spec stream) + (destructuring-bind (name (type &rest args)) (normalize-slot-spec spec) + `(setf ,name (read-value ',type ,stream ,@args)))) + + The function normalize-slot-spec normalizes the second element of the +slot specifier, converting a symbol like u1 to the list (u1) so the +DESTRUCTURING-BIND can parse it. It looks like this: + + (defun normalize-slot-spec (spec) + (list (first spec) (mklist (second spec)))) + + (defun mklist (x) (if (listp x) x (list x))) + + You can test slot->read-value with each type of slot specifier. + + BINARY-DATA> (slot->read-value '(major-version u1) 'stream) + (SETF MAJOR-VERSION (READ-VALUE 'U1 STREAM)) + BINARY-DATA> (slot->read-value '(identifier (iso-8859-1-string :length 3)) 'stream) + (SETF IDENTIFIER (READ-VALUE 'ISO-8859-1-STRING STREAM :LENGTH 3)) + + With these functions you're ready to add read-value to +define-binary-class. If you take the handwritten read-value method and +strip out anything that's tied to a particular class, you're left with +this skeleton: + + (defmethod read-value ((type (eql ...)) stream &key) + (let ((object (make-instance ...))) + (with-slots (...) object + ... + object))) + + All you need to do is add this skeleton to the define-binary-class +template, replacing ellipses with code that fills in the skeleton with +the appropriate names and code. You'll also want to replace the +variables type, stream, and object with gensymed names to avoid +potential conflicts with slot names, (2) which you can do with the +with-gensyms macro from Chapter 8. + + Also, because a macro must expand into a single form, you need to +wrap some form around the DEFCLASS and DEFMETHOD. PROGN is the customary +form to use for macros that expand into multiple definitions because of +the special treatment it gets from the file compiler when appearing at +the top level of a file, as I discussed in Chapter 20. + + So, you can change define-binary-class as follows: + + (defmacro define-binary-class (name slots) + (with-gensyms (typevar objectvar streamvar) + `(progn + (defclass ,name () + ,(mapcar #'slot->defclass-slot slots)) + + (defmethod read-value ((,typevar (eql ',name)) ,streamvar &key) + (let ((,objectvar (make-instance ',name))) + (with-slots ,(mapcar #'first slots) ,objectvar + ,@(mapcar #'(lambda (x) (slot->read-value x streamvar)) slots)) + ,objectvar))))) + + ---------- Footnotes ---------- + + (1) Unfortunately, the language itself doesn't always provide a good +model in this respect: the macro DEFSTRUCT, which I don't discuss since +it has largely been superseded by DEFCLASS, generates functions with +names that it generates based on the name of the structure it's given. +DEFSTRUCT's bad example leads many new macro writers astray. + + (2) Technically there's no possibility of type or object conflicting +with slot names-at worst they'd be shadowed within the WITH-SLOTS form. +But it doesn't hurt anything to simply GENSYM all local variable names +used within a macro template. + + +File: pcl.info, Node: 24-8, Next: 24-9, Prev: 24-7, Up: Chapter 24 + +Writing Binary Objects +====================== + +Generating code to write out an instance of a binary class will proceed +similarly. First you can define a write-value generic function. + + (defgeneric write-value (type stream value &key) + (:documentation "Write a value as the given type to the stream.")) + + Then you define a helper function that translates a +define-binary-class slot specifier into code that writes out the slot +using write-value. As with the slot->read-value function, this helper +function needs to take the name of the stream variable as an argument. + + (defun slot->write-value (spec stream) + (destructuring-bind (name (type &rest args)) (normalize-slot-spec spec) + `(write-value ',type ,stream ,name ,@args))) + + Now you can add a write-value template to the define-binary-class +macro. + + (defmacro define-binary-class (name slots) + (with-gensyms (typevar objectvar streamvar) + `(progn + (defclass ,name () + ,(mapcar #'slot->defclass-slot slots)) + + (defmethod read-value ((,typevar (eql ',name)) ,streamvar &key) + (let ((,objectvar (make-instance ',name))) + (with-slots ,(mapcar #'first slots) ,objectvar + ,@(mapcar #'(lambda (x) (slot->read-value x streamvar)) slots)) + ,objectvar)) + + (defmethod write-value ((,typevar (eql ',name)) ,streamvar ,objectvar &key) + (with-slots ,(mapcar #'first slots) ,objectvar + ,@(mapcar #'(lambda (x) (slot->write-value x streamvar)) slots)))))) + + +File: pcl.info, Node: 24-9, Next: 24-10, Prev: 24-8, Up: Chapter 24 + +Adding Inheritance and Tagged Structures +======================================== + +While this version of define-binary-class will handle stand-alone +structures, binary file formats often define on-disk structures that +would be natural to model with subclasses and superclasses. So you +might want to extend define-binary-class to support inheritance. + + A related technique used in many binary formats is to have several +on-disk structures whose exact type can be determined only by reading +some data that indicates how to parse the following bytes. For +instance, the frames that make up the bulk of an ID3 tag all share a +common header structure consisting of a string identifier and a length. +To read a frame, you need to read the identifier and use its value to +determine what kind of frame you're looking at and thus how to parse the +body of the frame. + + The current define-binary-class macro has no way to handle this kind +of reading-you could use define-binary-class to define a class to +represent each kind of frame, but you'd have no way to know what type of +frame to read without reading at least the identifier. And if other +code reads the identifier in order to determine what type to pass to +read-value, then that will break read-value since it's expecting to read +all the data that makes up the instance of the class it instantiates. + + You can solve this problem by adding inheritance to +define-binary-class and then writing another macro, +define-tagged-binary-class, for defining "abstract" classes that aren't +instantiated directly but that can be specialized on by read-value +methods that know how to read enough data to determine what kind of +class to create. + + The first step to adding inheritance to define-binary-class is to add +a parameter to the macro to accept a list of superclasses. + + (defmacro define-binary-class (name (&rest superclasses) slots) ... + + Then, in the DEFCLASS template, interpolate that value instead of the +empty list. + + (defclass ,name ,superclasses + ...) + + However, there's a bit more to it than that. You also need to change +the read-value and write-value methods so the methods generated when +defining a superclass can be used by the methods generated as part of a +subclass to read and write inherited slots. + + The current way read-value works is particularly problematic since it +instantiates the object before filling it in-obviously, you can't have +the method responsible for reading the superclass's fields instantiate +one object while the subclass's method instantiates and fills in a +different object. + + You can fix that problem by splitting read-value into two parts-one +responsible for instantiating the correct kind of object and another +responsible for filling slots in an existing object. On the writing +side it's a bit simpler, but you can use the same technique. + + So you'll define two new generic functions, read-object and +write-object, that will both take an existing object and a stream. +Methods on these generic functions will be responsible for reading or +writing the slots specific to the class of the object on which they're +specialized. + + (defgeneric read-object (object stream) + (:method-combination progn :most-specific-last) + (:documentation "Fill in the slots of object from stream.")) + + (defgeneric write-object (object stream) + (:method-combination progn :most-specific-last) + (:documentation "Write out the slots of object to the stream.")) + + Defining these generic functions to use the PROGN method combination +with the option :most-specific-last allows you to define methods that +specialize object on each binary class and have them deal only with the +slots actually defined in that class; the PROGN method combination will +combine all the applicable methods so the method specialized on the +least specific class in the hierarchy runs first, reading or writing the +slots defined in that class, then the method specialized on next least +specific subclass, and so on. And since all the heavy lifting for a +specific class is now going to be done by read-object and write-object, +you don't even need to define specialized read-value and write-value +methods; you can define default methods that assume the type argument is +the name of a binary class. + + (defmethod read-value ((type symbol) stream &key) + (let ((object (make-instance type))) + (read-object object stream) + object)) + + (defmethod write-value ((type symbol) stream value &key) + (assert (typep value type)) + (write-object value stream)) + + Note how you can use MAKE-INSTANCE as a generic object factory-while +you normally call MAKE-INSTANCE with a quoted symbol as the first +argument because you normally know exactly what class you want to +instantiate, you can use any expression that evaluates to a class name +such as, in this case, the type parameter in the read-value method. + + The actual changes to define-binary-class to define methods on +read-object and write-object rather than read-value and write-value are +fairly minor. + + (defmacro define-binary-class (name superclasses slots) + (with-gensyms (objectvar streamvar) + `(progn + (defclass ,name ,superclasses + ,(mapcar #'slot->defclass-slot slots)) + + (defmethod read-object progn ((,objectvar ,name) ,streamvar) + (with-slots ,(mapcar #'first slots) ,objectvar + ,@(mapcar #'(lambda (x) (slot->read-value x streamvar)) slots))) + + (defmethod write-object progn ((,objectvar ,name) ,streamvar) + (with-slots ,(mapcar #'first slots) ,objectvar + ,@(mapcar #'(lambda (x) (slot->write-value x streamvar)) slots)))))) + + +File: pcl.info, Node: 24-10, Next: 24-11, Prev: 24-9, Up: Chapter 24 + +Keeping Track of Inherited Slots +================================ + +This definition will work for many purposes. However, it doesn't handle +one fairly common situation, namely, when you have a subclass that needs +to refer to inherited slots in its own slot specifications. For +instance, with the current definition of define-binary-class, you can +define a single class like this: + + (define-binary-class generic-frame () + ((id (iso-8859-1-string :length 3)) + (size u3) + (data (raw-bytes :bytes size)))) + + The reference to size in the specification of data works the way +you'd expect because the expressions that read and write the data slot +are wrapped in a WITH-SLOTS that lists all the object's slots. However, +if you try to split that class into two classes like this: + + (define-binary-class frame () + ((id (iso-8859-1-string :length 3)) + (size u3))) + + (define-binary-class generic-frame (frame) + ((data (raw-bytes :bytes size)))) + + you'll get a compile-time warning when you compile the generic-frame +definition and a runtime error when you try to use it because there will +be no lexically apparent variable size in the read-object and +write-object methods specialized on generic-frame. + + What you need to do is keep track of the slots defined by each binary +class and then include inherited slots in the WITH-SLOTS forms in the +read-object and write-object methods. + + The easiest way to keep track of information like this is to hang it +off the symbol that names the class. As I discussed in Chapter 21, +every symbol object has an associated property list, which can be +accessed via the functions SYMBOL-PLIST and GET. You can associate +arbitrary key/value pairs with a symbol by adding them to its property +list with SETF of GET. For instance, if the binary class foo defines +three slots-x, y, and z-you can keep track of that fact by adding a +slots key to the symbol foo's property list with the value (x y z) with +this expression: + + (setf (get 'foo 'slots) '(x y z)) + + You want this bookkeeping to happen as part of evaluating the +define-binary-class of foo. However, it's not clear where to put the +expression. If you evaluate it when you compute the macro's expansion, +it'll get evaluated when you compile the define-binary-class form but +not if you later load a file that contains the resulting compiled code. +On the other hand, if you include the expression in the expansion, then +it won't be evaluated during compilation, which means if you compile a +file with several define-binary-class forms, none of the information +about what classes define what slots will be available until the whole +file is loaded, which is too late. + + This is what the special operator EVAL-WHEN I discussed in Chapter 20 +is for. By wrapping a form in an EVAL-WHEN, you can control whether +it's evaluated at compile time, when the compiled code is loaded, or +both. For cases like this where you want to squirrel away some +information during the compilation of a macro form that you also want to +be available after the compiled form is loaded, you should wrap it in an +EVAL-WHEN like this: + + (eval-when (:compile-toplevel :load-toplevel :execute) + (setf (get 'foo 'slots) '(x y z))) + + and include the EVAL-WHEN in the expansion generated by the macro. +Thus, you can save both the slots and the direct superclasses of a +binary class by adding this form to the expansion generated by +define-binary-class: + + (eval-when (:compile-toplevel :load-toplevel :execute) + (setf (get ',name 'slots) ',(mapcar #'first slots)) + (setf (get ',name 'superclasses) ',superclasses)) + + Now you can define three helper functions for accessing this +information. The first simply returns the slots directly defined by a +binary class. It's a good idea to return a copy of the list since you +don't want other code to modify the list of slots after the binary class +has been defined. + + (defun direct-slots (name) + (copy-list (get name 'slots))) + + The next function returns the slots inherited from other binary +classes. + + (defun inherited-slots (name) + (loop for super in (get name 'superclasses) + nconc (direct-slots super) + nconc (inherited-slots super))) + + Finally, you can define a function that returns a list containing the +names of all directly defined and inherited slots. + + (defun all-slots (name) + (nconc (direct-slots name) (inherited-slots name))) + + When you're computing the expansion of a define-generic-binary-class +form, you want to generate a WITH-SLOTS form that contains the names of +all the slots defined in the new class and all its superclasses. +However, you can't use all-slots while you're generating the expansion +since the information won't be available until after the expansion is +compiled. Instead, you should use the following function, which takes +the list of slot specifiers and superclasses passed to +define-generic-binary-class and uses them to compute the list of all the +new class's slots: + + (defun new-class-all-slots (slots superclasses) + (nconc (mapcan #'all-slots superclasses) (mapcar #'first slots))) + + With these functions defined, you can change define-binary-class to +store the information about the class currently being defined and to use +the already stored information about the superclasses' slots to generate +the WITH-SLOTS forms you want like this: + + (defmacro define-binary-class (name (&rest superclasses) slots) + (with-gensyms (objectvar streamvar) + `(progn + (eval-when (:compile-toplevel :load-toplevel :execute) + (setf (get ',name 'slots) ',(mapcar #'first slots)) + (setf (get ',name 'superclasses) ',superclasses)) + + (defclass ,name ,superclasses + ,(mapcar #'slot->defclass-slot slots)) + + (defmethod read-object progn ((,objectvar ,name) ,streamvar) + (with-slots ,(new-class-all-slots slots superclasses) ,objectvar + ,@(mapcar #'(lambda (x) (slot->read-value x streamvar)) slots))) + + (defmethod write-object progn ((,objectvar ,name) ,streamvar) + (with-slots ,(new-class-all-slots slots superclasses) ,objectvar + ,@(mapcar #'(lambda (x) (slot->write-value x streamvar)) slots)))))) + + +File: pcl.info, Node: 24-11, Next: 24-12, Prev: 24-10, Up: Chapter 24 + +Tagged Structures +================= + +With the ability to define binary classes that extend other binary +classes, you're ready to define a new macro for defining classes to +represent "tagged" structures. The strategy for reading tagged +structures will be to define a specialized read-value method that knows +how to read the values that make up the start of the structure and then +use those values to determine what subclass to instantiate. It'll then +make an instance of that class with MAKE-INSTANCE, passing the already +read values as initargs, and pass the object to read-object, allowing +the actual class of the object to determine how the rest of the +structure is read. + + The new macro, define-tagged-binary-class, will look like +define-binary-class with the addition of a :dispatch option used to +specify a form that should evaluate to the name of a binary class. The +:dispatch form will be evaluated in a context where the names of the +slots defined by the tagged class are bound to variables that hold the +values read from the file. The class whose name it returns must accept +initargs corresponding to the slot names defined by the tagged class. +This is easily ensured if the :dispatch form always evaluates to the +name of a class that subclasses the tagged class. + + For instance, supposing you have a function, find-frame-class, that +will map a string identifier to a binary class representing a particular +kind of ID3 frame, you might define a tagged binary class, id3-frame, +like this: + + (define-tagged-binary-class id3-frame () + ((id (iso-8859-1-string :length 3)) + (size u3)) + (:dispatch (find-frame-class id))) + + The expansion of a define-tagged-binary-class will contain a DEFCLASS +and a write-object method just like the expansion of +define-binary-class, but instead of a read-object method it'll contain a +read-value method that looks like this: + + (defmethod read-value ((type (eql 'id3-frame)) stream &key) + (let ((id (read-value 'iso-8859-1-string stream :length 3)) + (size (read-value 'u3 stream))) + (let ((object (make-instance (find-frame-class id) :id id :size size))) + (read-object object stream) + object))) + + Since the expansions of define-tagged-binary-class and +define-binary-class are going to be identical except for the read +method, you can factor out the common bits into a helper macro, +define-generic-binary-class, that accepts the read method as a parameter +and interpolates it. + + (defmacro define-generic-binary-class (name (&rest superclasses) slots read-method) + (with-gensyms (objectvar streamvar) + `(progn + (eval-when (:compile-toplevel :load-toplevel :execute) + (setf (get ',name 'slots) ',(mapcar #'first slots)) + (setf (get ',name 'superclasses) ',superclasses)) + + (defclass ,name ,superclasses + ,(mapcar #'slot->defclass-slot slots)) + + ,read-method + + (defmethod write-object progn ((,objectvar ,name) ,streamvar) + (declare (ignorable ,streamvar)) + (with-slots ,(new-class-all-slots slots superclasses) ,objectvar + ,@(mapcar #'(lambda (x) (slot->write-value x streamvar)) slots)))))) + + Now you can define both define-binary-class and +define-tagged-binary-class to expand into a call to +define-generic-binary-class. Here's a new version of +define-binary-class that generates the same code as the earlier version +when it's fully expanded: + + (defmacro define-binary-class (name (&rest superclasses) slots) + (with-gensyms (objectvar streamvar) + `(define-generic-binary-class ,name ,superclasses ,slots + (defmethod read-object progn ((,objectvar ,name) ,streamvar) + (declare (ignorable ,streamvar)) + (with-slots ,(new-class-all-slots slots superclasses) ,objectvar + ,@(mapcar #'(lambda (x) (slot->read-value x streamvar)) slots)))))) + + And here's define-tagged-binary-class along with two new helper +functions it uses: + + (defmacro define-tagged-binary-class (name (&rest superclasses) slots &rest options) + (with-gensyms (typevar objectvar streamvar) + `(define-generic-binary-class ,name ,superclasses ,slots + (defmethod read-value ((,typevar (eql ',name)) ,streamvar &key) + (let* ,(mapcar #'(lambda (x) (slot->binding x streamvar)) slots) + (let ((,objectvar + (make-instance + ,@(or (cdr (assoc :dispatch options)) + (error "Must supply :dispatch form.")) + ,@(mapcan #'slot->keyword-arg slots)))) + (read-object ,objectvar ,streamvar) + ,objectvar)))))) + + (defun slot->binding (spec stream) + (destructuring-bind (name (type &rest args)) (normalize-slot-spec spec) + `(,name (read-value ',type ,stream ,@args)))) + + (defun slot->keyword-arg (spec) + (let ((name (first spec))) + `(,(as-keyword name) ,name))) + + +File: pcl.info, Node: 24-12, Next: 24-13, Prev: 24-11, Up: Chapter 24 + +Primitive Binary Types +====================== + +While define-binary-class and define-tagged-binary-class make it easy to +define composite structures, you still have to write read-value and +write-value methods for primitive data types by hand. You could decide +to live with that, specifying that users of the library need to write +appropriate methods on read-value and write-value to support the +primitive types used by their binary classes. + + However, rather than having to document how to write a suitable +read-value/write-value pair, you can provide a macro to do it +automatically. This also has the advantage of making the abstraction +created by define-binary-class less leaky. Currently, +define-binary-class depends on having methods on read-value and +write-value defined in a particular way, but that's really just an +implementation detail. By defining a macro that generates the +read-value and write-value methods for primitive types, you hide those +details behind an abstraction you control. If you decide later to +change the implementation of define-binary-class, you can change your +primitive-type-defining macro to meet the new requirements without +requiring any changes to code that uses the binary data library. + + So you should define one last macro, define-binary-type, that will +generate read-value and write-value methods for reading values +represented by instances of existing classes, rather than by classes +defined with define-binary-class. + + For a concrete example, consider a type used in the id3-tag class, a +fixed-length string encoded in ISO-8859-1 characters. I'll assume, as I +did earlier, that the native character encoding of your Lisp is +ISO-8859-1 or a superset, so you can use CODE-CHAR and CHAR-CODE to +translate bytes to characters and back. + + As always, your goal is to write a macro that allows you to express +only the essential information needed to generate the required code. In +this case, there are four pieces of essential information: the name of +the type, iso-8859-1-string; the &key parameters that should be accepted +by the read-value and write-value methods, length in this case; the code +for reading from a stream; and the code for writing to a stream. Here's +an expression that contains those four pieces of information: + + (define-binary-type iso-8859-1-string (length) + (:reader (in) + (let ((string (make-string length))) + (dotimes (i length) + (setf (char string i) (code-char (read-byte in)))) + string)) + (:writer (out string) + (dotimes (i length) + (write-byte (char-code (char string i)) out)))) + + Now you just need a macro that can take apart this form and put it +back together in the form of two DEFMETHODs wrapped in a PROGN. If you +define the parameter list to define-binary-type like this: + + (defmacro define-binary-type (name (&rest args) &body spec) ... + + then within the macro the parameter spec will be a list containing +the reader and writer definitions. You can then use ASSOC to extract +the elements of spec using the tags :reader and :writer and then use +DESTRUCTURING-BIND to take apart the REST of each element. (1) + + From there it's just a matter of interpolating the extracted values +into the backquoted templates of the read-value and write-value methods. + + (defmacro define-binary-type (name (&rest args) &body spec) + (with-gensyms (type) + `(progn + ,(destructuring-bind ((in) &body body) (rest (assoc :reader spec)) + `(defmethod read-value ((,type (eql ',name)) ,in &key ,@args) + ,@body)) + ,(destructuring-bind ((out value) &body body) (rest (assoc :writer spec)) + `(defmethod write-value ((,type (eql ',name)) ,out ,value &key ,@args) + ,@body))))) + + Note how the backquoted templates are nested: the outermost template +starts with the backquoted PROGN form. That template consists of the +symbol PROGN and two comma-unquoted DESTRUCTURING-BIND expressions. +Thus, the outer template is filled in by evaluating the +DESTRUCTURING-BIND expressions and interpolating their values. Each +DESTRUCTURING-BIND expression in turn contains another backquoted +template, which is used to generate one of the method definitions to be +interpolated in the outer template. + + With this macro defined, the define-binary-type form given previously +expands to this code: + + (progn + (defmethod read-value ((#:g1618 (eql 'iso-8859-1-string)) in &key length) + (let ((string (make-string length))) + (dotimes (i length) + (setf (char string i) (code-char (read-byte in)))) + string)) + (defmethod write-value ((#:g1618 (eql 'iso-8859-1-string)) out string &key length) + (dotimes (i length) + (write-byte (char-code (char string i)) out)))) + + Of course, now that you've got this nice macro for defining binary +types, it's tempting to make it do a bit more work. For now you should +just make one small enhancement that will turn out to be pretty handy +when you start using this library to deal with actual formats such as +ID3 tags. + + ID3 tags, like many other binary formats, use lots of primitive types +that are minor variations on a theme, such as unsigned integers in one-, +two-, three-, and four-byte varieties. You could certainly define each +of those types with define-binary-type as it stands. Or you could +factor out the common algorithm for reading and writing n-byte unsigned +integers into helper functions. + + But suppose you had already defined a binary type, unsigned-integer, +that accepts a :bytes parameter to specify how many bytes to read and +write. Using that type, you could specify a slot representing a +one-byte unsigned integer with a type specifier of (unsigned-integer +:bytes 1). But if a particular binary format specifies lots of slots of +that type, it'd be nice to be able to easily define a new type-say, +u1-that means the same thing. As it turns out, it's easy to change +define-binary-type to support two forms, a long form consisting of a +:reader and :writer pair and a short form that defines a new binary type +in terms of an existing type. Using a short form define-binary-type, +you can define u1 like this: + + (define-binary-type u1 () (unsigned-integer :bytes 1)) + + which will expand to this: + + (progn + (defmethod read-value ((#:g161887 (eql 'u1)) #:g161888 &key) + (read-value 'unsigned-integer #:g161888 :bytes 1)) + (defmethod write-value ((#:g161887 (eql 'u1)) #:g161888 #:g161889 &key) + (write-value 'unsigned-integer #:g161888 #:g161889 :bytes 1))) + + To support both long- and short-form define-binary-type calls, you +need to differentiate based on the value of the spec argument. If spec +is two items long, it represents a long-form call, and the two items +should be the :reader and :writer specifications, which you extract as +before. On the other hand, if it's only one item long, the one item +should be a type specifier, which needs to be parsed differently. You +can use ECASE to switch on the LENGTH of spec and then parse spec and +generate an appropriate expansion for either the long form or the short +form. + + (defmacro define-binary-type (name (&rest args) &body spec) + (ecase (length spec) + (1 + (with-gensyms (type stream value) + (destructuring-bind (derived-from &rest derived-args) (mklist (first spec)) + `(progn + (defmethod read-value ((,type (eql ',name)) ,stream &key ,@args) + (read-value ',derived-from ,stream ,@derived-args)) + (defmethod write-value ((,type (eql ',name)) ,stream ,value &key ,@args) + (write-value ',derived-from ,stream ,value ,@derived-args)))))) + (2 + (with-gensyms (type) + `(progn + ,(destructuring-bind ((in) &body body) (rest (assoc :reader spec)) + `(defmethod read-value ((,type (eql ',name)) ,in &key ,@args) + ,@body)) + ,(destructuring-bind ((out value) &body body) (rest (assoc :writer spec)) + `(defmethod write-value ((,type (eql ',name)) ,out ,value &key ,@args) + ,@body))))))) + + ---------- Footnotes ---------- + + (1) Using ASSOC to extract the :reader and :writer elements of spec +allows users of define-binary-type to include the elements in either +order; if you required the :reader element to be always be first, you +could then have used (rest (first spec)) to extract the reader and (rest +(second spec)) to extract the writer. However, as long as you require +the :reader and :writer keywords to improve the readability of +define-binary-type forms, you might as well use them to extract the +correct data. + + +File: pcl.info, Node: 24-13, Next: Chapter 25, Prev: 24-12, Up: Chapter 24 + +The Current Object Stack +======================== + +One last bit of functionality you'll need in the next chapter is a way +to get at the binary object being read or written while reading and +writing. More generally, when reading or writing nested composite +objects, it's useful to be able to get at any of the objects currently +being read or written. Thanks to dynamic variables and :around methods, +you can add this enhancement with about a dozen lines of code. To +start, you should define a dynamic variable that will hold a stack of +objects currently being read or written. + + (defvar *in-progress-objects* nil) + + Then you can define :around methods on read-object and write-object +that push the object being read or written onto this variable before +invoking CALL-NEXT-METHOD. + + (defmethod read-object :around (object stream) + (declare (ignore stream)) + (let ((*in-progress-objects* (cons object *in-progress-objects*))) + (call-next-method))) + + (defmethod write-object :around (object stream) + (declare (ignore stream)) + (let ((*in-progress-objects* (cons object *in-progress-objects*))) + (call-next-method))) + + Note how you rebind *in-progress-objects* to a list with a new item +on the front rather than assigning it a new value. This way, at the end +of the LET, after CALL-NEXT-METHOD returns, the old value of +*in-progress-objects* will be restored, effectively popping the object +of the stack. + + With those two methods defined, you can provide two convenience +functions for getting at specific objects in the in-progress stack. The +function current-binary-object will return the head of the stack, the +object whose read-object or write-object method was invoked most +recently. The other, parent-of-type, takes an argument that should be +the name of a binary object class and returns the most recently pushed +object of that type, using the TYPEP function that tests whether a given +object is an instance of a particular type. + + (defun current-binary-object () (first *in-progress-objects*)) + + (defun parent-of-type (type) + (find-if #'(lambda (x) (typep x type)) *in-progress-objects*)) + + These two functions can be used in any code that will be called +within the dynamic extent of a read-object or write-object call. You'll +see one example of how current-binary-object can be used in the next +chapter. (1) + + Now you have all the tools you need to tackle an ID3 parsing library, +so you're ready to move onto the next chapter where you'll do just that. + + ---------- Footnotes ---------- + + (1) The ID3 format doesn't require the parent-of-type function since +it's a relatively flat structure. This function comes into its own when +you need to parse a format made up of many deeply nested structures +whose parsing depends on information stored in higher-level structures. +For example, in the Java class file format, the top-level class file +structure contains a constant pool that maps numeric values used in +other substructures within the class file to constant values that are +needed while parsing those substructures. If you were writing a class +file parser, you could use parent-of-type in the code that reads and +writes those substructures to get at the top-level class file object and +from there to the constant pool. + + +File: pcl.info, Node: Chapter 25, Next: Chapter 26, Prev: Chapter 24, Up: Top + +25. Practical: An ID3 Parser +============================ + +With a library for parsing binary data, you're ready to write some code +for reading and writing an actual binary format, that of ID3 tags. ID3 +tags are used to embed metadata in MP3 audio files. Dealing with ID3 +tags will be a good test of the binary data library because the ID3 +format is a true real-world format-a mix of engineering trade-offs and +idiosyncratic design choices that does, whatever else might be said +about it, get the job done. In case you missed the file-sharing +revolution, here's a quick overview of what ID3 tags are and how they +relate to MP3 files. + + MP3, also known as MPEG Audio Layer 3, is a format for storing +compressed audio data, designed by researchers at Fraunhofer IIS and +standardized by the Moving Picture Experts Group, a joint committee of +the International Organization for Standardization (ISO) and the +International Electrotechnical Commission (IEC). However, the MP3 +format, by itself, defines only how to store audio data. That's fine as +long as all your MP3 files are managed by a single application that can +store metadata externally and keep track of which metadata goes with +which files. However, when people started passing around individual MP3 +files on the Internet, via file-sharing systems such as Napster, they +soon discovered they needed a way to embed metadata in the MP3 files +themselves. + + Because the MP3 standard was already codified and a fair bit of +software and hardware had already been written that knew how to decode +the existing MP3 format, any scheme for embedding information in an MP3 +file would have to be invisible to MP3 decoders. Enter ID3. + + The original ID3 format, invented by programmer Eric Kemp, consisted +of 128 bytes stuck on the end of an MP3 file where it'd be ignored by +most MP3 software. It consisted of four 30-character fields, one each +for the song title, the album title, the artist name, and a comment; a +four-byte year field; and a one-byte genre code. Kemp provided standard +meanings for the first 80 genre codes. Nullsoft, the makers of Winamp, +a popular MP3 player, later supplemented this list with another 60 or so +genres. + + This format was easy to parse but obviously quite limited. It had no +way to encode names longer than 30 characters; it was limited to 256 +genres, and the meaning of the genre codes had to be agreed upon by all +users of ID3-aware software. There wasn't even a way to encode the CD +track number of a particular MP3 file until another programmer, Michael +Mutschler, proposed embedding the track number in the comment field, +separated from the rest of the comment by a null byte, so existing ID3 +software, which tended to read up to the first null in each of the text +fields, would ignore it. Kemp's version is now called ID3v1, and +Mutschler's is ID3v1.1. + + Limited as they were, the version 1 proposals were at least a partial +solution to the metadata problem, so they were adopted by many MP3 +ripping programs (which had to put the ID3 tag into the MP3 files) and +MP3 players (which would extract the information in the ID3 tag to +display to the user). (1) + + By 1998, however, the limitations were really becoming annoying, and +a new group, led by Martin Nilsson, started work on a completely new +tagging scheme, which came to be called ID3v2. The ID3v2 format is +extremely flexible, allowing for many kinds of information to be +included, with almost no length limitations. It also takes advantage of +certain details of the MP3 format to allow ID3v2 tags to be placed at +the beginning of an MP3 file. + + ID3v2 tags are, however, more of a challenge to parse than version 1 +tags. In this chapter, you'll use the binary data parsing library from +the previous chapter to develop code that can read and write ID3v2 tags. +Or at least you'll make a reasonable start-where ID3v1 was too simple, +ID3v2 is baroque to the point of being completely overengineered. +Implementing every nook and cranny of the specification, especially if +you want to support all three versions that have been specified, would +be a fair bit of work. However, you can ignore many of the features in +those specifications since they're rarely used "in the wild." For +starters, you can ignore, for now, a whole version, 2.4, since it has +not been widely adopted and mostly just adds more needless flexibility +compared to version 2.3. I'll focus on versions 2.2 and 2.3 because +they're both widely used and are different enough from each other to +keep things interesting. + +* Menu: + +* 25-1:: Structure of an ID3v2 Tag +* 25-2:: Defining a Package +* 25-3:: Integer Types +* 25-4:: String Types +* 25-5:: ID3 Tag Header +* 25-6:: ID3 Frames +* 25-7:: Detecting Tag Padding +* 25-8:: Supporting Multiple Versions of ID3 +* 25-9:: Versioned Frame Base Classes +* 25-10:: Versioned Concrete Frame Classes +* 25-11:: What Frames Do You Actually Need? +* 25-12:: Text Information Frames +* 25-13:: Comment Frames +* 25-14:: Extracting Information from an ID3 Tag + + ---------- Footnotes ---------- + + (1) Ripping is the process by which a song on an audio CD is +converted to an MP3 file on your hard drive. These days most ripping +software also automatically retrieves information about the songs being +ripped from online databases such as Gracenote (née the Compact Disc +Database [CDDB]) or FreeDB, which it then embeds in the MP3 files as ID3 +tags. + + +File: pcl.info, Node: 25-1, Next: 25-2, Prev: Chapter 25, Up: Chapter 25 + +Structure of an ID3v2 Tag +========================= + +Before you can start cutting code, you'll need to be familiar with the +overall structure of an ID3v2 tag. A tag starts with a header +containing information about the tag as a whole. The first three bytes +of the header encode the string "ID3" in ISO-8859-1 characters. In +other words, they're the bytes 73, 68, and 51. Then comes two bytes +that encode the major version and revision of the ID3 specification to +which the tag purports to conform. They're followed by a single byte +whose individual bits are treated as flags. The meanings of the +individual flags depend on the version of the spec. Some of the flags +can affect the way the rest of the tag is parsed. The "major version" +is actually used to record the minor version of the spec, while the +"revision" is the subminor version of the spec. Thus, the "major +version" field for a tag conforming to the 2.3.0 spec is 3. The +revision field is always zero since each new ID3v2 spec has bumped the +minor version, leaving the subminor version at zero. The value stored +in the major version field of the tag has, as you'll see, a dramatic +effect on how you'll parse the rest of the tag. + + The last field in the tag header is an integer, encoded in four bytes +but using only seven bits from each byte, that gives the total size of +the tag, not counting the header. In version 2.3 tags, the header may +be followed by several extended header fields; otherwise, the remainder +of the tag data is divided into frames. Different types of frames store +different kinds of information, from simple textual information, such as +the song name, to embedded images. Each frame starts with a header +containing a string identifier and a size. In version 2.3, the frame +header also contains two bytes worth of flags and, depending on the +value of one the flags, an optional one-byte code indicating how the +rest of the frame is encrypted. + + Frames are a perfect example of a tagged data structure-to know how +to parse the body of a frame, you need to read the header and use the +identifier to determine what kind of frame you're reading. + + The ID3 tag header contains no direct indication of how many frames +are in a tag-the tag header tells you how big the tag is, but since many +frames are variable length, the only way to find out how many frames the +tag contains is to read the frame data. Also, the size given in the tag +header may be larger than the actual number of bytes of frame data; the +frames may be followed with enough null bytes to pad the tag out to the +specified size. This makes it possible for tag editors to modify a tag +without having to rewrite the whole MP3 file. (1) + + So, the main issues you have to deal with are reading the ID3 header; +determining whether you're reading a version 2.2 or 2.3 tag; and reading +the frame data, stopping either when you've read the complete tag or +when you've hit the padding bytes. + + ---------- Footnotes ---------- + + (1) Almost all file systems provide the ability to overwrite existing +bytes of a file, but few, if any, provide a way to add or remove data at +the beginning or middle of a file without having to rewrite the rest of +the file. Since ID3 tags are typically stored at the beginning of a +file, to rewrite an ID3 tag without disturbing the rest of the file you +must replace the old tag with a new tag of exactly the same length. By +writing ID3 tags with a certain amount of padding, you have a better +chance of being able to do so-if the new tag has more data than the +original tag, you use less padding, and if it's shorter, you use more. + + +File: pcl.info, Node: 25-2, Next: 25-3, Prev: 25-1, Up: Chapter 25 + +Defining a Package +================== + +Like the other libraries you've developed so far, the code you'll write +in this chapter is worth putting in its own package. You'll need to +refer to functions from both the binary data and pathname libraries +developed in Chapters 24 and 15 and will also want to export the names +of the functions that make up the public API to this package. The +following package definition does all that: + + (defpackage :com.gigamonkeys.id3v2 + (:use :common-lisp + :com.gigamonkeys.binary-data + :com.gigamonkeys.pathnames) + (:export + :read-id3 + :mp3-p + :id3-p + :album + :composer + :genre + :encoding-program + :artist + :part-of-set + :track + :song + :year + :size + :translated-genre)) + + As usual, you can, and probably should, change the com.gigamonkeys +part of the package name to your own domain. + + +File: pcl.info, Node: 25-3, Next: 25-4, Prev: 25-2, Up: Chapter 25 + +Integer Types +============= + +You can start by defining binary types for reading and writing several +of the primitive types used by the ID3 format, various sizes of unsigned +integers, and four kinds of strings. + + ID3 uses unsigned integers encoded in one, two, three, and four +bytes. If you first write a general unsigned-integer binary type that +takes the number of bytes to read as an argument, you can then use the +short form of define-binary-type to define the specific types. The +general unsigned-integer type looks like this: + + (define-binary-type unsigned-integer (bytes) + (:reader (in) + (loop with value = 0 + for low-bit downfrom (* 8 (1- bytes)) to 0 by 8 do + (setf (ldb (byte 8 low-bit) value) (read-byte in)) + finally (return value))) + (:writer (out value) + (loop for low-bit downfrom (* 8 (1- bytes)) to 0 by 8 + do (write-byte (ldb (byte 8 low-bit) value) out)))) + + Now you can use the short form of define-binary-type to define one +type for each size of integer used in the ID3 format like this: + + (define-binary-type u1 () (unsigned-integer :bytes 1)) + (define-binary-type u2 () (unsigned-integer :bytes 2)) + (define-binary-type u3 () (unsigned-integer :bytes 3)) + (define-binary-type u4 () (unsigned-integer :bytes 4)) + + Another type you'll need to be able to read and write is the 28-bit +value used in the header. This size is encoded using 28 bits rather +than a multiple of 8, such as 32 bits, because an ID3 tag can't contain +the byte #xff followed by a byte with the top 3 bits on because that +pattern has a special meaning to MP3 decoders. None of the other fields +in the ID3 header could possibly contain such a byte sequence, but if +you encoded the tag size as a regular unsigned-integer, it might. To +avoid that possibility, the size is encoded using only the bottom seven +bits of each byte, with the top bit always zero. (1) + + Thus, it can be read and written a lot like an unsigned-integer +except the size of the byte specifier you pass to LDB should be seven +rather than eight. This similarity suggests that if you add a +parameter, bits-per-byte, to the existing unsigned-integer binary type, +you could then define a new type, id3-tag-size, using a short-form +define-binary-type. The new version of unsigned-integer is just like +the old version except with bits-per-byte used everywhere the old +version hardwired the number eight. It looks like this: + + (define-binary-type unsigned-integer (bytes bits-per-byte) + (:reader (in) + (loop with value = 0 + for low-bit downfrom (* bits-per-byte (1- bytes)) to 0 by bits-per-byte do + (setf (ldb (byte bits-per-byte low-bit) value) (read-byte in)) + finally (return value))) + (:writer (out value) + (loop for low-bit downfrom (* bits-per-byte (1- bytes)) to 0 by bits-per-byte + do (write-byte (ldb (byte bits-per-byte low-bit) value) out)))) + + The definition of id3-tag-size is then trivial. + + (define-binary-type id3-tag-size () (unsigned-integer :bytes 4 :bits-per-byte 7)) + + You'll also have to change the definitions of u1 through u4 to +specify eight bits per byte like this: + + (define-binary-type u1 () (unsigned-integer :bytes 1 :bits-per-byte 8)) + (define-binary-type u2 () (unsigned-integer :bytes 2 :bits-per-byte 8)) + (define-binary-type u3 () (unsigned-integer :bytes 3 :bits-per-byte 8)) + (define-binary-type u4 () (unsigned-integer :bytes 4 :bits-per-byte 8)) + + ---------- Footnotes ---------- + + (1) The frame data following the ID3 header could also potentially +contain the illegal sequence. That's prevented using a different scheme +that's turned on via one of the flags in the tag header. The code in +this chapter doesn't account for the possibility that this flag might be +set; in practice it's rarely used. + + +File: pcl.info, Node: 25-4, Next: 25-5, Prev: 25-3, Up: Chapter 25 + +String Types +============ + +The other kinds of primitive types that are ubiquitous in the ID3 format +are strings. In the previous chapter I discussed some of the issues you +have to consider when dealing with strings in binary files, such as the +difference between character codes and character encodings. + + ID3 uses two different character codes, ISO 8859-1 and Unicode. ISO +8859-1, also known as Latin-1, is an eight-bit character code that +extends ASCII with characters used by the languages of Western Europe. +In other words, the code points from 0-127 map to the same characters in +ASCII and ISO 8859-1, but ISO 8859-1 also provides mappings for code +points up to 255. Unicode is a character code designed to provide a +code point for virtually every character of all the world's languages. +Unicode is a superset of ISO 8859-1 in the same way that ISO 8859-1 is a +superset of ASCII-the code points from 0-255 map to the same characters +in both ISO 8859-1 and Unicode. (Thus, Unicode is also a superset of +ASCII.) + + Since ISO 8859-1 is an eight-bit character code, it's encoded using +one byte per character. For Unicode strings, ID3 uses the UCS-2 +encoding with a leading byte order mark. (1) I'll discuss what a byte +order mark is in a moment. + + Reading and writing these two encodings isn't a problem-it's just a +question of reading and writing unsigned integers in various formats, +and you just finished writing the code to do that. The trick is how you +translate those numeric values to Lisp character objects. + + The Lisp implementation you're using probably uses either Unicode or +ISO 8859-1 as its internal character code. And since all the values +from 0-255 map to the same characters in both ISO 8859-1 and Unicode, +you can use Lisp's CODE-CHAR and CHAR-CODE functions to translate those +values in both character codes. However, if your Lisp supports only ISO +8859-1, then you'll be able to represent only the first 255 Unicode +characters as Lisp characters. In other words, in such a Lisp +implementation, if you try to process an ID3 tag that uses Unicode +strings and if any of those strings contain characters with code points +higher than 255, you'll get an error when you try to translate the code +point to a Lisp character. For now I'll assume either you're using a +Unicode-based Lisp or you won't process any files containing characters +outside the ISO 8859-1 range. + + The other issue with encoding strings is how to know how many bytes +to interpret as character data. ID3 uses two strategies I mentioned in +the previous chapter-some strings are terminated with a null character, +while other strings occur in positions where you can determine the +number of bytes to read, either because the string at that position is +always the same length or because the string is at the end of a +composite structure whose overall size you know. Note, however, that +the number of bytes isn't necessarily the same as the number of +characters in the string. + + Putting all these variations together, the ID3 format uses four ways +to read and write strings-two characters crossed with two ways of +delimiting the string data. + + Obviously, much of the logic of reading and writing strings will be +quite similar. So, you can start by defining two binary types, one for +reading strings of a specific length (in characters) and another for +reading terminated strings. Both types take advantage of that the type +argument to read-value and write-value is just another piece of data; +you can make the type of character to read a parameter of these types. +This is a technique you'll use quite a few times in this chapter. + + (define-binary-type generic-string (length character-type) + (:reader (in) + (let ((string (make-string length))) + (dotimes (i length) + (setf (char string i) (read-value character-type in))) + string)) + (:writer (out string) + (dotimes (i length) + (write-value character-type out (char string i))))) + + (define-binary-type generic-terminated-string (terminator character-type) + (:reader (in) + (with-output-to-string (s) + (loop for char = (read-value character-type in) + until (char= char terminator) do (write-char char s)))) + (:writer (out string) + (loop for char across string + do (write-value character-type out char) + finally (write-value character-type out terminator)))) + + With these types available, there's not much to reading ISO 8859-1 +strings. Because the character-type argument you pass to read-value and +write-value of a generic-string must be the name of a binary type, you +need to define an iso-8859-1-char binary type. This also gives you a +good place to put a bit of sanity checking on the code points of +characters you read and write. + + (define-binary-type iso-8859-1-char () + (:reader (in) + (let ((code (read-byte in))) + (or (code-char code) + (error "Character code ~d not supported" code)))) + (:writer (out char) + (let ((code (char-code char))) + (if (<= 0 code #xff) + (write-byte code out) + (error "Illegal character for iso-8859-1 encoding: character: ~c with code: ~d" char code))))) + + Now defining the ISO 8859-1 string types is trivial using the short +form of define-binary-type as follows: + + (define-binary-type iso-8859-1-string (length) + (generic-string :length length :character-type 'iso-8859-1-char)) + + (define-binary-type iso-8859-1-terminated-string (terminator) + (generic-terminated-string :terminator terminator :character-type 'iso-8859-1-char)) + + Reading UCS-2 strings is only slightly more complex. The complexity +arises because you can encode a UCS-2 code point in two ways: most +significant byte first (big-endian) or least significant byte first +(little-endian). UCS-2 strings therefore start with two extra bytes, +called the byte order mark, made up of the numeric value #xfeff encoded +in either big-endian form or little-endian form. When reading a UCS-2 +string, you read the byte order mark and then, depending on its value, +read either big-endian or little-endian characters. Thus, you'll need +two different UCS-2 character types. But you need only one version of +the sanity-checking code, so you can define a parameterized binary type +like this: + + (define-binary-type ucs-2-char (swap) + (:reader (in) + (let ((code (read-value 'u2 in))) + (when swap (setf code (swap-bytes code))) + (or (code-char code) (error "Character code ~d not supported" code)))) + (:writer (out char) + (let ((code (char-code char))) + (unless (<= 0 code #xffff) + (error "Illegal character for ucs-2 encoding: ~c with char-code: ~d" char code)) + (when swap (setf code (swap-bytes code))) + (write-value 'u2 out code)))) + + where the swap-bytes function can be defined as follows, taking +advantage of LDB being SETFable and thus ROTATEFable: + + (defun swap-bytes (code) + (assert (<= code #xffff)) + (rotatef (ldb (byte 8 0) code) (ldb (byte 8 8) code)) + code) + + Using ucs-2-char, you can define two character types that will be +used as the character-type arguments to the generic string functions. + + (define-binary-type ucs-2-char-big-endian () (ucs-2-char :swap nil)) + + (define-binary-type ucs-2-char-little-endian () (ucs-2-char :swap t)) + + Then you need a function that returns the name of the character type +to use based on the value of the byte order mark. + + (defun ucs-2-char-type (byte-order-mark) + (ecase byte-order-mark + (#xfeff 'ucs-2-char-big-endian) + (#xfffe 'ucs-2-char-little-endian))) + + Now you can define length- and terminator-delimited string types for +UCS-2-encoded strings that read the byte order mark and use it to +determine which variant of UCS-2 character to pass as the character-type +argument to read-value and write-value. The only other wrinkle is that +you need to translate the length argument, which is a number of bytes, +to the number of characters to read, accounting for the byte order mark. + + (define-binary-type ucs-2-string (length) + (:reader (in) + (let ((byte-order-mark (read-value 'u2 in)) + (characters (1- (/ length 2)))) + (read-value + 'generic-string in + :length characters + :character-type (ucs-2-char-type byte-order-mark)))) + (:writer (out string) + (write-value 'u2 out #xfeff) + (write-value + 'generic-string out string + :length (length string) + :character-type (ucs-2-char-type #xfeff)))) + + (define-binary-type ucs-2-terminated-string (terminator) + (:reader (in) + (let ((byte-order-mark (read-value 'u2 in))) + (read-value + 'generic-terminated-string in + :terminator terminator + :character-type (ucs-2-char-type byte-order-mark)))) + (:writer (out string) + (write-value 'u2 out #xfeff) + (write-value + 'generic-terminated-string out string + :terminator terminator + :character-type (ucs-2-char-type #xfeff)))) + + ---------- Footnotes ---------- + + (1) In ID3v2.4, UCS-2 is replaced by the virtually identical UTF-16, +and UTF-16BE and UTF-8 are added as additional encodings. + + +File: pcl.info, Node: 25-5, Next: 25-6, Prev: 25-4, Up: Chapter 25 + +ID3 Tag Header +============== + +With the basic primitive types done, you're ready to switch to a +high-level view and start defining binary classes to represent first the +ID3 tag as a whole and then the individual frames. + + If you turn first to the ID3v2.2 specification, you'll see that the +basic structure of the tag is this header: + + ID3/file identifier "ID3" + ID3 version $02 00 + ID3 flags %xx000000 + ID3 size 4 * %0xxxxxxx + + followed by frame data and padding. Since you've already defined +binary types to read and write all the fields in the header, defining a +class that can read the header of an ID3 tag is just a matter of putting +them together. + + (define-binary-class id3-tag () + ((identifier (iso-8859-1-string :length 3)) + (major-version u1) + (revision u1) + (flags u1) + (size id3-tag-size))) + + If you have some MP3 files lying around, you can test this much of +the code and also see what version of ID3 tags your MP3s contain. First +you can write a function that reads an id3-tag, as just defined, from +the beginning of a file. Be aware, however, that ID3 tags aren't +required to appear at the beginning of a file, though these days they +almost always do. To find an ID3 tag elsewhere in a file, you can scan +the file looking for the sequence of bytes 73, 68, 51 (in other words, +the string "ID3"). (1) For now you can probably get away with assuming +the tags are the first thing in the file. + + (defun read-id3 (file) + (with-open-file (in file :element-type '(unsigned-byte 8)) + (read-value 'id3-tag in))) + + On top of this function you can build a function that takes a +filename and prints the information in the tag header along with the +name of the file. + + (defun show-tag-header (file) + (with-slots (identifier major-version revision flags size) (read-id3 file) + (format t "~a ~d.~d ~8,'0b ~d bytes -- ~a~%" + identifier major-version revision flags size (enough-namestring file)))) + + It prints output that looks like this: + + ID3V2> (show-tag-header "/usr2/mp3/Kitka/Wintersongs/02 Byla Cesta.mp3") + ID3 2.0 00000000 2165 bytes -- Kitka/Wintersongs/02 Byla Cesta.mp3 + NIL + + Of course, to determine what versions of ID3 are most common in your +MP3 library, it'd be handier to have a function that returns a summary +of all the MP3 files under a given directory. You can write one easily +enough using the walk-directory function defined in Chapter 15. First +define a helper function that tests whether a given filename has an mp3 +extension. + + (defun mp3-p (file) + (and + (not (directory-pathname-p file)) + (string-equal "mp3" (pathname-type file)))) + + Then you can combine show-tag-header and mp3-p with walk-directory to +print a summary of the ID3 header in each file under a given directory. + + (defun show-tag-headers (dir) + (walk-directory dir #'show-tag-header :test #'mp3-p)) + + However, if you have a lot of MP3s, you may just want a count of how +many ID3 tags of each version you have in your MP3 collection. To get +that information, you might write a function like this: + + (defun count-versions (dir) + (let ((versions (mapcar #'(lambda (x) (cons x 0)) '(2 3 4)))) + (flet ((count-version (file) + (incf (cdr (assoc (major-version (read-id3 file)) versions))))) + (walk-directory dir #'count-version :test #'mp3-p)) + versions)) + + Another function you'll need in Chapter 29 is one that tests whether +a file actually starts with an ID3 tag, which you can define like this: + + (defun id3-p (file) + (with-open-file (in file :element-type '(unsigned-byte 8)) + (string= "ID3" (read-value 'iso-8859-1-string in :length 3)))) + + ---------- Footnotes ---------- + + (1) The 2.4 version of the ID3 format also supports placing a footer +at the end of a tag, which makes it easier to find a tag appended to the +end of a file. + + +File: pcl.info, Node: 25-6, Next: 25-7, Prev: 25-5, Up: Chapter 25 + +ID3 Frames +========== + +As I discussed earlier, the bulk of an ID3 tag is divided into frames. +Each frame has a structure similar to that of the tag as a whole. Each +frame starts with a header indicating what kind of frame it is and the +size of the frame in bytes. The structure of the frame header changed +slightly between version 2.2 and version 2.3 of the ID3 format, and +eventually you'll have to deal with both forms. To start, you can focus +on parsing version 2.2 frames. + + The header of a 2.2 frame consists of three bytes that encode a +three-character ISO 8859-1 string followed by a three-byte unsigned +integer, which specifies the size of the frame in bytes, excluding the +six-byte header. The string identifies what type of frame it is, which +determines how you parse the data following the size. This is exactly +the kind of situation for which you defined the +define-tagged-binary-class macro. You can define a tagged class that +reads the frame header and then dispatches to the appropriate concrete +class using a function that maps IDs to a class names. + + (define-tagged-binary-class id3-frame () + ((id (iso-8859-1-string :length 3)) + (size u3)) + (:dispatch (find-frame-class id))) + + Now you're ready to start implementing concrete frame classes. +However, the specification defines quite a few-63 in version 2.2 and +even more in later specs. Even considering frame types that share a +common structure to be equivalent, you'll still find 24 unique frame +types in version 2.2. But only a few of these are used "in the wild." +So rather than immediately setting to work defining classes for each of +the frame types, you can start by writing a generic frame class that +lets you read the frames in a tag without parsing the data within the +frames themselves. This will give you a way to find out what frames are +actually present in the MP3s you want to process. You'll need this +class eventually anyway because the specification allows for +experimental frames that you'll need to be able to read without parsing. + + Since the size field of the frame header tells you exactly how many +bytes long the frame is, you can define a generic-frame class that +extends id3-frame and adds a single field, data, that will hold an array +of bytes. + + (define-binary-class generic-frame (id3-frame) + ((data (raw-bytes :size size)))) + + The type of the data field, raw-bytes, just needs to hold an array of +bytes. You can define it like this: + + (define-binary-type raw-bytes (size) + (:reader (in) + (let ((buf (make-array size :element-type '(unsigned-byte 8)))) + (read-sequence buf in) + buf)) + (:writer (out buf) + (write-sequence buf out))) + + For the time being, you'll want all frames to be read as +generic-frames, so you can define the find-frame-class function used in +id3-frame's :dispatch expression to always return generic-frame, +regardless of the frame's id. + + (defun find-frame-class (id) + (declare (ignore id)) + 'generic-frame) + + Now you need to modify id3-tag so it'll read frames after the header +fields. There's only one tricky bit to reading the frame data: although +the tag header tells you how many bytes long the tag is, that number +includes the padding that can follow the frame data. Since the tag +header doesn't tell you how many frames the tag contains, the only way +to tell when you've hit the padding is to look for a null byte where +you'd expect a frame identifier. + + To handle this, you can define a binary type, id3-frames, that will +be responsible for reading the remainder of a tag, creating frame +objects to represent all the frames it finds, and then skipping over any +padding. This type will take as a parameter the tag size, which it can +use to avoid reading past the end of the tag. But the reading code will +also need to detect the beginning of the padding that can follow the +tag's frame data. Rather than calling read-value directly in id3-frames +:reader, you should use a function read-frame, which you'll define to +return NIL when it detects padding, otherwise returning an id3-frame +object read using read-value. Assuming you define read-frame so it +reads only one byte past the end of the last frame in order to detect +the start of the padding, you can define the id3-frames binary type like +this: + + (define-binary-type id3-frames (tag-size) + (:reader (in) + (loop with to-read = tag-size + while (plusp to-read) + for frame = (read-frame in) + while frame + do (decf to-read (+ 6 (size frame))) + collect frame + finally (loop repeat (1- to-read) do (read-byte in)))) + (:writer (out frames) + (loop with to-write = tag-size + for frame in frames + do (write-value 'id3-frame out frame) + (decf to-write (+ 6 (size frame))) + finally (loop repeat to-write do (write-byte 0 out))))) + + You can use this type to add a frames slot to id3-tag. + + (define-binary-class id3-tag () + ((identifier (iso-8859-1-string :length 3)) + (major-version u1) + (revision u1) + (flags u1) + (size id3-tag-size) + (frames (id3-frames :tag-size size)))) + + +File: pcl.info, Node: 25-7, Next: 25-8, Prev: 25-6, Up: Chapter 25 + +Detecting Tag Padding +===================== + +Now all that remains is to implement read-frame. This is a bit tricky +since the code that actually reads bytes from the stream is several +layers down from read-frame. + + What you'd really like to do in read-frame is read one byte and +return NIL if it's a null and otherwise read a frame with read-value. +Unfortunately, if you read the byte in read-frame, then it won't be +available to be read by read-value. (1) + + It turns out this is a perfect opportunity to use the condition +system-you can check for null bytes in the low-level code that reads +from the stream and signal a condition when you read a null; read-frame +can then handle the condition by unwinding the stack before more bytes +are read. In addition to turning out to be a tidy solution to the +problem of detecting the start of the tag's padding, this is also an +example of how you can use conditions for purposes other than handling +errors. + + You can start by defining a condition type to be signaled by the +low-level code and handled by the high-level code. This condition +doesn't need any slots-you just need a distinct class of condition so +you know no other code will be signaling or handling it. + + (define-condition in-padding () ()) + + Next you need to define a binary type whose :reader reads a given +number of bytes, first reading a single byte and signaling an in-padding +condition if the byte is null and otherwise reading the remaining bytes +as an iso-8859-1-string and combining it with the first byte read. + + (define-binary-type frame-id (length) + (:reader (in) + (let ((first-byte (read-byte in))) + (when (= first-byte 0) (signal 'in-padding)) + (let ((rest (read-value 'iso-8859-1-string in :length (1- length)))) + (concatenate + 'string (string (code-char first-byte)) rest)))) + (:writer (out id) + (write-value 'iso-8859-1-string out id :length length))) + + If you redefine id3-frame to make the type of its id slot frame-id +instead of iso-8859-1-string, the condition will be signaled whenever +id3-frame's read-value method reads a null byte instead of the beginning +of a frame. + + (define-tagged-binary-class id3-frame () + ((id (frame-id :length 3)) + (size u3)) + (:dispatch (find-frame-class id))) + + Now all read-frame has to do is wrap a call to read-value in a +HANDLER-CASE that handles the in-padding condition by returning NIL. + + (defun read-frame (in) + (handler-case (read-value 'id3-frame in) + (in-padding () nil))) + + With read-frame defined, you can now read a complete version 2.2 ID3 +tag, representing frames with instances of generic-frame. In the "What +Frames Do You Actually Need?" section, you'll do some experiments at +the REPL to determine what frame classes you need to implement. But +first let's add support for version 2.3 ID3 tags. + + ---------- Footnotes ---------- + + (1) Character streams support two functions, PEEK-CHAR and +UNREAD-CHAR, either of which would be a perfect solution to this +problem, but binary streams support no equivalent functions. + + +File: pcl.info, Node: 25-8, Next: 25-9, Prev: 25-7, Up: Chapter 25 + +Supporting Multiple Versions of ID3 +=================================== + +Currently, id3-tag is defined using define-binary-class, but if you want +to support multiple versions of ID3, it makes more sense to use a +define-tagged-binary-class that dispatches on the major-version value. +As it turns out, all versions of ID3v2 have the same structure up to the +size field. So, you can define a tagged binary class like the following +that defines this basic structure and then dispatches to the appropriate +version-specific subclass: + + (define-tagged-binary-class id3-tag () + ((identifier (iso-8859-1-string :length 3)) + (major-version u1) + (revision u1) + (flags u1) + (size id3-tag-size)) + (:dispatch + (ecase major-version + (2 'id3v2.2-tag) + (3 'id3v2.3-tag)))) + + Version 2.2 and version 2.3 tags differ in two ways. First, the +header of a version 2.3 tag may be extended with up to four optional +extended header fields, as determined by values in the flags field. +Second, the frame format changed between version 2.2 and version 2.3, +which means you'll have to use different classes to represent version +2.2 frames and the corresponding version 2.3 frames. + + Since the new id3-tag class is based on the one you originally wrote +to represent version 2.2 tags, it's not surprising that the new +id3v2.2-tag class is trivial, inheriting most of its slots from the new +id3-tag class and adding the one missing slot, frames. Because version +2.2 and version 2.3 tags use different frame formats, you'll have to +change the id3-frames type to be parameterized with the type of frame to +read. For now, assume you'll do that and add a :frame-type argument to +the id3-frames type descriptor like this: + + (define-binary-class id3v2.2-tag (id3-tag) + ((frames (id3-frames :tag-size size :frame-type 'id3v2.2-frame)))) + + The id3v2.3-tag class is slightly more complex because of the +optional fields. The first three of the four optional fields are +included when the sixth bit in flags is set. They're a four- byte +integer specifying the size of the extended header, two bytes worth of +flags, and another four-byte integer specifying how many bytes of +padding are included in the tag. (1) The fourth optional field, +included when the fifteenth bit of the extended header flags is set, is +a four-byte cyclic redundancy check (CRC) of the rest of the tag. + + The binary data library doesn't provide any special support for +optional fields in a binary class, but it turns out that regular +parameterized binary types are sufficient. You can define a type +parameterized with the name of a type and a value that indicates whether +a value of that type should actually be read or written. + + (define-binary-type optional (type if) + (:reader (in) + (when if (read-value type in))) + (:writer (out value) + (when if (write-value type out value)))) + + Using if as the parameter name looks a bit strange in that code, but +it makes the optional type descriptors quite readable. For instance, +here's the definition of id3v2.3-tag using optional slots: + + (define-binary-class id3v2.3-tag (id3-tag) + ((extended-header-size (optional :type 'u4 :if (extended-p flags))) + (extra-flags (optional :type 'u2 :if (extended-p flags))) + (padding-size (optional :type 'u4 :if (extended-p flags))) + (crc (optional :type 'u4 :if (crc-p flags extra-flags))) + (frames (id3-frames :tag-size size :frame-type 'id3v2.3-frame)))) + + where extended-p and crc-p are helper functions that test the +appropriate bit of the flags value they're passed. To test whether an +individual bit of an integer is set, you can use LOGBITP, another +bit-twiddling function. It takes an index and an integer and returns +true if the specified bit is set in the integer. + + (defun extended-p (flags) (logbitp 6 flags)) + + (defun crc-p (flags extra-flags) + (and (extended-p flags) (logbitp 15 extra-flags))) + + As in the version 2.2 tag class, the frames slot is defined to be of +type id3-frames, passing the name of the frame type as a parameter. You +do, however, need to make a few small changes to id3-frames and +read-frame to support the extra frame-type parameter. + + (define-binary-type id3-frames (tag-size frame-type) + (:reader (in) + (loop with to-read = tag-size + while (plusp to-read) + for frame = (read-frame frame-type in) + while frame + do (decf to-read (+ (frame-header-size frame) (size frame))) + collect frame + finally (loop repeat (1- to-read) do (read-byte in)))) + (:writer (out frames) + (loop with to-write = tag-size + for frame in frames + do (write-value frame-type out frame) + (decf to-write (+ (frame-header-size frame) (size frame))) + finally (loop repeat to-write do (write-byte 0 out))))) + + (defun read-frame (frame-type in) + (handler-case (read-value frame-type in) + (in-padding () nil))) + + The changes are in the calls to read-frame and write-value, where you +need to pass the frame-type argument and, in computing the size of the +frame, where you need to use a function frame-header-size instead of the +literal value 6 since the frame header changed size between version 2.2 +and version 2.3. Since the difference in the result of this function is +based on the class of the frame, it makes sense to define it as a +generic function like this: + + (defgeneric frame-header-size (frame)) + + You'll define the necessary methods on that generic function in the +next section after you define the new frame classes. + + ---------- Footnotes ---------- + + (1) If a tag had an extended header, you could use this value to +determine where the frame data should end. However, if the extended +header isn't used, you'd have to use the old algorithm anyway, so it's +not worth adding code to do it another way. + + +File: pcl.info, Node: 25-9, Next: 25-10, Prev: 25-8, Up: Chapter 25 + +Versioned Frame Base Classes +============================ + +Where before you defined a single base class for all frames, you'll now +have two classes, id3v2.2-frame and id3v2.3-frame. The id3v2.2-frame +class will be essentially the same as the original id3-frame class. + + (define-tagged-binary-class id3v2.2-frame () + ((id (frame-id :length 3)) + (size u3)) + (:dispatch (find-frame-class id))) + + The id3v2.3-frame, on the other hand, requires more changes. The +frame identifier and size fields were extended in version 2.3 from three +to four bytes each, and two bytes worth of flags were added. +Additionally, the frame, like the version 2.3 tag, can contain optional +fields, controlled by the values of three of the frame's flags. (1) +With those changes in mind, you can define the version 2.3 frame base +class, along with some helper functions, like this: + + (define-tagged-binary-class id3v2.3-frame () + ((id (frame-id :length 4)) + (size u4) + (flags u2) + (decompressed-size (optional :type 'u4 :if (frame-compressed-p flags))) + (encryption-scheme (optional :type 'u1 :if (frame-encrypted-p flags))) + (grouping-identity (optional :type 'u1 :if (frame-grouped-p flags)))) + (:dispatch (find-frame-class id))) + + (defun frame-compressed-p (flags) (logbitp 7 flags)) + + (defun frame-encrypted-p (flags) (logbitp 6 flags)) + + (defun frame-grouped-p (flags) (logbitp 5 flags)) + + With these two classes defined, you can now implement the methods on +the generic function frame-header-size. + + (defmethod frame-header-size ((frame id3v2.2-frame)) 6) + + (defmethod frame-header-size ((frame id3v2.3-frame)) 10) + + The optional fields in a version 2.3 frame aren't counted as part of +the header for this computation since they're already included in the +value of the frame's size. + + ---------- Footnotes ---------- + + (1) These flags, in addition to controlling whether the optional +fields are included, can affect the parsing of the rest of the tag. In +particular, if the seventh bit of the flags is set, then the actual +frame data is compressed using the zlib algorithm, and if the sixth bit +is set, the data is encrypted. In practice these options are rarely, if +ever, used, so you can get away with ignoring them for now. But that +would be an area you'd have to address to make this a production-quality +ID3 library. One simple half solution would be to change +find-frame-class to accept a second argument and pass it the flags; if +the frame is compressed or encrypted, you could instantiate a generic +frame to hold the data. + + +File: pcl.info, Node: 25-10, Next: 25-11, Prev: 25-9, Up: Chapter 25 + +Versioned Concrete Frame Classes +================================ + +In the original definition, generic-frame subclassed id3-frame. But now +id3-frame has been replaced with the two version-specific base classes, +id3v2.2-frame and id3v2.3-frame. So, you need to define two new +versions of generic-frame, one for each base class. One way to define +this classes would be like this: + + (define-binary-class generic-frame-v2.2 (id3v2.2-frame) + ((data (raw-bytes :size size)))) + + (define-binary-class generic-frame-v2.3 (id3v2.3-frame) + ((data (raw-bytes :size size)))) + + However, it's a bit annoying that these two classes are the same +except for their superclass. It's not too bad in this case since +there's only one additional field. But if you take this approach for +other concrete frame classes, ones that have a more complex internal +structure that's identical between the two ID3 versions, the duplication +will be more irksome. + + Another approach, and the one you should actually use, is to define a +class generic-frame as a mixin: a class intended to be used as a +superclass along with one of the version-specific base classes to +produce a concrete, version-specific frame class. The only tricky bit +about this approach is that if generic-frame doesn't extend either of +the frame base classes, then you can't refer to the size slot in its +definition. Instead, you must use the current-binary-object function I +discussed at the end of the previous chapter to access the object you're +in the midst of reading or writing and pass it to size. And you need to +account for the difference in the number of bytes of the total frame +size that will be left over, in the case of a version 2.3 frame, if any +of the optional fields are included in the frame. So, you should define +a generic function data-bytes with methods that do the right thing for +both version 2.2 and version 2.3 frames. + + (define-binary-class generic-frame () + ((data (raw-bytes :size (data-bytes (current-binary-object)))))) + + (defgeneric data-bytes (frame)) + + (defmethod data-bytes ((frame id3v2.2-frame)) + (size frame)) + + (defmethod data-bytes ((frame id3v2.3-frame)) + (let ((flags (flags frame))) + (- (size frame) + (if (frame-compressed-p flags) 4 0) + (if (frame-encrypted-p flags) 1 0) + (if (frame-grouped-p flags) 1 0)))) + + Then you can define concrete classes that extend one of the +version-specific base classes and generic-frame to define +version-specific generic frame classes. + + (define-binary-class generic-frame-v2.2 (id3v2.2-frame generic-frame) ()) + + (define-binary-class generic-frame-v2.3 (id3v2.3-frame generic-frame) ()) + + With these classes defined, you can redefine the find-frame-class +function to return the right versioned class based on the length of the +identifier. + + (defun find-frame-class (id) + (ecase (length id) + (3 'generic-frame-v2.2) + (4 'generic-frame-v2.3))) + + +File: pcl.info, Node: 25-11, Next: 25-12, Prev: 25-10, Up: Chapter 25 + +What Frames Do You Actually Need? +================================= + +With the ability to read both version 2.2 and version 2.3 tags using +generic frames, you're ready to start implementing classes to represent +the specific frames you care about. However, before you dive in, you +should take a breather and figure out what frames you actually care +about since, as I mentioned earlier, the ID3 spec specifies many frames +that are almost never used. Of course, what frames you care about +depends on what kinds of applications you're interested in writing. If +you're mostly interested in extracting information from existing ID3 +tags, then you need implement only the classes representing the frames +containing the information you care about. On the other hand, if you +want to write an ID3 tag editor, you may need to support all the frames. + + Rather than guessing which frames will be most useful, you can use +the code you've already written to poke around a bit at the REPL and see +what frames are actually used in your own MP3s. To start, you need an +instance of id3-tag, which you can get with the read-id3 function. + + ID3V2> (read-id3 "/usr2/mp3/Kitka/Wintersongs/02 Byla Cesta.mp3") + #<ID3V2.2-TAG #x727b2912> + + Since you'll want to play with this object a bit, you should save it +in a variable. + + ID3V2> (defparameter *id3* (read-id3 "/usr2/mp3/Kitka/Wintersongs/02 Byla Cesta.mp3")) + *ID3* + + Now you can see, for example, how many frames it has. + + ID3V2> (length (frames *id3*)) + 11 + + Not too many-let's take a look at what they are. + + ID3V2> (frames *id3*) + (#<GENERIC-FRAME-V2.2 #x72dabdda> #<GENERIC-FRAME-V2.2 #x72dabec2> + #<GENERIC-FRAME-V2.2 #x72dabfa2> #<GENERIC-FRAME-V2.2 #x72dac08a> + #<GENERIC-FRAME-V2.2 #x72dac16a> #<GENERIC-FRAME-V2.2 #x72dac24a> + #<GENERIC-FRAME-V2.2 #x72dac32a> #<GENERIC-FRAME-V2.2 #x72dac40a> + #<GENERIC-FRAME-V2.2 #x72dac4f2> #<GENERIC-FRAME-V2.2 #x72dac632> + #<GENERIC-FRAME-V2.2 #x72dac7b2>) + + Okay, that's not too informative. What you really want to know are +what kinds of frames are in there. In other words, you want to know the +ids of those frames, which you can get with a simple MAPCAR like this: + + ID3V2> (mapcar #'id (frames *id3*)) + ("TT2" "TP1" "TAL" "TRK" "TPA" "TYE" "TCO" "TEN" "COM" "COM" "COM") + + If you look up these identifiers in the ID3v2.2 spec, you'll discover +that all the frames with identifiers starting with T are text +information frames and have a similar structure. And COM is the +identifier for comment frames, which have a structure similar to that of +text information frames. The particular text information frames +identified here turn out to be the frames for representing the song +title, artist, album, track, part of set, year, genre, and encoding +program. + + Of course, this is just one MP3 file. Maybe other frames are used in +other files. It's easy enough to discover. First define a function +that combines the previous MAPCAR expression with a call to read-id3 and +wraps the whole thing in a DELETE-DUPLICATES to keep things tidy. +You'll have to use a :test argument of #'string= to DELETE-DUPLICATES to +specify that you want two elements considered the same if they're the +same string. + + (defun frame-types (file) + (delete-duplicates (mapcar #'id (frames (read-id3 file))) :test #'string=)) + + This should give the same answer except with only one of each +identifier when passed the same filename. + + ID3V2> (frame-types "/usr2/mp3/Kitka/Wintersongs/02 Byla Cesta.mp3") + ("TT2" "TP1" "TAL" "TRK" "TPA" "TYE" "TCO" "TEN" "COM") + + Then you can use Chapter 15's walk-directory function along with +mp3-p to find every MP3 file under a directory and combine the results +of calling frame-types on each file. Recall that NUNION is the +recycling version of the UNION function; since frame-types makes a new +list for each file, this is safe. + + (defun frame-types-in-dir (dir) + (let ((ids ())) + (flet ((collect (file) + (setf ids (nunion ids (frame-types file) :test #'string=)))) + (walk-directory dir #'collect :test #'mp3-p)) + ids)) + + Now pass it the name of a directory, and it'll tell you the set of +identifiers used in all the MP3 files under that directory. It may take +a few seconds depending how many MP3 files you have, but you'll probably +get something similar to this: + + ID3V2> (frame-types-in-dir "/usr2/mp3/") + ("TCON" "COMM" "TRCK" "TIT2" "TPE1" "TALB" "TCP" "TT2" "TP1" "TCM" + "TAL" "TRK" "TPA" "TYE" "TCO" "TEN" "COM") + + The four-letter identifiers are the version 2.3 equivalents of the +version 2.2 identifiers I discussed previously. Since the information +stored in those frames is exactly the information you'll need in Chapter +27, it makes sense to implement classes only for the frames actually +used, namely, text information and comment frames, which you'll do in +the next two sections. If you decide later that you want to support +other frame types, it's mostly a matter of translating the ID3 +specifications into the appropriate binary class definitions. + + +File: pcl.info, Node: 25-12, Next: 25-13, Prev: 25-11, Up: Chapter 25 + +Text Information Frames +======================= + +All text information frames consist of two fields: a single byte +indicating which string encoding is used in the frame and a string +encoded in the remaining bytes of the frame. If the encoding byte is +zero, the string is encoded in ISO 8859-1; if the encoding is one, the +string is a UCS-2 string. + + You've already defined binary types representing the four different +kinds of strings-two different encodings each with two different methods +of delimiting the string. However, define-binary-class provides no +direct facility for determining the type of value to read based on other +values in the object. Instead, you can define a binary type that you +pass the value of the encoding byte and that then reads or writes the +appropriate kind of string. + + As long as you're defining such a type, you can also define it to +take two parameters, :length and :terminator, and pick the right type of +string based on which argument is supplied. To implement this new type, +you must first define some helper functions. The first two return the +name of the appropriate string type based on the encoding byte. + + (defun non-terminated-type (encoding) + (ecase encoding + (0 'iso-8859-1-string) + (1 'ucs-2-string))) + + (defun terminated-type (encoding) + (ecase encoding + (0 'iso-8859-1-terminated-string) + (1 'ucs-2-terminated-string))) + + Then string-args uses the encoding byte, the length, and the +terminator to determine several of the arguments to be passed to +read-value and write-value by the :reader and :writer of +id3-encoded-string. One of the length and terminator arguments to +string-args should always be NIL. + + (defun string-args (encoding length terminator) + (cond + (length + (values (non-terminated-type encoding) :length length)) + (terminator + (values (terminated-type encoding) :terminator terminator)))) + + With those helpers, the definition of id3-encoded-string is simple. +One detail to note is that the keyword-either :length or +:terminator-used in the call to read-value and write-value is just +another piece of data returned by string-args. Although keywords in +arguments lists are almost always literal keywords, they don't have to +be. + + (define-binary-type id3-encoded-string (encoding length terminator) + (:reader (in) + (multiple-value-bind (type keyword arg) + (string-args encoding length terminator) + (read-value type in keyword arg))) + (:writer (out string) + (multiple-value-bind (type keyword arg) + (string-args encoding length terminator) + (write-value type out string keyword arg)))) + + Now you can define a text-info mixin class, much the way you defined +generic-frame earlier. + + (define-binary-class text-info-frame () + ((encoding u1) + (information (id3-encoded-string :encoding encoding :length (bytes-left 1))))) + + As when you defined generic-frame, you need access to the size of the +frame, in this case to compute the :length argument to pass to +id3-encoded-string. Because you'll need to do a similar computation in +the next class you define, you can go ahead and define a helper +function, bytes-left, that uses current-binary-object to get at the size +of the frame. + + (defun bytes-left (bytes-read) + (- (size (current-binary-object)) bytes-read)) + + Now, as you did with the generic-frame mixin, you can define two +version-specific concrete classes with a minimum of duplicated code. + + (define-binary-class text-info-frame-v2.2 (id3v2.2-frame text-info-frame) ()) + + (define-binary-class text-info-frame-v2.3 (id3v2.3-frame text-info-frame) ()) + + To wire these classes in, you need to modify find-frame-class to +return the appropriate class name when the ID indicates the frame is a +text information frame, namely, whenever the ID starts with T and isn't +TXX or TXXX. + + (defun find-frame-class (name) + (cond + ((and (char= (char name 0) #\T) + (not (member name '("TXX" "TXXX") :test #'string=))) + (ecase (length name) + (3 'text-info-frame-v2.2) + (4 'text-info-frame-v2.3))) + (t + (ecase (length name) + (3 'generic-frame-v2.2) + (4 'generic-frame-v2.3))))) + + +File: pcl.info, Node: 25-13, Next: 25-14, Prev: 25-12, Up: Chapter 25 + +Comment Frames +============== + +Another commonly used frame type is the comment frame, which is like a +text information frame with a few extra fields. Like a text information +frame, it starts with a single byte indicating the string encoding used +in the frame. That byte is followed by a three-character ISO 8859-1 +string (regardless of the value of the string encoding byte), which +indicates what language the comment is in using an ISO-639-2 code, for +example, "eng" for English or "jpn" for Japanese. That field is +followed by two strings encoded as indicated by the first byte. The +first is a null-terminated string containing a description of the +comment. The second, which takes up the remainder of the frame, is the +comment text itself. + + (define-binary-class comment-frame () + ((encoding u1) + (language (iso-8859-1-string :length 3)) + (description (id3-encoded-string :encoding encoding :terminator +null+)) + (text (id3-encoded-string + :encoding encoding + :length (bytes-left + (+ 1 ; encoding + 3 ; language + (encoded-string-length description encoding t))))))) + + As in the definition of the text-info mixin, you can use bytes-left +to compute the size of the final string. However, since the description +field is a variable-length string, the number of bytes read prior to the +start of text isn't a constant. To make matters worse, the number of +bytes used to encode description is dependent on the encoding. So, you +should define a helper function that returns the number of bytes used to +encode a string given the string, the encoding code, and a boolean +indicating whether the string is terminated with an extra character. + + (defun encoded-string-length (string encoding terminated) + (let ((characters (+ (length string) (if terminated 1 0)))) + (* characters (ecase encoding (0 1) (1 2))))) + + And, as before, you can define the concrete version-specific comment +frame classes and wire them into find-frame-class. + + (define-binary-class comment-frame-v2.2 (id3v2.2-frame comment-frame) ()) + + (define-binary-class comment-frame-v2.3 (id3v2.3-frame comment-frame) ()) + + (defun find-frame-class (name) + (cond + ((and (char= (char name 0) #\T) + (not (member name '("TXX" "TXXX") :test #'string=))) + (ecase (length name) + (3 'text-info-frame-v2.2) + (4 'text-info-frame-v2.3))) + ((string= name "COM") 'comment-frame-v2.2) + ((string= name "COMM") 'comment-frame-v2.3) + (t + (ecase (length name) + (3 'generic-frame-v2.2) + (4 'generic-frame-v2.3))))) + + +File: pcl.info, Node: 25-14, Next: Chapter 26, Prev: 25-13, Up: Chapter 25 + +Extracting Information from an ID3 Tag +====================================== + +Now that you have the basic ability to read and write ID3 tags, you have +a lot of directions you could take this code. If you want to develop a +complete ID3 tag editor, you'll need to implement specific classes for +all the frame types. You'd also need to define methods for manipulating +the tag and frame objects in a consistent way (for instance, if you +change the value of a string in a text-info-frame, you'll likely need to +adjust the size); as the code stands, there's nothing to make sure that +happens. (1) + + Or, if you just need to extract certain pieces of information about +an MP3 file from its ID3 tag-as you will when you develop a streaming +MP3 server in Chapters 27, 28, and 29-you'll need to write functions +that find the appropriate frames and extract the information you want. + + Finally, to make this production-quality code, you'd have to pore +over the ID3 specs and deal with the details I skipped over in the +interest of space. In particular, some of the flags in both the tag and +the frame can affect the way the contents of the tag or frame is read; +unless you write some code that does the right thing when those flags +are set, there may be ID3 tags that this code won't be able to parse +correctly. But the code from this chapter should be capable of parsing +nearly all the MP3s you actually encounter. + + For now you can finish with a few functions to extract individual +pieces of information from an id3-tag. You'll need these functions in +Chapter 27 and probably in other code that uses this library. They +belong in this library because they depend on details of the ID3 format +that the users of this library shouldn't have to worry about. + + To get, say, the name of the song of the MP3 from which an id3-tag +was extracted, you need to find the ID3 frame with a specific identifier +and then extract the information field. And some pieces of information, +such as the genre, can require further decoding. Luckily, all the +frames that contain the information you'll care about are text +information frames, so extracting a particular piece of information +mostly boils down to using the right identifier to look up the +appropriate frame. Of course, the ID3 authors decided to change all the +identifiers between ID3v2.2 and ID3v2.3, so you'll have to account for +that. + + Nothing too complex-you just need to figure out the right path to get +to the various pieces of information. This is a perfect bit of code to +develop interactively, much the way you figured out what frame classes +you needed to implement. To start, you need an id3-tag object to play +with. Assuming you have an MP3 laying around, you can use read-id3 like +this: + + ID3V2> (defparameter *id3* (read-id3 "Kitka/Wintersongs/02 Byla Cesta.mp3")) + *ID3* + ID3V2> *id3* + #<ID3V2.2-TAG #x73d04c1a> + + replacing Kitka/Wintersongs/02 Byla Cesta.mp3 with the filename of +your MP3. Once you have your id3-tag object, you can start poking +around. For instance, you can check out the list of frame objects with +the frames function. + + ID3V2> (frames *id3*) + (#<TEXT-INFO-FRAME-V2.2 #x73d04cca> + #<TEXT-INFO-FRAME-V2.2 #x73d04dba> + #<TEXT-INFO-FRAME-V2.2 #x73d04ea2> + #<TEXT-INFO-FRAME-V2.2 #x73d04f9a> + #<TEXT-INFO-FRAME-V2.2 #x73d05082> + #<TEXT-INFO-FRAME-V2.2 #x73d0516a> + #<TEXT-INFO-FRAME-V2.2 #x73d05252> + #<TEXT-INFO-FRAME-V2.2 #x73d0533a> + #<COMMENT-FRAME-V2.2 #x73d0543a> + #<COMMENT-FRAME-V2.2 #x73d05612> + #<COMMENT-FRAME-V2.2 #x73d0586a>) + + Now suppose you want to extract the song title. It's probably in one +of those frames, but to find it, you need to find the frame with the +"TT2" identifier. Well, you can check easily enough to see if the tag +contains such a frame by extracting all the identifiers like this: + + ID3V2> (mapcar #'id (frames *id3*)) + ("TT2" "TP1" "TAL" "TRK" "TPA" "TYE" "TCO" "TEN" "COM" "COM" "COM") + + There it is, the first frame. However, there's no guarantee it'll +always be the first frame, so you should probably look it up by +identifier rather than position. That's also straightforward using the +FIND function. + + ID3V2> (find "TT2" (frames *id3*) :test #'string= :key #'id) + #<TEXT-INFO-FRAME-V2.2 #x73d04cca> + + Now, to get at the actual information in the frame, do this: + + ID3V2> (information (find "TT2" (frames *id3*) :test #'string= :key #'id)) + "Byla Cesta^@" + + Whoops. That ^ is how Emacs prints a null character. In a maneuver +reminiscent of the kludge that turned ID3v1 into ID3v1.1, the +information slot of a text information frame, though not officially a +null-terminated string, can contain a null, and ID3 readers are supposed +to ignore any characters after the null. So, you need a function that +takes a string and returns the contents up to the first null character, +if any. That's easy enough using the +null+ constant from the binary +data library. + + (defun upto-null (string) + (subseq string 0 (position +null+ string))) + + Now you can get just the title. + + ID3V2> (upto-null (information (find "TT2" (frames *id3*) :test #'string= :key #'id))) + "Byla Cesta" + + You could just wrap that code in a function named song that takes an +id3-tag as an argument, and you'd be done. However, the only difference +between this code and the code you'll use to extract the other pieces of +information you'll need (such as the album name, the artist, and the +genre) is the identifier. So, it's better to split up the code a bit. +For starters, you can write a function that just finds a frame given an +id3-tag and an identifier like this: + + (defun find-frame (id3 id) + (find id (frames id3) :test #'string= :key #'id)) + + ID3V2> (find-frame *id3* "TT2") + #<TEXT-INFO-FRAME-V2.2 #x73d04cca> + + Then the other bit of code, the part that extracts the information +from a text-info-frame, can go in another function. + + (defun get-text-info (id3 id) + (let ((frame (find-frame id3 id))) + (when frame (upto-null (information frame))))) + + ID3V2> (get-text-info *id3* "TT2") + "Byla Cesta" + + Now the definition of song is just a matter of passing the right +identifier. + + (defun song (id3) (get-text-info id3 "TT2")) + + ID3V2> (song *id3*) + "Byla Cesta" + + However, this definition of song works only with version 2.2 tags +since the identifier changed from "TT2" to "TIT2" between version 2.2 +and version 2.3. And all the other tags changed too. Since the user of +this library shouldn't have to know about different versions of the ID3 +format to do something as simple as get the song title, you should +probably handle those details for them. A simple way is to change +find-frame to take not just a single identifier but a list of +identifiers like this: + + (defun find-frame (id3 ids) + (find-if #'(lambda (x) (find (id x) ids :test #'string=)) (frames id3))) + + Then change get-text-info slightly so it can take one or more +identifiers using a &rest parameter. + + (defun get-text-info (id3 &rest ids) + (let ((frame (find-frame id3 ids))) + (when frame (upto-null (information frame))))) + + Then the change needed to allow song to support both version 2.2 and +version 2.3 tags is just a matter of adding the version 2.3 identifier. + + (defun song (id3) (get-text-info id3 "TT2" "TIT2")) + + Then you just need to look up the appropriate version 2.2 and version +2.3 frame identifiers for any fields for which you want to provide an +accessor function. Here are the ones you'll need in Chapter 27: + + (defun album (id3) (get-text-info id3 "TAL" "TALB")) + + (defun artist (id3) (get-text-info id3 "TP1" "TPE1")) + + (defun track (id3) (get-text-info id3 "TRK" "TRCK")) + + (defun year (id3) (get-text-info id3 "TYE" "TYER" "TDRC")) + + (defun genre (id3) (get-text-info id3 "TCO" "TCON")) + + The last wrinkle is that the way the genre is stored in the TCO or +TCON frames isn't always human readable. Recall that in ID3v1, genres +were stored as a single byte that encoded a particular genre from a +fixed list. Unfortunately, those codes live on in ID3v2-if the text of +the genre frame is a number in parentheses, the number is supposed to be +interpreted as an ID3v1 genre code. But, again, users of this library +probably won't care about that ancient history. So, you should provide +a function that automatically translates the genre. The following +function uses the genre function just defined to extract the actual +genre text and then checks whether it starts with a left parenthesis, +decoding the version 1 genre code with a function you'll define in a +moment if it does: + + (defun translated-genre (id3) + (let ((genre (genre id3))) + (if (and genre (char= #\( (char genre 0))) + (translate-v1-genre genre) + genre))) + + Since a version 1 genre code is effectively just an index into an +array of standard names, the easiest way to implement translate-v1-genre +is to extract the number from the genre string and use it as an index +into an actual array. + + (defun translate-v1-genre (genre) + (aref *id3-v1-genres* (parse-integer genre :start 1 :junk-allowed t))) + + Then all you need to do is to define the array of names. The +following array of names includes the 80 official version 1 genres plus +the genres created by the authors of Winamp: + + (defparameter *id3-v1-genres* + #( + ;; These are the official ID3v1 genres. + "Blues" "Classic Rock" "Country" "Dance" "Disco" "Funk" "Grunge" + "Hip-Hop" "Jazz" "Metal" "New Age" "Oldies" "Other" "Pop" "R&B" "Rap" + "Reggae" "Rock" "Techno" "Industrial" "Alternative" "Ska" + "Death Metal" "Pranks" "Soundtrack" "Euro-Techno" "Ambient" + "Trip-Hop" "Vocal" "Jazz+Funk" "Fusion" "Trance" "Classical" + "Instrumental" "Acid" "House" "Game" "Sound Clip" "Gospel" "Noise" + "AlternRock" "Bass" "Soul" "Punk" "Space" "Meditative" + "Instrumental Pop" "Instrumental Rock" "Ethnic" "Gothic" "Darkwave" + "Techno-Industrial" "Electronic" "Pop-Folk" "Eurodance" "Dream" + "Southern Rock" "Comedy" "Cult" "Gangsta" "Top 40" "Christian Rap" + "Pop/Funk" "Jungle" "Native American" "Cabaret" "New Wave" + "Psychadelic" "Rave" "Showtunes" "Trailer" "Lo-Fi" "Tribal" + "Acid Punk" "Acid Jazz" "Polka" "Retro" "Musical" "Rock & Roll" + "Hard Rock" + + ;; These were made up by the authors of Winamp but backported into + ;; the ID3 spec. + "Folk" "Folk-Rock" "National Folk" "Swing" "Fast Fusion" + "Bebob" "Latin" "Revival" "Celtic" "Bluegrass" "Avantgarde" + "Gothic Rock" "Progressive Rock" "Psychedelic Rock" "Symphonic Rock" + "Slow Rock" "Big Band" "Chorus" "Easy Listening" "Acoustic" "Humour" + "Speech" "Chanson" "Opera" "Chamber Music" "Sonata" "Symphony" + "Booty Bass" "Primus" "Porn Groove" "Satire" "Slow Jam" "Club" + "Tango" "Samba" "Folklore" "Ballad" "Power Ballad" "Rhythmic Soul" + "Freestyle" "Duet" "Punk Rock" "Drum Solo" "A capella" "Euro-House" + "Dance Hall" + + ;; These were also invented by the Winamp folks but ignored by the + ;; ID3 authors. + "Goa" "Drum & Bass" "Club-House" "Hardcore" "Terror" "Indie" + "BritPop" "Negerpunk" "Polsk Punk" "Beat" "Christian Gangsta Rap" + "Heavy Metal" "Black Metal" "Crossover" "Contemporary Christian" + "Christian Rock" "Merengue" "Salsa" "Thrash Metal" "Anime" "Jpop" + "Synthpop")) + + Once again, it probably feels like you wrote a ton of code in this +chapter. But if you put it all in a file, or if you download the +version from this book's Web site, you'll see it's just not that many +lines-most of the pain of writing this library stems from having to +understand the intricacies of the ID3 format itself. Anyway, now you +have a major piece of what you'll turn into a streaming MP3 server in +Chapters 27, 28, and 29. The other major bit of infrastructure you'll +need is a way to write server-side Web software, the topic of the next +chapter. + + ---------- Footnotes ---------- + + (1) Ensuring that kind of interfield consistency would be a fine +application for :after methods on the accessor generic functions. For +instance, you could define this :after method to keep size in sync with +the information string: + + (defmethod (setf information) :after (value (frame text-info-frame)) + (declare (ignore value)) + (with-slots (encoding size information) frame + (setf size (encoded-string-length information encoding nil)))) + + +File: pcl.info, Node: Chapter 26, Next: Chapter 27, Prev: Chapter 25, Up: Top + +26. Practical: Web Programming with AllegroServe +================================================ + +In this chapter you'll look at one way to develop Web-based programs in +Common Lisp, using the open-source AllegroServe Web server. This isn't +meant as a full introduction to AllegroServe. And I'm certainly not +going to cover anything more than a tiny corner of the larger topic of +Web programming. My goal here is to cover enough of the basics of using +AllegroServe that you'll be able, in Chapter 29, to develop an +application for browsing a library of MP3 files and streaming them to an +MP3 client. Similarly, this chapter will serve as a brief introduction +to Web programming for folks new to the topic. + +* Menu: + +* 26-1:: A 30-Second Intro to Server-Side Web Programming +* 26-2:: AllegroServe +* 26-3:: Generating Dynamic Content with AllegroServe +* 26-4:: Generating HTML +* 26-5:: HTML Macros +* 26-6:: Query Parameters +* 26-7:: Cookies +* 26-8:: A Small Application Framework +* 26-9:: The Implementation + + +File: pcl.info, Node: 26-1, Next: 26-2, Prev: Chapter 26, Up: Chapter 26 + +A 30-Second Intro to Server-Side Web Programming +================================================ + +While Web programming today typically involves quite a number of +software frameworks and different protocols, the core bits of Web +programming haven't changed much since they were invented in the early +1990s. For simple applications, such as the one you'll write in Chapter +29, you need to understand only a few key concepts, so I'll review them +quickly here. Experienced Web programmers can skim or skip the rest of +this section. (1) + + To start, you need to understand the roles the Web browser and the +Web server play in Web programming. While a modern browser comes with a +lot of bells and whistles, the core functionality of a Web browser is to +request Web pages from a Web server and then render them. Typically +those pages will be written in the Hypertext Markup Language (HTML), +which tells the browser how to render the page, including where to +insert inline images and links to other Web pages. HTML consists of +text marked up with tags that give the text a structure that the browser +uses when rendering the page. For instance, a simple HTML document +looks like this: + + <html> + <head> + <title>Hello</title> + </head> + <body> + <p>Hello, world!</p> + <p>This is a picture: <img src="some-image.gif"></p> + <p>This is a <a href="another-page.html">link</a> to another page.</p> + </body> + </html> + + Figure 26-1 shows how the browser renders this page. + + + + Figure 26-1. Sample Web page + + The browser and server communicate using a protocol called the +Hypertext Transfer Protocol (HTTP). While you don't need to worry about +the details of the protocol, it's worth understanding that it consists +entirely of a sequence of requests initiated by the browser and +responses generated by the server. That is, the browser connects to the +Web server and sends a request that includes, at the least, the desired +URL and the version of HTTP that the browser speaks. The browser can +also include data in its request; that's how the browser submits HTML +forms to the server. + + To reply to a request, the server sends a response made up of a set +of headers and a body. The headers contain information about the body, +such as what type of data it is (for instance, HTML, plain text, or an +image), and the body is the data itself, which is then rendered by the +browser. The server can also send an error response telling the browser +that its request couldn't be answered for some reason. + + And that's pretty much it. Once the browser has received the +complete response from the server, there's no communication between the +browser and the server until the next time the browser decides to +request a page from the server. (2) This is the main constraint of Web +programming-there's no way for code running on the server to affect what +the user sees in their browser unless the browser issues a new request +to the server. (3) + + Some Web pages, called static pages, are simply HTML files stored on +the Web server and served up when requested by the browser. Dynamic +pages, on the other hand, consist of HTML generated each time the page +is requested by a browser. For instance, a dynamic page might be +generated by querying a database and then constructing HTML to represent +the results of the query. (4) + + When generating its response to a request, server-side code has four +main pieces of information to act on. The first piece of information is +the requested URL. Typically, however, the URL is used by the Web server +itself to determine what code is responsible for generating the +response. Next, if the URL contains a question mark, everything after +the question mark is considered to be a query string, which is typically +ignored by the Web server except that it makes it available to the code +generating the response. Most of the time the query string contains a +set of key/value pairs. The request from the browser can also contain +post data, which also usually consists of key/value pairs. Post data is +typically used to submit HTML forms. The key/value pairs supplied in +either the query string or the post data are collectively called the +query parameters. + + Finally, in order to string together a sequence of individual +requests from the same browser, code running in the server can set a +cookie, sending a special header in its response to the browser that +contains a bit of opaque data called a cookie. After a cookie is set by +a particular server, the browser will send the cookie with each request +it sends to that server. The browser doesn't care about the data in the +cookie-it just echoes it back to the server for the server-side code to +interpret however it wants. + + These are the primitive elements on top of which 99 percent of +server-side Web programming is built. The browser sends a request, the +server finds some code to handle the request and runs it, and the code +uses query parameters and cookies to determine what to do. + + ---------- Footnotes ---------- + + (1) Readers new to Web programming will probably need to supplement +this introduction with a more in-depth tutorial or two. You can find a +good set of online tutorials at http://www.jmarshall.com/easy/. + + (2) Loading a single Web page may actually involve multiple +requests-to render the HTML of a page containing inline images, the +browser must request each image individually and then insert each into +the appropriate place in the rendered HTML. + + (3) Much of the complexity around Web programming is a result of +trying to work around this fundamental limitation in order to provide a +user experience that's more like the interactivity provided by desktop +applications. + + (4) Unfortunately, dynamic is somewhat overloaded in the Web world. +The phrase Dynamic HTML refers to HTML containing embedded code, usually +in the language JavaScript, that can be executed in the browser without +further communication with the Web server. Used with some discretion, +Dynamic HTML can improve the usability of a Web-based application since, +even with high-speed Internet connections, making a request to a Web +server, receiving the response, and rendering the new page can take a +noticeable amount of time. To further confuse things, dynamically +generated pages (in other words, generated on the server) could also +contain Dynamic HTML (code to be run on the client.) For the purposes +of this book, you'll stick to dynamically generating plain old +nondynamic HTML. + + +File: pcl.info, Node: 26-2, Next: 26-3, Prev: 26-1, Up: Chapter 26 + +AllegroServe +============ + +You can serve Web content using Common Lisp in a number of ways; there +are at least three open-source Web servers written in Common Lisp as +well as plug-ins such as mod_lisp (1) and Lisplets (2) that allow the +Apache Web server or any Java Servlet container to delegate requests to +a Lisp server running in a separate process. + + For this chapter, you'll use a version of the open-source Web server +AllegroServe, originally written by John Foderaro at Franz Inc.. +AllegroServe is included in the version of Allegro available from Franz +for use with this book. If you're not using Allegro, you can use +PortableAllegroServe, a friendly fork of the AllegroServe code base, +which includes an Allegro compatibility layer that allows +PortableAllegroServe to run on most Common Lisps. The code you'll write +in this chapter and in Chapter 29 should run in both vanilla +AllegroServe and PortableAllegroServe. + + AllegroServe provides a programming model similar in spirit to Java +Servlets-each time a browser requests a page, AllegroServe parses the +request and looks up an object, called an entity, which handles the +request. Some entity classes provided as part of AllegroServe know how +to serve static content-either individual files or the contents of a +directory tree. Others, the ones I'll spend most of this chapter +discussing, run arbitrary Lisp code to generate the response. (3) + + But before I get to that, you need to know how to start AllegroServe +and set it up to serve a few files. The first step is to load the +AllegroServe code into your Lisp image. In Allegro, you can simply type +(require :aserve). In other Lisps (or in Allegro), you can load +PortableAllegroServe by loading the file INSTALL.lisp at the top of the +portableaserve directory tree. Loading AllegroServe will create three +new packages, NET.ASERVE, NET.HTML.GENERATOR, and NET.ASERVE.CLIENT. (4) + + After loading the server, you start it with the function start in the +NET.ASERVE package. To have easy access to the symbols exported from +NET.ASERVE, from COM.GIGAMONKEYS.HTML (a package I'll discuss in a +moment), and from the rest of Common Lisp, you should create a new +package to play in like this: + + CL-USER> (defpackage :com.gigamonkeys.web + (:use :cl :net.aserve :com.gigamonkeys.html)) + #<The COM.GIGAMONKEYS.WEB package> + + Now switch to that package with this IN-PACKAGE expression: + + CL-USER> (in-package :com.gigamonkeys.web) + #<The COM.GIGAMONKEYS.WEB package> + WEB> + + Now you can use the exported names from NET.ASERVE without +qualification. The function start starts the server. It takes quite a +number of keyword parameters, but the only one you need to pass is +:port, which specifies the port to listen on. You should probably use a +high port such as 2001 instead of the default port for HTTP servers, 80, +because on Unix-derived operating systems only the root user can listen +on ports below 1024. To run AllegroServe listening on port 80 on Unix, +you'd need to start Lisp as root and then use the :setuid and :setgid +parameters to tell start to switch its identity after opening the port. +You can start a server listening on port 2001 like this: + + WEB> (start :port 2001) + #<WSERVER port 2001 #x72511c72> + + The server is now running in your Lisp. It's possible you'll get an +error that says something about "port already in use" when you try to +start the server. This means port 2001 is already in use by some other +server on your machine. In that case, the simplest fix is to use a +different port, supplying a different argument to start and then using +that value instead of 2001 in the URLs used throughout this chapter. + + You can continue to interact with Lisp via the REPL because +AllegroServe starts its own threads to handle requests from browsers. +This means, among other things, that you can use the REPL to get a view +into the guts of your server while it's running, which makes debugging +and testing a lot easier than if the server is a complete black box. + + Assuming you're running Lisp on the same machine as your browser, you +can check that the server is up and running by pointing your browser at +http://localhost:2001/. At this point you should get a page-not-found +error message in the browser since you haven't published anything yet. +But the error message will be from AllegroServe; it'll say so at the +bottom of the page. On the other hand, if the browser displays an error +dialog that says something like "The connection was refused when +attempting to contact localhost:2001," it means either that the server +isn't running or that you started it with a different port than 2001. + + Now you can publish some files. Suppose you have a file hello.html +in the directory /tmp/html with the following contents: + + <html> + <head> + <title>Hello</title> + </head> + <body> + <p>Hello, world!</p> + </body> + </html> + + You can publish it individually with the publish-file function. + + WEB> (publish-file :path "/hello.html" :file "/tmp/html/hello.html") + #<NET.ASERVE::FILE-ENTITY #x725eddea> + + The :path argument is the path that will appear in the URL requested +by the browser, while the :file argument is the name of the file in the +file system. After evaluating the publish-file expression, you can +point your browser to http://localhost:2001/hello.html, and it should +display a page something like Figure 26-2. + + + + Figure 26-2. http://localhost:2001/hello.html + + You could also publish a whole directory tree of files using the +publish-directory function. First let's clear out the already published +entity with the following call to publish-file: + + WEB> (publish-file :path "/hello.html" :remove t) + NIL + + Now you can publish the whole /tmp/html/ directory (and all its +subdirectories) with the publish-directory function. + + WEB> (publish-directory :prefix "/" :destination "/tmp/html/") + #<NET.ASERVE::DIRECTORY-ENTITY #x72625aa2> + + In this case, the :prefix argument specifies the beginning of the +path part of URLs that should be handled by this entity. Thus, if the +server receives a request for http://localhost:2001/foo/bar.html, the +path is /foo/bar.html, which starts with /. This path is then +translated to a filename by replacing the prefix, /, with the +destination, /tmp/html/. Thus, the URL http://localhost:2001/hello.html +will still be translated into a request for the file +/tmp/html/hello.html. + + ---------- Footnotes ---------- + + (1) http://www.fractalconcept.com/asp/html/mod_lisp.html + + (2) http://lisplets.sourceforge.net/ + + (3) AllegroServe also provides a framework called Webactions that's +analogous to JSPs in the Java world-instead of writing code that +generates HTML, with Webactions you write pages that are essentially +HTML with a bit of magic foo that turns into code to be run when the +page is served. I won't cover Webactions in this book. + + (4) Loading PortableAllegroServe will create some other packages for +the compatibility libraries, but the packages you'll care about are +those three. + + +File: pcl.info, Node: 26-3, Next: 26-4, Prev: 26-2, Up: Chapter 26 + +Generating Dynamic Content with AllegroServe +============================================ + +Publishing entities that generate dynamic content is nearly as simple as +publishing static content. The functions publish and publish-prefix are +the dynamic analogs of publish-file and publish-directory. The basic +idea of these two functions is that you publish a function that will be +called to generate the response to a request for either a specific URL +or any URL with a given prefix. The function will be called with two +arguments: an object representing the request and the published entity. +Most of time you don't need to do anything with the entity object except +to pass it along to a couple macros I'll discuss in a moment. On the +other hand, you'll use the request object to obtain information +submitted by the browser-query parameters included in the URL or data +posted using an HTML form. + + For a trivial example of using a function to generate dynamic +content, let's write a function that generates a page with a different +random number each time it's requested. + + (defun random-number (request entity) + (with-http-response (request entity :content-type "text/html") + (with-http-body (request entity) + (format + (request-reply-stream request) + "<html>~ <head><title>Random</title></head>~ <body>~ <p>Random number: ~d</p>~ </body>~ </html>~ " + (random 1000))))) + + The macros with-http-response and with-http-body are part of +AllegroServe. The former starts the process of generating an HTTP +response and can be used, as here, to specify things such as the type of +content that will be returned. It also handles various parts of HTTP +such as dealing with If-Modified-Since requests. The with-http-body +actually sends the HTTP response headers and then executes its body, +which should contain code that generates the content of the reply. +Within with-http-response but before the with-http-body, you can add or +change HTTP headers to be sent in the reply. The function +request-reply-stream is also part of AllegroServe and returns the stream +to which you should write output intended to be sent to the browser. + + As this function shows, you can just use FORMAT to print HTML to the +stream returned by request-reply-stream. In the next section, I'll show +you more convenient ways to programmatically generate HTML. (1) + + Now you're ready to publish this function. + + WEB> (publish :path "/random-number" :function 'random-number) + #<COMPUTED-ENTITY #x7262bab2> + + As it does in the publish-file function, the :path argument specifies +the path part of the URL that will result in this function being +invoked. The :function argument specifies either the name or an actual +function object. Using the name of a function, as shown here, allows +you to redefine the function later without republishing and have +AllegroServe use the new function definition. After evaluating the call +to publish, you can point your browser at http:// +localhost:2001/random-number to get a page with a random number on it, +as shown in Figure 26-3. + + + + Figure 26-3. http://localhost:2001/random-number + + ---------- Footnotes ---------- + + (1) The ~ followed by a newline tells FORMAT to ignore whitespace +after the newline, which allows you to indent your code nicely without +adding a bunch of whitespace to the HTML. Since white-space is typically +not significant in HTML, this doesn't matter to the browser, but it +makes the generated HTML source look a bit nicer to humans. + + +File: pcl.info, Node: 26-4, Next: 26-5, Prev: 26-3, Up: Chapter 26 + +Generating HTML +=============== + +Although using FORMAT to emit HTML works fine for the simple pages I've +discussed so far, as you start building more elaborate pages it'd be +nice to have a more concise way to generate HTML. Several libraries are +available for generating HTML from an s-expression representation +including one, htmlgen, that's included with AllegroServe. In this +chapter you'll use a library called FOO, (1) which is loosely modeled on +Franz's htmlgen and whose implementation you'll look at in more detail +in Chapters 30 and 31. For now, however, you just need to know how to +use FOO. + + Generating HTML from within Lisp is quite natural since s-expressions +and HTML are essentially isomorphic. You can represent HTML elements +with s-expressions by treating each element in HTML as a list "tagged" +with an appropriate first element, such as a keyword symbol of the same +name as the HTML tag. Thus, the HTML <p>foo</p> is represented by the +s-expression (:p "foo"). Because HTML elements nest the same way lists +in s-expressions do, this scheme extends to more complex HTML. For +instance, this HTML: + + <html> + <head> + <title>Hello</title> + </head> + <body> + <p>Hello, world!</p> + </body> + </html> + + could be represented with the following s-expression: + + (:html + (:head (:title "Hello")) + (:body (:p "Hello, world!"))) + + HTML elements with attributes complicate things a bit but not in an +insurmountable way. FOO supports two ways of including attributes in a +tag. One is to simply follow the first item of the list with +keyword/value pairs. The first element that follows a keyword/value +pair that's not itself a keyword symbol marks the beginning of the +element's contents. Thus, you'd represent this HTML: + + <a href="foo.html">This is a link</a> + + with the following s-expression: + + (:a :href "foo.html" "This is a link") + + The other syntax FOO supports is to group the tag name and attributes +into their own list like this: + + ((:a :href "foo.html") "This is link.") + + FOO can use the s-expression representation of HTML in two ways. The +function emit-html takes an HTML s-expression and outputs the +corresponding HTML. + + WEB> (emit-html '(:html (:head (:title "Hello")) (:body (:p "Hello, world!")))) + <html> + <head> + <title>Hello</title> + </head> + <body> + <p>Hello, world!</p> + </body> + </html> + T + + However, emit-html isn't always the most efficient way to generate +HTML because its argument must be a complete s-expression representation +of the HTML to be generated. While it's easy to build such a +representation, it's not always particularly efficient. For instance, +suppose you wanted to make an HTML page containing a list of 10,000 +random numbers. You could build the s-expression using a backquote +template and then pass it to emit-html like this: + + (emit-html + `(:html + (:head + (:title "Random numbers")) + (:body + (:h1 "Random numbers") + (:p ,@(loop repeat 10000 collect (random 1000) collect " "))))) + + However, this has to build a tree containing a 10,000-element list +before it can even start emitting HTML, and the whole s-expression will +become garbage as soon as the HTML is emitted. To avoid this +inefficiency, FOO also provides a macro html, which allows you to embed +bits of Lisp code in the middle of an HTML s-expression. + + Literal values such as strings and numbers in the input to html are +interpolated into the output HTML. Likewise, symbols are treated as +variable references, and code is generated to emit their value at +runtime. Thus, both of these: + + (html (:p "foo")) + + (let ((x "foo")) (html (:p x))) + + will emit the following: + + <p>foo</p> + + List forms that don't start with a keyword symbol are assumed to be +code and are embedded in the generated code. Any values the embedded +code returns will be ignored, but the code can emit more HTML by calling +html itself. For instance, to emit the contents of a list in HTML, you +might write this: + + (html (:ul (dolist (item (list 1 2 3)) (html (:li item))))) + + which will emit the following HTML: + + <ul> + <li>1</li> + <li>2</li> + <li>3</li> + </ul> + + If you want to emit the value of a list form, you must wrap it in the +pseudotag :print. Thus, this expression: + + (html (:p (+ 1 2))) + + generates this HTML after computing and discarding the value 3: + + <p></p> + + To emit the 3, you must write this: + + (html (:p (:print (+ 1 2)))) + + Or you could compute the value and store it in a variable outside the +call to html like this: + + (let ((x (+ 1 2))) (html (:p x))) + + Thus, you can use the html macro to generate the list of random +numbers like this: + + (html + (:html + (:head + (:title "Random numbers")) + (:body + (:h1 "Random numbers") + (:p (loop repeat 10 do (html (:print (random 1000)) " ")))))) + + The macro version will be quite a bit more efficient than the +emit-html version. Not only do you never have to generate an +s-expression representing the whole page, also much of the work that +emit-html does at runtime to interpret the s-expression will be done +once, when the macro is expanded, rather than every time the code is +run. + + You can control where the output generated by both html and emit-html +is sent with the macro with-html-output, which is part of the FOO +library. Thus, you can use the with-html-output and html macros from +FOO to rewrite random-number like this: + + (defun random-number (request entity) + (with-http-response (request entity :content-type "text/html") + (with-http-body (request entity) + (with-html-output ((request-reply-stream request)) + (html + (:html + (:head (:title "Random")) + (:body + (:p "Random number: " (:print (random 1000)))))))))) + + ---------- Footnotes ---------- + + (1) FOO is a recursive tautological acronym for FOO Outputs Output. + + +File: pcl.info, Node: 26-5, Next: 26-6, Prev: 26-4, Up: Chapter 26 + +HTML Macros +=========== + +Another feature of FOO is that it allows you to define HTML "macros" +that can translate arbitrary forms into HTML s-expressions that the html +macro understands. For instance, suppose you frequently find yourself +writing pages of this form: + + (:html + (:head (:title "Some title")) + (:body + (:h1 "Some title") + ... stuff ...)) + + You could define an HTML macro to capture that pattern like this: + + (define-html-macro :standard-page ((&key title) &body body) + `(:html + (:head (:title ,title)) + (:body + (:h1 ,title) + ,@body))) + + Now you can use the "tag" :standard-page in your s-expression HTML, +and it'll be expanded before being interpreted or compiled. For +instance, the following: + + (html (:standard-page (:title "Hello") (:p "Hello, world."))) + + generates the following HTML: + + <html> + <head> + <title>Hello</title> + </head> + <body> + <h1>Hello</h1> + <p>Hello, world.</p> + </body> + </html> + + +File: pcl.info, Node: 26-6, Next: 26-7, Prev: 26-5, Up: Chapter 26 + +Query Parameters +================ + +Of course, generating HTML output is only half of Web programming. The +other thing you need to do is get input from the user. As I discussed +in the "A 30-Second Intro to Server-Side Web Programming" section, when +a browser requests a page from a Web server, it can send query +parameters in the URL and post data, both of which act as input to the +server-side code. + + AllegroServe, like most Web programming frameworks, takes care of +parsing both these sources of input for you. By the time your published +functions are called, all the key/value pairs from the query string +and/or post data have been decoded and placed into an alist that you can +retrieve from the request object with the function request-query. The +following function returns a page showing all the query parameters it +receives: + + (defun show-query-params (request entity) + (with-http-response (request entity :content-type "text/html") + (with-http-body (request entity) + (with-html-output ((request-reply-stream request)) + (html + (:standard-page + (:title "Query Parameters") + (if (request-query request) + (html + (:table :border 1 + (loop for (k . v) in (request-query request) + do (html (:tr (:td k) (:td v)))))) + (html (:p "No query parameters."))))))))) + + (publish :path "/show-query-params" :function 'show-query-params) + + If you give your browser a URL with a query string in it like the +following: + + http://localhost:2001/show-query-params?foo=bar&baz=10 + + you should get back a page similar to the one shown in Figure 26-4. + + + + Figure 26-4. http://localhost:2001/show-query-params?foo=bar&baz=10 + + To generate some post data, you need an HTML form. The following +function generates a simple form, which submits its data to +show-query-params: + + (defun simple-form (request entity) + (with-http-response (request entity :content-type "text/html") + (with-http-body (request entity) + (let ((*html-output* (request-reply-stream request))) + (html + (:html + (:head (:title "Simple Form")) + (:body + (:form :method "POST" :action "/show-query-params" + (:table + (:tr (:td "Foo") + (:td (:input :name "foo" :size 20))) + (:tr (:td "Password") + (:td (:input :name "password" :type "password" :size 20)))) + (:p (:input :name "submit" :type "submit" :value "Okay") + (:input ::type "reset" :value "Reset")))))))))) + + (publish :path "/simple-form" :function 'simple-form) + + Point your browser to http://localhost:2001/simple-form, and you +should see a page like the one in Figure 26-5. + + If you fill in the form with the "abc" and "def" values, clicking the +Okay button should take you to a page like the one in Figure 26-6. + + + + Figure 26-5. http://localhost:2001/simple-form + + + + Figure 26-6. Result of submitting the simple form + + However, most of the time you won't need to iterate over all the +query parameters; you'll want to pick out individual parameters. For +instance, you might want to modify random-number so the limit value you +pass to RANDOM can be supplied via a query parameter. In that case, you +use the function request-query-value, which takes the request object and +the name of the parameter whose value you want and returns the value as +a string or NIL if no such parameter has been supplied. A +parameterizable version of random-number might look like this: + + (defun random-number (request entity) + (with-http-response (request entity :content-type "text/html") + (with-http-body (request entity) + (let* ((*html-output* (request-reply-stream request)) + (limit-string (or (request-query-value "limit" request) "")) + (limit (or (parse-integer limit-string :junk-allowed t) 1000))) + (html + (:html + (:head (:title "Random")) + (:body + (:p "Random number: " (:print (random limit)))))))))) + + Because request-query-value can return either NIL or an empty string, +you have to deal with both those cases when parsing the parameter into a +number to pass to RANDOM. You can deal with a NIL value when you bind +limit-string, binding it to "" if there's no "limit" query parameter. +Then you can use the :junk-allowed argument to PARSE-INTEGER to ensure +that it returns either NIL (if it can't parse an integer from the string +given) or an integer. In the section "A Small Application Framework," +you'll develop some macros to make it easier to deal with grabbing query +parameters and converting them to various types. + + +File: pcl.info, Node: 26-7, Next: 26-8, Prev: 26-6, Up: Chapter 26 + +Cookies +======= + +In AllegroServe you can send a Set-Cookie header that tells the browser +to save a cookie and send it along with subsequent requests by calling +the function set-cookie-header within the body of with-http-response but +before the call to with-http-body. The first argument to the function +is the request object, and the remaining arguments are keyword arguments +used to set the various properties of the cookie. The only two you must +pass are the :name and :value arguments, both of which should be +strings. The other possible arguments that affect the cookie sent to +the browser are :expires, :path, :domain, and :secure. + + Of these, you need to worry only about :expires. It controls how +long the browser should save the cookie. If :expires is NIL (the +default), the browser will save the cookie only until it exits. Other +possible values are :never, which means the cookie should be kept +forever, or a universal time as returned by GET-UNIVERSAL-TIME or +ENCODE-UNIVERSAL-TIME. An :expires of zero tells the client to +immediately discard an existing cookie. (1) + + After you've set a cookie, you can use the function get-cookie-values +to get an alist containing one name/value pair for each cookie sent by +the browser. From that alist, you can pick out individual cookie values +using ASSOC and CDR. + + The following function shows the names and values of all the cookies +sent by the browser: + + (defun show-cookies (request entity) + (with-http-response (request entity :content-type "text/html") + (with-http-body (request entity) + (with-html-output ((request-reply-stream request)) + (html + (:standard-page + (:title "Cookies") + (if (null (get-cookie-values request)) + (html (:p "No cookies.")) + (html + (:table + (loop for (key . value) in (get-cookie-values request) + do (html (:tr (:td key) (:td value))))))))))))) + + (publish :path "/show-cookies" :function 'show-cookies) + + The first time you load the page http://localhost:2001/show-cookies +it should say "No cookies" as shown in Figure 26-7 since you haven't set +any yet. + + + + Figure 26-7. http://localhost:2001/show-cookies with no cookies + + To set a cookie, you need another function, such as the following: + + (defun set-cookie (request entity) + (with-http-response (request entity :content-type "text/html") + (set-cookie-header request :name "MyCookie" :value "A cookie value") + (with-http-body (request entity) + (with-html-output ((request-reply-stream request)) + (html + (:standard-page + (:title "Set Cookie") + (:p "Cookie set.") + (:p (:a :href "/show-cookies" "Look at cookie jar.")))))))) + + (publish :path "/set-cookie" :function 'set-cookie) + + If you enter the URL http://localhost:2001/set-cookie, your browser +should display a page like the one in Figure 26-8. Additionally, the +server will send a Set-Cookie header with a cookie named "MyCookie" with +"A cookie value" as its value. If you click the link Look at cookie +jar, you'll be taken to the /show-cookies page where you'll see the new +cookie, as shown in Figure 26-9. Because you didn't specify an :expires +argument, the browser will continue to send the cookie with each request +until you quit the browser. + + + + Figure 26-8. http://localhost:2001/set-cookie + + + + Figure 26-9. http://localhost:2001/show-cookies after setting a +cookie + + ---------- Footnotes ---------- + + (1) For information about the meaning of the other parameters, see +the AllegroServe documentation and RFC 2109, which describes the cookie +mechanism. + + +File: pcl.info, Node: 26-8, Next: 26-9, Prev: 26-7, Up: Chapter 26 + +A Small Application Framework +============================= + +Although AllegroServe provides fairly straightforward access to all the +basic facilities you need to write server-side Web code (access to query +parameters from both the URL's query string and the post data; the +ability to set cookies and retrieve their values; and, of course, the +ability to generate the response sent back to the browser), there's a +fair bit of annoyingly repetitive code. + + For instance, every HTML-generating function you write is going to +take the arguments request and entity and then will contain calls to +with-http-response, with-http-response, and-if you're going to use FOO +to generate HTML-with-html-output. Then, in functions that need to get +at query parameters, there will be a bunch of calls to +request-query-value and then more code to convert the string returned to +whatever type you actually want. Finally, you need to remember to +publish the function. + + To reduce the amount of boilerplate you have to write, you can write +a small framework on top of AllegroServe to make it easier to define +functions that handle requests for a particular URL. + + The basic approach will be to define a macro, define-url-function, +that you'll use to define functions that will automatically be published +via publish. This macro will expand into a DEFUN that contains the +appropriate boilerplate as well as code to publish the function under a +URL of the same name. It'll also take care of generating code to +extract values from query parameters and cookies and to bind them to +variables declared in the function's parameter list. Thus, the basic +form of a define-url-function definition is this: + + (define-url-function name (request query-parameter*) + body) + + where the body is the code to emit the HTML of the page. It'll be +wrapped in a call to FOO's html macro, so for simple pages it might +contain nothing but s-expression HTML. + + Within the body, the query parameter variables will be bound to +values of query parameters with the same name or from a cookie. In the +simplest case, a query parameter's value will be the string taken from +the query parameter or post data field of the same name. If the query +parameter is specified with a list, you can also specify an automatic +type conversion, a default value, and whether to look for and save the +value of the parameter in a cookie. The complete syntax for a +query-parameter is as follows: + + name | (name type [default-value] [stickiness]) + + The type must be a name recognized by define-url-function. I'll +discuss in a moment how to define new types. The default-value must be +a value of the given type. Finally, stickiness, if supplied, indicates +that the parameter's value should be taken from an appropriately named +cookie if no query parameter is supplied and that a Set-Cookie header +should be sent in the response that saves the value in the cookie of the +same name. Thus, a sticky parameter, after being explicitly supplied a +value via a query parameter, will keep that value on subsequent requests +of the page even when no query parameter is supplied. + + The name of the cookie used depends on the value of stickiness: with +a value of :global, the cookie will be named the same as the parameter. +Thus, different functions that use globally sticky parameters with the +same name will share the value. If stickiness is :package, then the +cookie name is constructed from the name of the parameter and the +package of the function's name; this allows functions in the same +package to share values but not have to worry about stomping on +parameters of functions in other packages. Finally, a parameter with a +stickiness value of :local will use a cookie made from the name of the +parameter, the package of the function name, and the function name, +making it unique to that function. + + For instance, you can use define-url-function to replace the previous +eleven-line definition of random-page with this five-line version: + + (define-url-function random-number (request (limit integer 1000)) + (:html + (:head (:title "Random")) + (:body + (:p "Random number: " (:print (random limit)))))) + + If you wanted the limit argument to be sticky, you could change the +limit declaration to (limit integer 1000 :local). + + +File: pcl.info, Node: 26-9, Next: Chapter 27, Prev: 26-8, Up: Chapter 26 + +The Implementation +================== + +I'll explain the implementation of define-url-function from the top +down. The macro itself looks like this: + + (defmacro define-url-function (name (request &rest params) &body body) + (with-gensyms (entity) + (let ((params (mapcar #'normalize-param params))) + `(progn + (defun ,name (,request ,entity) + (with-http-response (,request ,entity :content-type "text/html") + (let* (,@(param-bindings name request params)) + ,@(set-cookies-code name request params) + (with-http-body (,request ,entity) + (with-html-output ((request-reply-stream ,request)) + (html ,@body)))))) + (publish :path ,(format nil "/~(~a~)" name) :function ',name))))) + + Let's take it bit by bit, starting with the first few lines. + + (defmacro define-url-function (name (request &rest params) &body body) + (with-gensyms (entity) + (let ((params (mapcar #'normalize-param params))) + + Up to here you're just getting ready to generate code. You GENSYM a +symbol to use later as the name of the entity parameter in the DEFUN. +Then you normalize the parameters, converting plain symbols to list form +using this function: + + (defun normalize-param (param) + (etypecase param + (list param) + (symbol `(,param string nil nil)))) + + In other words, declaring a parameter with just a symbol is the same +as declaring a nonsticky, string parameter with no default value. + + Then comes the PROGN. You must expand into a PROGN because you need +to generate code to do two things: define a function with DEFUN and call +publish. You should define the function first so if there's an error in +the definition, the function won't be published. The first two lines of +the DEFUN are just boilerplate. + + (defun ,name (,request ,entity) + (with-http-response (,request ,entity :content-type "text/html") + + Now you do the real work. The following two lines generate the +bindings for the parameters specified in define-url-function other than +request and the code that calls set-cookie-header for the sticky +parameters. Of course, the real work is done by helper functions that +you'll look at in a moment. (1) + + (let* (,@(param-bindings name request params)) + ,@(set-cookies-code name request params) + + The rest is just more boilerplate, putting the body from the +define-url-function definition in the appropriate context of +with-http-body, with-html-output, and html macros. Then comes the call +to publish. + + (publish :path ,(format nil "/~(~a~)" name) :function ',name) + + The expression (format nil "/~(~a~)" name) is evaluated at macro +expansion time, generating a string consisting of /, followed by an +all-lowercase version of the name of the function you're about to +define. That string becomes the :path argument to publish, while the +function name is interpolated as the :function argument. + + Now let's look at the helper functions used to generate the DEFUN +form. To generate parameter bindings, you need to loop over the params +and collect a snippet of code for each one, generated by param-binding. +That snippet will be a list containing the name of the variable to bind +and the code that will compute the value of that variable. The exact +form of code used to compute the value will depend on the type of the +parameter, whether it's sticky, and the default value, if any. Because +you already normalized the params, you can use DESTRUCTURING-BIND to +take them apart in param-binding. + + (defun param-bindings (function-name request params) + (loop for param in params + collect (param-binding function-name request param))) + + (defun param-binding (function-name request param) + (destructuring-bind (name type &optional default sticky) param + (let ((query-name (symbol->query-name name)) + (cookie-name (symbol->cookie-name function-name name sticky))) + `(,name (or + (string->type ',type (request-query-value ,query-name ,request)) + ,@(if cookie-name + (list `(string->type ',type (get-cookie-value ,request ,cookie-name)))) + ,default))))) + + The function string->type, which you use to convert strings obtained +from the query parameters and cookies to the desired type, is a generic +function with the following signature: + + (defgeneric string->type (type value)) + + To make a particular name usable as a type name for a query +parameter, you just need to define a method on string->type. You'll +need to define at least a method specialized on the symbol string since +that's the default type. Of course, that's pretty easy. Since browsers +sometimes submit forms with empty strings to indicate no value was +supplied for a particular value, you'll want to convert an empty string +to NIL as this method does: + + (defmethod string->type ((type (eql 'string)) value) + (and (plusp (length value)) value)) + + You can add conversions for other types needed by your application. +For instance, to make integer usable as a query parameter type so you +can handle the limit parameter of random-page, you might define this +method: + + (defmethod string->type ((type (eql 'integer)) value) + (parse-integer (or value "") :junk-allowed t)) + + Another helper function used in the code generated by param-binding +is get-cookie-value, which is just a bit of sugar around the +get-cookie-values function provided by AllegroServe. It looks like +this: + + (defun get-cookie-value (request name) + (cdr (assoc name (get-cookie-values request) :test #'string=))) + + The functions that compute the query parameter and cookies names are +similarly straightforward. + + (defun symbol->query-name (sym) + (string-downcase sym)) + + (defun symbol->cookie-name (function-name sym sticky) + (let ((package-name (package-name (symbol-package function-name)))) + (when sticky + (ecase sticky + (:global + (string-downcase sym)) + (:package + (format nil "~(~a:~a~)" package-name sym)) + (:local + (format nil "~(~a:~a:~a~)" package-name function-name sym)))))) + + To generate the code that sets cookies for sticky parameters, you +again loop over the list of parameters, this time collecting a snippet +of code for each sticky param. You can use the when and collect it LOOP +forms to collect only the non-NIL values returned by set-cookie-code. + + (defun set-cookies-code (function-name request params) + (loop for param in params + when (set-cookie-code function-name request param) collect it)) + + (defun set-cookie-code (function-name request param) + (destructuring-bind (name type &optional default sticky) param + (declare (ignore type default)) + (if sticky + `(when ,name + (set-cookie-header + ,request + :name ,(symbol->cookie-name function-name name sticky) + :value (princ-to-string ,name)))))) + + One of the advantages of defining macros in terms of helper functions +like this is that it's easy to make sure the individual bits of code +you're generating look right. For instance, you can check that the +following set-cookie-code: + + (set-cookie-code 'foo 'request '(x integer 20 :local)) + + generates something like this: + + (WHEN X + (SET-COOKIE-HEADER REQUEST + :NAME "com.gigamonkeys.web:foo:x" + :VALUE (PRINC-TO-STRING X))) + + Assuming this code will occur in a context where x is the name of a +variable, this looks good. + + Once again, macros have allowed you to distill the code you need to +write down to its essence-in this case, the data you want to extract +from the request and the HTML you want to generate. That said, this +framework isn't meant to be the be-all and end-all of Web application +frameworks-it's just a little sugar to make it a bit easier to write +simple apps like the one you'll write in Chapter 29. + + But before you can get to that, you need to write the guts of the +application for which the Chapter 29 application will be the user +interface. You'll start in the next chapter with a souped-up version of +the database you wrote in Chapter 3, this time to keep track of ID3 data +extracted from MP3 files. + + ---------- Footnotes ---------- + + (1) You need to use LET* rather than a LET to allow the default value +forms for parameters to refer to parameters that appear earlier in the +parameter list. For example, you could write this: + + (define-url-function (request (x integer 10) (y integer (* 2 x))) ...) + + and the value of y, if not explicitly supplied, would be twice the +value of x. + + +File: pcl.info, Node: Chapter 27, Next: Chapter 28, Prev: Chapter 26, Up: Top + +27. Practical: An MP3 Database +============================== + +In this chapter you'll revisit the idea first explored in Chapter 3 of +building an in-memory database out of basic Lisp data structures. This +time your goal is to hold information that you'll extract from a +collection of MP3 files using the ID3v2 library from Chapter 25. You'll +then use this database in Chapters 28 and 29 as part of a Web-based +streaming MP3 server. Of course, this time around you can use some of +the language features you've learned since Chapter 3 to build a more +sophisticated version. + +* Menu: + +* 27-1:: The Database +* 27-2:: Defining a Schema +* 27-3:: Inserting Values +* 27-4:: Querying the Database +* 27-5:: Matching Functions +* 27-6:: Getting at the Results +* 27-7:: Other Database Operations + + +File: pcl.info, Node: 27-1, Next: 27-2, Prev: Chapter 27, Up: Chapter 27 + +The Database +============ + +The main problem with the database in Chapter 3 is that there's only one +table, the list stored in the variable *db*. Another is that the code +doesn't know anything about what type of values are stored in different +columns. In Chapter 3 you got away with that by using the fairly +general-purpose EQUAL method to compare column values when selecting +rows from the database, but you would've been in trouble if you had +wanted to store values that couldn't be compared with EQUAL or if you +had wanted to sort the rows in the database since there's no ordering +function that's as general as EQUAL. + + This time you'll solve both problems by defining a class, table, to +represent individual database tables. Each table instance will consist +of two slots-one to hold the table's data and another to hold +information about the columns in the table that database operations will +be able to use. The class looks like this: + + (defclass table () + ((rows :accessor rows :initarg :rows :initform (make-rows)) + (schema :accessor schema :initarg :schema))) + + As in Chapter 3, you can represent the individual rows with plists, +but this time around you'll create an abstraction that will make that an +implementation detail you can change later without too much trouble. +And this time you'll store the rows in a vector rather than a list since +certain operations that you'll want to support, such as random access to +rows by a numeric index and the ability to sort a table, can be more +efficiently implemented with vectors. + + The function make-rows used to initialize the rows slot can be a +simple wrapper around MAKE-ARRAY that builds an empty, adjustable,vector +with a fill pointer. + +* Menu: + +* 27-1-1:: The Package + + (defparameter *default-table-size* 100) + + (defun make-rows (&optional (size *default-table-size*)) + (make-array size :adjustable t :fill-pointer 0)) + + To represent a table's schema, you need to define another class, +column, each instance of which will contain information about one column +in the table: its name, how to compare values in the column for equality +and ordering, a default value, and a function that will be used to +normalize the column's values when inserting data into the table and +when querying the table. The schema slot will hold a list of column +objects. The class definition looks like this: + + (defclass column () + ((name + :reader name + :initarg :name) + + (equality-predicate + :reader equality-predicate + :initarg :equality-predicate) + + (comparator + :reader comparator + :initarg :comparator) + + (default-value + :reader default-value + :initarg :default-value + :initform nil) + + (value-normalizer + :reader value-normalizer + :initarg :value-normalizer + :initform #'(lambda (v column) (declare (ignore column)) v)))) + + The equality-predicate and comparator slots of a column object hold +functions used to compare values from the given column for equivalence +and ordering. Thus, a column containing string values might have +STRING= as its equality-predicate and STRING< as its comparator, while a +column containing numbers might have = and <. + + The default-value and value-normalizer slots are used when inserting +rows into the database and, in the case of value-normalizer, when +querying the database. When you insert a row into the database, if no +value is provided for a particular column, you can use the value stored +in the column's default-value slot. Then the value-defaulted or +otherwise-is normalized by passing it and the column object to the +function stored in the value-normalizer slot. You pass the column in +case the value-normalizer function needs to use some data associated +with the column object. (You'll see an example of this in the next +section.) You should also normalize values passed in queries before +comparing them with values in the database. + + Thus, the value-normalizer's responsibility is primarily to return a +value that can be safely and correctly passed to the equality-predicate +and comparator functions. If the value-normalizer can't figure out an +appropriate value to return, it can signal an error. + + The other reason to normalize values before you store them in the +database is to save both memory and CPU cycles. For instance, if you +have a column that's going to contain string values but the number of +distinct strings that will be stored in the column is small-for +instance, the genre column in the MP3 database-you can save space and +speed by using the value-normalizer to intern the strings (translate all +STRING= values to a single string object). Thus, you'll need only as +many strings as there are distinct values, regardless of how many rows +are in the table, and you can use EQL to compare column values rather +than the slower STRING=. (1) + + ---------- Footnotes ---------- + + (1) The general theory behind interning objects is that if you're +going to compare a particular value many times, it's worth it to pay the +cost of interning it. The value-normalizer runs once when you insert a +value into the table and, as you'll see, once at the beginning of each +query. Since a query can involve invoking the equality-predicate once +per row in the table, the amortized cost of interning the values will +quickly approach zero. + + +File: pcl.info, Node: 27-1-1, Next: 27-1, Prev: 27-1, Up: 27-1 + +The Package +=========== + +The package for the code you'll develop in this chapter looks like this: + + (defpackage :com.gigamonkeys.mp3-database + (:use :common-lisp + :com.gigamonkeys.pathnames + :com.gigamonkeys.macro-utilities + :com.gigamonkeys.id3v2) + (:export :*default-table-size* + :*mp3-schema* + :*mp3s* + :column + :column-value + :delete-all-rows + :delete-rows + :do-rows + :extract-schema + :in + :insert-row + :load-database + :make-column + :make-schema + :map-rows + :matching + :not-nullable + :nth-row + :random-selection + :schema + :select + :shuffle-table + :sort-rows + :table + :table-size + :with-column-values)) + + The :use section gives you access to the functions and macros whose +names are exported from the packages defined in Chapter 15, 8, and 25 +and the :export section exports the API this library will provide, which +you'll use in Chapter 29. + + +File: pcl.info, Node: 27-2, Next: 27-3, Prev: 27-1, Up: Chapter 27 + +Defining a Schema +================= + +Thus, to make an instance of table, you need to build a list of column +objects. You could build the list by hand, using LIST and +MAKE-INSTANCE. But you'll soon notice that you're frequently making a +lot column objects with the same comparator and equality-predicate +combinations. This is because the combination of a comparator and +equality predicate essentially defines a column type. It'd be nice if +there was a way to give those types names that would allow you to say +simply that a given column is a string column, rather than having to +specify STRING< as its comparator and STRING= as its equality predicate. +One way is to define a generic function, make-column, like this: + + (defgeneric make-column (name type &optional default-value)) + + Now you can implement methods on this generic function that +specialize on type with EQL specializers and return column objects with +the slots filled in with appropriate values. Here's the generic +function and methods that define column types for the type names string +and number: + + (defmethod make-column (name (type (eql 'string)) &optional default-value) + (make-instance + 'column + :name name + :comparator #'string< + :equality-predicate #'string= + :default-value default-value + :value-normalizer #'not-nullable)) + + (defmethod make-column (name (type (eql 'number)) &optional default-value) + (make-instance + 'column + :name name + :comparator #'< + :equality-predicate #'= + :default-value default-value)) + + The following function, not-nullable, used as the value-normalizer +for string columns, simply returns the value it's given unless the value +is NIL, in which case it signals an error: + + (defun not-nullable (value column) + (or value (error "Column ~a can't be null" (name column)))) + + This is important because STRING< and STRING= will signal an error if +called on NIL; it's better to catch bad values before they go into the +table rather than when you try to use them. (1) + + Another column type you'll need for the MP3 database is an +interned-string whose values are interned as discussed previously. +Since you need a hash table in which to intern values, you should define +a subclass of column, interned-values-column, that adds a slot whose +value is the hash table you use to intern. + + To implement the actual interning, you'll also need to provide an +:initform for value-normalizer of a function that interns the value in +the column's interned-values hash table. And because one of the main +reasons to intern values is to allow you to use EQL as the equality +predicate, you should also add an :initform for the equality-predicate +of #'eql. + + (defclass interned-values-column (column) + ((interned-values + :reader interned-values + :initform (make-hash-table :test #'equal)) + (equality-predicate :initform #'eql) + (value-normalizer :initform #'intern-for-column))) + + (defun intern-for-column (value column) + (let ((hash (interned-values column))) + (or (gethash (not-nullable value column) hash) + (setf (gethash value hash) value)))) + + You can then define a make-column method specialized on the name +interned-string that returns an instance of interned-values-column. + + (defmethod make-column (name (type (eql 'interned-string)) &optional default-value) + (make-instance + 'interned-values-column + :name name + :comparator #'string< + :default-value default-value)) + + With these methods defined on make-column, you can now define a +function, make-schema, that builds a list of column objects from a list +of column specifications consisting of a column name, a column type +name, and, optionally, a default value. + + (defun make-schema (spec) + (mapcar #'(lambda (column-spec) (apply #'make-column column-spec)) spec)) + + For instance, you can define the schema for the table you'll use to +store data extracted from MP3s like this: + + (defparameter *mp3-schema* + (make-schema + '((:file string) + (:genre interned-string "Unknown") + (:artist interned-string "Unknown") + (:album interned-string "Unknown") + (:song string) + (:track number 0) + (:year number 0) + (:id3-size number)))) + + To make an actual table for holding information about MP3s, you pass +*mp3-schema* as the :schema initarg to MAKE-INSTANCE. + + (defparameter *mp3s* (make-instance 'table :schema *mp3-schema*)) + + ---------- Footnotes ---------- + + (1) As always, the first causality of concise exposition in +programming books is proper error handling; in production code you'd +probably want to define your own error type, such as the following, and +signal it instead: + + (error 'illegal-column-value :value value :column column) + + Then you'd want to think about where you can add restarts that might +be able to recover from this condition. And, finally, in any given +application you could establish condition handlers that would choose +from among those restarts. + + +File: pcl.info, Node: 27-3, Next: 27-4, Prev: 27-2, Up: Chapter 27 + +Inserting Values +================ + +Now you're ready to define your first table operation, insert-row, which +takes a plist of names and values and a table and adds a row to the +table containing the given values. The bulk of the work is done in a +helper function, normalize-row, that builds a plist with a defaulted, +normalized value for each column, using the values from names-and-values +if available and the default-value for the column if not. + + (defun insert-row (names-and-values table) + (vector-push-extend (normalize-row names-and-values (schema table)) (rows table))) + + (defun normalize-row (names-and-values schema) + (loop + for column in schema + for name = (name column) + for value = (or (getf names-and-values name) (default-value column)) + collect name + collect (normalize-for-column value column))) + + It's worth defining a separate helper function, normalize-for-column, +that takes a value and a column object and returns the normalized value +because you'll need to perform the same normalization on query +arguments. + + (defun normalize-for-column (value column) + (funcall (value-normalizer column) value column)) + + Now you're ready to combine this database code with code from +previous chapters to build a database of data extracted from MP3 files. +You can define a function, file->row, that uses read-id3 from the ID3v2 +library to extract an ID3 tag from a file and turns it into a plist that +you can pass to insert-row. + + (defun file->row (file) + (let ((id3 (read-id3 file))) + (list + :file (namestring (truename file)) + :genre (translated-genre id3) + :artist (artist id3) + :album (album id3) + :song (song id3) + :track (parse-track (track id3)) + :year (parse-year (year id3)) + :id3-size (size id3)))) + + You don't have to worry about normalizing the values since insert-row +takes care of that for you. You do, however, have to convert the string +values returned by the track and year into numbers. The track number in +an ID3 tag is sometimes stored as the ASCII representation of the track +number and sometimes as a number followed by a slash followed by the +total number of tracks on the album. Since you care only about the +actual track number, you should use the :end argument to PARSE-INTEGER +to specify that it should parse only up to the slash, if any. (1) + + (defun parse-track (track) + (when track (parse-integer track :end (position #\/ track)))) + + (defun parse-year (year) + (when year (parse-integer year))) + + Finally, you can put all these functions together, along with +walk-directory from the portable pathnames library and mp3-p from the +ID3v2 library, to define a function that loads an MP3 database with data +extracted from all the MP3 files it can find under a given directory. + + (defun load-database (dir db) + (let ((count 0)) + (walk-directory + dir + #'(lambda (file) + (princ #\.) + (incf count) + (insert-row (file->row file) db)) + :test #'mp3-p) + (format t "~&Loaded ~d files into database." count))) + + ---------- Footnotes ---------- + + (1) If any MP3 files have malformed data in the track and year +frames, PARSE-INTEGER could signal an error. One way to deal with that +is to pass PARSE-INTEGER the :junk-allowed argument of T, which will +cause it to ignore any non-numeric junk following the number and to +return NIL if no number can be found in the string. Or, if you want +practice at using the condition system, you could define an error and +signal it from these functions when the data is malformed and also +establish a few restarts to allow these functions to recover. + + +File: pcl.info, Node: 27-4, Next: 27-5, Prev: 27-3, Up: Chapter 27 + +Querying the Database +===================== + +Once you've loaded your database with data, you'll need a way to query +it. For the MP3 application you'll need a slightly more sophisticated +query function than you wrote in Chapter 3. This time around you want +not only to be able to select rows matching particular criteria but also +to limit the results to particular columns, to limit the results to +unique rows, and perhaps to sort the rows by particular columns. In +keeping with the spirit of relational database theory, the result of a +query will be a new table object containing the desired rows and +columns. + + The query function you'll write, select, is loosely modeled on the +SELECT statement from Structured Query Language (SQL). It'll take five +keyword parameters: :from, :columns, :where, :distinct, and :order-by. +The :from argument is the table object you want to query. The :columns +argument specifies which columns should be included in the result. The +value should be a list of column names, a single column name, or a T, +the default, meaning return all columns. The :where argument, if +provided, should be a function that accepts a row and returns true if it +should be included in the results. In a moment, you'll write two +functions, matching and in, that return functions appropriate for use as +:where arguments. The :order-by argument, if supplied, should be a list +of column names; the results will be sorted by the named columns. As +with the :columns argument, you can specify a single column using just +the name, which is equivalent to a one-item list containing the same +name. Finally, the :distinct argument is a boolean that says whether to +eliminate duplicate rows from the results. The default value for +:distinct is NIL. + + Here are some examples of using select: + + ;; Select all rows where the :artist column is "Green Day" + (select :from *mp3s* :where (matching *mp3s* :artist "Green Day")) + + ;; Select a sorted list of artists with songs in the genre "Rock" + (select + :columns :artist + :from *mp3s* + :where (matching *mp3s* :genre "Rock") + :distinct t + :order-by :artist) + + The implementation of select with its immediate helper functions +looks like this: + + (defun select (&key (columns t) from where distinct order-by) + (let ((rows (rows from)) + (schema (schema from))) + + (when where + (setf rows (restrict-rows rows where))) + + (unless (eql columns 't) + (setf schema (extract-schema (mklist columns) schema)) + (setf rows (project-columns rows schema))) + + (when distinct + (setf rows (distinct-rows rows schema))) + + (when order-by + (setf rows (sorted-rows rows schema (mklist order-by)))) + + (make-instance 'table :rows rows :schema schema))) + + (defun mklist (thing) + (if (listp thing) thing (list thing))) + + (defun extract-schema (column-names schema) + (loop for c in column-names collect (find-column c schema))) + + (defun find-column (column-name schema) + (or (find column-name schema :key #'name) + (error "No column: ~a in schema: ~a" column-name schema))) + + (defun restrict-rows (rows where) + (remove-if-not where rows)) + + (defun project-columns (rows schema) + (map 'vector (extractor schema) rows)) + + (defun distinct-rows (rows schema) + (remove-duplicates rows :test (row-equality-tester schema))) + + (defun sorted-rows (rows schema order-by) + (sort (copy-seq rows) (row-comparator order-by schema))) + + Of course, the really interesting part of select is how you implement +the functions extractor, row-equality-tester, and row-comparator. + + As you can tell by how they're used, each of these functions must +return a function. For instance, project-columns uses the value +returned by extractor as the function argument to MAP. Since the purpose +of project-columns is to return a set of rows with only certain column +values, you can infer that extractor returns a function that takes a row +as an argument and returns a new row containing only the columns +specified in the schema it's passed. Here's how you can implement it: + + (defun extractor (schema) + (let ((names (mapcar #'name schema))) + #'(lambda (row) + (loop for c in names collect c collect (getf row c))))) + + Note how you can do the work of extracting the names from the schema +outside the body of the closure: since the closure will be called many +times, you want it to do as little work as possible each time it's +called. + + The functions row-equality-tester and row-comparator are implemented +in a similar way. To decide whether two rows are equivalent, you need +to apply the appropriate equality predicate for each column to the +appropriate column values. Recall from Chapter 22 that the LOOP clause +always will return NIL as soon as a pair of values fails their test or +will cause the LOOP to return T. + + (defun row-equality-tester (schema) + (let ((names (mapcar #'name schema)) + (tests (mapcar #'equality-predicate schema))) + #'(lambda (a b) + (loop for name in names and test in tests + always (funcall test (getf a name) (getf b name)))))) + + Ordering two rows is a bit more complex. In Lisp, comparator +functions return true if their first argument should be sorted ahead of +the second and NIL otherwise. Thus, a NIL can mean that the second +argument should be sorted ahead of the first or that they're equivalent. +You want your row comparators to behave the same way: return T if the +first row should be sorted ahead of the second and NIL otherwise. + + Thus, to compare two rows, you should compare the values from the +columns you're sorting by, in order, using the appropriate comparator +for each column. First call the comparator with the value from the +first row as the first argument. If the comparator returns true, that +means the first row should definitely be sorted ahead of the second row, +so you can immediately return T. + + But if the column comparator returns NIL, then you need to determine +whether that's because the second value should sort ahead of the first +value or because they're equivalent. So you should call the comparator +again with the arguments reversed. If the comparator returns true this +time, it means the second column value sorts ahead of the first and thus +the second row ahead of the first row, so you can return NIL +immediately. Otherwise, the column values are equivalent, and you need +to move onto the next column. If you get through all the columns +without one row's value ever winning the comparison, then the rows are +equivalent, and you return NIL. A function that implements this +algorithm looks like this: + + (defun row-comparator (column-names schema) + (let ((comparators (mapcar #'comparator (extract-schema column-names schema)))) + #'(lambda (a b) + (loop + for name in column-names + for comparator in comparators + for a-value = (getf a name) + for b-value = (getf b name) + when (funcall comparator a-value b-value) return t + when (funcall comparator b-value a-value) return nil + finally (return nil))))) + + +File: pcl.info, Node: 27-5, Next: 27-6, Prev: 27-4, Up: Chapter 27 + +Matching Functions +================== + +The :where argument to select can be any function that takes a row +object and returns true if it should be included in the results. In +practice, however, you'll rarely need the full power of arbitrary code +to express query criteria. So you should provide two functions, +matching and in, that will build query functions that allow you to +express the common kinds of queries and that take care of using the +proper equality predicates and value normalizers for each column. + + The workhouse query-function constructor will be matching, which +returns a function that will match rows with specific column values. +You saw how it was used in the earlier examples of select. For +instance, this call to matching: + + (matching *mp3s* :artist "Green Day") + + returns a function that matches rows whose :artist value is "Green +Day". You can also pass multiple names and values; the returned +function matches when all the columns match. For example, the following +returns a closure that matches rows where the artist is "Green Day" and +the album is "American Idiot": + + (matching *mp3s* :artist "Green Day" :album "American Idiot") + + You have to pass matching the table object because it needs access to +the table's schema in order to get at the equality predicates and value +normalizer functions for the columns it matches against. + + You build up the function returned by matching out of smaller +functions, each responsible for matching one column's value. To build +these functions, you should define a function, column-matcher, that +takes a column object and an unnormalized value you want to match and +returns a function that accepts a single row and returns true when the +value of the given column in the row matches the normalized version of +the given value. + + (defun column-matcher (column value) + (let ((name (name column)) + (predicate (equality-predicate column)) + (normalized (normalize-for-column value column))) + #'(lambda (row) (funcall predicate (getf row name) normalized)))) + + You then build a list of column-matching functions for the names and +values you care about with the following function, column-matchers: + + (defun column-matchers (schema names-and-values) + (loop for (name value) on names-and-values by #'cddr + when value collect + (column-matcher (find-column name schema) value))) + + Now you can implement matching. Again, note that you do as much work +as possible outside the closure in order to do it only once rather than +once per row in the table. + + (defun matching (table &rest names-and-values) + "Build a where function that matches rows with the given column values." + (let ((matchers (column-matchers (schema table) names-and-values))) + #'(lambda (row) + (every #'(lambda (matcher) (funcall matcher row)) matchers)))) + + This function is a bit of a twisty maze of closures, but it's worth +contemplating for a moment to get a flavor of the possibilities of +programming with functions as first-class objects. + + The job of matching is to return a function that will be invoked on +each row in a table to determine whether it should be included in the +new table. So, matching returns a closure with one parameter, row. + + Now recall that the function EVERY takes a predicate function as its +first argument and returns true if, and only if, that function returns +true each time it's applied to an element of the list passed as EVERY's +second argument. However, in this case, the list you pass to EVERY is +itself a list of functions, the column matchers. What you want to know +is that every column matcher, when invoked on the row you're currently +testing, returns true. So, as the predicate argument to EVERY, you pass +yet another closure that FUNCALLs the column matcher, passing it the +row. + + Another matching function that you'll occasionally find useful is in, +which returns a function that matches rows where a particular column is +in a given set of values. You'll define in to take two arguments: a +column name and a table that contains the values you want to match. For +instance, suppose you wanted to find all the songs in the MP3 database +that have names the same as a song performed by the Dixie Chicks. You +can write that where clause using in and a subselect like this: (1) + + (select + :columns '(:artist :song) + :from *mp3s* + :where (in :song + (select + :columns :song + :from *mp3s* + :where (matching *mp3s* :artist "Dixie Chicks")))) + + Although the queries are more complex, the definition of in is much +simpler than that of matching. + + (defun in (column-name table) + (let ((test (equality-predicate (find-column column-name (schema table)))) + (values (map 'list #'(lambda (r) (getf r column-name)) (rows table)))) + #'(lambda (row) + (member (getf row column-name) values :test test)))) + + ---------- Footnotes ---------- + + (1) This query will also return all the songs performed by the Dixie +Chicks. If you want to limit it to songs by artists other than the +Dixie Chicks, you need a more complex :where function. Since the :where +argument can be any function, it's certainly possible; you could remove +the Dixie Chicks' own songs with this query: + + (let* ((dixie-chicks (matching *mp3s* :artist "Dixie Chicks")) + (same-song (in :song (select :columns :song :from *mp3s* :where dixie-chicks))) + (query #'(lambda (row) (and (not (funcall dixie-chicks row)) (funcall same-song row))))) + (select :columns '(:artist :song) :from *mp3s* :where query)) + + This obviously isn't quite as convenient. If you were going to write +an application that needed to do lots of complex queries, you might want +to consider coming up with a more expressive query language. + + +File: pcl.info, Node: 27-6, Next: 27-7, Prev: 27-5, Up: Chapter 27 + +Getting at the Results +====================== + +Since select returns another table, you need to think a bit about how +you want to get at the individual row and column values in a table. If +you're sure you'll never want to change the way you represent the data +in a table, you can just make the structure of a table part of the +API-that table has a slot rows that's a vector of plists-and use all the +normal Common Lisp functions for manipulating vectors and plists to get +at the values in the table. But that representation is really an +internal detail that you might want to change. Also, you don't +necessarily want other code manipulating the data structures +directly-for instance, you don't want anyone to use SETF to put an +unnormalized column value into a row. So it might be a good idea to +define a few abstractions that provide the operations you want to +support. Then if you decide to change the internal representation +later, you'll need to change only the implementation of these functions +and macros. And while Common Lisp doesn't enable you to absolutely +prevent folks from getting at "internal" data, by providing an official +API you at least make it clear where the boundary is. + + Probably the most common thing you'll need to do with the results of +a query is to iterate over the individual rows and extract specific +column values. So you need to provide a way to do both those things +without touching the rows vector directly or using GETF to get at the +column values within a row. + + For now these operations are trivial to implement; they're merely +wrappers around the code you'd write if you didn't have these +abstractions. You can provide two ways to iterate over the rows of a +table: a macro do-rows, which provides a basic looping construct, and a +function map-rows, which builds a list containing the results of +applying a function to each row in the table. (1) + + (defmacro do-rows ((row table) &body body) + `(loop for ,row across (rows ,table) do ,@body)) + + (defun map-rows (fn table) + (loop for row across (rows table) collect (funcall fn row))) + + To get at individual column values within a row, you should provide a +function, column-value, that takes a row and a column name and returns +the appropriate value. Again, it's a trivial wrapper around the code +you'd write otherwise. But if you change the internal representation of +a table later, users of column-value needn't be any the wiser. + + (defun column-value (row column-name) + (getf row column-name)) + + While column-value is a sufficient abstraction for getting at column +values, you'll often want to get at the values of multiple columns at +once. So you can provide a bit of syntactic sugar, a macro, +with-column-values, that binds a set of variables to the values +extracted from a row using the corresponding keyword names. Thus, +instead of writing this: + + (do-rows (row table) + (let ((song (column-value row :song)) + (artist (column-value row :artist)) + (album (column-value row :album))) + (format t "~a by ~a from ~a~%" song artist album))) + + you can simply write the following: + + (do-rows (row table) + (with-column-values (song artist album) row + (format t "~a by ~a from ~a~%" song artist album))) + + Again, the actual implementation isn't complicated if you use the +once-only macro from Chapter 8. + + (defmacro with-column-values ((&rest vars) row &body body) + (once-only (row) + `(let ,(column-bindings vars row) ,@body))) + + (defun column-bindings (vars row) + (loop for v in vars collect `(,v (column-value ,row ,(as-keyword v))))) + + (defun as-keyword (symbol) + (intern (symbol-name symbol) :keyword)) + + Finally, you should provide abstractions for getting at the number of +rows in a table and for accessing a specific row by numeric index. + + (defun table-size (table) + (length (rows table))) + + (defun nth-row (n table) + (aref (rows table) n)) + + ---------- Footnotes ---------- + + (1) The version of LOOP implemented at M.I.T. before Common Lisp was +standardized included a mechanism for extending the LOOP grammar to +support iteration over new data structures. Some Common Lisp +implementations that inherited their LOOP implementation from that code +base may still support that facility, which would make do-rows and +map-rows less necessary. + + +File: pcl.info, Node: 27-7, Next: Chapter 28, Prev: 27-6, Up: Chapter 27 + +Other Database Operations +========================= + +Finally, you'll implement a few other database operations that you'll +need in Chapter 29. The first two are analogs of the SQL DELETE +statement. The function delete-rows is used to delete rows from a table +that match particular criteria. Like select, it takes :from and :where +keyword arguments. Unlike select, it doesn't return a new table-it +actually modifies the table passed as the :from argument. + + (defun delete-rows (&key from where) + (loop + with rows = (rows from) + with store-idx = 0 + for read-idx from 0 + for row across rows + do (setf (aref rows read-idx) nil) + unless (funcall where row) do + (setf (aref rows store-idx) row) + (incf store-idx) + finally (setf (fill-pointer rows) store-idx))) + + In the interest of efficiency, you might want to provide a separate +function for deleting all the rows from a table. + + (defun delete-all-rows (table) + (setf (rows table) (make-rows *default-table-size*))) + + The remaining table operations don't really map to normal relational +database operations but will be useful in the MP3 browser application. +The first is a function to sort the rows of a table in place. + + (defun sort-rows (table &rest column-names) + (setf (rows table) (sort (rows table) (row-comparator column-names (schema table)))) + table) + + On the flip side, in the MP3 browser application, you'll need a +function that shuffles a table's rows in place using the function +nshuffle-vector from Chapter 23. + + (defun shuffle-table (table) + (nshuffle-vector (rows table)) + table) + + And finally, again for the purposes of the MP3 browser, you should +provide a function that selects n random rows, returning the results as +a new table. It also uses nshuffle-vector along with a version of +random-sample based on Algorithm S from Donald Knuth's The Art of +Computer Programming, Volume 2: Seminumerical Algorithms, Third Edition +(Addison-Wesley, 1998) that I discussed in Chapter 20. + + (defun random-selection (table n) + (make-instance + 'table + :schema (schema table) + :rows (nshuffle-vector (random-sample (rows table) n)))) + + (defun random-sample (vector n) + "Based on Algorithm S from Knuth. TAOCP, vol. 2. p. 142" + (loop with selected = (make-array n :fill-pointer 0) + for idx from 0 + do + (loop + with to-select = (- n (length selected)) + for remaining = (- (length vector) idx) + while (>= (* remaining (random 1.0)) to-select) + do (incf idx)) + (vector-push (aref vector idx) selected) + when (= (length selected) n) return selected)) + + With this code you'll be ready, in Chapter 29, to build a Web +interface for browsing a collection of MP3 files. But before you get to +that, you need to implement the part of the server that streams MP3s +using the Shoutcast protocol, which is the topic of the next chapter. + + +File: pcl.info, Node: Chapter 28, Next: Chapter 29, Prev: Chapter 27, Up: Top + +28. Practical: A Shoutcast Server +================================= + +In this chapter you'll develop another important part of what will +eventually be a Web-based application for streaming MP3s, namely, the +server that implements the Shoutcast protocol for actually streaming +MP3s to clients such as iTunes, XMMS, (1) or Winamp. + +* Menu: + +* 28-1:: The Shoutcast Protocol +* 28-2:: Song Sources +* 28-3:: Implementing Shoutcast + + ---------- Footnotes ---------- + + (1) The version of XMMS shipped with Red Hat 8.0 and 9.0 and Fedora +no longer knows how to play MP3s because the folks at Red Hat were +worried about the licensing issues related to the MP3 codec. To get an +XMMS with MP3 support on these versions of Linux, you can grab the +source from http://www.xmms.org and build it yourself. Or, see +http://www.fedorafaq.org/#xmms-mp3 for information about other +possibilities. + + +File: pcl.info, Node: 28-1, Next: 28-2, Prev: Chapter 28, Up: Chapter 28 + +The Shoutcast Protocol +====================== + +The Shoutcast protocol was invented by the folks at Nullsoft, the makers +of the Winamp MP3 software. It was designed to support Internet audio +broadcasting-Shoutcast DJs send audio data from their personal computers +to a central Shoutcast server that then turns around and streams it out +to any connected listeners. + + The server you'll build is actually only half a true Shoutcast +server-you'll use the protocol that Shoutcast servers use to stream MP3s +to listeners, but your server will be able to serve only songs already +stored on the file system of the computer where the server is running. + + You need to worry about only two parts of the Shoutcast protocol: the +request that a client makes in order to start receiving a stream and the +format of the response, including the mechanism by which metadata about +what song is currently playing is embedded in the stream. + + The initial request from the MP3 client to the Shoutcast server is +formatted as a normal HTTP request. In response, the Shoutcast server +sends an ICY response that looks like an HTTP response except with the +string "ICY" (1) in place of the normal HTTP version string and with +different headers. After sending the headers and a blank line, the +server streams a potentially endless amount of MP3 data. + + The only tricky thing about the Shoutcast protocol is the way +metadata about the songs being streamed is embedded in the data sent to +the client. The problem facing the Shoutcast designers was to provide a +way for the Shoutcast server to communicate new title information to the +client each time it started playing a new song so the client could +display it in its UI. (Recall from Chapter 25 that the MP3 format +doesn't make any provision for encoding metadata.) While one of the +design goals of ID3v2 had been to make it better suited for use when +streaming MP3s, the Nullsoft folks decided to go their own route and +invent a new scheme that's fairly easy to implement on both the client +side and the server side. That, of course, was ideal for them since +they were also the authors of their own MP3 client. + + Their scheme was to simply ignore the structure of MP3 data and embed +a chunk of self-delimiting metadata every n bytes. The client would +then be responsible for stripping out this metadata so it wasn't treated +as MP3 data. Since metadata sent to a client that isn't ready for it +will cause glitches in the sound, the server is supposed to send +metadata only if the client's original request contains a special +Icy-Metadata header. And in order for the client to know how often to +expect metadata, the server must send back a header Icy-Metaint whose +value is the number of bytes of MP3 data that will be sent between each +chunk of metadata. + + The basic content of the metadata is a string of the form +"StreamTitle='title';" where title is the title of the current song and +can't contain single quote marks. This payload is encoded as a +length-delimited array of bytes: a single byte is sent indicating how +many 16-byte blocks follow, and then that many blocks are sent. They +contain the string payload as an ASCII string, with the final block +padded out with null bytes as necessary. + + Thus, the smallest legal metadata chunk is a single byte, zero, +indicating zero subsequent blocks. If the server doesn't need to update +the metadata, it can send such an empty chunk, but it must send at least +the one byte so the client doesn't throw away actual MP3 data. + + ---------- Footnotes ---------- + + (1) To further confuse matters, there's a different streaming +protocol called Icecast. There seems to be no connection between the +ICY header used by Shoutcast and the Icecast protocol. + + +File: pcl.info, Node: 28-2, Next: 28-3, Prev: 28-1, Up: Chapter 28 + +Song Sources +============ + +Because a Shoutcast server has to keep streaming songs to the client for +as long as it's connected, you need to provide your server with a source +of songs to draw on. In the Web-based application, each connected +client will have a playlist that can be manipulated via the Web +interface. But in the interest of avoiding excessive coupling, you +should define an interface that the Shoutcast server can use to obtain +songs to play. You can write a simple implementation of this interface +now and then a more complex one as part of the Web application you'll +build in Chapter 29. + +* Menu: + +* 28-2-1:: The Package + + The idea behind the interface is that the Shoutcast server will find +a source of songs based on an ID extracted from the AllegroServe request +object. It can then do three things with the song source it's given. + + * Get the current song from the source + * Tell the song source that it's done with the current song + * Ask the source whether the song it was given earlier is still the + current song + + The last operation is necessary because there may be ways-and will be +in Chapter 29-to manipulate the songs source outside the Shoutcast +server. You can express the operations the Shoutcast server needs with +the following generic functions: + + (defgeneric current-song (source) + (:documentation "Return the currently playing song or NIL.")) + + (defgeneric maybe-move-to-next-song (song source) + (:documentation + "If the given song is still the current one update the value + returned by current-song.")) + + (defgeneric still-current-p (song source) + (:documentation + "Return true if the song given is the same as the current-song.")) + + The function maybe-move-to-next-song is defined the way it is so a +single operation checks whether the song is current and, if it is, moves +the song source to the next song. This will be important in the next +chapter when you need to implement a song source that can be safely +manipulated from two different threads. (1) + + To represent the information about a song that the Shoutcast server +needs, you can define a class, song, with slots to hold the name of the +MP3 file, the title to send in the Shoutcast metadata, and the size of +the ID3 tag so you can skip it when serving up the file. + + (defclass song () + ((file :reader file :initarg :file) + (title :reader title :initarg :title) + (id3-size :reader id3-size :initarg :id3-size))) + + The value returned by current-song (and thus the first argument to +still-current-p and maybe-move-to-next-song) will be an instance of +song. + + In addition, you need to define a generic function that the server +can use to find a song source based on the type of source desired and +the request object. Methods will specialize the type parameter in order +to return different kinds of song source and will pull whatever +information they need from the request object to determine which source +to return. + + (defgeneric find-song-source (type request) + (:documentation "Find the song-source of the given type for the given request.")) + + However, for the purposes of this chapter, you can use a trivial +implementation of this interface that always uses the same object, a +simple queue of song objects that you can manipulate from the REPL. You +can start by defining a class, simple-song-queue, and a global variable, +*songs*, that holds an instance of this class. + + (defclass simple-song-queue () + ((songs :accessor songs :initform (make-array 10 :adjustable t :fill-pointer 0)) + (index :accessor index :initform 0))) + + (defparameter *songs* (make-instance 'simple-song-queue)) + + Then you can define a method on find-song-source that specializes +type with an EQL specializer on the symbol singleton and returns the +instance stored in *songs*. + + (defmethod find-song-source ((type (eql 'singleton)) request) + (declare (ignore request)) + *songs*) + + Now you just need to implement methods on the three generic functions +that the Shoutcast server will use. + + (defmethod current-song ((source simple-song-queue)) + (when (array-in-bounds-p (songs source) (index source)) + (aref (songs source) (index source)))) + + (defmethod still-current-p (song (source simple-song-queue)) + (eql song (current-song source))) + + (defmethod maybe-move-to-next-song (song (source simple-song-queue)) + (when (still-current-p song source) + (incf (index source)))) + + And for testing purposes you should provide a way to add songs to +this queue. + + (defun add-file-to-songs (file) + (vector-push-extend (file->song file) (songs *songs*))) + + (defun file->song (file) + (let ((id3 (read-id3 file))) + (make-instance + 'song + :file (namestring (truename file)) + :title (format nil "~a by ~a from ~a" (song id3) (artist id3) (album id3)) + :id3-size (size id3)))) + + ---------- Footnotes ---------- + + (1) Technically, the implementation in this chapter will also be +manipulated from two threads-the AllegroServe thread running the +Shoutcast server and the REPL thread. But you can live with the race +condition for now. I'll discuss how to use locking to make code thread +safe in the next chapter. + + +File: pcl.info, Node: 28-2-1, Next: 28-2, Prev: 28-2, Up: 28-2 + +The Package +=========== + +The package for the code you'll develop in this chapter looks like this: + + (defpackage :com.gigamonkeys.shoutcast + (:use :common-lisp + :net.aserve + :com.gigamonkeys.id3v2) + (:export :song + :file + :title + :id3-size + :find-song-source + :current-song + :still-current-p + :maybe-move-to-next-song + :*song-source-type*)) + + +File: pcl.info, Node: 28-3, Next: Chapter 29, Prev: 28-2, Up: Chapter 28 + +Implementing Shoutcast +====================== + +Now you're ready to implement the Shoutcast server. Since the Shoutcast +protocol is loosely based on HTTP, you can implement the server as a +function within AllegroServe. However, since you need to interact with +some of the low-level features of AllegroServe, you can't use the +define-url-function macro from Chapter 26. Instead, you need to write a +regular function that looks like this: + + (defun shoutcast (request entity) + (with-http-response + (request entity :content-type "audio/MP3" :timeout *timeout-seconds*) + (prepare-icy-response request *metadata-interval*) + (let ((wants-metadata-p (header-slot-value request :icy-metadata))) + (with-http-body (request entity) + (play-songs + (request-socket request) + (find-song-source *song-source-type* request) + (if wants-metadata-p *metadata-interval*)))))) + + Then publish that function under the path /stream.mp3 like this: (1) + + (publish :path "/stream.mp3" :function 'shoutcast) + + In the call to with-http-response, in addition to the usual request +and entity arguments, you need to pass :content-type and :timeout +arguments. The :content-type argument tells AllegroServe how to set the +Content-Type header it sends. And the :timeout argument specifies the +number of seconds AllegroServe gives the function to generate its +response. By default AllegroServe times out each request after five +minutes. Because you're going to stream an essentially endless sequence +of MP3s, you need much more time. There's no way to tell AllegroServe +to never time out the request, so you should set it to the value of +*timeout-seconds*, which you can define to some suitably large value +such as the number of seconds in ten years. + + (defparameter *timeout-seconds* (* 60 60 24 7 52 10)) + + Then, within the body of the with-http-response and before the call +to with-http-body that will cause the response headers to be sent, you +need to manipulate the reply that AllegroServe will send. The function +prepare-icy-response encapsulates the necessary manipulations: changing +the protocol string from the default of "HTTP" to "ICY" and adding the +Shoutcast-specific headers. (2) You also need, in order to work around +a bug in iTunes, to tell AllegroServe not to use chunked +transfer-encoding. (3) The functions request-reply-protocol-string, +request-uri, and reply-header-slot-value are all part of AllegroServe. + + (defun prepare-icy-response (request metadata-interval) + (setf (request-reply-protocol-string request) "ICY") + (loop for (k v) in (reverse + `((:|icy-metaint| ,(princ-to-string metadata-interval)) + (:|icy-notice1| "<BR>This stream blah blah blah<BR>") + (:|icy-notice2| "More blah") + (:|icy-name| "MyLispShoutcastServer") + (:|icy-genre| "Unknown") + (:|icy-url| ,(request-uri request)) + (:|icy-pub| "1"))) + do (setf (reply-header-slot-value request k) v)) + ;; iTunes, despite claiming to speak HTTP/1.1, doesn't understand + ;; chunked Transfer-encoding. Grrr. So we just turn it off. + (turn-off-chunked-transfer-encoding request)) + + (defun turn-off-chunked-transfer-encoding (request) + (setf (request-reply-strategy request) + (remove :chunked (request-reply-strategy request)))) + + Within the with-http-body of shoutcast, you actually stream the MP3 +data. The function play-songs takes the stream to which it should write +the data, the song source, and the metadata interval it should use or +NIL if the client doesn't want metadata. The stream is the socket +obtained from the request object, the song source is obtained by calling +find-song-source, and the metadata interval comes from the global +variable *metadata-interval*. The type of song source is controlled by +the variable *song-source-type*, which for now you can set to singleton +in order to use the simple-song-queue you implemented previously. + + (defparameter *metadata-interval* (expt 2 12)) + + (defparameter *song-source-type* 'singleton) + + The function play-songs itself doesn't do much-it loops calling the +function play-current, which does all the heavy lifting of sending the +contents of a single MP3 file, skipping the ID3 tag and embedding ICY +metadata. The only wrinkle is that you need to keep track of when to +send the metadata. + + Since you must send metadata chunks at a fixed intervals, regardless +of when you happen to switch from one MP3 file to the next, each time +you call play-current you need to tell it when the next metadata is due, +and when it returns, it must tell you the same thing so you can pass the +information to the next call to play-current. If play-current gets NIL +from the song source, it returns NIL, which allows the play-songs LOOP +to end. + + In addition to handling the looping, play-songs also provides a +HANDLER-CASE to trap the error that will be signaled when the MP3 client +disconnects from the server and one of the writes to the socket, down in +play-current, fails. Since the HANDLER-CASE is outside the LOOP, +handling the error will break out of the loop, allowing play-songs to +return. + + (defun play-songs (stream song-source metadata-interval) + (handler-case + (loop + for next-metadata = metadata-interval + then (play-current + stream + song-source + next-metadata + metadata-interval) + while next-metadata) + (error (e) (format *trace-output* "Caught error in play-songs: ~a" e)))) + + Finally, you're ready to implement play-current, which actually sends +the Shoutcast data. The basic idea is that you get the current song +from the song source, open the song's file, and then loop reading data +from the file and writing it to the socket until either you reach the +end of the file or the current song is no longer the current song. + + There are only two complications: One is that you need to make sure +you send the metadata at the correct interval. The other is that if the +file starts with an ID3 tag, you want to skip it. If you don't worry +too much about I/O efficiency, you can implement play-current like this: + + (defun play-current (out song-source next-metadata metadata-interval) + (let ((song (current-song song-source))) + (when song + (let ((metadata (make-icy-metadata (title song)))) + (with-open-file (mp3 (file song)) + (unless (file-position mp3 (id3-size song)) + (error "Can't skip to position ~d in ~a" (id3-size song) (file song))) + (loop for byte = (read-byte mp3 nil nil) + while (and byte (still-current-p song song-source)) do + (write-byte byte out) + (decf next-metadata) + when (and (zerop next-metadata) metadata-interval) do + (write-sequence metadata out) + (setf next-metadata metadata-interval)) + + (maybe-move-to-next-song song song-source))) + next-metadata))) + + This function gets the current song from the song source and gets a +buffer containing the metadata it'll need to send by passing the title +to make-icy-metadata. Then it opens the file and skips past the ID3 tag +using the two-argument form of FILE-POSITION. Then it commences reading +bytes from the file and writing them to the request stream. (4) + + It'll break out of the loop either when it reaches the end of the +file or when the song source's current song changes out from under it. +In the meantime, whenever next-metadata gets to zero (if you're supposed +to send metadata at all), it writes metadata to the stream and resets +next-metadata. Once it finishes the loop, it checks to see if the song +is still the song source's current song; if it is, that means it broke +out of the loop because it read the whole file, in which case it tells +the song source to move to the next song. Otherwise, it broke out of +the loop because someone changed the current song out from under it, and +it just returns. In either case, it returns the number of bytes left +before the next metadata is due so it can be passed in the next call to +play-current. (5) + + The function make-icy-metadata, which takes the title of the current +song and generates an array of bytes containing a properly formatted +chunk of ICY metadata, is also straightforward. (6) + + (defun make-icy-metadata (title) + (let* ((text (format nil "StreamTitle='~a';" (substitute #\Space #\' title))) + (blocks (ceiling (length text) 16)) + (buffer (make-array (1+ (* blocks 16)) + :element-type '(unsigned-byte 8) + :initial-element 0))) + (setf (aref buffer 0) blocks) + (loop + for char across text + for i from 1 + do (setf (aref buffer i) (char-code char))) + buffer)) + + Depending on how your particular Lisp implementation handles its +streams, and also how many MP3 clients you want to serve at once, the +simple version of play-current may or may not be efficient enough. + + The potential problem with the simple implementation is that you have +to call READ-BYTE and WRITE-BYTE for every byte you transfer. It's +possible that each call may result in a relatively expensive system call +to read or write one byte. And even if Lisp implements its own streams +with internal buffering so not every call to READ-BYTE or WRITE-BYTE +results in a system call, function calls still aren't free. In +particular, in implementations that provide user-extensible streams +using so-called Gray Streams, READ-BYTE and WRITE-BYTE may result in a +generic function call under the covers to dispatch on the class of the +stream argument. While generic function dispatch is normally speedy +enough that you don't have to worry about it, it's a bit more expensive +than a nongeneric function call and thus not something you necessarily +want to do several million times in a few minutes if you can avoid it. + + A more efficient, if slightly more complex, way to implement +play-current is to read and write multiple bytes at a time using the +functions READ-SEQUENCE and WRITE-SEQUENCE. This also gives you a chance +to match your file reads with the natural block size of the file system, +which will likely give you the best disk throughput. Of course, no +matter what buffer size you use, keeping track of when to send the +metadata becomes a bit more complicated. A more efficient version of +play-current that uses READ-SEQUENCE and WRITE-SEQUENCE might look like +this: + + (defun play-current (out song-source next-metadata metadata-interval) + (let ((song (current-song song-source))) + (when song + (let ((metadata (make-icy-metadata (title song))) + (buffer (make-array size :element-type '(unsigned-byte 8)))) + (with-open-file (mp3 (file song)) + (labels ((write-buffer (start end) + (if metadata-interval + (write-buffer-with-metadata start end) + (write-sequence buffer out :start start :end end))) + + (write-buffer-with-metadata (start end) + (cond + ((> next-metadata (- end start)) + (write-sequence buffer out :start start :end end) + (decf next-metadata (- end start))) + (t + (let ((middle (+ start next-metadata))) + (write-sequence buffer out :start start :end middle) + (write-sequence metadata out) + (setf next-metadata metadata-interval) + (write-buffer-with-metadata middle end)))))) + + (multiple-value-bind (skip-blocks skip-bytes) + (floor (id3-size song) (length buffer)) + + (unless (file-position mp3 (* skip-blocks (length buffer))) + (error "Couldn't skip over ~d ~d byte blocks." + skip-blocks (length buffer))) + + (loop for end = (read-sequence buffer mp3) + for start = skip-bytes then 0 + do (write-buffer start end) + while (and (= end (length buffer)) + (still-current-p song song-source))) + + (maybe-move-to-next-song song song-source))))) + next-metadata))) + + Now you're ready to put all the pieces together. In the next chapter +you'll write a Web interface to the Shoutcast server developed in this +chapter, using the MP3 database from Chapter 27 as the source of songs. + + ---------- Footnotes ---------- + + (1) Another thing you may want to do while working on this code is to +evaluate the form (net.aserve::debug-on :notrap). This tells +AllegroServe to not trap errors signaled by your code, which will allow +you to debug them in the normal Lisp debugger. In SLIME this will pop +up a SLIME debugger buffer just like any other error. + + (2) Shoutcast headers are usually sent in lowercase, so you need to +escape the names of the keyword symbols used to identify them to +AllegroServe to keep the Lisp reader from converting them to all +uppercase. Thus, you'd write :|icy-metaint| rather than :icy-metaint. +You could also write :\i\c\y-\m\e\t\a\i\n\t, but that'd be silly. + + (3) The function turn-off-chunked-transfer-encoding is a bit of a +kludge. There's no way to turn off chunked transfer encoding via +AllegroServe's official APIs without specifying a content length because +any client that advertises itself as an HTTP/1.1 client, which iTunes +does, is supposed to understand it. But this does the trick. + + (4) Most MP3-playing software will display the metadata somewhere in +the user interface. However, the XMMS program on Linux by default +doesn't. To get XMMS to display Shoutcast metadata, press Ctrl+P to see +the Preferences pane. Then in the Audio I/O Plugins tab (the leftmost +tab in version 1.2.10), select the MPEG Layer 1/2/3 Player +(libmpg123.so) and hit the Configure button. Then select the Streaming +tab on the configuration window, and at the bottom of the tab in the +SHOUTCAST/Icecast section, check the "Enable SHOUTCAST/Icecast title +streaming" box. + + (5) Folks coming to Common Lisp from Scheme might wonder why +play-current can't just call itself recursively. In Scheme that would +work fine since Scheme implementations are required by the Scheme +specification to support "an unbounded number of active tail calls." +Common Lisp implementations are allowed to have this property, but it +isn't required by the language standard. Thus, in Common Lisp the +idiomatic way to write loops is with a looping construct, not with +recursion. + + (6) This function assumes, as has other code you've written, that +your Lisp implementation's internal character encoding is ASCII or a +superset of ASCII, so you can use CHAR-CODE to translate Lisp CHARACTER +objects to bytes of ASCII data. + + +File: pcl.info, Node: Chapter 29, Next: Chapter 30, Prev: Chapter 28, Up: Top + +29. Practical: An MP3 Browser +============================= + +The final step in building the MP3 streaming application is to provide a +Web interface that allows a user to find the songs they want to listen +to and add them to a playlist that the Shoutcast server will draw upon +when the user's MP3 client requests the stream URL. For this component +of the application, you'll pull together several bits of code from the +previous few chapters: the MP3 database, the define-url-function macro +from Chapter 26, and, of course, the Shoutcast server itself. + +* Menu: + +* 29-1:: Playlists +* 29-2:: Playlists As Song Sources +* 29-3:: Manipulating the Playlist +* 29-4:: Query Parameter Types +* 29-5:: Boilerplate HTML +* 29-6:: The Browse Page +* 29-7:: The Playlist +* 29-8:: Finding a Playlist +* 29-9:: Running the App + + +File: pcl.info, Node: 29-1, Next: 29-2, Prev: Chapter 29, Up: Chapter 29 + +Playlists +========= + +The basic idea behind the interface will be that each MP3 client that +connects to the Shoutcast server gets its own playlist, which serves as +the source of songs for the Shoutcast server. A playlist will also +provide facilities beyond those needed by the Shoutcast server: through +the Web interface the user will be able to add songs to the playlist, +delete songs already in the playlist, and reorder the playlist by +sorting and shuffling. + + You can define a class to represent playlists like this: + + (defclass playlist () + ((id :accessor id :initarg :id) + (songs-table :accessor songs-table :initform (make-playlist-table)) + (current-song :accessor current-song :initform *empty-playlist-song*) + (current-idx :accessor current-idx :initform 0) + (ordering :accessor ordering :initform :album) + (shuffle :accessor shuffle :initform :none) + (repeat :accessor repeat :initform :none) + (user-agent :accessor user-agent :initform "Unknown") + (lock :reader lock :initform (make-process-lock)))) + + The id of a playlist is the key you extract from the request object +passed to find-song-source when looking up a playlist. You don't +actually need to store it in the playlist object, but it makes debugging +a bit easier if you can find out from an arbitrary playlist object what +its id is. + + The heart of the playlist is the songs-table slot, which will hold a +table object. The schema for this table will be the same as for the +main MP3 database. The function make-playlist-table, which you use to +initialize songs-table, is simply this: + + (defun make-playlist-table () + (make-instance 'table :schema *mp3-schema*)) + +* Menu: + +* 29-1-1:: The Package + + By storing the list of songs as a table, you can use the database +functions from Chapter 27 to manipulate the playlist: you can add to the +playlist with insert-row, delete songs with delete-rows, and reorder the +playlist with sort-rows and shuffle-table. + + The current-song and current-idx slots keep track of which song is +playing: current-song is an actual song object, while current-idx is the +index into the songs-table of the row representing the current song. +You'll see in the section "Manipulating the Playlist" how to make sure +current-song is updated whenever current-idx changes. + + The ordering and shuffle slots hold information about how the songs +in songs-table are to be ordered. The ordering slot holds a keyword +that tells how the songs-table should be sorted when it's not shuffled. +The legal values are :genre, :artist, :album, and :song. The shuffle +slot holds one of the keywords :none, :song, or :album, which specifies +how songs-table should be shuffled, if at all. + + The repeat slot also holds a keyword, one of :none, :song, or :all, +which specifies the repeat mode for the playlist. If repeat is :none, +after the last song in the songs-table has been played, the current-song +goes back to a default MP3. When :repeat is :song, the playlist keeps +returning the same current-song forever. And if it's :all, after the +last song, current-song goes back to the first song. + + The user-agent slot holds the value of the User-Agent header sent by +the MP3 client in its request for the stream. You need to hold onto +this value purely for use in the Web interface-the User-Agent header +identifies the program that made the request, so you can display the +value on the page that lists all the playlists to make it easier to tell +which playlist goes with which connection when multiple clients connect. + + Finally, the lock slot holds a process lock created with the function +make-process-lock, which is part of Allegro's MULTIPROCESSING package. +You'll need to use that lock in certain functions that manipulate +playlist objects to ensure that only one thread at a time manipulates a +given playlist object. You can define the following macro, built upon +the with-process-lock macro from MULTIPROCESSING, to give an easy way to +wrap a body of code that should be performed while holding a playlist's +lock: + + (defmacro with-playlist-locked ((playlist) &body body) + `(with-process-lock ((lock ,playlist)) + ,@body)) + + The with-process-lock macro acquires exclusive access to the process +lock given and then executes the body forms, releasing the lock +afterward. By default, with-process-lock allows recursive locks, +meaning the same thread can safely acquire the same lock multiple times. + + +File: pcl.info, Node: 29-1-1, Next: 29-1, Prev: 29-1, Up: 29-1 + +The Package +=========== + +You can define the package for the code in this chapter with the +following DEFPACKAGE: + + (defpackage :com.gigamonkeys.mp3-browser + (:use :common-lisp + :net.aserve + :com.gigamonkeys.html + :com.gigamonkeys.shoutcast + :com.gigamonkeys.url-function + :com.gigamonkeys.mp3-database + :com.gigamonkeys.id3v2) + (:import-from :acl-socket + :ipaddr-to-dotted + :remote-host) + (:import-from :multiprocessing + :make-process-lock + :with-process-lock) + (:export :start-mp3-browser)) + + Because this is a high-level application, it uses a lot of +lower-level packages. It also imports three symbols from the ACL-SOCKET +package and two more from MULTIPROCESSING since it just needs those five +and not the other 139 symbols those two packages export. + + +File: pcl.info, Node: 29-2, Next: 29-3, Prev: 29-1, Up: Chapter 29 + +Playlists As Song Sources +========================= + +To use playlists as a source of songs for the Shoutcast server, you'll +need to implement a method on the generic function find-song-source from +Chapter 28. Since you're going to have multiple playlists, you need a +way to find the right one for each client that connects to the server. +The mapping part is easy-you can define a variable that holds an EQUAL +hash table that you can use to map from some identifier to the playlist +object. + + (defvar *playlists* (make-hash-table :test #'equal)) + + You'll also need to define a process lock to protect access to this +hash table like this: + + (defparameter *playlists-lock* (make-process-lock :name "playlists-lock")) + + Then define a function that looks up a playlist given an ID, creating +a new playlist object if necessary and using with-process-lock to ensure +that only one thread at a time manipulates the hash table. (1) + + (defun lookup-playlist (id) + (with-process-lock (*playlists-lock*) + (or (gethash id *playlists*) + (setf (gethash id *playlists*) (make-instance 'playlist :id id))))) + + Then you can implement find-song-source on top of that function and +another, playlist-id, that takes an AllegroServe request object and +returns the appropriate playlist identifier. The find-song-source +function is also where you grab the User-Agent string out of the request +object and stash it in the playlist object. + + (defmethod find-song-source ((type (eql 'playlist)) request) + (let ((playlist (lookup-playlist (playlist-id request)))) + (with-playlist-locked (playlist) + (let ((user-agent (header-slot-value request :user-agent))) + (when user-agent (setf (user-agent playlist) user-agent)))) + playlist)) + + The trick, then, is how you implement playlist-id, the function that +extracts the identifier from the request object. You have a couple +options, each with different implications for the user interface. You +can pull whatever information you want out of the request object, but +however you decide to identify the client, you need some way for the +user of the Web interface to get hooked up to the right playlist. + + For now you can take an approach that "just works" as long as there's +only one MP3 client per machine connecting to the server and as long as +the user is browsing the Web interface from the machine running the MP3 +client: you'll use the IP address of the client machine as the +identifier. This way you can find the right playlist for a request +regardless of whether the request is from the MP3 client or a Web +browser. You will, however, provide a way in the Web interface to +select a different playlist from the browser, so the only real +constraint this choice puts on the application is that there can be only +one connected MP3 client per client IP address. (2) The implementation +of playlist-id looks like this: + + (defun playlist-id (request) + (ipaddr-to-dotted (remote-host (request-socket request)))) + + The function request-socket is part of AllegroServe, while +remote-host and ipaddr-to-dotted are part of Allegro's socket library. + + To make a playlist usable as a song source by the Shoutcast server, +you need to define methods on current-song, still-current-p, and +maybe-move-to-next-song that specialize their source parameter on +playlist. The current-song method is already taken care of: by defining +the accessor current-song on the eponymous slot, you automatically got a +current-song method specialized on playlist that returns the value of +that slot. However, to make accesses to the playlist thread safe, you +need to lock the playlist before accessing the current-song slot. In +this case, the easiest way is to define an :around method like the +following: + + (defmethod current-song :around ((playlist playlist)) + (with-playlist-locked (playlist) (call-next-method))) + + Implementing still-current-p is also quite simple, assuming you can +be sure that current-song gets updated with a new song object only when +the current song actually changes. Again, you need to acquire the +process lock to ensure you get a consistent view of the playlist's +state. + + (defmethod still-current-p (song (playlist playlist)) + (with-playlist-locked (playlist) + (eql song (current-song playlist)))) + + The trick, then, is to make sure the current-song slot gets updated +at the right times. However, the current song can change in a number of +ways. The obvious one is when the Shoutcast server calls +maybe-move-to-next-song. But it can also change when songs are added to +the playlist, when the Shoutcast server has run out of songs, or even if +the playlist's repeat mode is changed. + + Rather than trying to write code specific to every situation to +determine whether to update current-song, you can define a function, +update-current-if-necessary, that updates current-song if the song +object in current-song no longer matches the file that the current-idx +slot says should be playing. Then, if you call this function after any +manipulation of the playlist that could possibly put those two slots out +of sync, you're sure to keep current-song set properly. Here are +update-current-if-necessary and its helper functions: + + (defun update-current-if-necessary (playlist) + (unless (equal (file (current-song playlist)) + (file-for-current-idx playlist)) + (reset-current-song playlist))) + + (defun file-for-current-idx (playlist) + (if (at-end-p playlist) + nil + (column-value (nth-row (current-idx playlist) (songs-table playlist)) :file))) + + (defun at-end-p (playlist) + (>= (current-idx playlist) (table-size (songs-table playlist)))) + + You don't need to add locking to these functions since they'll be +called only from functions that will take care of locking the playlist +first. + + The function reset-current-song introduces one more wrinkle: because +you want the playlist to provide an endless stream of MP3s to the +client, you don't want to ever set current-song to NIL. Instead, when a +playlist runs out of songs to play-when songs-table is empty or after +the last song has been played and repeat is set to :none-then you need +to set current-song to a special song whose file is an MP3 of silence +(3) and whose title explains why no music is playing. Here's some code +to define two parameters, *empty-playlist-song* and +*end-of-playlist-song*, each set to a song with the file named by +*silence-mp3* as their file and an appropriate title: + + (defparameter *silence-mp3* ...) + + (defun make-silent-song (title &optional (file *silence-mp3*)) + (make-instance + 'song + :file file + :title title + :id3-size (if (id3-p file) (size (read-id3 file)) 0))) + + (defparameter *empty-playlist-song* (make-silent-song "Playlist empty.")) + + (defparameter *end-of-playlist-song* (make-silent-song "At end of playlist.")) + + reset-current-song uses these parameters when the current-idx doesn't +point to a row in songs-table. Otherwise, it sets current-song to a +song object representing the current row. + + (defun reset-current-song (playlist) + (setf + (current-song playlist) + (cond + ((empty-p playlist) *empty-playlist-song*) + ((at-end-p playlist) *end-of-playlist-song*) + (t (row->song (nth-row (current-idx playlist) (songs-table playlist))))))) + + (defun row->song (song-db-entry) + (with-column-values (file song artist album id3-size) song-db-entry + (make-instance + 'song + :file file + :title (format nil "~a by ~a from ~a" song artist album) + :id3-size id3-size))) + + (defun empty-p (playlist) + (zerop (table-size (songs-table playlist)))) + + Now, at last, you can implement the method on maybe-move-to-next-song +that moves current-idx to its next value, based on the playlist's repeat +mode, and then calls update-current-if-necessary. You don't change +current-idx when it's already at the end of the playlist because you +want it to keep its current value, so it'll point at the next song you +add to the playlist. This function must lock the playlist before +manipulating it since it's called by the Shoutcast server code, which +doesn't do any locking. + + (defmethod maybe-move-to-next-song (song (playlist playlist)) + (with-playlist-locked (playlist) + (when (still-current-p song playlist) + (unless (at-end-p playlist) + (ecase (repeat playlist) + (:song) ; nothing changes + (:none (incf (current-idx playlist))) + (:all (setf (current-idx playlist) + (mod (1+ (current-idx playlist)) + (table-size (songs-table playlist))))))) + (update-current-if-necessary playlist)))) + + ---------- Footnotes ---------- + + (1) The intricacies of concurrent programming are beyond the scope of +this book. The basic idea is that if you have multiple threads of +control-as you will in this application with some threads running the +shoutcast function and other threads responding to requests from the +browser-then you need to make sure only one thread at a time manipulates +an object in order to prevent one thread from seeing the object in an +inconsistent state while another thread is working on it. In this +function, for instance, if two new MP3 clients are connecting at the +same time, they'd both try to add an entry to *playlists* and might +interfere with each other. The with-process-lock ensures that each +thread gets exclusive access to the hash table for long enough to do the +work it needs to do. + + (2) This approach also assumes that every client machine has a unique +IP address. This assumption should hold as long as all the users are on +the same LAN but may not hold if clients are connecting from behind a +firewall that does network address translation. Deploying this +application outside a LAN will require some modifications, but if you +want to deploy this application to the wider Internet, you'd better know +enough about networking to figure out an appropriate scheme yourself. + + (3) Unfortunately, because of licensing issues around the MP3 format, +it's not clear that it's legal for me to provide you with such an MP3 +without paying licensing fees to Fraunhofer IIS. I got mine as part of +the software that came with my Slimp3 from Slim Devices. You can grab +it from their Subversion repository via the Web at +http://svn.slimdevices.com/*checkout*/trunk/server/ +HTML/EN/html/silentpacket.mp3?rev=2. Or buy a Squeezebox, the new, +wireless version of Slimp3, and you'll get silentpacket.mp3 as part of +the software that comes with it. Or find an MP3 of John Cage's piece +4'33". + + +File: pcl.info, Node: 29-3, Next: 29-4, Prev: 29-2, Up: Chapter 29 + +Manipulating the Playlist +========================= + +The rest of the playlist code is functions used by the Web interface to +manipulate playlist objects, including adding and deleting songs, +sorting and shuffling, and setting the repeat mode. As in the helper +functions in the previous section, you don't need to worry about locking +in these functions because, as you'll see, the lock will be acquired in +the Web interface function that calls these. + + Adding and deleting is mostly a question of manipulating the +songs-table. The only extra work you have to do is to keep the +current-song and current-idx in sync. For instance, whenever the +playlist is empty, its current-idx will be zero, and the current-song +will be the *empty-playlist-song*. If you add a song to an empty +playlist, then the index of zero is now in bounds, and you should change +the current-song to the newly added song. By the same token, when +you've played all the songs in a playlist and current-song is +*end-of-playlist-song*, adding a song should cause current-song to be +reset. All this really means, though, is that you need to call +update-current-if-necessary at the appropriate points. + + Adding songs to a playlist is a bit involved because of the way the +Web interface communicates which songs to add. For reasons I'll discuss +in the next section, the Web interface code can't just give you a simple +set of criteria to use in selecting songs from the database. Instead, +it gives you the name of a column and a list of values, and you're +supposed to add all the songs from the main database where the given +column has a value in the list of values. Thus, to add the right songs, +you need to first build a table object containing the desired values, +which you can then use with an in query against the song database. So, +add-songs looks like this: + + (defun add-songs (playlist column-name values) + (let ((table (make-instance + 'table + :schema (extract-schema (list column-name) (schema *mp3s*))))) + (dolist (v values) (insert-row (list column-name v) table)) + (do-rows (row (select :from *mp3s* :where (in column-name table))) + (insert-row row (songs-table playlist)))) + (update-current-if-necessary playlist)) + + Deleting songs is a bit simpler; you just need to be able to delete +songs from the songs-table that match particular criteria-either a +particular song or all songs in a particular genre, by a particular +artist, or from a particular album. So, you can provide a delete-songs +function that takes keyword/value pairs, which are used to construct a +matching :where clause you can pass to the delete-rows database +function. + + Another complication that arises when deleting songs is that +current-idx may need to change. Assuming the current song isn't one of +the ones just deleted, you'd like it to remain the current song. But if +songs before it in songs-table are deleted, it'll be in a different +position in the table after the delete. So after a call to delete-rows, +you need to look for the row containing the current song and reset +current-idx. If the current song has itself been deleted, then, for +lack of anything better to do, you can reset current-idx to zero. After +updating current-idx, calling update-current-if-necessary will take care +of updating current-song. And if current-idx changed but still points +at the same song, current-song will be left alone. + + (defun delete-songs (playlist &rest names-and-values) + (delete-rows + :from (songs-table playlist) + :where (apply #'matching (songs-table playlist) names-and-values)) + (setf (current-idx playlist) (or (position-of-current playlist) 0)) + (update-current-if-necessary playlist)) + + (defun position-of-current (playlist) + (let* ((table (songs-table playlist)) + (matcher (matching table :file (file (current-song playlist)))) + (pos 0)) + (do-rows (row table) + (when (funcall matcher row) + (return-from position-of-current pos)) + (incf pos)))) + + You can also provide a function to completely clear the playlist, +which uses delete-all-rows and doesn't have to worry about finding the +current song since it has obviously been deleted. The call to +update-current-if-necessary will take care of setting current-song to +NIL. + + (defun clear-playlist (playlist) + (delete-all-rows (songs-table playlist)) + (setf (current-idx playlist) 0) + (update-current-if-necessary playlist)) + + Sorting and shuffling the playlist are related in that the playlist +is always either sorted or shuffled. The shuffle slot says whether the +playlist should be shuffled and if so how. If it's set to :none, then +the playlist is ordered according to the value in the ordering slot. +When shuffle is :song, the playlist will be randomly permuted. And when +it's set to :album, the list of albums is randomly permuted, but the +songs within each album are listed in track order. Thus, the +sort-playlist function, which will be called by the Web interface code +whenever the user selects a new ordering, needs to set ordering to the +desired ordering and set shuffle to :none before calling order-playlist, +which actually does the sort. As in delete-songs, you need to use +position-of-current to reset current-idx to the new location of the +current song. However, this time you don't need to call +update-current-if-necessary since you know the current song is still in +the table. + + (defun sort-playlist (playlist ordering) + (setf (ordering playlist) ordering) + (setf (shuffle playlist) :none) + (order-playlist playlist) + (setf (current-idx playlist) (position-of-current playlist))) + + In order-playlist, you can use the database function sort-rows to +actually perform the sort, passing a list of columns to sort by based on +the value of ordering. + + (defun order-playlist (playlist) + (apply #'sort-rows (songs-table playlist) + (case (ordering playlist) + (:genre '(:genre :album :track)) + (:artist '(:artist :album :track)) + (:album '(:album :track)) + (:song '(:song))))) + + The function shuffle-playlist, called by the Web interface code when +the user selects a new shuffle mode, works in a similar fashion except +it doesn't need to change the value of ordering. Thus, when +shuffle-playlist is called with a shuffle of :none, the playlist goes +back to being sorted according to the most recent ordering. Shuffling +by songs is simple-just call shuffle-table on songs-table. Shuffling by +albums is a bit more involved but still not rocket science. + + (defun shuffle-playlist (playlist shuffle) + (setf (shuffle playlist) shuffle) + (case shuffle + (:none (order-playlist playlist)) + (:song (shuffle-by-song playlist)) + (:album (shuffle-by-album playlist))) + (setf (current-idx playlist) (position-of-current playlist))) + + (defun shuffle-by-song (playlist) + (shuffle-table (songs-table playlist))) + + (defun shuffle-by-album (playlist) + (let ((new-table (make-playlist-table))) + (do-rows (album-row (shuffled-album-names playlist)) + (do-rows (song (songs-for-album playlist (column-value album-row :album))) + (insert-row song new-table))) + (setf (songs-table playlist) new-table))) + + (defun shuffled-album-names (playlist) + (shuffle-table + (select + :columns :album + :from (songs-table playlist) + :distinct t))) + + (defun songs-for-album (playlist album) + (select + :from (songs-table playlist) + :where (matching (songs-table playlist) :album album) + :order-by :track)) + + The last manipulation you need to support is setting the playlist's +repeat mode. Most of the time you don't need to take any extra action +when setting repeat-its value comes into play only in +maybe-move-to-next-song. However, you need to update the current-song +as a result of changing repeat in one situation, namely, if current-idx +is at the end of a nonempty playlist and repeat is being changed to +:song or :all. In that case, you want to continue playing, either +repeating the last song or starting at the beginning of the playlist. +So, you should define an :after method on the generic function (setf +repeat). + + (defmethod (setf repeat) :after (value (playlist playlist)) + (if (and (at-end-p playlist) (not (empty-p playlist))) + (ecase value + (:song (setf (current-idx playlist) (1- (table-size (songs-table playlist))))) + (:none) + (:all (setf (current-idx playlist) 0))) + (update-current-if-necessary playlist))) + + Now you have all the underlying bits you need. All that remains is +the code that will provide a Web-based user interface for browsing the +MP3 database and manipulating playlists. The interface will consist of +three main functions defined with define-url-function: one for browsing +the song database, one for viewing and manipulating a single playlist, +and one for listing all the available playlists. + + But before you get to writing these three functions, you need to +start with some helper functions and HTML macros that they'll use. + + +File: pcl.info, Node: 29-4, Next: 29-5, Prev: 29-3, Up: Chapter 29 + +Query Parameter Types +===================== + +Since you'll be using define-url-function, you need to define a few +methods on the string->type generic function from Chapter 28 that +define-url-function uses to convert string query parameters into Lisp +objects. In this application, you'll need methods to convert strings to +integers, keyword symbols, and a list of values. + + The first two are quite simple. + + (defmethod string->type ((type (eql 'integer)) value) + (parse-integer (or value "") :junk-allowed t)) + + (defmethod string->type ((type (eql 'keyword)) value) + (and (plusp (length value)) (intern (string-upcase value) :keyword))) + + The last string->type method is slightly more complex. For reasons +I'll get to in a moment, you'll need to generate pages that display a +form that contains a hidden field whose value is a list of strings. +Since you're responsible for generating the value in the hidden field +and for parsing it when it comes back, you can use whatever encoding is +convenient. You could use the functions WRITE-TO-STRING and +READ-FROM-STRING, which use the Lisp printer and reader to write and +read data to and from strings, except the printed representation of +strings can contain quotation marks and other characters that may cause +problems when embedded in the value attribute of an INPUT element. So, +you'll need to escape those characters somehow. Rather than trying to +come up with your own escaping scheme, you can just use base 64, an +encoding commonly used to protect binary data sent through e-mail. +AllegroServe comes with two functions, base64-encode and base64-decode, +that do the encoding and decoding for you, so all you have to do is +write a pair of functions: one that encodes a Lisp object by converting +it to a readable string with WRITE-TO-STRING and then base 64 encoding +it and, conversely, another to decode such a string by base 64 decoding +it and passing the result to READ-FROM-STRING. You'll want to wrap the +calls to WRITE-TO-STRING and READ-FROM-STRING in WITH-STANDARD-IO-SYNTAX +to make sure all the variables that affect the printer and reader are +set to their standard values. However, because you're going to be +reading data that's coming in from the network, you'll definitely want +to turn off one feature of the reader-the ability to evaluate arbitrary +Lisp code while reading! (1) You can define your own macro +with-safe-io-syntax, which wraps its body forms in +WITH-STANDARD-IO-SYNTAX wrapped around a LET that binds *READ-EVAL* to +NIL. + + (defmacro with-safe-io-syntax (&body body) + `(with-standard-io-syntax + (let ((*read-eval* nil)) + ,@body))) + + Then the encoding and decoding functions are trivial. + + (defun obj->base64 (obj) + (base64-encode (with-safe-io-syntax (write-to-string obj)))) + + (defun base64->obj (string) + (ignore-errors + (with-safe-io-syntax (read-from-string (base64-decode string))))) + + Finally, you can use these functions to define a method on +string->type that defines the conversion for the query parameter type +base64-list. + + (defmethod string->type ((type (eql 'base-64-list)) value) + (let ((obj (base64->obj value))) + (if (listp obj) obj nil))) + + ---------- Footnotes ---------- + + (1) The reader supports a bit of syntax, #., that causes the +following s-expression to be evaluated at read time. This is +occasionally useful in source code but obviously opens a big security +hole when you read untrusted data. However, you can turn off this +syntax by setting *READ-EVAL* to NIL, which will cause the reader to +signal an error if it encounters #.. + + +File: pcl.info, Node: 29-5, Next: 29-6, Prev: 29-4, Up: Chapter 29 + +Boilerplate HTML +================ + +Next you need to define some HTML macros and helper functions to make it +easy to give the different pages in the application a consistent look +and feel. You can start with an HTML macro that defines the basic +structure of a page in the application. + + (define-html-macro :mp3-browser-page ((&key title (header title)) &body body) + `(:html + (:head + (:title ,title) + (:link :rel "stylesheet" :type "text/css" :href "mp3-browser.css")) + (:body + (standard-header) + (when ,header (html (:h1 :class "title" ,header))) + ,@body + (standard-footer)))) + + You should define standard-header and standard-footer as separate +functions for two reasons. First, during development you can redefine +those functions and see the effect immediately without having to +recompile functions that use the :mp3-browser-page macro. Second, it +turns out that one of the pages you'll write later won't be defined with +:mp3-browser-page but will still need the standard header and footers. +They look like this: + + (defparameter *r* 25) + + (defun standard-header () + (html + ((:p :class "toolbar") + "[" (:a :href (link "/browse" :what "genre") "All genres") "] " + "[" (:a :href (link "/browse" :what "genre" :random *r*) "Random genres") "] " + "[" (:a :href (link "/browse" :what "artist") "All artists") "] " + "[" (:a :href (link "/browse" :what "artist" :random *r*) "Random artists") "] " + "[" (:a :href (link "/browse" :what "album") "All albums") "] " + "[" (:a :href (link "/browse" :what "album" :random *r*) "Random albums") "] " + "[" (:a :href (link "/browse" :what "song" :random *r*) "Random songs") "] " + "[" (:a :href (link "/playlist") "Playlist") "] " + "[" (:a :href (link "/all-playlists") "All playlists") "]"))) + + (defun standard-footer () + (html (:hr) ((:p :class "footer") "MP3 Browser v" *major-version* "." *minor-version*))) + + A couple of smaller HTML macros and helper functions automate other +common patterns. The :table-row HTML macro makes it easier to generate +the HTML for a single row of a table. It uses a feature of FOO that +I'll discuss in Chapter 31, an &attributes parameter, which causes uses +of the macro to be parsed just like normal s-expression HTML forms, with +any attributes gathered into a list that will be bound to the +&attributes parameter. It looks like this: + + (define-html-macro :table-row (&attributes attrs &rest values) + `(:tr ,@attrs ,@(loop for v in values collect `(:td ,v)))) + + And the link function generates a URL back into the application to be +used as the HREF attribute with an A element, building a query string +out of a set of keyword/value pairs and making sure all special +characters are properly escaped. For instance, instead of writing this: + + (:a :href "browse?what=artist&genre=Rhythm+%26+Blues" "Artists") + + you can write the following: + + (:a :href (link "browse" :what "artist" :genre "Rhythm & Blues") "Artists") + + It looks like this: + + (defun link (target &rest attributes) + (html + (:attribute + (:format "~a~@[?~{~(~a~)=~a~^&~}~]" target (mapcar #'urlencode attributes))))) + + To URL encode the keys and values, you use the helper function +urlencode, which is a wrapper around the function +encode-form-urlencoded, which is a nonpublic function from AllegroServe. +This is-on one hand-bad form; since the name encode-form-urlencoded +isn't exported from NET.ASERVE, it's possible that +encode-form-urlencoded may go away or get renamed out from under you. +On the other hand, using this unexported symbol for the time being lets +you get work done for the moment; by wrapping encode-form-urlencoded in +your own function, you isolate the crufty code to one function, which +you could rewrite if you had to. + + (defun urlencode (string) + (net.aserve::encode-form-urlencoded string)) + + Finally, you need the CSS style sheet mp3-browser.css used by +:mp3-browser-page. Since there's nothing dynamic about it, it's +probably easiest to just publish a static file with publish-file. + + (publish-file :path "/mp3-browser.css" :file filename :content-type "text/css") + + A sample style sheet is included with the source code for this +chapter on the book's Web site. You'll define a function, at the end of +this chapter, that starts the MP3 browser application. It'll take care +of, among other things, publishing this file. + + +File: pcl.info, Node: 29-6, Next: 29-7, Prev: 29-5, Up: Chapter 29 + +The Browse Page +=============== + +The first URL function will generate a page for browsing the MP3 +database. Its query parameters will tell it what kind of thing the user +is browsing and provide the criteria of what elements of the database +they're interested in. It'll give them a way to select database entries +that match a specific genre, artist, or album. In the interest of +serendipity, you can also provide a way to select a random subset of +matching items. When the user is browsing at the level of individual +songs, the title of the song will be a link that causes that song to be +added to the playlist. Otherwise, each item will be presented with +links that let the user browse the listed item by some other category. +For example, if the user is browsing genres, the entry "Blues" will +contain links to browse all albums, artists, and songs in the genre +Blues. Additionally, the browse page will feature an "Add all" button +that adds every song matching the page's criteria to the user's +playlist. The function looks like this: + + (define-url-function browse + (request (what keyword :genre) genre artist album (random integer)) + + (let* ((values (values-for-page what genre artist album random)) + (title (browse-page-title what random genre artist album)) + (single-column (if (eql what :song) :file what)) + (values-string (values->base-64 single-column values))) + (html + (:mp3-browser-page + (:title title) + ((:form :method "POST" :action "playlist") + (:input :name "values" :type "hidden" :value values-string) + (:input :name "what" :type "hidden" :value single-column) + (:input :name "action" :type "hidden" :value :add-songs) + (:input :name "submit" :type "submit" :value "Add all")) + (:ul (do-rows (row values) (list-item-for-page what row))))))) + + This function starts by using the function values-for-page to get a +table containing the values it needs to present. When the user is +browsing by song-when the what parameter is :song-you want to select +complete rows from the database. But when they're browsing by genre, +artist, or album, you want to select only the distinct values for the +given category. The database function select does most of the heavy +lifting, with values-for-page mostly responsible for passing the right +arguments depending on the value of what. This is also where you select +a random subset of the matching rows if necessary. + + (defun values-for-page (what genre artist album random) + (let ((values + (select + :from *mp3s* + :columns (if (eql what :song) t what) + :where (matching *mp3s* :genre genre :artist artist :album album) + :distinct (not (eql what :song)) + :order-by (if (eql what :song) '(:album :track) what)))) + (if random (random-selection values random) values))) + + To generate the title for the browse page, you pass the browsing +criteria to the following function, browse-page-title: + + (defun browse-page-title (what random genre artist album) + (with-output-to-string (s) + (when random (format s "~:(~r~) Random " random)) + (format s "~:(~a~p~)" what random) + (when (or genre artist album) + (when (not (eql what :song)) (princ " with songs" s)) + (when genre (format s " in genre ~a" genre)) + (when artist (format s " by artist ~a " artist)) + (when album (format s " on album ~a" album))))) + + Once you have the values you want to present, you need to do two +things with them. The main task, of course, is to present them, which +happens in the do-rows loop, leaving the rendering of each row to the +function list-item-for-page. That function renders :song rows one way +and all other kinds another way. + + (defun list-item-for-page (what row) + (if (eql what :song) + (with-column-values (song file album artist genre) row + (html + (:li + (:a :href (link "playlist" :file file :action "add-songs") (:b song)) " from " + (:a :href (link "browse" :what :song :album album) album) " by " + (:a :href (link "browse" :what :song :artist artist) artist) " in genre " + (:a :href (link "browse" :what :song :genre genre) genre)))) + (let ((value (column-value row what))) + (html + (:li value " - " + (browse-link :genre what value) + (browse-link :artist what value) + (browse-link :album what value) + (browse-link :song what value)))))) + + (defun browse-link (new-what what value) + (unless (eql new-what what) + (html + "[" + (:a :href (link "browse" :what new-what what value) (:format "~(~as~)" new-what)) + "] "))) + + The other thing on the browse page is a form with several hidden +INPUT fields and an "Add all" submit button. You need to use an HTML +form instead of a regular link to keep the application stateless-to make +sure all the information needed to respond to a request comes in the +request itself. Because the browse page results can be partially +random, you need to submit a fair bit of data for the server to be able +to reconstitute the list of songs to add to the playlist. If you didn't +allow the browse page to return randomly generated results, you wouldn't +need much data-you could just submit a request to add songs with +whatever search criteria the browse page used. But if you added songs +that way, with criteria that included a random argument, then you'd end +up adding a different set of random songs than the user was looking at +on the page when they hit the "Add all" button. + + The solution you'll use is to send back a form that has enough +information stashed away in a hidden INPUT element to allow the server +to reconstitute the list of songs matching the browse page criteria. +That information is the list of values returned by values-for-page and +the value of the what parameter. This is where you use the base64-list +parameter type; the function values->base64 extracts the values of a +specified column from the table returned by values-for-page into a list +and then makes a base 64-encoded string out of that list to embed in the +form. + + (defun values->base-64 (column values-table) + (flet ((value (r) (column-value r column))) + (obj->base64 (map-rows #'value values-table)))) + + When that parameter comes back as the value of the values query +parameter to a URL function that declares values to be of type +base-64-list, it'll be automatically converted back to a list. As +you'll see in a moment, that list can then be used to construct a query +that'll return the correct list of songs. (1) When you're browsing by +:song, you use the values from the :file column since they uniquely +identify the actual songs while the song names may not. + + ---------- Footnotes ---------- + + (1) This solution has its drawbacks-if a browse page returns a lot of +results, a fair bit of data is going back and forth under the covers. +Also, the database queries aren't necessarily the most efficient. But +it does keep the application stateless. An alternative approach is to +squirrel away, on the server side, information about the results +returned by browse and then, when a request to add songs come in, find +the appropriate bit of information in order to re-create the correct set +of songs. For instance, you could just save the values list instead of +sending it back in the form. Or you could copy the RANDOM-STATE object +before you generate the browse results so you can later re-create the +same "random" results. But this approach causes its own problems. For +instance, you'd then need to worry about when you can get rid of the +squirreled-away information; you never know when the user might hit the +Back button on their browser to return to an old browse page and then +hit the "Add all" button. Welcome to the wonderful world of Web +programming. + + +File: pcl.info, Node: 29-7, Next: 29-8, Prev: 29-6, Up: Chapter 29 + +The Playlist +============ + +This brings me to the next URL function, playlist. This is the most +complex page of the three-it's responsible for displaying the current +contents of the user's playlist as well as for providing the interface +to manipulate the playlist. But with most of the tedious bookkeeping +handled by define-url-function, it's not too hard to see how playlist +works. Here's the beginning of the definition, with just the parameter +list: + + (define-url-function playlist + (request + (playlist-id string (playlist-id request) :package) + (action keyword) ; Playlist manipulation action + (what keyword :file) ; for :add-songs action + (values base-64-list) ; " + file ; for :add-songs and :delete-songs actions + genre ; for :delete-songs action + artist ; " + album ; " + (order-by keyword) ; for :sort action + (shuffle keyword) ; for :shuffle action + (repeat keyword)) ; for :set-repeat action + + In addition to the obligatory request parameter, playlist takes a +number of query parameters. The most important in some ways is +playlist-id, which identifies which playlist object the page should +display and manipulate. For this parameter, you can take advantage of +define-url-function's "sticky parameter" feature. Normally, the +playlist-id won't be supplied explicitly, defaulting to the value +returned by the playlist-id function, namely, the IP address of the +client machine on which the browser is running. However, users can also +manipulate their playlists from different machines than the ones running +their MP3 clients by allowing this value to be explicitly specified. +And if it's specified once, define-url-function will arrange for it to +"stick" by setting a cookie in the browser. Later you'll define a URL +function that generates a list of all existing playlists, which users +can use to pick a playlist other than the one for the machines they're +browsing from. + + The action parameter specifies some action to take on the user's +playlist object. The value of this parameter, which will be converted +to a keyword symbol for you, can be :add-songs, :delete-songs, :clear, +:sort, :shuffle, or :set-repeat. The :add-songs action is used by the +"Add all" button in the browse page and also by the links used to add +individual songs. The other actions are used by the links on the +playlist page itself. + + The file, what, and values parameters are used with the :add-songs +action. By declaring values to be of type base-64-list, the +define-url-function infrastructure will take care of decoding the value +submitted by the "Add all" form. The other parameters are used with +other actions as noted in the comments. + + Now let's look at the body of playlist. The first thing you need to +do is use the playlist-id to look up the queue object and then acquire +the playlist's lock with the following two lines: + + (let ((playlist (lookup-playlist playlist-id))) + (with-playlist-locked (playlist) + + Since lookup-playlist will create a new playlist if necessary, this +will always return a playlist object. Then you take care of any +necessary queue manipulation, dispatching on the value of the action +parameter in order to call one of the playlist functions. + + (case action + (:add-songs (add-songs playlist what (or values (list file)))) + (:delete-songs (delete-songs + playlist + :file file :genre genre + :artist artist :album album)) + (:clear (clear-playlist playlist)) + (:sort (sort-playlist playlist order-by)) + (:shuffle (shuffle-playlist playlist shuffle)) + (:set-repeat (setf (repeat playlist) repeat))) + + All that's left of the playlist function is the actual HTML +generation. Again, you can use the :mp3-browser-page HTML macro to make +sure the basic form of the page matches the other pages in the +application, though this time you pass NIL to the :header argument in +order to leave out the H1 header. Here's the rest of the function: + + (html + (:mp3-browser-page + (:title (:format "Playlist - ~a" (id playlist)) :header nil) + (playlist-toolbar playlist) + (if (empty-p playlist) + (html (:p (:i "Empty."))) + (html + ((:table :class "playlist") + (:table-row "#" "Song" "Album" "Artist" "Genre") + (let ((idx 0) + (current-idx (current-idx playlist))) + (do-rows (row (songs-table playlist)) + (with-column-values (track file song album artist genre) row + (let ((row-style (if (= idx current-idx) "now-playing" "normal"))) + (html + ((:table-row :class row-style) + track + (:progn song (delete-songs-link :file file)) + (:progn album (delete-songs-link :album album)) + (:progn artist (delete-songs-link :artist artist)) + (:progn genre (delete-songs-link :genre genre))))) + (incf idx)))))))))))) + + The function playlist-toolbar generates a toolbar containing links to +playlist to perform the various :action manipulations. And +delete-songs-link generates a link to playlist with the :action +parameter set to :delete-songs and the appropriate arguments to delete +an individual file, or all files on an album, by a particular artist or +in a specific genre. + + (defun playlist-toolbar (playlist) + (let ((current-repeat (repeat playlist)) + (current-sort (ordering playlist)) + (current-shuffle (shuffle playlist))) + (html + (:p :class "playlist-toolbar" + (:i "Sort by:") + " [ " + (sort-playlist-button "genre" current-sort) " | " + (sort-playlist-button "artist" current-sort) " | " + (sort-playlist-button "album" current-sort) " | " + (sort-playlist-button "song" current-sort) " ] " + (:i "Shuffle by:") + " [ " + (playlist-shuffle-button "none" current-shuffle) " | " + (playlist-shuffle-button "song" current-shuffle) " | " + (playlist-shuffle-button "album" current-shuffle) " ] " + (:i "Repeat:") + " [ " + (playlist-repeat-button "none" current-repeat) " | " + (playlist-repeat-button "song" current-repeat) " | " + (playlist-repeat-button "all" current-repeat) " ] " + "[ " (:a :href (link "playlist" :action "clear") "Clear") " ] ")))) + + (defun playlist-button (action argument new-value current-value) + (let ((label (string-capitalize new-value))) + (if (string-equal new-value current-value) + (html (:b label)) + (html (:a :href (link "playlist" :action action argument new-value) label))))) + + (defun sort-playlist-button (order-by current-sort) + (playlist-button :sort :order-by order-by current-sort)) + + (defun playlist-shuffle-button (shuffle current-shuffle) + (playlist-button :shuffle :shuffle shuffle current-shuffle)) + + (defun playlist-repeat-button (repeat current-repeat) + (playlist-button :set-repeat :repeat repeat current-repeat)) + + (defun delete-songs-link (what value) + (html " [" (:a :href (link "playlist" :action :delete-songs what value) "x") "]")) + + +File: pcl.info, Node: 29-8, Next: 29-9, Prev: 29-7, Up: Chapter 29 + +Finding a Playlist +================== + +The last of the three URL functions is the simplest. It presents a +table listing all the playlists that have been created. Ordinarily +users won't need to use this page, but during development it gives you a +useful view into the state of the system. It also provides the +mechanism to choose a different playlist-each playlist ID is a link to +the playlist page with an explicit playlist-id query parameter, which +will then be made sticky by the playlist URL function. Note that you +need to acquire the *playlists-lock* to make sure the *playlists* hash +table doesn't change out from under you while you're iterating over it. + + (define-url-function all-playlists (request) + (:mp3-browser-page + (:title "All Playlists") + ((:table :class "all-playlists") + (:table-row "Playlist" "# Songs" "Most recent user agent") + (with-process-lock (*playlists-lock*) + (loop for playlist being the hash-values of *playlists* do + (html + (:table-row + (:a :href (link "playlist" :playlist-id (id playlist)) (:print (id playlist))) + (:print (table-size (songs-table playlist))) + (:print (user-agent playlist))))))))) + + +File: pcl.info, Node: 29-9, Next: Chapter 30, Prev: 29-8, Up: Chapter 29 + +Running the App +=============== + +And that's it. To use this app, you just need to load the MP3 database +with the load-database function from Chapter 27, publish the CSS style +sheet, set *song-source-type* to playlist so find-song-source uses +playlists instead of the singleton song source defined in the previous +chapter, and start AllegroServe. The following function takes care of +all these steps for you, after you fill in appropriate values for the +two parameters *mp3-dir*, which is the root directory of your MP3 +collection, and *mp3-css*, the filename of the CSS style sheet: + + (defparameter *mp3-dir* ...) + + (defparameter *mp3-css* ...) + + (defun start-mp3-browser () + (load-database *mp3-dir* *mp3s*) + (publish-file :path "/mp3-browser.css" :file *mp3-css* :content-type "text/css") + (setf *song-source-type* 'playlist) + (net.aserve::debug-on :notrap) + (net.aserve:start :port 2001)) + + When you invoke this function, it will print dots while it loads the +ID3 information from your ID3 files. Then you can point your MP3 client +at this URL: + + http://localhost:2001/stream.mp3 + + and point your browser at some good starting place, such as this: + + http://localhost:2001/browse + + which will let you start browsing by the default category, Genre. +After you've added some songs to the playlist, you can press Play on the +MP3 client, and it should start playing the first song. + + Obviously, you could improve the user interface in any of a number of +ways-for instance, if you have a lot of MP3s in your library, it might +be useful to be able to browse artists or albums by the first letter of +their names. Or maybe you could add a "Play whole album" button to the +playlist page that causes the playlist to immediately put all the songs +from the same album as the currently playing song at the top of the +playlist. Or you could change the playlist class, so instead of playing +silence when there are no songs queued up, it picks a random song from +the database. But all those ideas fall in the realm of application +design, which isn't really the topic of this book. Instead, the next +two chapters will drop back to the level of software infrastructure to +cover how the FOO HTML generation library works. + + +File: pcl.info, Node: Chapter 30, Next: Chapter 31, Prev: Chapter 29, Up: Top + +30. Practical: An HTML Generation Library, the Interpreter +========================================================== + +In this chapter and the next you'll take a look under the hood of the +FOO HTML generator that you've been using in the past few chapters. FOO +is an example of a kind of programming that's quite common in Common +Lisp and relatively uncommon in non-Lisp languages, namely, +language-oriented programming. Rather than provide an API built +primarily out of functions, classes, and macros, FOO provides language +processors for a domain-specific language that you can embed in your +Common Lisp programs. + + FOO provides two language processors for the same s-expression +language. One is an interpreter that takes a FOO "program" as data and +interprets it to generate HTML. The other is a compiler that compiles +FOO expressions, possibly with embedded Common Lisp code, into Common +Lisp that generates HTML and runs the embedded code. The interpreter is +exposed as the function emit-html and the compiler as the macro html, +which you used in previous chapters. + + In this chapter you'll look at some of the infrastructure shared +between the interpreter and the compiler and then at the implementation +of the interpreter. In the next chapter, I'll show you how the compiler +works. + +* Menu: + +* 30-1:: Designing a Domain-Specific Language +* 30-2:: The FOO Language +* 30-3:: Character Escaping +* 30-4:: Indenting Printer +* 30-5:: HTML Processor Interface +* 30-6:: The Pretty Printer Backend +* 30-7:: The Basic Evaluation Rule +* 30-8:: What's Next? + + +File: pcl.info, Node: 30-1, Next: 30-2, Prev: Chapter 30, Up: Chapter 30 + +Designing a Domain-Specific Language +==================================== + +Designing an embedded language requires two steps: first, design the +language that'll allow you to express the things you want to express, +and second, implement a processor, or processors, that accepts a +"program" in that language and either performs the actions indicated by +the program or translates the program into Common Lisp code that'll +perform equivalent behaviors. + + So, step one is to design the HTML-generating language. The key to +designing a good domain-specific language is to strike the right balance +between expressiveness and concision. For instance, a highly expressive +but not very concise "language" for generating HTML is the language of +literal HTML strings. The legal "forms" of this language are strings +containing literal HTML. Language processors for this "language" could +process such forms by simply emitting them as-is. + + (defvar *html-output* *standard-output*) + + (defun emit-html (html) + "An interpreter for the literal HTML language." + (write-sequence html *html-output*)) + + (defmacro html (html) + "A compiler for the literal HTML language." + `(write-sequence ,html *html-output*)) + + This "language" is highly expressive since it can express any HTML +you could possibly want to generate. (1) On the other hand, this +language doesn't win a lot of points for its concision because it gives +you zero compression-its input is its output. + + To design a language that gives you some useful compression without +sacrificing too much expressiveness, you need to identify the details of +the output that are either redundant or uninteresting. You can then +make those aspects of the output implicit in the semantics of the +language. + + For instance, because of the structure of HTML, every opening tag is +paired with a matching closing tag. (2) When you write HTML by hand, +you have to write those closing tags, but you can improve the concision +of your HTML-generating language by making the closing tags implicit. + + Another way you can gain concision at a slight cost in expressiveness +is to make the language processors responsible for adding appropriate +whitespace between elements-blank lines and indentation. When you're +generating HTML programmatically, you typically don't care much about +which elements have line breaks before or after them or about whether +different elements are indented relative to their parent elements. +Letting the language processor insert whitespace according to some rule +means you don't have to worry about it. As it turns out, FOO actually +supports two modes-one that uses the minimum amount of whitespace, which +allows it to generate extremely efficient code and compact HTML, and +another that generates nicely formatted HTML with different elements +indented and separated from other elements according to their role. + + Another detail that's best moved into the language processor is the +escaping of certain characters that have a special meaning in HTML such +as <, >, and &. Obviously, if you generate HTML by just printing +strings to a stream, then it's up to you to replace any occurrences of +those characters in the string with the appropriate escape sequences, +<, > and &. But if the language processor can know which +strings are to be emitted as element data, then it can take care of +automatically escaping those characters for you. + + ---------- Footnotes ---------- + + (1) In fact, it's probably too expressive since it can also generate +all sorts of output that's not even vaguely legal HTML. Of course, that +might be a feature if you need to generate HTML that's not strictly +correct to compensate for buggy Web browsers. Also, it's common for +language processors to accept programs that are syntactically correct +and otherwise well formed that'll nonetheless provoke undefined behavior +when run. + + (2) Well, almost every tag. Certain tags such as IMG and BR don't. +You'll deal with those in the section "The Basic Evaluation Rule." + + +File: pcl.info, Node: 30-2, Next: 30-3, Prev: 30-1, Up: Chapter 30 + +The FOO Language +================ + +So, enough theory. I'll give you a quick overview of the language +implemented by FOO, and then you'll look at the implementation of the +two FOO language processors-the interpreter, in this chapter, and the +compiler, in the next. + + Like Lisp itself, the basic syntax of the FOO language is defined in +terms of forms made up of Lisp objects. The language defines how each +legal FOO form is translated into HTML. + + The simplest FOO forms are self-evaluating Lisp objects such as +strings, numbers, and keyword symbols. (1) You'll need a function +self-evaluating-p that tests whether a given object is self-evaluating +for FOO's purposes. + + (defun self-evaluating-p (form) + (and (atom form) (if (symbolp form) (keywordp form) t))) + + Objects that satisfy this predicate will be emitted by converting +them to strings with PRINC-TO-STRING and then escaping any reserved +characters, such as <, >, or &. When the value is being emitted as an +attribute, the characters ", and ' are also escaped. Thus, you can +invoke the html macro on a self-evaluating object to emit it to +*html-output* (which is initially bound to *STANDARD-OUTPUT*). Table +30-1 shows how a few different self-evaluating values will be output. + + Table 30-1. FOO Output for Self-Evaluating Objects + + + + Of course, most HTML consists of tagged elements. The three pieces +of information that describe each element are the tag, a set of +attributes, and a body containing text and/or more HTML elements. Thus, +you need a way to represent these three pieces of information as Lisp +objects, preferably ones that the Lisp reader already knows how to read. +(2) If you forget about attributes for a moment, there's an obvious +mapping between Lisp lists and HTML elements: any HTML element can be +represented by a list whose FIRST is a symbol where the name is the name +of the element's tag and whose REST is a list of self-evaluating objects +or lists representing other HTML elements. Thus: + + <p>Foo</p> <==> (:p "Foo") + + <p><i>Now</i> is the time</p> <==> (:p (:i "Now") " is the time") + + Now the only problem is where to squeeze in the attributes. Since +most elements have no attributes, it'd be nice if you could use the +preceding syntax for elements without attributes. FOO provides two ways +to notate elements with attributes. The first is to simply include the +attributes in the list immediately following the symbol, alternating +keyword symbols naming the attributes and objects representing the +attribute value forms. The body of the element starts with the first +item in the list that's in a position to be an attribute name and isn't +a keyword symbol. Thus: + + HTML> (html (:p "foo")) + <p>foo</p> + NIL + HTML> (html (:p "foo " (:i "bar") " baz")) + <p>foo <i>bar</i> baz</p> + NIL + HTML> (html (:p :style "foo" "Foo")) + <p style='foo'>Foo</p> + NIL + HTML> (html (:p :id "x" :style "foo" "Foo")) + <p id='x' style='foo'>Foo</p> + NIL + + For folks who prefer a bit more obvious delineation between the +element's attributes and its body, FOO supports an alternative syntax: +if the first element of a list is itself a list with a keyword as its +first element, then the outer list represents an HTML element with that +keyword indicating the tag, with the REST of the nested list as the +attributes, and with the REST of the outer list as the body. Thus, you +could write the previous two expressions like this: + + HTML> (html ((:p :style "foo") "Foo")) + <p style='foo'>Foo</p> + NIL + HTML> (html ((:p :id "x" :style "foo") "Foo")) + <p id='x' style='foo'>Foo</p> + NIL + + The following function tests whether a given object matches either of +these syntaxes: + + (defun cons-form-p (form &optional (test #'keywordp)) + (and (consp form) + (or (funcall test (car form)) + (and (consp (car form)) (funcall test (caar form)))))) + + You should parameterize the test function because later you'll need +to test the same two syntaxes with a slightly different predicate on the +name. + + To completely abstract the differences between the two syntax +variants, you can define a function, parse-cons-form, that takes a form +and parses it into three elements, the tag, the attributes plist, and +the body list, returning them as multiple values. The code that +actually evaluates cons forms will use this function and not have to +worry about which syntax was used. + + (defun parse-cons-form (sexp) + (if (consp (first sexp)) + (parse-explicit-attributes-sexp sexp) + (parse-implicit-attributes-sexp sexp))) + + (defun parse-explicit-attributes-sexp (sexp) + (destructuring-bind ((tag &rest attributes) &body body) sexp + (values tag attributes body))) + + (defun parse-implicit-attributes-sexp (sexp) + (loop with tag = (first sexp) + for rest on (rest sexp) by #'cddr + while (and (keywordp (first rest)) (second rest)) + when (second rest) + collect (first rest) into attributes and + collect (second rest) into attributes + end + finally (return (values tag attributes rest)))) + + Now that you have the basic language specified, you can think about +how you're actually going to implement the language processors. How do +you get from a series of FOO forms to the desired HTML? As I mentioned +previously, you'll be implementing two language processors for FOO: an +interpreter that walks a tree of FOO forms and emits the corresponding +HTML directly and a compiler that walks a tree and translates it into +Common Lisp code that'll emit the same HTML. Both the interpreter and +compiler will be built on top of a common foundation of code, which +provides support for things such as escaping reserved characters and +generating nicely indented output, so it makes sense to start there. + + ---------- Footnotes ---------- + + (1) In the strict language of the Common Lisp standard, keyword +symbols aren't self-evaluating, though they do, in fact, evaluate to +themselves. See section 3.1.2.1.3 of the language standard or HyperSpec +for a brief discussion. + + (2) The requirement to use objects that the Lisp reader knows how to +read isn't a hard-and-fast one. Since the Lisp reader is itself +customizable, you could also define a new reader-level syntax for a new +kind of object. But that tends to be more trouble than it's worth. + + +File: pcl.info, Node: 30-3, Next: 30-4, Prev: 30-2, Up: Chapter 30 + +Character Escaping +================== + +The first bit of the foundation you'll need to lay is the code that +knows how to escape characters with a special meaning in HTML. There are +three such characters, and they must not appear in the text of an +element or in an attribute value; they are <, >, and &. In element text +or attribute values, these characters must be replaced with the +character reference entities <, >, and &. Similarly, in +attribute values, the quotation marks used to delimit the value must be +escaped, ' with ' and " with ". Additionally, any character +can be represented by a numeric character reference entity consisting of +an ampersand, followed by a sharp sign, followed by the numeric code as +a base 10 integer, and followed by a semicolon. These numeric escapes +are sometimes used to embed non-ASCII characters in HTML. + +* Menu: + +* 30-3-1:: The Package + + The following function accepts a single character and returns a +string containing a character reference entity for that character: + + (defun escape-char (char) + (case char + (#\& "&") + (#\< "<") + (#\> ">") + (#\' "'") + (#\" """) + (t (format nil "&#~d;" (char-code char))))) + + You can use this function as the basis for a function, escape, that +takes a string and a sequence of characters and returns a copy of the +first argument with all occurrences of the characters in the second +argument replaced with the corresponding character entity returned by +escape-char. + + (defun escape (in to-escape) + (flet ((needs-escape-p (char) (find char to-escape))) + (with-output-to-string (out) + (loop for start = 0 then (1+ pos) + for pos = (position-if #'needs-escape-p in :start start) + do (write-sequence in out :start start :end pos) + when pos do (write-sequence (escape-char (char in pos)) out) + while pos)))) + + You can also define two parameters: *element-escapes*, which contains +the characters you need to escape in normal element data, and +*attribute-escapes*, which contains the set of characters to be escaped +in attribute values. + + (defparameter *element-escapes* "<>&") + (defparameter *attribute-escapes* "<>&\"'") + + Here are some examples: + + HTML> (escape "foo & bar" *element-escapes*) + "foo & bar" + HTML> (escape "foo & 'bar'" *element-escapes*) + "foo & 'bar'" + HTML> (escape "foo & 'bar'" *attribute-escapes*) + "foo & 'bar'" + + Finally, you'll need a variable, *escapes*, that will be bound to the +set of characters that need to be escaped. It's initially set to the +value of *element-escapes*, but when generating attributes, it will, as +you'll see, be rebound to the value of *attribute-escapes*. + + (defvar *escapes* *element-escapes*) + + +File: pcl.info, Node: 30-3-1, Next: 30-3, Prev: 30-3, Up: 30-3 + +The Package +=========== + +Since FOO is a low-level library, the package you develop it in doesn't +rely on much external code-just the usual dependency on names from the +COMMON-LISP package and, almost as usual, on the names of the +macro-writing macros from COM.GIGAMONKEYS.MACRO-UTILITIES. On the other +hand, the package needs to export all the names needed by code that uses +FOO. Here's the DEFPACKAGE from the source that you can download from +the book's Web site: + + (defpackage :com.gigamonkeys.html + (:use :common-lisp :com.gigamonkeys.macro-utilities) + (:export :with-html-output + :in-html-style + :define-html-macro + :html + :emit-html + :&attributes)) + + +File: pcl.info, Node: 30-4, Next: 30-5, Prev: 30-3, Up: Chapter 30 + +Indenting Printer +================= + +To handle generating nicely indented output, you can define a class +indenting-printer, which wraps around an output stream, and functions +that use an instance of that class to emit strings to the stream while +keeping track of when it's at the beginning of the line. The class +looks like this: + + (defclass indenting-printer () + ((out :accessor out :initarg :out) + (beginning-of-line-p :accessor beginning-of-line-p :initform t) + (indentation :accessor indentation :initform 0) + (indenting-p :accessor indenting-p :initform t))) + + The main function that operates on indenting-printers is emit, which +takes the printer and a string and emits the string to the printer's +output stream, keeping track of when it emits a newline so it can reset +the beginning-of-line-p slot. + + (defun emit (ip string) + (loop for start = 0 then (1+ pos) + for pos = (position #\Newline string :start start) + do (emit/no-newlines ip string :start start :end pos) + when pos do (emit-newline ip) + while pos)) + + To actually emit the string, it uses the function emit/no-newlines, +which emits any needed indentation, via the helper indent-if-necessary, +and then writes the string to the stream. This function can also be +called directly by other code to emit a string that's known not to +contain any newlines. + + (defun emit/no-newlines (ip string &key (start 0) end) + (indent-if-necessary ip) + (write-sequence string (out ip) :start start :end end) + (unless (zerop (- (or end (length string)) start)) + (setf (beginning-of-line-p ip) nil))) + + The helper indent-if-necessary checks beginning-of-line-p and +indenting-p to determine whether it needs to emit indentation and, if +they're both true, emits as many spaces as indicated by the value of +indentation. Code that uses the indenting-printer can control the +indentation by manipulating the indentation and indenting-p slots. +Incrementing and decrementing indentation changes the number of leading +spaces, while setting indenting-p to NIL can temporarily turn off +indentation. + + (defun indent-if-necessary (ip) + (when (and (beginning-of-line-p ip) (indenting-p ip)) + (loop repeat (indentation ip) do (write-char #\Space (out ip))) + (setf (beginning-of-line-p ip) nil))) + + The last two functions in the indenting-printer API are emit-newline +and emit-freshline, which are both used to emit a newline character, +similar to the ~% and ~& FORMAT directives. That is, the only +difference is that emit-newline always emits a newline, while +emit-freshline does so only if beginning-of-line-p is false. Thus, +multiple calls to emit-freshline without any intervening emits won't +result in a blank line. This is handy when one piece of code wants to +generate some output that should end with a newline while another piece +of code wants to generate some output that should start on a newline but +you don't want a blank line between the two bits of output. + + (defun emit-newline (ip) + (write-char #\Newline (out ip)) + (setf (beginning-of-line-p ip) t)) + + (defun emit-freshline (ip) + (unless (beginning-of-line-p ip) (emit-newline ip))) + + With those preliminaries out of the way, you're ready to get to the +guts of the FOO processor. + + +File: pcl.info, Node: 30-5, Next: 30-6, Prev: 30-4, Up: Chapter 30 + +HTML Processor Interface +======================== + +Now you're ready to define the interface that'll be used by the FOO +language processor to emit HTML. You can define this interface as a set +of generic functions because you'll need two implementations-one that +actually emits HTML and another that the html macro can use to collect a +list of actions that need to be performed, which can then be optimized +and compiled into code that emits the same output in a more efficient +way. I'll call this set of generic functions the backend interface. It +consists of the following eight generic functions: + + (defgeneric raw-string (processor string &optional newlines-p)) + + (defgeneric newline (processor)) + + (defgeneric freshline (processor)) + + (defgeneric indent (processor)) + + (defgeneric unindent (processor)) + + (defgeneric toggle-indenting (processor)) + + (defgeneric embed-value (processor value)) + + (defgeneric embed-code (processor code)) + + While several of these functions have obvious correspondence to +indenting-printer functions, it's important to understand that these +generic functions define the abstract operations that are used by the +FOO language processors and won't always be implemented in terms of +calls to the indenting-printer functions. + + That said, perhaps the easiest way to understand the semantics of +these abstract operations is to look at the concrete implementations of +the methods specialized on html-pretty-printer, the class used to +generate human-readable HTML. + + +File: pcl.info, Node: 30-6, Next: 30-7, Prev: 30-5, Up: Chapter 30 + +The Pretty Printer Backend +========================== + +You can start by defining a class with two slots-one to hold an instance +of indenting-printer and one to hold the tab width-the number of spaces +you want to increase the indentation for each level of nesting of HTML +elements. + + (defclass html-pretty-printer () + ((printer :accessor printer :initarg :printer) + (tab-width :accessor tab-width :initarg :tab-width :initform 2))) + + Now you can implement methods specialized on html-pretty-printer on +the eight generic functions that make up the backend interface. + + The FOO processors use the raw-string function to emit strings that +don't need character escaping, either because you actually want to emit +normally reserved characters or because all reserved characters have +already been escaped. Usually raw-string is invoked with strings that +don't contain newlines, so the default behavior is to use +emit/no-newlines unless the caller specifies a non-NIL newlines-p +argument. + + (defmethod raw-string ((pp html-pretty-printer) string &optional newlines-p) + (if newlines-p + (emit (printer pp) string) + (emit/no-newlines (printer pp) string))) + + The functions newline, freshline, indent, unindent, and +toggle-indenting implement fairly straightforward manipulations of the +underlying indenting-printer. The only wrinkle is that the HTML pretty +printer generates pretty output only when the dynamic variable *pretty* +is true. When it's NIL, you should generate compact HTML with no +unnecessary whitespace. So, these methods, with the exception of +newline, all check *pretty* before doing anything: (1) + + (defmethod newline ((pp html-pretty-printer)) + (emit-newline (printer pp))) + + (defmethod freshline ((pp html-pretty-printer)) + (when *pretty* (emit-freshline (printer pp)))) + + (defmethod indent ((pp html-pretty-printer)) + (when *pretty* + (incf (indentation (printer pp)) (tab-width pp)))) + + (defmethod unindent ((pp html-pretty-printer)) + (when *pretty* + (decf (indentation (printer pp)) (tab-width pp)))) + + (defmethod toggle-indenting ((pp html-pretty-printer)) + (when *pretty* + (with-slots (indenting-p) (printer pp) + (setf indenting-p (not indenting-p))))) + + Finally, the functions embed-value and embed-code are used only by +the FOO compiler-embed-value is used to generate code that'll emit the +value of a Common Lisp expression, while embed-code is used to embed a +bit of code to be run and its result discarded. In the interpreter, you +can't meaningfully evaluate embedded Lisp code, so the methods on these +functions always signal an error. + + (defmethod embed-value ((pp html-pretty-printer) value) + (error "Can't embed values when interpreting. Value: ~s" value)) + + (defmethod embed-code ((pp html-pretty-printer) code) + (error "Can't embed code when interpreting. Code: ~s" code)) + +* Menu: + +* 30-6-1:: Using Conditions to Have Your Cake and Eat It Too + + ---------- Footnotes ---------- + + (1) Another, more purely object-oriented, approach would be to define +two classes, perhaps html-pretty-printer and html-raw-printer, and then +define no-op methods specialized on html-raw-printer for the methods +that should do stuff only when *pretty* is true. However, in this case, +after defining all the no-op methods, you'd end up with more code, and +then you'd have the hassle of making sure you created an instance of the +right class at the right time. But in general, using polymorphism to +replace conditionals is a good strategy. + + +File: pcl.info, Node: 30-6-1, Next: 30-6, Prev: 30-6, Up: 30-6 + +Using Conditions to Have Your Cake and Eat It Too +================================================= + +An alternate approach would be to use EVAL to evaluate Lisp expressions +in the interpreter. The problem with this approach is that EVAL has no +access to the lexical environment. Thus, there's no way to make +something like this work: + + (let ((x 10)) (emit-html '(:p x))) + + when x is a lexical variable. The symbol x that's passed to +emit-html at runtime has no particular connection to the lexical +variable named with the same symbol. The Lisp compiler arranges for +references to x in the code to refer to the variable, but after the code +is compiled, there's no longer necessarily any association between the +name x and that variable. This is the main reason that when you think +EVAL is the solution to your problem, you're probably wrong. + + However, if x was a dynamic variable, declared with DEFVAR or +DEFPARAMETER (and likely named *x* instead of x), EVAL could get at its +value. Thus, it might be useful to allow the FOO interpreter to use +EVAL in some situations. But it's a bad idea to always use EVAL. You +can get the best of both worlds by combining the idea of using EVAL with +the condition system. + + First define some error classes that you can signal when embed-value +and embed-code are called in the interpreter. + + (define-condition embedded-lisp-in-interpreter (error) + ((form :initarg :form :reader form))) + + (define-condition value-in-interpreter (embedded-lisp-in-interpreter) () + (:report + (lambda (c s) + (format s "Can't embed values when interpreting. Value: ~s" (form c))))) + + (define-condition code-in-interpreter (embedded-lisp-in-interpreter) () + (:report + (lambda (c s) + (format s "Can't embed code when interpreting. Code: ~s" (form c))))) + + Now you can implement embed-value and embed-code to signal those +errors and provide a restart that'll evaluate the form with EVAL. + + (defmethod embed-value ((pp html-pretty-printer) value) + (restart-case (error 'value-in-interpreter :form value) + (evaluate () + :report (lambda (s) (format s "EVAL ~s in null lexical environment." value)) + (raw-string pp (escape (princ-to-string (eval value)) *escapes*) t)))) + + (defmethod embed-code ((pp html-pretty-printer) code) + (restart-case (error 'code-in-interpreter :form code) + (evaluate () + :report (lambda (s) (format s "EVAL ~s in null lexical environment." code)) + (eval code)))) + + Now you can do something like this: + + HTML> (defvar *x* 10) + *X* + HTML> (emit-html '(:p *x*)) + + and you'll get dropped into the debugger with this message: + + Can't embed values when interpreting. Value: *X* + [Condition of type VALUE-IN-INTERPRETER] + + Restarts: + 0: [EVALUATE] EVAL *X* in null lexical environment. + 1: [ABORT] Abort handling SLIME request. + 2: [ABORT] Abort entirely from this process. + + If you invoke the evaluate restart, embed-value will EVAL *x*, get +the value 10, and generate this HTML: + + <p>10</p> + + Then, as a convenience, you can provide restart functions-functions +that invoke the evaluate restart-in certain situations. The evaluate +restart function unconditionally invokes the restart, while +eval-dynamic-variables and eval-code invoke it only if the form in the +condition is a dynamic variable or potential code. + + (defun evaluate (&optional condition) + (declare (ignore condition)) + (invoke-restart 'evaluate)) + + (defun eval-dynamic-variables (&optional condition) + (when (and (symbolp (form condition)) (boundp (form condition))) + (evaluate))) + + (defun eval-code (&optional condition) + (when (consp (form condition)) + (evaluate))) + + Now you can use HANDLER-BIND to set up a handler to automatically +invoke the evaluate restart for you. + + HTML> (handler-bind ((value-in-interpreter #'evaluate)) (emit-html '(:p *x*))) + <p>10</p> + T + + Finally, you can define a macro to provide a nicer syntax for binding +handlers for the two kinds of errors. + + (defmacro with-dynamic-evaluation ((&key values code) &body body) + `(handler-bind ( + ,@(if values `((value-in-interpreter #'evaluate))) + ,@(if code `((code-in-interpreter #'evaluate)))) + ,@body)) + + With this macro defined, you can write this: + + HTML> (with-dynamic-evaluation (:values t) (emit-html '(:p *x*))) + <p>10</p> + T + + +File: pcl.info, Node: 30-7, Next: 30-8, Prev: 30-6, Up: Chapter 30 + +The Basic Evaluation Rule +========================= + +Now to connect the FOO language to the processor interface, all you need +is a function that takes an object and processes it, invoking the +appropriate processor functions to generate HTML. For instance, when +given a simple form like this: + + (:p "Foo") + + this function might execute this sequence of calls on the processor: + + (freshline processor) + (raw-string processor "<p" nil) + (raw-string processor ">" nil) + (raw-string processor "Foo" nil) + (raw-string processor "</p>" nil) + (freshline processor) + + For now you can define a simple function that just checks whether a +form is, in fact, a legal FOO form and, if it is, hands it off to the +function process-sexp-html for processing. In the next chapter, you'll +add some bells and whistles to this function to allow it to handle +macros and special operators. But for now it looks like this: + + (defun process (processor form) + (if (sexp-html-p form) + (process-sexp-html processor form) + (error "Malformed FOO form: ~s" form))) + + The function sexp-html-p determines whether the given object is a +legal FOO expression, either a self-evaluating form or a properly +formatted cons. + + (defun sexp-html-p (form) + (or (self-evaluating-p form) (cons-form-p form))) + + Self-evaluating forms are easily handled: just convert to a string +with PRINC-TO-STRING and escape the characters in the variable +*escapes*, which, as you'll recall, is initially bound to the value of +*element-escapes*. Cons forms you pass off to process-cons-sexp-html. + + (defun process-sexp-html (processor form) + (if (self-evaluating-p form) + (raw-string processor (escape (princ-to-string form) *escapes*) t) + (process-cons-sexp-html processor form))) + + The function process-cons-sexp-html is then responsible for emitting +the opening tag, any attributes, the body, and the closing tag. The +main complication here is that to generate pretty HTML, you need to emit +fresh lines and adjust the indentation according to the type of the +element being emitted. You can categorize all the elements defined in +HTML into one of three categories: block, paragraph, and inline. Block +elements-such as body and ul-are emitted with fresh lines before and +after both their opening and closing tags and with their contents +indented one level. Paragraph elements-such as p, li, and +blockquote-are emitted with a fresh line before the opening tag and +after the closing tag. Inline elements are simply emitted in line. The +following three parameters list the elements of each type: + + (defparameter *block-elements* + '(:body :colgroup :dl :fieldset :form :head :html :map :noscript :object + :ol :optgroup :pre :script :select :style :table :tbody :tfoot :thead + :tr :ul)) + + (defparameter *paragraph-elements* + '(:area :base :blockquote :br :button :caption :col :dd :div :dt :h1 + :h2 :h3 :h4 :h5 :h6 :hr :input :li :link :meta :option :p :param + :td :textarea :th :title)) + + (defparameter *inline-elements* + '(:a :abbr :acronym :address :b :bdo :big :cite :code :del :dfn :em + :i :img :ins :kbd :label :legend :q :samp :small :span :strong :sub + :sup :tt :var)) + + The functions block-element-p and paragraph-element-p test whether a +given tag is a member of the corresponding list. (1) + + (defun block-element-p (tag) (find tag *block-elements*)) + + (defun paragraph-element-p (tag) (find tag *paragraph-elements*)) + + Two other categorizations with their own predicates are the elements +that are always empty, such as br and hr, and the three elements, pre, +style, and script, in which whitespace is supposed to be preserved. The +former are handled specially when generating regular HTML (in other +words, not XHTML) since they're not supposed to have a closing tag. And +when emitting the three tags in which whitespace is preserved, you can +temporarily turn off indentation so the pretty printer doesn't add any +spaces that aren't part of the element's actual contents. + + (defparameter *empty-elements* + '(:area :base :br :col :hr :img :input :link :meta :param)) + + (defparameter *preserve-whitespace-elements* '(:pre :script :style)) + + (defun empty-element-p (tag) (find tag *empty-elements*)) + + (defun preserve-whitespace-p (tag) (find tag *preserve-whitespace-elements*)) + + The last piece of information you need when generating HTML is +whether you're generating XHTML since that affects how you emit empty +elements. + + (defparameter *xhtml* nil) + + With all that information, you're ready to process a cons FOO form. +You use parse-cons-form to parse the list into three parts, the tag +symbol, a possibly empty plist of attribute key/value pairs, and a +possibly empty list of body forms. You then emit the opening tag, the +body, and the closing tag with the helper functions emit-open-tag, +emit-element-body, and emit-close-tag. + + (defun process-cons-sexp-html (processor form) + (when (string= *escapes* *attribute-escapes*) + (error "Can't use cons forms in attributes: ~a" form)) + (multiple-value-bind (tag attributes body) (parse-cons-form form) + (emit-open-tag processor tag body attributes) + (emit-element-body processor tag body) + (emit-close-tag processor tag body))) + + In emit-open-tag you have to call freshline when appropriate and then +emit the attributes with emit-attributes. You need to pass the +element's body to emit-open-tag so when it's emitting XHTML, it knows +whether to finish the tag with /> or >. + + (defun emit-open-tag (processor tag body-p attributes) + (when (or (paragraph-element-p tag) (block-element-p tag)) + (freshline processor)) + (raw-string processor (format nil "<~(~a~)" tag)) + (emit-attributes processor attributes) + (raw-string processor (if (and *xhtml* (not body-p)) "/>" ">"))) + + In emit-attributes the attribute names aren't evaluated since they +must be keyword symbols, but you should invoke the top-level process +function to evaluate the attribute values, binding *escapes* to +*attribute-escapes*. As a convenience for specifying boolean +attributes, whose value should be the name of the attribute, if the +value is T-not just any true value but actually T-then you replace the +value with the name of the attribute. (2) + + (defun emit-attributes (processor attributes) + (loop for (k v) on attributes by #'cddr do + (raw-string processor (format nil " ~(~a~)='" k)) + (let ((*escapes* *attribute-escapes*)) + (process processor (if (eql v t) (string-downcase k) v))) + (raw-string processor "'"))) + + Emitting the element's body is similar to emitting the attribute +values: you can loop through the body calling process to evaluate each +form. The rest of the code is dedicated to emitting fresh lines and +adjusting the indentation as appropriate for the type of element. + + (defun emit-element-body (processor tag body) + (when (block-element-p tag) + (freshline processor) + (indent processor)) + (when (preserve-whitespace-p tag) (toggle-indenting processor)) + (dolist (item body) (process processor item)) + (when (preserve-whitespace-p tag) (toggle-indenting processor)) + (when (block-element-p tag) + (unindent processor) + (freshline processor))) + + Finally, emit-close-tag, as you'd probably expect, emits the closing +tag (unless no closing tag is necessary, such as when the body is empty +and you're either emitting XHTML or the element is one of the special +empty elements). Regardless of whether you actually emit a close tag, +you need to emit a final fresh line for block and paragraph elements. + + (defun emit-close-tag (processor tag body-p) + (unless (and (or *xhtml* (empty-element-p tag)) (not body-p)) + (raw-string processor (format nil "</~(~a~)>" tag))) + (when (or (paragraph-element-p tag) (block-element-p tag)) + (freshline processor))) + + The function process is the basic FOO interpreter. To make it a bit +easier to use, you can define a function, emit-html, that invokes +process, passing it an html-pretty-printer and a form to evaluate. You +can define and use a helper function, get-pretty-printer, to get the +pretty printer, which returns the current value of *html-pretty-printer* +if it's bound; otherwise, it makes a new instance of html-pretty-printer +with *html-output* as its output stream. + + (defun emit-html (sexp) (process (get-pretty-printer) sexp)) + + (defun get-pretty-printer () + (or *html-pretty-printer* + (make-instance + 'html-pretty-printer + :printer (make-instance 'indenting-printer :out *html-output*)))) + + With this function, you can emit HTML to *html-output*. Rather than +expose the variable *html-output* as part of FOO's public API, you +should define a macro, with-html-output, that takes care of binding the +stream for you. It also lets you specify whether you want pretty HTML +output, defaulting to the value of the variable *pretty*. + + (defmacro with-html-output ((stream &key (pretty *pretty*)) &body body) + `(let* ((*html-output* ,stream) + (*pretty* ,pretty)) + ,@body)) + + So, if you wanted to use emit-html to generate HTML to a file, you +could write the following: + + (with-open-file (out "foo.html" :direction output) + (with-html-output (out :pretty t) + (emit-html *some-foo-expression*))) + + ---------- Footnotes ---------- + + (1) You don't need a predicate for *inline-elements* since you only +ever test for block and paragraph elements. I include the parameter +here for completeness. + + (2) While XHTML requires boolean attributes to be notated with their +name as the value to indicate a true value, in HTML it's also legal to +simply include the name of the attribute with no value, for example, +<option selected> rather than <option selected='selected'>. All HTML +4.0-compatible browsers should understand both forms, but some buggy +browsers understand only the no-value form for certain attributes. If +you need to generate HTML for such browsers, you'll need to hack +emit-attributes to emit those attributes a bit differently. + + +File: pcl.info, Node: 30-8, Next: Chapter 31, Prev: 30-7, Up: Chapter 30 + +What's Next? +============ + +In the next chapter, you'll look at how to implement a macro that +compiles FOO expressions into Common Lisp so you can embed HTML +generation code directly into your Lisp programs. You'll also extend +the FOO language to make it a bit more expressive by adding its own +flavor of special operators and macros. + + +File: pcl.info, Node: Chapter 31, Next: Chapter 32, Prev: Chapter 30, Up: Top + +31. Practical: An HTML Generation Library, the Compiler +======================================================= + +Now you're ready to look at how the FOO compiler works. The main +difference between a compiler and an interpreter is that an interpreter +processes a program and directly generates some behavior-generating HTML +in the case of a FOO interpreter-but a compiler processes the same +program and generates code in some other language that will exhibit the +same behavior. In FOO, the compiler is a Common Lisp macro that +translates FOO into Common Lisp so it can be embedded in a Common Lisp +program. Compilers, in general, have the advantage over interpreters +that, because compilation happens in advance, they can spend a bit of +time optimizing the code they generate to make it more efficient. The +FOO compiler does that, merging literal text as much as possible in +order to emit the same HTML with a smaller number of writes than the +interpreter uses. When the compiler is a Common Lisp macro, you also +have the advantage that it's easy for the language understood by the +compiler to contain embedded Common Lisp-the compiler just has to +recognize it and embed it in the right place in the generated code. The +FOO compiler will take advantage of this capability. + +* Menu: + +* 31-1:: The Compiler +* 31-2:: FOO Special Operators +* 31-3:: FOO Macros +* 31-4:: The Public API +* 31-5:: The End of the Line + + +File: pcl.info, Node: 31-1, Next: 31-2, Prev: Chapter 31, Up: Chapter 31 + +The Compiler +============ + +The basic architecture of the compiler consists of three layers. First +you'll implement a class html-compiler that has one slot that holds an +adjustable vector that's used to accumulate ops representing the calls +made to the generic functions in the backend interface during the +execution of process. + + You'll then implement methods on the generic functions in the backend +interface that will store the sequence of actions in the vector. Each +op is represented by a list consisting of a keyword naming the operation +and the arguments passed to the function that generated the op. The +function sexp->ops implements the first phase of the compiler, compiling +a list of FOO forms by calling process on each form with an instance of +html-compiler. + + This vector of ops stored by the compiler is then passed to a +function that optimizes it, merging consecutive raw-string ops into a +single op that emits the combined string in one go. The optimization +function can also, optionally, strip out ops that are needed only for +pretty printing, which is mostly important because it allows you to +merge more raw-string ops. + + Finally, the optimized ops vector is passed to a third function, +generate-code, that returns a list of Common Lisp expressions that will +actually output the HTML. When *pretty* is true, generate-code generates +code that uses the methods specialized on html-pretty-printer to output +pretty HTML. When *pretty* is NIL, it generates code that writes +directly to the stream *html-output*. + + The macro html actually generates a body that contains two +expansions, one generated with *pretty* bound to T and one with *pretty* +bound to NIL. Which expansion is used is determined by the runtime value +of *pretty*. Thus, every function that contains a call to html will +contain code to generate both pretty and compact output. + + The other significant difference between the compiler and the +interpreter is that the compiler can embed Lisp forms in the code it +generates. To take advantage of that, you need to modify the process +function so it calls the embed-code and embed-value functions when asked +to process an expression that's not a FOO form. Since all +self-evaluating objects are valid FOO forms, the only forms that won't +be passed to process-sexp-html are lists that don't match the syntax for +FOO cons forms and non-keyword symbols, the only atoms that aren't +self-evaluating. You can assume that any non-FOO cons is code to be run +inline and all symbols are variables whose value you should embed. + + (defun process (processor form) + (cond + ((sexp-html-p form) (process-sexp-html processor form)) + ((consp form) (embed-code processor form)) + (t (embed-value processor form)))) + + Now let's look at the compiler code. First you should define two +functions that slightly abstract the vector you'll use to save ops in +the first two phases of compilation. + + (defun make-op-buffer () (make-array 10 :adjustable t :fill-pointer 0)) + + (defun push-op (op ops-buffer) (vector-push-extend op ops-buffer)) + + Next you can define the html-compiler class and the methods +specialized on it to implement the backend interface. + + (defclass html-compiler () + ((ops :accessor ops :initform (make-op-buffer)))) + + (defmethod raw-string ((compiler html-compiler) string &optional newlines-p) + (push-op `(:raw-string ,string ,newlines-p) (ops compiler))) + + (defmethod newline ((compiler html-compiler)) + (push-op '(:newline) (ops compiler))) + + (defmethod freshline ((compiler html-compiler)) + (push-op '(:freshline) (ops compiler))) + + (defmethod indent ((compiler html-compiler)) + (push-op `(:indent) (ops compiler))) + + (defmethod unindent ((compiler html-compiler)) + (push-op `(:unindent) (ops compiler))) + + (defmethod toggle-indenting ((compiler html-compiler)) + (push-op `(:toggle-indenting) (ops compiler))) + + (defmethod embed-value ((compiler html-compiler) value) + (push-op `(:embed-value ,value ,*escapes*) (ops compiler))) + + (defmethod embed-code ((compiler html-compiler) code) + (push-op `(:embed-code ,code) (ops compiler))) + + With those methods defined, you can implement the first phase of the +compiler, sexp->ops. + + (defun sexp->ops (body) + (loop with compiler = (make-instance 'html-compiler) + for form in body do (process compiler form) + finally (return (ops compiler)))) + + During this phase you don't need to worry about the value of +*pretty*: just record all the functions called by process. Here's what +sexp->ops makes of a simple FOO form: + + HTML> (sexp->ops '((:p "Foo"))) + #((:FRESHLINE) (:RAW-STRING "<p" NIL) (:RAW-STRING ">" NIL) + (:RAW-STRING "Foo" T) (:RAW-STRING "</p>" NIL) (:FRESHLINE)) + + The next phase, optimize-static-output, takes a vector of ops and +returns a new vector containing the optimized version. The algorithm is +simple-for each :raw-string op, it writes the string to a temporary +string buffer. Thus, consecutive :raw-string ops will build up a single +string containing the concatenation of the strings that need to be +emitted. Whenever you encounter an op other than a :raw-string op, you +convert the built-up string into a sequence of alternating :raw-string +and :newline ops with the helper function compile-buffer and then add +the next op. This function is also where you strip out the pretty +printing ops if *pretty* is NIL. + + (defun optimize-static-output (ops) + (let ((new-ops (make-op-buffer))) + (with-output-to-string (buf) + (flet ((add-op (op) + (compile-buffer buf new-ops) + (push-op op new-ops))) + (loop for op across ops do + (ecase (first op) + (:raw-string (write-sequence (second op) buf)) + ((:newline :embed-value :embed-code) (add-op op)) + ((:indent :unindent :freshline :toggle-indenting) + (when *pretty* (add-op op))))) + (compile-buffer buf new-ops))) + new-ops)) + + (defun compile-buffer (buf ops) + (loop with str = (get-output-stream-string buf) + for start = 0 then (1+ pos) + for pos = (position #\Newline str :start start) + when (< start (length str)) + do (push-op `(:raw-string ,(subseq str start pos) nil) ops) + when pos do (push-op '(:newline) ops) + while pos)) + + The last step is to translate the ops into the corresponding Common +Lisp code. This phase also pays attention to the value of *pretty*. +When *pretty* is true, it generates code that invokes the backend +generic functions on *html-pretty-printer*, which will be bound to an +instance of html-pretty-printer. When *pretty* is NIL, it generates +code that writes directly to *html-output*, the stream to which the +pretty printer would send its output. + + The actual function, generate-code, is trivial. + + (defun generate-code (ops) + (loop for op across ops collect (apply #'op->code op))) + + All the work is done by methods on the generic function op->code +specializing the op argument with an EQL specializer on the name of the +op. + + (defgeneric op->code (op &rest operands)) + + (defmethod op->code ((op (eql :raw-string)) &rest operands) + (destructuring-bind (string check-for-newlines) operands + (if *pretty* + `(raw-string *html-pretty-printer* ,string ,check-for-newlines) + `(write-sequence ,string *html-output*)))) + + (defmethod op->code ((op (eql :newline)) &rest operands) + (if *pretty* + `(newline *html-pretty-printer*) + `(write-char #\Newline *html-output*))) + + (defmethod op->code ((op (eql :freshline)) &rest operands) + (if *pretty* + `(freshline *html-pretty-printer*) + (error "Bad op when not pretty-printing: ~a" op))) + + (defmethod op->code ((op (eql :indent)) &rest operands) + (if *pretty* + `(indent *html-pretty-printer*) + (error "Bad op when not pretty-printing: ~a" op))) + + (defmethod op->code ((op (eql :unindent)) &rest operands) + (if *pretty* + `(unindent *html-pretty-printer*) + (error "Bad op when not pretty-printing: ~a" op))) + + (defmethod op->code ((op (eql :toggle-indenting)) &rest operands) + (if *pretty* + `(toggle-indenting *html-pretty-printer*) + (error "Bad op when not pretty-printing: ~a" op))) + + The two most interesting op->code methods are the ones that generate +code for the :embed-value and :embed-code ops. In the :embed-value +method, you can generate slightly different code depending on the value +of the escapes operand since if escapes is NIL, you don't need to +generate a call to escape. And when both *pretty* and escapes are NIL, +you can generate code that uses PRINC to emit the value directly to the +stream. + + (defmethod op->code ((op (eql :embed-value)) &rest operands) + (destructuring-bind (value escapes) operands + (if *pretty* + (if escapes + `(raw-string *html-pretty-printer* (escape (princ-to-string ,value) ,escapes) t) + `(raw-string *html-pretty-printer* (princ-to-string ,value) t)) + (if escapes + `(write-sequence (escape (princ-to-string ,value) ,escapes) *html-output*) + `(princ ,value *html-output*))))) + + Thus, something like this: + + HTML> (let ((x 10)) (html (:p x))) + <p>10</p> + NIL + + works because html translates (:p x) into something like this: + + (progn + (write-sequence "<p>" *html-output*) + (write-sequence (escape (princ-to-string x) "<>&") *html-output*) + (write-sequence "</p>" *html-output*)) + + When that code replaces the call to html in the context of the LET, +you get the following: + + (let ((x 10)) + (progn + (write-sequence "<p>" *html-output*) + (write-sequence (escape (princ-to-string x) "<>&") *html-output*) + (write-sequence "</p>" *html-output*))) + + and the reference to x in the generated code turns into a reference +to the lexical variable from the LET surrounding the html form. + + The :embed-code method, on the other hand, is interesting because +it's so trivial. Because process passed the form to embed-code, which +stashed it in the :embed-code op, all you have to do is pull it out and +return it. + + (defmethod op->code ((op (eql :embed-code)) &rest operands) + (first operands)) + + This allows code like this to work: + + HTML> (html (:ul (dolist (x '(foo bar baz)) (html (:li x))))) + <ul> + <li>FOO</li> + <li>BAR</li> + <li>BAZ</li> + </ul> + NIL + + The outer call to html expands into code that does something like +this: + + (progn + (write-sequence "<ul>" *html-output*) + (dolist (x '(foo bar baz)) (html (:li x))) + (write-sequence "</ul>" *html-output*)))) + + Then if you expand the call to html in the body of the DOLIST, you'll +get something like this: + + (progn + (write-sequence "<ul>" *html-output*) + (dolist (x '(foo bar baz)) + (progn + (write-sequence "<li>" *html-output*) + (write-sequence (escape (princ-to-string x) "<>&") *html-output*) + (write-sequence "</li>" *html-output*))) + (write-sequence "</ul>" *html-output*)) + + This code will, in fact, generate the output you saw. + + +File: pcl.info, Node: 31-2, Next: 31-3, Prev: 31-1, Up: Chapter 31 + +FOO Special Operators +===================== + +You could stop there; certainly the FOO language is expressive enough to +generate nearly any HTML you'd care to. However, you can add two +features to the language, with just a bit more code, that will make it +quite a bit more powerful: special operators and macros. + + Special operators in FOO are analogous to special operators in Common +Lisp. Special operators provide ways to express things in the language +that can't be expressed in the language supported by the basic +evaluation rule. Or, another way to look at it is that special +operators provide access to the primitive mechanisms used by the +language evaluator. (1) + + To take a simple example, in the FOO compiler, the language evaluator +uses the embed-value function to generate code that will embed the value +of a variable in the output HTML. However, because only symbols are +passed to embed-value, there's no way, in the language I've described so +far, to embed the value of an arbitrary Common Lisp expression; the +process function passes cons cells to embed-code rather than +embed-value, so the values returned are ignored. Typically this is what +you'd want, since the main reason to embed Lisp code in a FOO program is +to use Lisp control constructs. However, sometimes you'd like to embed +computed values in the generated HTML. For example, you might like this +FOO program to generate a paragraph tag containing a random number: + + (:p (random 10)) + + But that doesn't work because the code is run and its value +discarded. + + HTML> (html (:p (random 10))) + <p></p> + NIL + + In the language, as you've implemented it so far, you could work +around this limitation by computing the value outside the call to html +and then embedding it via a variable. + + HTML> (let ((x (random 10))) (html (:p x))) + <p>1</p> + NIL + + But that's sort of annoying, particularly when you consider that if +you could arrange for the form (random 10) to be passed to embed-value +instead of embed-code, it'd do exactly what you want. So, you can +define a special operator, :print, that's processed by the FOO language +processor according to a different rule than a normal FOO expression. +Namely, instead of generating a <print> element, it passes the form in +its body to embed-value. Thus, you can generate a paragraph containing +a random number like this: + + HTML> (html (:p (:print (random 10)))) + <p>9</p> + NIL + + Obviously, this special operator is useful only in compiled FOO code +since embed-value doesn't work in the interpreter. Another special +operator that can be used in both interpreted and compiled FOO code is +:format, which lets you generate output using the FORMAT function. The +arguments to the :format special operator are a string used as a format +control string and then any arguments to be interpolated. When all the +arguments to :format are self-evaluating objects, a string is generated +by passing them to FORMAT, and that string is then emitted like any +other string. This allows such :format forms to be used in FOO passed +to emit-html. In compiled FOO, the arguments to :format can be any Lisp +expressions. + + Other special operators provide control over what characters are +automatically escaped and to explicitly emit newline characters: the +:noescape special operator causes all the forms in its body to be +evaluated as regular FOO forms but with *escapes* bound to NIL, while +:attribute evaluates the forms in its body with *escapes* bound to +*attribute-escapes*. And :newline is translated into code to emit an +explicit newline. + + So, how do you define special operators? There are two aspects to +processing special operators: how does the language processor recognize +forms that use special operators, and how does it know what code to run +to process each special operator? + + You could hack process-sexp-html to recognize each special operator +and handle it in the appropriate manner-special operators are, +logically, part of the implementation of the language, and there aren't +going to be that many of them. However, it'd be nice to have a slightly +more modular way to add new special operators-not because users of FOO +will be able to but just for your own sanity. + + Define a special form as any list whose CAR is a symbol that's the +name of a special operator. You can mark the names of special operators +by adding a non-NIL value to the symbol's property list under the key +html-special-operator. So, you can define a function that tests whether +a given form is a special form like this: + + (defun special-form-p (form) + (and (consp form) (symbolp (car form)) (get (car form) 'html-special-operator))) + + The code that implements each special operator is responsible for +taking apart the rest of the list however it sees fit and doing whatever +the semantics of the special operator require. Assuming you'll also +define a function process-special-form, which will take the language +processor and a special form and run the appropriate code to generate a +sequence of calls on the processor object, you can augment the top-level +process function to handle special forms like this: + + (defun process (processor form) + (cond + ((special-form-p form) (process-special-form processor form)) + ((sexp-html-p form) (process-sexp-html processor form)) + ((consp form) (embed-code processor form)) + (t (embed-value processor form)))) + + You must add the special-form-p clause first because special forms +can look, syntactically, like regular FOO expressions just the way +Common Lisp's special forms can look like regular function calls. + + Now you just need to implement process-special-form. Rather than +define a single monolithic function that implements all the special +operators, you should define a macro that allows you to define special +operators much like regular functions and that also takes care of adding +the html-special-operator entry to the property list of the special +operator's name. In fact, the value you store in the property list can +be a function that implements the special operator. Here's the macro: + + (defmacro define-html-special-operator (name (processor &rest other-parameters) &body body) + `(eval-when (:compile-toplevel :load-toplevel :execute) + (setf (get ',name 'html-special-operator) + (lambda (,processor ,@other-parameters) ,@body)))) + + This is a fairly advanced type of macro, but if you take it one line +at a time, there's nothing all that tricky about it. To see how it +works, take a simple use of the macro, the definition of the special +operator :noescape, and look at the macro expansion. If you write this: + + (define-html-special-operator :noescape (processor &rest body) + (let ((*escapes* nil)) + (loop for exp in body do (process processor exp)))) + + it's as if you had written this: + + (eval-when (:compile-toplevel :load-toplevel :execute) + (setf (get ':noescape 'html-special-operator) + (lambda (processor &rest body) + (let ((*escapes* nil)) + (loop for exp in body do (process processor exp)))))) + + The EVAL-WHEN special operator, as I discussed in Chapter 20, ensures +that the effects of code in its body will be made visible during +compilation when you compile with COMPILE-FILE. This matters if you want +to use define-html-special-operator in a file and then use the +just-defined special operator in that same file. + + Then the SETF expression sets the property html-special-operator on +the symbol :noescape to an anonymous function with the same parameter +list as was specified in define-html-special-operator. By defining +define-html-special-operator to split the parameter list in two parts, +processor and everything else, you ensure that all special operators +accept at least one argument. + + The body of the anonymous function is then the body provided to +define-html-special-operator. The job of the anonymous function is to +implement the special operator by making the appropriate calls on the +backend interface to generate the correct HTML or the code that will +generate it. It can also use process to evaluate an expression as a FOO +form. + + The :noescape special operator is particularly simple-all it does is +pass the forms in its body to process with *escapes* bound to NIL. In +other words, this special operator disables the normal character +escaping preformed by process-sexp-html. + + With special operators defined this way, all process-special-form has +to do is look up the anonymous function in the property list of the +special operator's name and APPLY it to the processor and rest of the +form. + + (defun process-special-form (processor form) + (apply (get (car form) 'html-special-operator) processor (rest form))) + + Now you're ready to define the five remaining FOO special operators. +Similar to :noescape is :attribute, which evaluates the forms in its +body with *escapes* bound to *attribute-escapes*. This special operator +is useful if you want to write helper functions that output attribute +values. If you write a function like this: + + (defun foo-value (something) + (html (:print (frob something)))) + + the html macro is going to generate code that escapes the characters +in *element-escapes*. But if you're planning to use foo-value like +this: + + (html (:p :style (foo-value 42) "Foo")) + + then you want it to generate code that uses *attribute-escapes*. So, +instead, you can write it like this: (2) + + (defun foo-value (something) + (html (:attribute (:print (frob something))))) + + The definition of :attribute looks like this: + + (define-html-special-operator :attribute (processor &rest body) + (let ((*escapes* *attribute-escapes*)) + (loop for exp in body do (process processor exp)))) + + The next two special operators, :print and :format, are used to +output values. The :print special operator, as I discussed earlier, is +used in compiled FOO programs to embed the value of an arbitrary Lisp +expression. The :format special operator is more or less equivalent to +generating a string with (format nil ...) and then embedding it. The +primary reason to define :format as a special operator is for +convenience. This: + + (:format "Foo: ~d" x) + + is nicer than this: + + (:print (format nil "Foo: ~d" x)) + + It also has the slight advantage that if you use :format with +arguments that are all self-evaluating, FOO can evaluate the :format at +compile time rather than waiting until runtime. The definitions of +:print and :format are as follows: + + (define-html-special-operator :print (processor form) + (cond + ((self-evaluating-p form) + (warn "Redundant :print of self-evaluating form ~s" form) + (process-sexp-html processor form)) + (t + (embed-value processor form)))) + + (define-html-special-operator :format (processor &rest args) + (if (every #'self-evaluating-p args) + (process-sexp-html processor (apply #'format nil args)) + (embed-value processor `(format nil ,@args)))) + + The :newline special operator forces an output of a literal newline, +which is occasionally handy. + + (define-html-special-operator :newline (processor) + (newline processor)) + + Finally, the :progn special operator is analogous to the PROGN +special operator in Common Lisp. It simply processes the forms in its +body in sequence. + + (define-html-special-operator :progn (processor &rest body) + (loop for exp in body do (process processor exp))) + + In other words, the following: + + (html (:p (:progn "Foo " (:i "bar") " baz"))) + + will generate the same code as this: + + (html (:p "Foo " (:i "bar") " baz")) + + This might seem like a strange thing to need since normal FOO +expressions can have any number of forms in their body. However, this +special operator will come in quite handy in one situation-when writing +FOO macros, which brings you to the last language feature you need to +implement. + + ---------- Footnotes ---------- + + (1) The analogy between FOO's special operators, and macros, which +I'll discuss in the next section, and Lisp's own is fairly sound. In +fact, understanding how FOO's special operators and macros work may give +you some insight into why Common Lisp is put together the way it is. + + (2) The :noescape and :attribute special operators must be defined as +special operators because FOO determines what escapes to use at compile +time, not at runtime. This allows FOO to escape literal values at +compile time, which is much more efficient than having to scan all +output at runtime. + + +File: pcl.info, Node: 31-3, Next: 31-4, Prev: 31-2, Up: Chapter 31 + +FOO Macros +========== + +FOO macros are similar in spirit to Common Lisp's macros. A FOO macro +is a bit of code that accepts a FOO expression as an argument and +returns a new FOO expression as the result, which is then evaluated +according to the normal FOO evaluation rules. The actual implementation +is quite similar to the implementation of special operators. + + As with special operators, you can define a predicate function to +test whether a given form is a macro form. + + (defun macro-form-p (form) + (cons-form-p form #'(lambda (x) (and (symbolp x) (get x 'html-macro))))) + + You use the previously defined function cons-form-p because you want +to allow macros to be used in either of the syntaxes of nonmacro FOO +cons forms. However, you need to pass a different predicate function, +one that tests whether the form name is a symbol with a non-NIL +html-macro property. Also, as in the implementation of special +operators, you'll define a macro for defining FOO macros, which is +responsible for storing a function in the property list of the macro's +name, under the key html-macro. However, defining a macro is a bit more +complicated because FOO supports two flavors of macro. Some macros +you'll define will behave much like normal HTML elements and may want to +have easy access to a list of attributes. Other macros will simply want +raw access to the elements of their body. + + You can make the distinction between the two flavors of macros +implicit: when you define a FOO macro, the parameter list can include an +&attributes parameter. If it does, the macro form will be parsed like a +regular cons form, and the macro function will be passed two values, a +plist of attributes and a list of expressions that make up the body of +the form. A macro form without an &attributes parameter won't be parsed +for attributes, and the macro function will be invoked with a single +argument, a list containing the body expressions. The former is useful +for what are essentially HTML templates. For example: + + (define-html-macro :mytag (&attributes attrs &body body) + `((:div :class "mytag" ,@attrs) ,@body)) + + HTML> (html (:mytag "Foo")) + <div class='mytag'>Foo</div> + NIL + HTML> (html (:mytag :id "bar" "Foo")) + <div class='mytag' id='bar'>Foo</div> + NIL + HTML> (html ((:mytag :id "bar") "Foo")) + <div class='mytag' id='bar'>Foo</div> + NIL + + The latter kind of macro is more useful for writing macros that +manipulate the forms in their body. This type of macro can function as +a kind of HTML control construct. As a trivial example, consider the +following macro that implements an :if construct: + + (define-html-macro :if (test then else) + `(if ,test (html ,then) (html ,else))) + + This macro allows you to write this: + + (:p (:if (zerop (random 2)) "Heads" "Tails")) + + instead of this slightly more verbose version: + + (:p (if (zerop (random 2)) (html "Heads") (html "Tails"))) + + To determine which kind of macro you should generate, you need a +function that can parse the parameter list given to define-html-macro. +This function returns two values, the name of the &attributes parameter, +or NIL if there was none, and a list containing all the elements of args +after removing the &attributes marker and the subsequent list element. +(1) + + (defun parse-html-macro-lambda-list (args) + (let ((attr-cons (member '&attributes args))) + (values + (cadr attr-cons) + (nconc (ldiff args attr-cons) (cddr attr-cons))))) + + HTML> (parse-html-macro-lambda-list '(a b c)) + NIL + (A B C) + HTML> (parse-html-macro-lambda-list '(&attributes attrs a b c)) + ATTRS + (A B C) + HTML> (parse-html-macro-lambda-list '(a b c &attributes attrs)) + ATTRS + (A B C) + + The element following &attributes in the parameter list can also be a +destructuring parameter list. + + HTML> (parse-html-macro-lambda-list '(&attributes (&key x y) a b c)) + (&KEY X Y) + (A B C) + + Now you're ready to write define-html-macro. Depending on whether +there was an &attributes parameter specified, you need to generate one +form or the other of HTML macro so the main macro simply determines +which kind of HTML macro it's defining and then calls out to a helper +function to generate the right kind of code. + + (defmacro define-html-macro (name (&rest args) &body body) + (multiple-value-bind (attribute-var args) + (parse-html-macro-lambda-list args) + (if attribute-var + (generate-macro-with-attributes name attribute-var args body) + (generate-macro-no-attributes name args body)))) + + The functions that actually generate the expansion look like this: + + (defun generate-macro-with-attributes (name attribute-args args body) + (with-gensyms (attributes form-body) + (if (symbolp attribute-args) (setf attribute-args `(&rest ,attribute-args))) + `(eval-when (:compile-toplevel :load-toplevel :execute) + (setf (get ',name 'html-macro-wants-attributes) t) + (setf (get ',name 'html-macro) + (lambda (,attributes ,form-body) + (destructuring-bind (,@attribute-args) ,attributes + (destructuring-bind (,@args) ,form-body + ,@body))))))) + + (defun generate-macro-no-attributes (name args body) + (with-gensyms (form-body) + `(eval-when (:compile-toplevel :load-toplevel :execute) + (setf (get ',name 'html-macro-wants-attributes) nil) + (setf (get ',name 'html-macro) + (lambda (,form-body) + (destructuring-bind (,@args) ,form-body ,@body))))) + + The macro functions you'll define accept either one or two arguments +and then use DESTRUCTURING-BIND to take them apart and bind them to the +parameters defined in the call to define-html-macro. In both expansions +you need to save the macro function in the name's property list under +html-macro and a boolean indicating whether the macro takes an +&attributes parameter under the property html-macro-wants-attributes. +You use that property in the following function, expand-macro-form, to +determine how the macro function should be invoked: + + (defun expand-macro-form (form) + (if (or (consp (first form)) + (get (first form) 'html-macro-wants-attributes)) + (multiple-value-bind (tag attributes body) (parse-cons-form form) + (funcall (get tag 'html-macro) attributes body)) + (destructuring-bind (tag &body body) form + (funcall (get tag 'html-macro) body)))) + + The last step is to integrate macros by adding a clause to the +dispatching COND in the top-level process function. + + (defun process (processor form) + (cond + ((special-form-p form) (process-special-form processor form)) + ((macro-form-p form) (process processor (expand-macro-form form))) + ((sexp-html-p form) (process-sexp-html processor form)) + ((consp form) (embed-code processor form)) + (t (embed-value processor form)))) + + This is the final version of process. + + ---------- Footnotes ---------- + + (1) Note that &attributes is just another symbol; there's nothing +intrinsically special about names that start with &. + + +File: pcl.info, Node: 31-4, Next: 31-5, Prev: 31-3, Up: Chapter 31 + +The Public API +============== + +Now, at long last, you're ready to implement the html macro, the main +entry point to the FOO compiler. The other parts of FOO's public API +are emit-html and with-html-output, which I discussed in the previous +chapter, and define-html-macro, which I discussed in the previous +section. The define-html-macro macro needs to be part of the public API +because FOO's users will want to write their own HTML macros. On the +other hand, define-html-special-operator isn't part of the public API +because it requires too much knowledge of FOO's internals to define a +new special operator. And there should be very little that can't be +done using the existing language and special operators. (1) + + One last element of the public API, before I get to html, is another +macro, in-html-style. This macro controls whether FOO generates XHTML +or regular HTML by setting the *xhtml* variable. The reason this needs +to be a macro is because you'll want to wrap the code that sets *xhtml* +in an EVAL-WHEN so you can set it in a file and have it affect uses of +the html macro later in that same file. + + (defmacro in-html-style (syntax) + (eval-when (:compile-toplevel :load-toplevel :execute) + (case syntax + (:html (setf *xhtml* nil)) + (:xhtml (setf *xhtml* t))))) + + Finally let's look at html itself. The only tricky bit about +implementing html comes from the need to generate code that can be used +to generate both pretty and compact output, depending on the runtime +value of the variable *pretty*. Thus, html needs to generate an +expansion that contains an IF expression and two versions of the code, +one compiled with *pretty* bound to true and one compiled with it bound +to NIL. To further complicate matters, it's common for one html call to +contain embedded calls to html, like this: + + (html (:ul (dolist (item stuff)) (html (:li item)))) + + If the outer html expands into an IF expression with two versions of +the code, one for when *pretty* is true and one for when it's false, +it's silly for nested html forms to expand into two versions too. In +fact, it'll lead to an exponential explosion of code since the nested +html is already going to be expanded twice-once in the *pretty*-is-true +branch and once in the *pretty*-is-false branch. If each expansion +generates two versions, then you'll have four total versions. And if +the nested html form contained another nested html form, you'd end up +with eight versions of that code. If the compiler is smart, it'll +eventually realize that most of that generated code is dead and will +eliminate it, but even figuring that out can take quite a bit of time, +slowing down compilation of any function that uses nested calls to html. + + Luckily, you can easily avoid this explosion of dead code by +generating an expansion that locally redefines the html macro, using +MACROLET, to generate only the right kind of code. First you define a +helper function that takes the vector of ops returned by sexp->ops and +runs it through optimize-static-output and generate-code-the two phases +that are affected by the value of *pretty*-with *pretty* bound to a +specified value and that interpolates the resulting code into a PROGN. +(The PROGN returns NIL just to keep things tidy.). + + (defun codegen-html (ops pretty) + (let ((*pretty* pretty)) + `(progn ,@(generate-code (optimize-static-output ops)) nil))) + + With that function, you can then define html like this: + + (defmacro html (&whole whole &body body) + (declare (ignore body)) + `(if *pretty* + (macrolet ((html (&body body) (codegen-html (sexp->ops body) t))) + (let ((*html-pretty-printer* (get-pretty-printer))) ,whole)) + (macrolet ((html (&body body) (codegen-html (sexp->ops body) nil))) + ,whole))) + + The &whole parameter represents the original html form, and because +it's interpolated into the expansion in the bodies of the two MACROLETs, +it will be reprocessed with each of the new definitions of html, the one +that generates pretty-printing code and the other that generates +non-pretty-printing code. Note that the variable *pretty* is used both +during macro expansion and when the resulting code is run. It's used at +macro expansion time by codegen-html to cause generate-code to generate +one kind of code or the other. And it's used at runtime, in the IF +generated by the top-level html macro, to determine whether the +pretty-printing or non-pretty-printing code should actually run. + + ---------- Footnotes ---------- + + (1) The one element of the underlying language-processing +infrastructure that's not currently exposed through special operators is +the indentation. If you wanted to make FOO more flexible, albeit at the +cost of making its API that much more complex, you could add special +operators for manipulating the underlying indenting printer. But it +seems like the cost of having to explain the extra special operators +would outweigh the rather small gain in expressiveness. + + +File: pcl.info, Node: 31-5, Next: Chapter 32, Prev: 31-4, Up: Chapter 31 + +The End of the Line +=================== + +As usual, you could keep working with this code to enhance it in various +ways. One interesting avenue to pursue is to use the underlying output +generation framework to emit other kinds of output. In the version of +FOO you can download from the book's Web site, you'll find some code +that implements CSS output that can be integrated into HTML output in +both the interpreter and compiler. That's an interesting case because +CSS's syntax can't be mapped to s-expressions in such a trivial way as +HTML's can. However, if you look at that code, you'll see it's still +possible to define an s-expression syntax for representing the various +constructs available in CSS. + + A more ambitious undertaking would be to add support for generating +embedded JavaScript. Done right, adding JavaScript support to FOO could +yield two big wins. One is that after you define an s-expression syntax +that you can map to JavaScript syntax, then you can start writing +macros, in Common Lisp, to add new constructs to the language you use to +write client-side code, which will then be compiled to JavaScript. The +other is that, as part of the FOO s-expression JavaScript to regular +JavaScript translation, you could deal with the subtle but annoying +differences between JavaScript implementations in different browsers. +That is, the JavaScript code that FOO generates could either contain the +appropriate conditional code to do one thing in one browser and another +in a different browser or could generate different code depending on +which browser you wanted to support. Then if you use FOO in dynamically +generated pages, it could use information about the User-Agent making +the request to generate the right flavor of JavaScript for that browser. + + But if that interests you, you'll have to implement it yourself since +this is the end of the last practical chapter of this book. In the next +chapter I'll wrap things up, discussing briefly some topics that I +haven't touched on elsewhere in the book such as how to find libraries, +how to optimize Common Lisp code, and how to deliver Lisp applications. + + +File: pcl.info, Node: Chapter 32, Prev: Chapter 31, Up: Top + +32. Conclusion: What's Next? +============================ + +I hope by now you're convinced that the title of this book isn't an +oxymoron. However, it's quite likely there's some area of programming +that's of great practical importance to you that I haven't discussed at +all. For instance, I haven't said anything about how to develop +graphical user interfaces (GUIs), how to connect to relational +databases, how to parse XML, or how to write programs that act as +clients for various network protocols. Similarly, I haven't discussed +two topics that will become important when you write real applications +in Common Lisp: optimizing your Lisp code and packaging your application +for delivery. + + I'm obviously not going to cover all these topics in depth in this +final chapter. Instead, I'll give you a few pointers you can use to +pursue whichever aspect of Lisp programming interests you most. + +* Menu: + +* 32-1:: Finding Lisp Libraries +* 32-2:: Interfacing with Other Languages +* 32-3:: Make It Work, Make It Right, Make It Fast +* 32-4:: Delivering Applications +* 32-5:: Where to Go Next + + +File: pcl.info, Node: 32-1, Next: 32-2, Prev: Chapter 32, Up: Chapter 32 + +Finding Lisp Libraries +====================== + +While the standard library of functions, data types, and macros that +comes with Common Lisp is quite large, it provides only general-purpose +programming constructs. Specialized tasks such as writing GUIs, talking +to databases, and parsing XML require libraries beyond what are provided +by the ANSI standardized language. + + The easiest way to obtain a library to do something you need may be +simply to check out your Lisp implementation. Most implementations +provide at least some facilities not specified in the language standard. +The commercial Common Lisp vendors tend to work especially hard at +providing additional libraries for their implementation in order to +justify their prices. Franz's Allegro Common Lisp, Enterprise Edition, +for instance, comes with libraries for parsing XML, speaking SOAP, +generating HTML, connecting to relational databases, and building +graphical interfaces in various ways, among others. LispWorks, another +prominent commercial Lisp, provides several similar libraries, including +a well-regarded portable GUI toolkit, CAPI, which can be used to develop +GUI applications that will run on any operating system LispWorks runs +on. + + The free and open-source Common Lisp implementations typically don't +include quite so many bundled libraries, relying instead on portable +free and open-source libraries. But even those implementations usually +fill in some of the more important areas not addressed by the language +standard such as networking and multithreading. + + The only disadvantage of using implementation-specific libraries is +that they tie you to the implementation that provides them. If you're +delivering end-user apps or are deploying a server-based application on +a server that you control, that may not matter a lot. But if you want +to write code to share with other Lispers or if you simply don't want to +be tied to a particular implementation, it's a little more annoying. + + For portable libraries-portable either because they're written +entirely in standard Common Lisp or because they contain appropriate +read-time conditionalization to work on multiple implementations +(1)-your best bet is to go to the Web. With the usual caveats about +URLs going stale as soon as they're printed on paper, these are three of +the best current starting points: + + * Common-Lisp.net (http://www.common-lisp.net/) is a site that hosts + free and open-source Common Lisp projects, providing version + control, mailing lists, and Web hosting of project pages. In the + first year and a half after the site went live, nearly a hundred + projects were registered. + * The Common Lisp Open Code Collection (CLOCC) + (http://clocc.sourceforge.net/) is a slightly older collection of + free software libraries, which are intended to be portable between + Common Lisp implementations and self-contained, not relying on any + libraries not included in CLOCC itself. + * Cliki (http://www.cliki.net/) is a wiki devoted to free software in + Common Lisp. While, like any wiki, it may change at any time, + typically it has quite a few links to libraries as well to various + open-source Common Lisp implementations. The eponymous software it + runs on is also written in Common Lisp. + + Linux users running the Debian or Gentoo distributions can also +easily install an ever-growing number of Lisp libraries that have been +packaged with those distributions' packing tools, apt-get on Debian and +emerge on Gentoo. + + I won't recommend any specific libraries here since the library +situation is changing every day-after years of envying the library +collections of Perl, Python, and Java, Common Lispers have, in the past +couple of years, begun to take up the challenge of giving Common Lisp +the set of libraries-both open source and commercial-that it deserves. + + One area where there has been a lot of activity recently is on the +GUI front. Unlike Java and C# but like Perl, Python, and C, there's no +single way to develop GUIs in Common Lisp. Instead, it depends both on +what Common Lisp implementation you're using and what operating system +or systems you want to support. + + The commercial Common Lisp implementations usually provide some way +to build GUIs for the platforms they run on. Additionally, LispWorks +provides CAPI, the previously mentioned, portable GUI API. + + On the open-source side, you have a number of options. On Unix, you +can write low-level X Windows GUIs using CLX, a pure-Common Lisp +implementation of the X Windows protocol, roughly akin to xlib in C. Or +you can use various bindings to higher-level APIs and toolkits such as +GTK and Tk, much the way you might in Perl or Python. + + Or, if you're looking for something completely different, you can +check out Common Lisp Interface Manager (CLIM). A descendant of the +Symbolics Lisp Machines GUI framework, CLIM is powerful but complex. +Although many commercial Common Lisp implementations actually support +it, it doesn't seem to have seen a lot of use. But in the past couple +years, an open-source implementation of CLIM, McCLIM-now hosted at +Common-Lisp.net-has been picking up steam lately, so we may be on the +verge of a CLIM renaissance. + + ---------- Footnotes ---------- + + (1) The combination of Common Lisp's read-time conditionalization and +macros makes it quite feasible to develop portability libraries that do +nothing but provide a common API layered over whatever API different +implementations provide for facilities not specified in the language +standard. The portable pathname library from Chapter 15 is an example +of this kind of library, albeit to smooth over differences in +interpretation of the standard rather than implementation-dependent +APIs. + + +File: pcl.info, Node: 32-2, Next: 32-3, Prev: 32-1, Up: Chapter 32 + +Interfacing with Other Languages +================================ + +While many useful libraries can be written in "pure" Common Lisp using +only the features specified in the language standard, and many more can +be written in Lisp using nonstandard facilities provided by a given +implementation, occasionally it's more straightforward to use an +existing library written in another language, such as C. + + The language standard doesn't specify a mechanism for Lisp code to +call code written in another language or even require that +implementations provide such a mechanism. But these days, almost all +Common Lisp implementations support what's called a Foreign Function +Interface, or FFI for short. (1) The basic job of an FFI is to allow +you to give Lisp enough information to be able to link in the foreign +code. Thus, if you're going to call a function from a C library, you +need to tell Lisp about how to translate the Lisp objects passed to the +function into C types and the value returned by the function back into a +Lisp object. However, each implementation provides its own FFI, each +with slightly varying capabilities and syntax. Some FFIs allow +callbacks from C to Lisp, and others don't. The Universal Foreign +Function Interface (UFFI) project provides a portability layer over the +FFIs of more than a half dozen different Common Lisp implementations. +It works by defining its own macros that expand into appropriate FFI +code for the implementation it's running in. The UFFI takes a lowest +common denominator approach, which means it can't take advantage of all +the features of different implementations' FFIs, but it does provide a +good way to build a simple Lisp wrapper around a basic C API. (2) + + ---------- Footnotes ---------- + + (1) A Foreign Function Interface is basically equivalent to JNI in +Java, XS in Perl, or the extension module API in Python. + + (2) As of this writing, the two main drawbacks of UFFI are the lack +of support for callbacks from C into Lisp, which many but not all +implementations' FFIs support, and the lack of support for CLISP, whose +FFI is quite good but different enough from the others as to not fit +easily into the UFFI model. + + +File: pcl.info, Node: 32-3, Next: 32-4, Prev: 32-2, Up: Chapter 32 + +Make It Work, Make It Right, Make It Fast +========================================= + +As has been said many times, and variously attributed to Donald Knuth, +C.A.R. Hoare, and Edsger Dijkstra, premature optimization is the root of +all evil. (1) Common Lisp is an excellent language to program in if you +want to heed this wisdom yet still need high performance. This may come +as a surprise if you've heard the conventional wisdom that Lisp is slow. +In Lisp's earliest days, when computers were programmed with punch +cards, Lisp's high-level features may have doomed it to be slower than +the competition, namely, assembly and FORTRAN. But that was a long time +ago. In the meantime, Lisp has been used for everything from creating +complex AI systems to writing operating systems, and a lot of work has +gone into figuring out how to compile Lisp into efficient code. In this +section I'll talk about some of the reasons why Common Lisp is an +excellent language for writing high-performance code and some of the +techniques for doing so. + + The first reason that Lisp is an excellent language for writing +high-performance code is, ironically enough, the dynamic nature of Lisp +programming-the very thing that originally made it hard to bring Lisp's +performance up to the levels achieved by FORTRAN compilers. The reason +Common Lisp's dynamic features make it easier to write high-performance +code is that the first step to writing efficient code is to find the +right algorithms and data structures. + + Common Lisp's dynamic features keep code flexible, which makes it +easier to try different approaches. Given a finite amount of time to +write a program, you're much more likely to end up with a +high-performance version if you don't spend a lot of time getting into +and out of dead ends. In Common Lisp, you can try an idea, see it's +going nowhere, and move on without having spent a ton of time convincing +the compiler your code is worthy of being run and then waiting for it to +finish compiling. You can write a straightforward but inefficient +version of a function-a code sketch-to determine whether your basic +approach is sound and then replace that function with a more complex but +more efficient implementation if you determine that it is. And if the +overall approach turns out to be flawed, then you haven't wasted a bunch +of time tuning a function that's no longer needed, which means you have +more time to find a better approach. + + The next reason Common Lisp is a good language for developing +high-performance software is that most Common Lisp implementations come +with mature compilers that generate quite efficient machine code. I'll +talk in a moment about how to help these compilers generate code that +will be competitive with code generated by C compilers, but these +implementations already are quite a bit faster than those of languages +whose implementations are less mature and use simpler compilers or +interpreters. Also, since the Lisp compiler is available at runtime, +the Lisp programmer has some possibilities that would be hard to emulate +in other languages-your programs can generate Lisp code at runtime +that's then compiled into machine code and run. If the generated code +is going to run enough times, this can be a big win. Or, even without +using the compiler at runtime, closures give you another way to meld +machine code with runtime data. For instance, the CL-PPCRE regular +expression library, running in CMUCL, is faster than Perl's regular +expression engine on some benchmarks, even though Perl's engine is +written in highly tuned C. This is presumably because in Perl a regular +expression is translated into what are essentially bytecodes that are +then interpreted by the regex engine, while CL-PPCRE translates a +regular expression into a tree of compiled closures that invoke each +other via the normal function-calling machinery. (2) + + However, even with the right algorithm and a high-quality compiler, +you may not get the raw speed you need. Then it's time to think about +profiling and tuning. The key, in Lisp as in any language, is to +profile first to find the spots where your program is actually spending +its time and then worry about speeding up those parts. (3) + + You have a number of different ways to approach profiling. The +language standard provides a few rudimentary tools for measuring how +long certain forms take to execute. In particular, the TIME macro can +be wrapped around any form and will return whatever values the form +returns after printing a message to *TRACE-OUTPUT* about how long it +took to run and how much memory it used. The exact form of the message +is implementation defined. + + You can use TIME for a bit of quick-and-dirty profiling to narrow +your search for bottlenecks. For instance, suppose you have a function +that's taking a long time to run and that calls two other +functions-something like this: + + (defun foo () + (bar) + (baz)) + + If you want to see whether bar or baz is taking more time, you can +change the definition of foo to this: + + (defun foo () + (time (bar)) + (time (baz))) + + Now you can call foo, and Lisp will print two reports, one for bar +and one for baz. The form is implementation dependent; here's what it +looks like in Allegro Common Lisp: + + CL-USER> (foo) + ; cpu time (non-gc) 60 msec user, 0 msec system + ; cpu time (gc) 0 msec user, 0 msec system + ; cpu time (total) 60 msec user, 0 msec system + ; real time 105 msec + ; space allocation: + ; 24,172 cons cells, 1,696 other bytes, 0 static bytes + ; cpu time (non-gc) 540 msec user, 10 msec system + ; cpu time (gc) 170 msec user, 0 msec system + ; cpu time (total) 710 msec user, 10 msec system + ; real time 1,046 msec + ; space allocation: + ; 270,172 cons cells, 1,696 other bytes, 0 static bytes + + Of course, that'd be a bit easier to read if the output included a +label. If you use this technique a lot, it might be worth defining your +own macro like this: + + (defmacro labeled-time (form) + `(progn + (format *trace-output* "~2&~a" ',form) + (time ,form))) + + If you replace TIME with labeled-time in foo, you'll get this output: + + CL-USER> (foo) + + (BAR) + ; cpu time (non-gc) 60 msec user, 0 msec system + ; cpu time (gc) 0 msec user, 0 msec system + ; cpu time (total) 60 msec user, 0 msec system + ; real time 131 msec + ; space allocation: + ; 24,172 cons cells, 1,696 other bytes, 0 static bytes + + (BAZ) + ; cpu time (non-gc) 490 msec user, 0 msec system + ; cpu time (gc) 190 msec user, 10 msec system + ; cpu time (total) 680 msec user, 10 msec system + ; real time 1,088 msec + ; space allocation: + ; 270,172 cons cells, 1,696 other bytes, 0 static bytes + + From this output, it's clear that most of the time in foo is spent in +baz. + + Of course, the output from TIME gets a bit unwieldy if the form you +want to profile is called repeatedly. You can build your own +measurement tools using the functions GET-INTERNAL-REAL-TIME and +GET-INTERNAL-RUN-TIME, which return a number that increases by the value +of the constant INTERNAL-TIME-UNITS-PER-SECOND each second. +GET-INTERNAL-REAL-TIME measures wall time, the actual amount of time +elapsed, while GET-INTERNAL-RUN-TIME measures some +implementation-defined value such as the amount of time Lisp was +actually executing or the time Lisp was executing user code and not +internal bookkeeping such as the garbage collector. Here's a trivial +but useful profiling tool built with a few macros and +GET-INTERNAL-RUN-TIME: + + (defparameter *timing-data* ()) + + (defmacro with-timing (label &body body) + (with-gensyms (start) + `(let ((,start (get-internal-run-time))) + (unwind-protect (progn ,@body) + (push (list ',label ,start (get-internal-run-time)) *timing-data*))))) + + (defun clear-timing-data () + (setf *timing-data* ())) + + (defun show-timing-data () + (loop for (label time count time-per %-of-total) in (compile-timing-data) do + (format t "~3d% ~a: ~d ticks over ~d calls for ~d per.~%" + %-of-total label time count time-per))) + + (defun compile-timing-data () + (loop with timing-table = (make-hash-table) + with count-table = (make-hash-table) + for (label start end) in *timing-data* + for time = (- end start) + summing time into total + do + (incf (gethash label timing-table 0) time) + (incf (gethash label count-table 0)) + finally + (return + (sort + (loop for label being the hash-keys in timing-table collect + (let ((time (gethash label timing-table)) + (count (gethash label count-table))) + (list label time count (round (/ time count)) (round (* 100 (/ time total)))))) + #'> :key #'fifth)))) + + This profiler lets you wrap a with-timing around any form; each time +the form is executed, the time it starts and the time it ends are +recorded, associating with a label you provide. The function +show-timing-data dumps out a table showing how much time was spent in +different labeled sections of code like this: + + CL-USER> (show-timing-data) + 84% BAR: 650 ticks over 2 calls for 325 per. + 16% FOO: 120 ticks over 5 calls for 24 per. + NIL + + You could obviously make this profiling code more sophisticated in +many ways. Alternatively, your Lisp implementation most likely provides +its own profiling tools, which, since they have access to the internals +of the implementation, can get at information not necessarily available +to user-level code. + + Once you've found the bottleneck in your code, you can start tuning. +The first thing you should try, of course, is to find a more efficient +basic algorithm-that's where the big gains are to be had. But assuming +you're already using an appropriate algorithm, then it's down to code +bumming-locally optimizing the code so it does absolutely no more work +than necessary. + + The main tools for code bumming in Common Lisp are its optional +declarations. The basic idea behind declarations in Common Lisp is that +they're used to give the compiler information it can use in a variety of +ways to generate better code. + + For a simple example, consider this Common Lisp function: + + (defun add (x y) (+ x y)) + + I mentioned in Chapter 10 (*Note Chapter 10::.) that if you compare +the performance of this function Lisp to the seemingly equivalent C +function: + + int add (int x, int y) { return x + y; } + + you'll likely find the Common Lisp version to be quite a bit slower, +even if your Common Lisp implementation features a high-quality native +compiler. + + That's because the Common Lisp version is doing a lot more-the Common +Lisp compiler doesn't even know that the values of a and b are numbers +and so has to generate code to check at runtime. And once it determines +they are numbers, it has to determine what types of numbers-integers, +rationals, floating point, or complex-and dispatch to the appropriate +addition routine for the actual types. And even if a and b are +integers-the case you care about-then the addition routine has to +account for the possibility that the result may be too large to +represent as a fixnum, a number that can be represented in a single +machine word, and thus it may have to allocate a bignum object. + + In C, on the other hand, because the type of all variables are +declared, the compiler knows exactly what kind of values a and b will +hold. And because C's arithmetic simply overflows when the result of an +addition is too large to represent in whatever type is being returned, +there's no checking for overflow and no allocation of a bignum object to +represent the result when the mathematical sum is too large to fit in a +machine word. + + Thus, while the behavior of the Common Lisp code is much more likely +to be mathematically correct, the C version can probably be compiled +down to one or two machine instructions. But if you're willing to give +the Common Lisp compiler the same information the C compiler has about +the types of arguments and return values and to accept certain C-like +compromises in terms of generality and error checking, the Common Lisp +function can also be compiled down to an instruction or two. + + That's what declarations are for. The main use of declarations is to +tell the compiler about the types of variables and other expressions. +For instance, you could tell the compiler that the arguments to add are +both fixnums by writing the function like this: + + (defun add (x y) + (declare (fixnum x y)) + (+ x y)) + + The DECLARE expression isn't a Lisp form; rather, it's part of the +syntax of the DEFUN and must appear before any other code in the +function body. (4) This declaration declares that the arguments passed +for the parameters x and y will always be fixnums. In other words, it's +a promise to the compiler, and the compiler is allowed to generate code +on the assumption that whatever you tell it is true. + + To declare the type of the value returned, you can wrap the form (+ x +y) in the THE special operator. This operator takes a type specifier, +such as FIXNUM, and a form and tells the compiler the form will evaluate +to the given type. Thus, to give the Common Lisp compiler all the +information about add that the C compiler gets, you can write it like +this: + + (defun add (x y) + (declare (fixnum x y)) + (the fixnum (+ x y))) + + However, even this version needs one more declaration to give the +Common Lisp compiler the same license as the C compiler to generate fast +but dangerous code. The OPTIMIZE declaration is used to tell the +compiler how to balance five qualities: the speed of the code generated; +the amount of runtime error checking; the memory usage of the code, both +in terms of code size and runtime memory usage; the amount of debugging +information kept with the code; and the speed of the compilation +process. An OPTIMIZE declaration consists of one or more lists, each +containing one of the symbols SPEED, SAFETY, SPACE, DEBUG, and +COMPILATION-SPEED, and a number from zero to three, inclusive. The +number specifies the relative weighting the compiler should give to the +corresponding quality, with 3 being the most important and 0 meaning not +important at all. Thus, to make Common Lisp compile add more or less +like a C compiler would, you can write it like this: + + (defun add (x y) + (declare (optimize (speed 3) (safety 0))) + (declare (fixnum x y)) + (the fixnum (+ x y))) + + Of course, now the Lisp version suffers from many of the same +liabilities as the C version-if the arguments passed aren't fixnums or +if the addition overflows, the result will be mathematically incorrect +or worse. Also, if someone calls add with a wrong number of arguments, +it may not be pretty. Thus, you should use these kinds of declarations +only after your program is working correctly. And you should add them +only where profiling shows they'll make a difference. If you're getting +reasonable performance without them, leave them out. But when profiling +shows you a real hot spot in your code and you need to tune it up, go +ahead. Because you can use declarations this way, it's rarely necessary +to rewrite code in C just for performance reasons; FFIs are used to +access existing C code, but declarations are used when C-like +performance is needed. Of course, how close you can get the performance +of a given piece of Common Lisp code to C and C++ depends mostly on how +much like C you're willing to make it. + + Another code-tuning tool built into Lisp is the function DISASSEMBLE. +The exact behavior of this function is implementation dependent because +it depends on how the implementation compiles code-whether to machine +code, bytecodes, or some other form. But the basic idea is that it +shows you the code generated by the compiler when it compiled a specific +function. + + Thus, you can use DISASSEMBLE to see whether your declarations are +having any effect on the code generated. And if your Lisp +implementation uses a native compiler and you know your platform's +assembly language, you can get a pretty good sense of what's actually +going on when you call one of your functions. For instance, you could +use DISASSEMBLE to get a sense of the difference between the first +version of add, with no declarations, and the final version. First, +define and compile the original version. + + (defun add (x y) (+ x y)) + + Then, at the REPL, call DISASSEMBLE with the name of the function. +In Allegro, it shows the following assembly-language-like dump of the +code generated by the compiler: + + CL-USER> (disassemble 'add) + ;; disassembly of #<Function ADD> + ;; formals: X Y + + ;; code start: #x737496f4: + 0: 55 pushl ebp + 1: 8b ec movl ebp,esp + 3: 56 pushl esi + 4: 83 ec 24 subl esp,$36 + 7: 83 f9 02 cmpl ecx,$2 + 10: 74 02 jz 14 + 12: cd 61 int $97 ; SYS::TRAP-ARGERR + 14: 80 7f cb 00 cmpb [edi-53],$0 ; SYS::C_INTERRUPT-PENDING + 18: 74 02 jz 22 + 20: cd 64 int $100 ; SYS::TRAP-SIGNAL-HIT + 22: 8b d8 movl ebx,eax + 24: 0b da orl ebx,edx + 26: f6 c3 03 testb bl,$3 + 29: 75 0e jnz 45 + 31: 8b d8 movl ebx,eax + 33: 03 da addl ebx,edx + 35: 70 08 jo 45 + 37: 8b c3 movl eax,ebx + 39: f8 clc + 40: c9 leave + 41: 8b 75 fc movl esi,[ebp-4] + 44: c3 ret + 45: 8b 5f 8f movl ebx,[edi-113] ; EXCL::+_2OP + 48: ff 57 27 call *[edi+39] ; SYS::TRAMP-TWO + 51: eb f3 jmp 40 + 53: 90 nop + ; No value + + Clearly, there's a bunch of stuff going on here. If you're familiar +with x86 assembly language, you can probably tell what. Now compile +this version of add with all the declarations. + + (defun add (x y) + (declare (optimize (speed 3) (safety 0))) + (declare (fixnum x y)) + (the fixnum (+ x y))) + + Now disassemble add again, and see if the declarations had any +effect. + + CL-USER> (disassemble 'add) + ;; disassembly of #<Function ADD> + ;; formals: X Y + + ;; code start: #x7374dc34: + 0: 03 c2 addl eax,edx + 2: f8 clc + 3: 8b 75 fc movl esi,[ebp-4] + 6: c3 ret + 7: 90 nop + ; No value + + Looks like they did. + + ---------- Footnotes ---------- + + (1) Knuth has used the saying several times in publications, +including in his 1974 ACM Turing Award paper, "Computer Programming as +an Art," and in his paper "Structured Programs with goto Statements." +In his paper "The Errors of TeX," he attributes the saying to C.A.R. +Hoare. And Hoare, in an 2004 e-mail to Hans Genwitz of phobia.com, said +he didn't remember the origin of the saying but that he might have +attributed it to Dijkstra. + + (2) CL-PPCRE also takes advantage of another Common Lisp feature I +haven't discussed, compiler macros. A compiler macro is a special kind +of macro that's given a chance to optimize calls to a specific function +by transforming calls to that function into more efficient code. +CL-PPCRE defines compiler macros for its functions that take regular +expression arguments. The compiler macros optimize calls to those +functions in which the regular expression is a constant value by parsing +the regular expression at compile time rather than leaving it to be done +at runtime. Look up DEFINE-COMPILER-MACRO in your favorite Common Lisp +reference for more information about compiler macros. + + (3) The word premature in "premature optimization" can pretty much be +defined as "before profiling." Remember that even if you can speed up a +piece of code to the point where it takes literally no time to run, +you'll still speed up your program only by whatever percentage of time +it spent in that piece of code. + + (4) Declarations can appear in most forms that introduce new +variables, such as LET, LET*, and the DO family of looping macros. LOOP +has its own syntax for declaring the types of loop variables. The +special operator LOCALLY, mentioned in Chapter 20, does nothing but +create a scope in which you can make declarations. + + +File: pcl.info, Node: 32-4, Next: 32-5, Prev: 32-3, Up: Chapter 32 + +Delivering Applications +======================= + +Another topic of practical importance, which I didn't talk about +elsewhere in the book, is how to deliver software written in Lisp. The +main reason I neglected this topic is because there are many different +ways to do it, and which one is best for you depends on what kind of +software you need to deliver to what kind of user with what Common Lisp +implementation. In this section I'll give an overview of some of the +different options. + + If you've written code you want to share with fellow Lisp +programmers, the most straightforward way to distribute it is as source +code. (1) You can distribute a simple library as a single source file, +which programmers can LOAD into their Lisp image, possibly after +compiling it with COMPILE-FILE. + + More complex libraries or applications, broken up across multiple +source files, pose an additional challenge-in order to load and compile +the code, the files need to be loaded and compiled in the correct order. +For instance, a file containing macro definitions must be loaded before +you can compile files that use those macros. And a file containing +DEFPACKAGE forms must be loaded before any files that use those packages +can even be READ. Lispers call this the system definition problem and +typically handle it with tools called system definition facilities or +system definition utilities, which are somewhat analogous to build tools +such as make or ant. As with make and ant, system definition tools +allow you to specify the dependencies between different files and then +take care of loading and compiling the files in the correct order while +trying to do only work that's necessary-recompiling only files that have +changed, for example. + + These days the most widely used system definition tool is ASDF, which +stands for Another System Definition Facility. (2) The basic idea +behind ASDF is that you define systems in ASD files, and ASDF provides a +number of operations on systems such as loading them or compiling them. +A system can also be defined to depend on other systems, which will be +loaded as necessary. For instance, the following shows the contents of +html.asd, the ASD file for the FOO library from Chapters 31 and 32: + + (defpackage :com.gigamonkeys.html-system (:use :asdf :cl)) + (in-package :com.gigamonkeys.html-system) + + (defsystem html + :name "html" + :author "Peter Seibel <peter@gigamonkeys.com>" + :version "0.1" + :maintainer "Peter Seibel <peter@gigamonkeys.com>" + :license "BSD" + :description "HTML and CSS generation from sexps." + :long-description "" + :components + ((:file "packages") + (:file "html" :depends-on ("packages")) + (:file "css" :depends-on ("packages" "html"))) + :depends-on (:macro-utilities)) + + If you add a symbolic link to this file from a directory listed in +asdf:*central-registry*, (3) then you can type this: + + (asdf:operate 'asdf:load-op :html) + + to compile and load the files packages.lisp, html.lisp, and +html-macros.lisp in the correct order after first making sure the +:macro-utilities system has been compiled and loaded. For other +examples of ASD files, you can look at this book's source code-the code +from each practical chapter is defined as a system with appropriate +intersystem dependencies expressed in the ASD files. + + Most free and open-source Common Lisp libraries you'll find will come +with an ASD file. Some will use other system definition tools such as +the slightly older MK:DEFSYSTEM or even utilities devised by the +library's author, but the tide seems to be turning in the direction of +ASDF. (4) + + Of course, while ASDF makes it easy for Lispers to install Lisp +libraries, it's not much help if you want to package an application for +an end user who doesn't know or care about Lisp. If you're delivering a +pure end-user application, presumably you want to provide something the +user can download, install, and run without having to know anything +about Lisp. You can't expect them to separately download and install a +Lisp implementation. And you want them to be able to run your +application just like any other application-by double-clicking an icon +on Windows or OS X or by typing the name of the program at the command +line on Unix. + + However, unlike C programs, which can typically rely on certain +shared libraries (DLLs on Windows) that make up the C "runtime" being +present as part of the operating system, Lisp programs must include a +Lisp runtime, that is, the same program you run when you start Lisp +though perhaps with certain functionality not needed to run the +application excised. + + To further complicate matters, program isn't really well defined in +Lisp. As you've seen throughout this book, the process of developing +software in Lisp is an incremental process that involves making changes +to the set of definitions and data living in your Lisp image. The +"program" is just a particular state of the image arrived at by loading +the .lisp or .fasl files that contain code that creates the appropriate +definitions and data. You could, then, distribute a Lisp application as +a Lisp runtime plus a bunch of FASL files and an executable that starts +the runtime, loads the FASLs, and somehow invokes the appropriate +starting function. However, since actually loading the FASLs can take +some time, especially if they have to do any computation to set up the +state of the world, most Common Lisp implementations provide a way to +dump an image-to save the state of a running Lisp to a file called an +image file or sometimes a core. When a Lisp runtime starts, the first +thing it does is load an image file, which it can do in much less time +than it'd take to re-create the state by loading FASL files. + + Normally the image file is a default image containing only the +standard packages defined by the language and any extras provided by the +implementation. But with most implementations, you have a way to +specify a different image file. Thus, instead of packaging an app as a +Lisp runtime plus a bunch of FASLs, you can package it as a Lisp runtime +plus a single image file containing all the definitions and data that +make up your application. Then all you need is a program that launches +the Lisp runtime with the appropriate image file and invokes whatever +function serves as the entry point to the application. + + This is where things get implementation and operating-system +dependent. Some Common Lisp implementations, in particular the +commercial ones such as Allegro and LispWorks, provide tools for +building such an executable. For instance, Allegro's Enterprise Edition +provides a function excl:generate-application that creates a directory +containing the Lisp runtime as a shared library, an image file, and an +executable that starts the runtime with the given image. Similarly, the +LispWorks Professional Edition "delivery" mechanism allows you to build +single-file executables of your programs. On Unix, with the various +free and open-source implementations, you can do essentially the same +thing except it's probably easier to use a shell script to start +everything. + + And on OS X things are even better-since all applications on OS X are +packaged as .app bundles, which are essentially directories with a +certain structure, it's not all that difficult to package all the parts +of a Lisp application as a double-clickable .app bundle. Mikel Evins's +Bosco tool makes it easy to create .app bundles for applications running +on OpenMCL. + + Of course, another popular way to deliver applications these days is +as server-side applications. This is a niche where Common Lisp can +really excel-you can pick a combination of operating system and Common +Lisp implementation that works well for you, and you don't have to worry +about packaging the application to be installed by an end user. And +Common Lisp's interactive debugging and development features make it +possible to debug and upgrade a live server in ways that either just +aren't possible in a less dynamic language or would require you to build +a lot of specific infrastructure. + + ---------- Footnotes ---------- + + (1) The FASL files produced by COMPILE-FILE are implementation +dependent and may or may not be compatible between different versions of +the same Common Lisp implementation. Thus, they're not a very good way +to distribute Lisp code. The one time they can be handy is as a way of +providing patches to be applied to an application running in a known +version of a particular implementation. Applying the patch simply +entails LOADing the FASL, and because a FASL can contain arbitrary code, +it can be used to upgrade existing data as well as to provide new code +definitions. + + (2) ASDF was originally written by Daniel Barlow, one of the SBCL +developers, and has been included as part of SBCL for a long time and +also distributed as a stand-alone library. It has recently been adopted +and included in other implementations such as OpenMCL and Allegro. + + (3) On Windows, where there are no symbolic links, it works a little +bit differently but roughly the same. + + (4) Another tool, ASDF-INSTALL, builds on top of ASDF and +MK:DEFSYSTEM, providing an easy way to automatically download and +install libraries from the network. The best starting point for +learning about ASDF-INSTALL is Edi Weitz's "A tutorial for ASDF-INSTALL" +(http:// www.weitz.de/asdf-install/). + + +File: pcl.info, Node: 32-5, Next: Top, Prev: 32-4, Up: Chapter 32 + +Where to Go Next +================ + +So, that's it. Welcome to the wonderful world of Lisp. The best thing +you can do now-if you haven't already-is to start writing your own Lisp +code. Pick a project that interests you, and do it in Common Lisp. +Then do another. Lather, rinse, repeat. + + However, if you need some further pointers, this section offers some +places to go. For starters, check out the Practical Common Lisp Web +site at http://www.gigamonkeys.com/book/, where you can find the source +code from the practical chapters, errata, and links to other Lisp +resources on the Web. + + In addition to the sites I mentioned in the "Finding Lisp Libraries" +section, you may also want explore the Common Lisp HyperSpec (a.k.a. +the HyperSpec or CLHS), an HTML version of the ANSI language standard +prepared by Kent Pitman and made available by LispWorks at +http://www.lispworks.com/documentation/HyperSpec/index.html. The +HyperSpec is by no means a tutorial, but it's as authoritative a guide +to the language as you can get without buying a printed copy of the +standard from ANSI and much more convenient for day-to-day use. (1) + + If you want to get in touch with other Lispers, comp.lang.lisp on +Usenet and the #lisp IRC channel or the Freenode network +(http://www.freenode.net) are two of the main online hang- outs. There +are also a number of Lisp-related blogs, most of which are aggregated on +Planet Lisp at http://planet.lisp.org/. + + And keep your eyes peeled in all those forums for announcements of +local Lisp users get-togethers in your area-in the past few years, +Lispnik gatherings have popped up in cities around the world, from New +York to Oakland, from Cologne to Munich, and from Geneva to Helsinki. + + If you want to stick to books, here are a few suggestions. For a +nice thick reference book to stick on your desk, grab The ANSI Common +Lisp Reference Book edited by David Margolies (Apress, 2005). (2) + + For more on Common Lisp's object system, you can start with +Object-Oriented Programming in Common Lisp: A Programmer's Guide to CLOS +by Sonya E. Keene (Addison-Wesley, 1989). Then if you really want to +become an object wizard or just to stretch your mind in interesting +ways, read The Art of the Metaobject Protocol by Gregor Kiczales, Jim +des Riviéres, and Daniel G. Bobrow (MIT Press, 1991). This book, also +known as AMOP, is both an explanation of what a metaobject protocol is +and why you want one and the de facto standard for the metaobject +protocol supported by many Common Lisp implementations. + + Two books that cover general Common Lisp technique are Paradigms of +Artificial Intelligence Programming: Case Studies in Common Lisp by +Peter Norvig (Morgan Kaufmann, 1992) and On Lisp: Advanced Techniques +for Common Lisp by Paul Graham (Prentice Hall, 1994). The former +provides a solid introduction to artificial intelligence techniques +while teaching quite a bit about how to write good Common Lisp code, and +the latter is especially good in its treatment of macros. + + If you're the kind of person who likes to know how things work down +to the bits, Lisp in Small Pieces by Christian Queinnec (Cambridge +University Press, 1996) provides a nice blend of programming language +theory and practical Lisp implementation techniques. While it's +primarily focused on Scheme rather than Common Lisp, the same principles +apply. + + For folks who want a little more theoretical look at things-or who +just want to know what it's like to be a freshman comp sci student at +M.I.T.-Structure and Interpretation of Computer Programs, Second +Edition, by Harold Abelson, Gerald Jay Sussman, and Julie Sussman +(M.I.T. Press, 1996) is a classic computer science text that uses Scheme +to teach important programming concepts. Any programmer can learn a lot +from this book-just remember that there are important differences +between Scheme and Common Lisp. + + Once you've wrapped your mind around Lisp, you may want to place it +in a bit of context. Since no one can claim to really understand object +orientation who doesn't know something about Smalltalk, you might want +to start with Smalltalk-80: The Language by Adele Goldberg and David +Robson (Addison Wesley, 1989), the standard introduction to the core of +Smalltalk. After that, Smalltalk Best Practice Patterns by Kent Beck +(Prentice Hall, 1997) is full of good advice aimed at Smalltalkers, much +of which is applicable to any object-oriented language. + + And at the other end of the spectrum, Object-Oriented Software +Construction by Bertrand Meyer (Prentice Hall, 1997) is an excellent +exposition of the static language mind-set from the inventor of Eiffel, +an oft-overlooked descendant of Simula and Algol. It contains much food +for thought, even for programmers working with dynamic languages such as +Common Lisp. In particular, Meyer's ideas about Design By Contract can +shed a lot of light on how one ought to use Common Lisp's condition +system. + + Though not about computers per se, The Wisdom of Crowds: Why the Many +Are Smarter Than the Few and How Collective Wisdom Shapes Business, +Economies, Societies, and Nations by James Surowiecki (Doubleday, 2004) +contains an excellent answer to the question, "If Lisp's so great how +come everybody isn't using it?" See the section on "Plank-Road Fever" +starting on page 53. + + And finally, for some fun, and to learn about the influence Lisp and +Lispers have had on hacker culture, dip into (or read from cover to +cover) The New Hacker's Dictionary, Third Edition, compiled by Eric S. +Raymond (MIT Press, 1996) and based on the original The Hacker's +Dictionary edited by Guy Steele (Harper & Row, 1983). + + But don't let all these suggestions interfere with your +programming-the only way to really learn a language is to use it. If +you've made it this far, you're certainly ready to do that. Happy +hacking! + + ---------- Footnotes ---------- + + (1) SLIME incorporates an Elisp library that allows you to +automatically jump to the HyperSpec entry for any name defined in the +standard. You can also download a complete copy of the HyperSpec to +keep locally for offline browsing. + + (2) Another classic reference is Common Lisp: The Language by Guy +Steele (Digital Press, 1984 and 1990). The first edition, a.k.a. +CLtL1, was the de facto standard for the language for a number of years. +While waiting for the official ANSI standard to be finished, Guy +Steele-who was on the ANSI committee-decided to release a second edition +to bridge the gap between CLtL1 and the eventual standard. The second +edition, now known as CLtL2, is essentially a snapshot of the work of +the standardization committee taken at a particular moment in time near +to, but not quite at, the end of the standardization process. +Consequently, CLtL2 differs from the standard in ways that make it not a +very good day-to-day reference. It is, however, a useful historical +document, particularly because it includes documentation of some +features that were dropped from the standard before it was finished as +well as commentary that isn't part of the standard about why certain +features are the way they are. + + + +Tag Table: +Node: Top67 +Node: Letter15012 +Node: Blurbs16883 +Node: Chapter 128083 +Node: 1-133768 +Ref: 1-1-Footnote-139045 +Ref: 1-1-Footnote-239117 +Ref: 1-1-Footnote-339394 +Ref: 1-1-Footnote-440546 +Ref: 1-1-Footnote-541570 +Ref: 1-1-Footnote-642375 +Node: 1-242534 +Ref: 1-2-Footnote-148073 +Ref: 1-2-Footnote-248137 +Ref: 1-2-Footnote-348396 +Node: 1-2-149472 +Node: 1-352556 +Node: Chapter 254869 +Ref: Chapter 2-Footnote-156461 +Ref: Chapter 2-Footnote-256510 +Node: 2-157663 +Ref: 2-1-Footnote-161771 +Ref: 2-1-Footnote-262610 +Ref: 2-1-Footnote-362641 +Ref: 2-1-Footnote-462665 +Node: 2-262924 +Node: 2-366471 +Node: 2-468461 +Node: 2-570105 +Ref: 2-5-Footnote-174805 +Ref: 2-5-Footnote-275275 +Ref: 2-5-Footnote-375359 +Ref: 2-5-Footnote-475608 +Node: 2-675692 +Ref: 2-6-Footnote-185098 +Ref: 2-6-Footnote-285223 +Ref: 2-6-Footnote-385301 +Ref: 2-6-Footnote-485851 +Node: Chapter 385900 +Ref: Chapter 3-Footnote-188604 +Node: 3-188814 +Node: 3-292036 +Ref: 3-2-Footnote-193776 +Node: 3-394130 +Ref: 3-3-Footnote-197716 +Node: 3-497986 +Node: 3-5102626 +Ref: 3-5-Footnote-1106884 +Node: 3-6107151 +Ref: 3-6-Footnote-1118122 +Ref: 3-6-Footnote-2118288 +Ref: 3-6-Footnote-3118494 +Node: 3-7118905 +Ref: 3-7-Footnote-1121892 +Ref: 3-7-Footnote-2122220 +Node: 3-8122552 +Ref: 3-8-Footnote-1133901 +Ref: 3-8-Footnote-2134309 +Ref: 3-8-Footnote-3134779 +Node: 3-9135013 +Node: Chapter 4136260 +Node: 4-1137077 +Ref: 4-1-Footnote-1138869 +Node: 4-2138936 +Ref: 4-2-Footnote-1142771 +Ref: 4-2-Footnote-2143326 +Ref: 4-2-Footnote-3143583 +Node: 4-3143740 +Ref: 4-3-Footnote-1151169 +Ref: 4-3-Footnote-2151256 +Ref: 4-3-Footnote-3151490 +Ref: 4-3-Footnote-4151752 +Node: 4-4151845 +Ref: 4-4-Footnote-1155315 +Ref: 4-4-Footnote-2155566 +Ref: 4-4-Footnote-3155708 +Ref: 4-4-Footnote-4155852 +Node: 4-5156320 +Node: 4-6157599 +Ref: 4-6-Footnote-1160821 +Ref: 4-6-Footnote-2160946 +Ref: 4-6-Footnote-3161222 +Node: 4-7161608 +Ref: 4-7-Footnote-1164212 +Node: 4-8165399 +Ref: 4-8-Footnote-1171402 +Ref: 4-8-Footnote-2171932 +Node: 4-9172446 +Node: Chapter 5177049 +Ref: Chapter 5-Footnote-1178946 +Node: 5-1179663 +Ref: 5-1-Footnote-1182467 +Ref: 5-1-Footnote-2182777 +Ref: 5-1-Footnote-3182939 +Node: 5-2183448 +Node: 5-3184829 +Ref: 5-3-Footnote-1189087 +Node: 5-4189676 +Ref: 5-4-Footnote-1191875 +Node: 5-5191962 +Node: 5-6195721 +Ref: 5-6-Footnote-1199193 +Node: 5-7199761 +Ref: 5-7-Footnote-1201851 +Node: 5-8202140 +Ref: 5-8-Footnote-1208017 +Ref: 5-8-Footnote-2208345 +Ref: 5-8-Footnote-3208458 +Ref: 5-8-Footnote-4208950 +Node: 5-9209119 +Ref: 5-9-Footnote-1211682 +Node: Chapter 6213325 +Ref: Chapter 6-Footnote-1214896 +Ref: Chapter 6-Footnote-2215139 +Node: 6-1215812 +Ref: 6-1-Footnote-1222516 +Ref: 6-1-Footnote-2222943 +Ref: 6-1-Footnote-3223395 +Ref: 6-1-Footnote-4223581 +Node: 6-2223980 +Node: 6-3226933 +Ref: 6-3-Footnote-1237415 +Ref: 6-3-Footnote-2237606 +Ref: 6-3-Footnote-3237783 +Ref: 6-3-Footnote-4238374 +Ref: 6-3-Footnote-5238996 +Ref: 6-3-Footnote-6239368 +Ref: 6-3-Footnote-7239451 +Node: 6-4239541 +Ref: 6-4-Footnote-1241056 +Node: 6-5241355 +Ref: 6-5-Footnote-1243442 +Node: 6-6243565 +Ref: 6-6-Footnote-1245971 +Ref: 6-6-Footnote-2246039 +Ref: 6-6-Footnote-3246413 +Node: 6-7246839 +Node: Chapter 7250879 +Ref: Chapter 7-Footnote-1255255 +Ref: Chapter 7-Footnote-2255653 +Node: 7-1255962 +Ref: 7-1-Footnote-1259668 +Node: 7-2259928 +Node: 7-3261575 +Node: 7-4262856 +Ref: 7-4-Footnote-1265093 +Node: 7-5265234 +Ref: 7-5-Footnote-1267075 +Node: 7-6268166 +Ref: 7-6-Footnote-1272920 +Ref: 7-6-Footnote-2273104 +Node: 7-7273260 +Ref: 7-7-Footnote-1277400 +Node: Chapter 8277937 +Node: 8-1279994 +Node: 8-2284812 +Node: 8-3287941 +Ref: 8-3-Footnote-1291026 +Node: 8-4291140 +Node: 8-5293142 +Node: 8-6297260 +Ref: 8-6-Footnote-1301006 +Ref: 8-6-Footnote-2301176 +Ref: 8-6-Footnote-3301656 +Node: 8-7301864 +Ref: 8-7-Footnote-1311691 +Ref: 8-7-Footnote-2312017 +Ref: 8-7-Footnote-3312146 +Node: 8-8312584 +Node: 8-8-1316168 +Node: 8-9317434 +Node: Chapter 9318008 +Ref: Chapter 9-Footnote-1319876 +Ref: Chapter 9-Footnote-2320300 +Node: 9-1320609 +Ref: 9-1-Footnote-1323174 +Node: 9-2323258 +Node: 9-3326539 +Ref: 9-3-Footnote-1329510 +Ref: 9-3-Footnote-2329902 +Node: 9-4329996 +Node: 9-5333614 +Node: 9-6335889 +Ref: 9-6-Footnote-1339220 +Ref: 9-6-Footnote-2339341 +Node: 9-7339644 +Node: Chapter 10343698 +Ref: Chapter 10-Footnote-1346759 +Node: 10-1346893 +Ref: 10-1-Footnote-1349231 +Ref: 10-1-Footnote-2349265 +Ref: 10-1-Footnote-3349814 +Ref: 10-1-Footnote-4350807 +Node: 10-2351079 +Ref: 10-2-Footnote-1357684 +Ref: 10-2-Footnote-2357939 +Ref: 10-2-Footnote-3358272 +Node: 10-3358834 +Ref: 10-3-Footnote-1362554 +Ref: 10-3-Footnote-2362711 +Node: 10-4362925 +Node: 10-5365045 +Node: 10-6365665 +Ref: 10-6-Footnote-1367177 +Node: 10-7367545 +Node: 10-8369427 +Ref: 10-8-Footnote-1371211 +Node: 10-9371514 +Node: Chapter 11374077 +Ref: Chapter 11-Footnote-1376589 +Node: 11-1376851 +Ref: 11-1-Footnote-1381453 +Ref: 11-1-Footnote-2381709 +Ref: 11-1-Footnote-3381848 +Node: 11-2382358 +Node: 11-3384716 +Node: 11-4386169 +Ref: 11-4-Footnote-1390592 +Ref: 11-4-Footnote-2391225 +Node: 11-5391489 +Ref: 11-5-Footnote-1394047 +Ref: 11-5-Footnote-2394143 +Node: 11-6394871 +Ref: 11-6-Footnote-1396413 +Node: 11-7396592 +Ref: 11-7-Footnote-1399266 +Node: 11-8399623 +Node: 11-9402216 +Node: 11-10403787 +Node: 11-11406410 +Ref: 11-11-Footnote-1410716 +Node: 11-12410941 +Ref: 11-12-Footnote-1412743 +Node: Chapter 12413115 +Node: 12-1414766 +Ref: 12-1-Footnote-1420644 +Ref: 12-1-Footnote-2420712 +Ref: 12-1-Footnote-3420770 +Ref: 12-1-Footnote-4421028 +Node: 12-2421434 +Node: 12-3425420 +Ref: 12-3-Footnote-1432224 +Ref: 12-3-Footnote-2432459 +Node: 12-4432675 +Ref: 12-4-Footnote-1437596 +Ref: 12-4-Footnote-2437873 +Ref: 12-4-Footnote-3438090 +Ref: 12-4-Footnote-4438178 +Node: 12-5438396 +Ref: 12-5-Footnote-1441143 +Ref: 12-5-Footnote-2441487 +Node: 12-6441913 +Ref: 12-6-Footnote-1444455 +Ref: 12-6-Footnote-2444631 +Node: 12-7444842 +Node: Chapter 13445397 +Node: 13-1446294 +Ref: 13-1-Footnote-1450640 +Ref: 13-1-Footnote-2450830 +Node: 13-2451133 +Ref: 13-2-Footnote-1454793 +Node: 13-3454934 +Ref: 13-3-Footnote-1464625 +Node: 13-4464935 +Ref: 13-4-Footnote-1467969 +Ref: 13-4-Footnote-2468199 +Node: Chapter 14468347 +Node: 14-1469488 +Ref: 14-1-Footnote-1473908 +Node: 14-2474161 +Ref: 14-2-Footnote-1475007 +Ref: 14-2-Footnote-2475343 +Ref: 14-2-Footnote-3475644 +Node: 14-3475943 +Node: 14-4476786 +Node: 14-5480627 +Ref: 14-5-Footnote-1483361 +Node: 14-6483872 +Ref: 14-6-Footnote-1486461 +Node: 14-6-1487273 +Node: 14-7489076 +Ref: 14-7-Footnote-1493331 +Ref: 14-7-Footnote-2493813 +Node: 14-8493923 +Ref: 14-8-Footnote-1499439 +Ref: 14-8-Footnote-2499553 +Node: 14-9500141 +Node: 14-10502336 +Ref: 14-10-Footnote-1507595 +Ref: 14-10-Footnote-2507647 +Ref: 14-10-Footnote-3508041 +Node: 14-11508826 +Ref: 14-11-Footnote-1513230 +Ref: 14-11-Footnote-2513320 +Node: Chapter 15514184 +Node: 15-1516169 +Node: 15-2516961 +Ref: 15-2-Footnote-1521117 +Node: 15-2-1521530 +Node: 15-3523080 +Ref: 15-3-Footnote-1530975 +Ref: 15-3-Footnote-2531128 +Node: 15-4531321 +Ref: 15-4-Footnote-1535267 +Node: 15-5535632 +Node: Chapter 16537497 +Ref: Chapter 16-Footnote-1539992 +Ref: Chapter 16-Footnote-2540002 +Node: 16-1540012 +Ref: 16-1-Footnote-1544714 +Ref: 16-1-Footnote-2544724 +Node: 16-2544734 +Ref: 16-2-Footnote-1548937 +Node: 16-3548947 +Node: 16-4550883 +Ref: 16-4-Footnote-1556472 +Ref: 16-4-Footnote-2556482 +Ref: 16-4-Footnote-3556492 +Node: 16-5556502 +Ref: 16-5-Footnote-1560628 +Ref: 16-5-Footnote-2560638 +Node: 16-6560649 +Node: 16-7566176 +Node: 16-8570755 +Ref: 16-8-Footnote-1573538 +Node: 16-8-1573549 +Node: 16-9575317 +Node: Chapter 17575615 +Ref: Chapter 17-Footnote-1577213 +Node: 17-1577619 +Ref: 17-1-Footnote-1579586 +Ref: 17-1-Footnote-2579693 +Node: 17-1-1579919 +Node: 17-2581437 +Ref: 17-2-Footnote-1584358 +Node: 17-3584508 +Ref: 17-3-Footnote-1591033 +Ref: 17-3-Footnote-2591375 +Ref: 17-3-Footnote-3591530 +Node: 17-4592082 +Ref: 17-4-Footnote-1599640 +Ref: 17-4-Footnote-2600245 +Node: 17-5600507 +Ref: 17-5-Footnote-1605214 +Node: 17-6605634 +Ref: 17-6-Footnote-1607157 +Node: 17-7607656 +Node: 17-8612047 +Ref: 17-8-Footnote-1620475 +Node: 17-9620781 +Node: Chapter 18621953 +Ref: Chapter 18-Footnote-1626056 +Ref: Chapter 18-Footnote-2626730 +Node: 18-1627051 +Ref: 18-1-Footnote-1629029 +Node: 18-2629342 +Node: 18-3632463 +Node: 18-4635213 +Ref: 18-4-Footnote-1640190 +Node: 18-5640344 +Ref: 18-5-Footnote-1642423 +Ref: 18-5-Footnote-2642656 +Node: 18-6642871 +Node: 18-7646345 +Node: 18-8649702 +Node: 18-9655404 +Ref: 18-9-Footnote-1657432 +Ref: 18-9-Footnote-2657851 +Node: 18-10658091 +Node: Chapter 19658760 +Ref: Chapter 19-Footnote-1664422 +Ref: Chapter 19-Footnote-2664481 +Node: 19-1664532 +Node: 19-2666015 +Ref: 19-2-Footnote-1667959 +Ref: 19-2-Footnote-2668112 +Node: 19-3668327 +Node: 19-3-1673927 +Node: 19-4674590 +Ref: 19-4-Footnote-1681747 +Node: 19-5681932 +Node: 19-6684841 +Node: 19-6-1689564 +Node: Chapter 20690625 +Node: 20-1692542 +Ref: 20-1-Footnote-1693321 +Node: 20-2693582 +Ref: 20-2-Footnote-1700978 +Ref: 20-2-Footnote-2701220 +Ref: 20-2-Footnote-3701531 +Node: 20-3702431 +Ref: 20-3-Footnote-1710169 +Ref: 20-3-Footnote-2710820 +Node: 20-4711317 +Ref: 20-4-Footnote-1720610 +Ref: 20-4-Footnote-2721029 +Ref: 20-4-Footnote-3721276 +Ref: 20-4-Footnote-4721371 +Node: 20-5721544 +Ref: 20-5-Footnote-1726829 +Node: 20-6727586 +Ref: 20-6-Footnote-1734596 +Ref: 20-6-Footnote-2735148 +Ref: 20-6-Footnote-3735537 +Node: 20-7735927 +Ref: 20-7-Footnote-1738885 +Node: Chapter 21739013 +Ref: Chapter 21-Footnote-1741253 +Node: 21-1741612 +Node: 21-2746199 +Node: 21-3749591 +Ref: 21-3-Footnote-1751858 +Ref: 21-3-Footnote-2752087 +Node: 21-4753081 +Ref: 21-4-Footnote-1758800 +Ref: 21-4-Footnote-2759039 +Ref: 21-4-Footnote-3759477 +Ref: 21-4-Footnote-4759918 +Ref: 21-4-Footnote-5760445 +Ref: 21-4-Footnote-6760878 +Ref: 21-4-Footnote-7761153 +Node: 21-5761309 +Node: 21-6762722 +Ref: 21-6-Footnote-1766927 +Node: 21-7767254 +Ref: 21-7-Footnote-1771763 +Node: 21-8771941 +Ref: 21-8-Footnote-1777655 +Node: Chapter 22777919 +Node: 22-1779594 +Ref: 22-1-Footnote-1780531 +Node: 22-2780929 +Ref: 22-2-Footnote-1782111 +Node: 22-3782393 +Node: 22-4785468 +Ref: 22-4-Footnote-1788800 +Ref: 22-4-Footnote-2789373 +Node: 22-5789480 +Node: 22-6791677 +Node: 22-7792610 +Node: 22-8794494 +Ref: 22-8-Footnote-1797120 +Ref: 22-8-Footnote-2797753 +Node: 22-9798374 +Ref: 22-9-Footnote-1800259 +Node: 22-10800358 +Node: 22-11804003 +Ref: 22-11-Footnote-1807366 +Ref: 22-11-Footnote-2807494 +Ref: 22-11-Footnote-3807737 +Node: 22-12807891 +Node: 22-13811625 +Ref: 22-13-Footnote-1813623 +Ref: 22-13-Footnote-2813808 +Node: Chapter 23814139 +Ref: Chapter 23-Footnote-1816032 +Ref: Chapter 23-Footnote-2816170 +Node: 23-1816411 +Ref: 23-1-Footnote-1824876 +Ref: 23-1-Footnote-2825043 +Ref: 23-1-Footnote-3825225 +Ref: 23-1-Footnote-4825423 +Node: 23-2825715 +Ref: 23-2-Footnote-1829791 +Ref: 23-2-Footnote-2830021 +Node: 23-3830735 +Ref: 23-3-Footnote-1836043 +Ref: 23-3-Footnote-2836406 +Node: 23-4837253 +Ref: 23-4-Footnote-1845554 +Node: 23-5845817 +Node: 23-6847989 +Node: 23-7851024 +Ref: 23-7-Footnote-1856233 +Ref: 23-7-Footnote-2856374 +Node: 23-8856665 +Ref: 23-8-Footnote-1859413 +Node: 23-9859753 +Node: 23-10865640 +Node: Chapter 24867351 +Node: 24-1868783 +Ref: 24-1-Footnote-1870024 +Ref: 24-1-Footnote-2870437 +Node: 24-2871218 +Ref: 24-2-Footnote-1874886 +Ref: 24-2-Footnote-2875762 +Ref: 24-2-Footnote-3876076 +Node: 24-3877125 +Ref: 24-3-Footnote-1883382 +Ref: 24-3-Footnote-2883628 +Node: 24-4884095 +Node: 24-5887977 +Node: 24-6890756 +Node: 24-7894894 +Ref: 24-7-Footnote-1902146 +Ref: 24-7-Footnote-2902492 +Node: 24-8902738 +Node: 24-9904393 +Node: 24-10910203 +Node: 24-11916616 +Node: 24-12921750 +Ref: 24-12-Footnote-1930131 +Node: 24-13930631 +Ref: 24-13-Footnote-1933290 +Node: Chapter 25934029 +Ref: Chapter 25-Footnote-1939250 +Node: 25-1939606 +Ref: 25-1-Footnote-1942676 +Node: 25-2943316 +Node: 25-3944347 +Ref: 25-3-Footnote-1948008 +Node: 25-4948330 +Ref: 25-4-Footnote-1957716 +Node: 25-5957847 +Ref: 25-5-Footnote-1961810 +Node: 25-6961971 +Node: 25-7967376 +Ref: 25-7-Footnote-1970400 +Node: 25-8970589 +Ref: 25-8-Footnote-1976508 +Node: 25-9976761 +Ref: 25-9-Footnote-1978777 +Node: 25-10979497 +Node: 25-11982575 +Node: 25-12987797 +Node: 25-13992225 +Node: 25-14995042 +Ref: 25-14-Footnote-11007360 +Node: Chapter 261007828 +Node: 26-11008979 +Ref: 26-1-Footnote-11014169 +Ref: 26-1-Footnote-21014377 +Ref: 26-1-Footnote-31014624 +Ref: 26-1-Footnote-41014850 +Node: 26-21015637 +Ref: 26-2-Footnote-11022292 +Ref: 26-2-Footnote-21022353 +Ref: 26-2-Footnote-31022394 +Ref: 26-2-Footnote-41022725 +Node: 26-31022879 +Ref: 26-3-Footnote-11026245 +Node: 26-41026587 +Ref: 26-4-Footnote-11032725 +Node: 26-51032797 +Node: 26-61033934 +Node: 26-71039082 +Ref: 26-7-Footnote-11042922 +Node: 26-81043076 +Node: 26-91047473 +Ref: 26-9-Footnote-11056057 +Node: Chapter 271056414 +Node: 27-11057328 +Ref: 27-1-Footnote-11062383 +Node: 27-1-11062833 +Node: 27-21064204 +Ref: 27-2-Footnote-11068923 +Node: 27-31069448 +Ref: 27-3-Footnote-11072787 +Node: 27-41073331 +Node: 27-51080744 +Ref: 27-5-Footnote-11085888 +Node: 27-61086758 +Ref: 27-6-Footnote-11090878 +Node: 27-71091245 +Node: Chapter 281094401 +Ref: Chapter 28-Footnote-11094965 +Node: 28-11095385 +Ref: 28-1-Footnote-11099021 +Node: 28-21099212 +Ref: 28-2-Footnote-11104333 +Node: 28-2-11104635 +Node: 28-31105206 +Ref: 28-3-Footnote-11118343 +Ref: 28-3-Footnote-21118676 +Ref: 28-3-Footnote-31119015 +Ref: 28-3-Footnote-41119356 +Ref: 28-3-Footnote-51119925 +Ref: 28-3-Footnote-61120411 +Node: Chapter 291120654 +Node: 29-11121601 +Node: 29-1-11126253 +Node: 29-21127261 +Ref: 29-2-Footnote-11136286 +Ref: 29-2-Footnote-21137075 +Ref: 29-2-Footnote-31137569 +Node: 29-31138174 +Node: 29-41147570 +Ref: 29-4-Footnote-11150919 +Node: 29-51151289 +Node: 29-61155914 +Ref: 29-6-Footnote-11163033 +Node: 29-71164103 +Node: 29-81171802 +Node: 29-91173150 +Node: Chapter 301175501 +Node: 30-11177197 +Ref: 30-1-Footnote-11180753 +Ref: 30-1-Footnote-21181188 +Node: 30-21181327 +Ref: 30-2-Footnote-11187371 +Ref: 30-2-Footnote-21187605 +Node: 30-31187883 +Node: 30-3-11190842 +Node: 30-41191659 +Node: 30-51195151 +Node: 30-61196752 +Ref: 30-6-Footnote-11199900 +Node: 30-6-11200443 +Node: 30-71205049 +Ref: 30-7-Footnote-11214771 +Ref: 30-7-Footnote-21214935 +Node: 30-81215483 +Node: Chapter 311215899 +Node: 31-11217429 +Node: 31-21228986 +Ref: 31-2-Footnote-11241203 +Ref: 31-2-Footnote-21241486 +Node: 31-31241785 +Ref: 31-3-Footnote-11249058 +Node: 31-41249181 +Ref: 31-4-Footnote-11253828 +Node: 31-51254299 +Node: Chapter 321256512 +Node: 32-11257705 +Ref: 32-1-Footnote-11263071 +Node: 32-21263562 +Ref: 32-2-Footnote-11265388 +Ref: 32-2-Footnote-21265516 +Node: 32-31265823 +Ref: 32-3-Footnote-11284538 +Ref: 32-3-Footnote-21284980 +Ref: 32-3-Footnote-31285668 +Ref: 32-3-Footnote-41285986 +Node: 32-41286310 +Ref: 32-4-Footnote-11294566 +Ref: 32-4-Footnote-21295143 +Ref: 32-4-Footnote-31295423 +Ref: 32-4-Footnote-41295534 +Node: 32-51295838 +Ref: 32-5-Footnote-11301807 +Ref: 32-5-Footnote-21302044 + +End Tag Table |
