I'm just giddy
Saturday, July 28, 2012
Monday, July 23, 2012
Land of Lisp chapter 7
So a while back I mentioned needing a drip-feed of neurochemical triggers in the form of a productivity chart. I'm now 2 weeks late on a promise. Call it supporting evidence. So moving right along.
Chapter 7 covered more interesting list structures than have been used so far. It also comes across as quite dry so I'm liable to not spend much time in spots.
We start with dotted lists and pairs. A dotted list is simply a list that isn't terminated with nil. If we go back to thinking of Lisp lists as similar to a linked list structure a dotted list would be the same as plugging a data value into the last pointer in the chain. Which would be fugly in a linked list but works out alright for Lisp. Incidentally the dot notation used to print these lists
can also be used as a replacement for cons as in (1 . ( 2 . 3)) or for a common list as in (1 . (2 . (3 . nil))). I would assume using one form over the other would be a matter of taste and convention. The use of dotted lists regularly in code is supposedly uncommon however the book does state that running across such lists happens often which is quite contradictory and I'm not sure which side of the tale to believe. Either way I'll know what I'm looking at when I meet one. There is a relatively standard usage though in concisely describing pairs of things.
Additionally circular lists are discussed; these are simple enough to me that I'll gloss over them. Basically, the cdr of the final cons cell points back to one of the prior cells in the list. You could, if you felt like it, craft a very complex list using a combination of circular references. You'd also confuse the hell out of the interpreter unless you warned it that you were going to by issuing (setf *print-circle* t) which basically tells it to take a little longer in the print routine to make sure it doesn't get stuck printing in a loop.
Association lists (or alists) are also discussed which is, predictably, a list of pairs though presumably it's a list of keys consed to values which could themselves be further lists. Discussed here is a list of dot pairs. This is a convenient structure but apparently becomes inefficient when the list grows beyond a few items. Finally it discusses tree structures in data which we've already been using to some extent. If you've been watching you can see that Lisp lists are adept at handling trees as each branch basically has a symbol by which it can be referenced in the car of its cons cell. If I rip an example straight from the book
we can see this clearly. The *house* is composed of walls, windows, and roof. Each of which is a symbol identifying a list of its component parts.
The last half of the chapter covers Graphviz which takes formatted text input and generates image graphs based on the data. This is largely just data format manipulation code that takes the node and edge parameters defined for the wizard game and converts it to text output properly formatted for Graphviz. I'm not going to discuss this because it's largely boring; feel free to dissect the code linked below. Just as a clarification, data conversion is at the root of the beginning and ending of every program. It is vitally important, clearly, and a topic worthy of understanding. However the majority of the code I'm skipping is just string manipulation. Not only is it a thoroughly understood form of data manipulation it is, in my opinion, extremely dull; like a video game played 4 or 5 times already becomes dull in a vaguely interesting way. I'm going to cover new topics and functions introduced here but I won't be bothered with the boring details because I want my life to be more than vaguely interesting.
What I will discuss is new functionality. The substitute-if function
feeds each item in the target through the predicate-function and when true it substitutes the subvalue and returns the final result. This is demonstrated to work on lists and strings though I suspect it works on lists and a string is simply silently converted to a list of chars and back again.
The complement higher order function returns a function that basically inverts a predicate function. So
returns a function (which could be fed to another higher order function or presumably used in lambda) that returns the opposite of the alphanumericp function (which appropriately enough returns true when given an alphanumeric character). So in short, the function provided by complement above would return false when given [a-zA-Z0-9] and true for any other character.
A noteworthy concept on display in this chapter is the use of side effects. In each of the functions created to reformat the data we aren't really concerned about the output of the function itself. Instead the functions print the data to the console using princ and the output is ignored. This allows us to have greater control of what gets generated and easily review and debug that output. In the final function we'll simply capture the console output to a file. This runs against functional programming style as it relies on side effects. I honestly don't care; it works, it's clean, it facilitates code reuse (which we'll see in a minute) and I'm not religious about programming technique.
Once we've created all the functions required to reformat the data into a Gaphviz format we need to run that data through a Graphviz converter. The following function handles that task.
This gets interesting. Firstly, there's that thunk parameter. What a funny name for a symbol. According to Wikipedia (clearly the authority in computer science studies >.> ) thunks were invented by Peter Zilahy Ingerman and the term was arrived at because it's 'the past tense of think at two in the morning.' Which I can entirely believe. I'm not a hardcore compsci major but I'll do my best to take a whack at describing these, if only to help solidify my own understanding. Also called suspensions these functions offer a means of controlling the timing of evaluation of computations. Why this might be important (beyond deep efficiency mining) isn't readily apparent to me. The book makes the case that wrapping the graph->dot function inside a thunk delays the computation and allows the dot->png function to capture its output but I would argue that the same effect would be had if you simply plopped the function directly into the dot->png function as in this code
That gets you precisely the same output via the same methods. So it seems to me that what the thunk amounts to in this usage is to blackbox the creation of the data processed by the dot->png function. This is something I definitely understand and it allows us to plug any function name we want into the code above provided that the function outputs to console and (preferably) generates data parseable by the Graphviz 'dot' command. This also gets us off the hook for not actually outputting the correct data from the function directly. We use the thunk as a nullary (parameterless) function and ignore its output relying solely on the side effects of the function and the use of global values to perform the computations required.
Now, some less crunchy aspects of the function. The with-open-file function is very similar to let in that it defines a symbol to be used in the following forms.
The symbol becomes a stream variable (glossed over here) which is then available within the with-open-file form to be written to. In the usage above we very cleverly redefine the *standard-output* global stream variable (which is coincidentally the default stream for Lisp printing functions) to point to a file which is opened in write mode (:direction :output) and overwrites any existing file (:if-exists :supersede). Thus all the prints done from within the thunk (which we execute with funcall) are written to the file filename due to the dynamic value of *standard-output* from within the with-open-file form.
Now would be a good time for a quick interlude on keywords. These are symbols prepended with a colon as in :if-exists. These might be thought of as constants (though I'm unsure of the accuracy of the comparison to a C constant). They have an inherent value and that value is themselves and cannot be changed.
These are used in a number of functions to define various options such as file open modes. Options that might otherwise be incomprehensible strings of apparently arbitrary characters (yes, I'm looking at you Perl).
The last part of the chapter discusses the functions for generating an undirected graph. The siginifcant difference here is that the uedges->dot function uses maplist instead of mapcar which (as you might expect from the mnemonic) executes a function across a list (map) but instead of sending each car it sends the remaining list at each iteration (car vs list). Then the lambda function uses unless to skip any edges that are repeated later. The end result is that only the last direction of an edge gets printed but since the different directions of travel are meaningless in this context that's what we want.
The code for this chapter is at:
grapher.lisp
TheWizardAdventureGame.lisp
You'll want to do the following:
to get things running (using the correct paths of course).
Chapter 7 covered more interesting list structures than have been used so far. It also comes across as quite dry so I'm liable to not spend much time in spots.
We start with dotted lists and pairs. A dotted list is simply a list that isn't terminated with nil. If we go back to thinking of Lisp lists as similar to a linked list structure a dotted list would be the same as plugging a data value into the last pointer in the chain. Which would be fugly in a linked list but works out alright for Lisp. Incidentally the dot notation used to print these lists
can also be used as a replacement for cons as in (1 . ( 2 . 3)) or for a common list as in (1 . (2 . (3 . nil))). I would assume using one form over the other would be a matter of taste and convention. The use of dotted lists regularly in code is supposedly uncommon however the book does state that running across such lists happens often which is quite contradictory and I'm not sure which side of the tale to believe. Either way I'll know what I'm looking at when I meet one. There is a relatively standard usage though in concisely describing pairs of things.
Additionally circular lists are discussed; these are simple enough to me that I'll gloss over them. Basically, the cdr of the final cons cell points back to one of the prior cells in the list. You could, if you felt like it, craft a very complex list using a combination of circular references. You'd also confuse the hell out of the interpreter unless you warned it that you were going to by issuing (setf *print-circle* t) which basically tells it to take a little longer in the print routine to make sure it doesn't get stuck printing in a loop.
Association lists (or alists) are also discussed which is, predictably, a list of pairs though presumably it's a list of keys consed to values which could themselves be further lists. Discussed here is a list of dot pairs. This is a convenient structure but apparently becomes inefficient when the list grows beyond a few items. Finally it discusses tree structures in data which we've already been using to some extent. If you've been watching you can see that Lisp lists are adept at handling trees as each branch basically has a symbol by which it can be referenced in the car of its cons cell. If I rip an example straight from the book
we can see this clearly. The *house* is composed of walls, windows, and roof. Each of which is a symbol identifying a list of its component parts.
The last half of the chapter covers Graphviz which takes formatted text input and generates image graphs based on the data. This is largely just data format manipulation code that takes the node and edge parameters defined for the wizard game and converts it to text output properly formatted for Graphviz. I'm not going to discuss this because it's largely boring; feel free to dissect the code linked below. Just as a clarification, data conversion is at the root of the beginning and ending of every program. It is vitally important, clearly, and a topic worthy of understanding. However the majority of the code I'm skipping is just string manipulation. Not only is it a thoroughly understood form of data manipulation it is, in my opinion, extremely dull; like a video game played 4 or 5 times already becomes dull in a vaguely interesting way. I'm going to cover new topics and functions introduced here but I won't be bothered with the boring details because I want my life to be more than vaguely interesting.
What I will discuss is new functionality. The substitute-if function
feeds each item in the target through the predicate-function and when true it substitutes the subvalue and returns the final result. This is demonstrated to work on lists and strings though I suspect it works on lists and a string is simply silently converted to a list of chars and back again.
The complement higher order function returns a function that basically inverts a predicate function. So
returns a function (which could be fed to another higher order function or presumably used in lambda) that returns the opposite of the alphanumericp function (which appropriately enough returns true when given an alphanumeric character). So in short, the function provided by complement above would return false when given [a-zA-Z0-9] and true for any other character.
A noteworthy concept on display in this chapter is the use of side effects. In each of the functions created to reformat the data we aren't really concerned about the output of the function itself. Instead the functions print the data to the console using princ and the output is ignored. This allows us to have greater control of what gets generated and easily review and debug that output. In the final function we'll simply capture the console output to a file. This runs against functional programming style as it relies on side effects. I honestly don't care; it works, it's clean, it facilitates code reuse (which we'll see in a minute) and I'm not religious about programming technique.
Once we've created all the functions required to reformat the data into a Gaphviz format we need to run that data through a Graphviz converter. The following function handles that task.
This gets interesting. Firstly, there's that thunk parameter. What a funny name for a symbol. According to Wikipedia (clearly the authority in computer science studies >.> ) thunks were invented by Peter Zilahy Ingerman and the term was arrived at because it's 'the past tense of think at two in the morning.' Which I can entirely believe. I'm not a hardcore compsci major but I'll do my best to take a whack at describing these, if only to help solidify my own understanding. Also called suspensions these functions offer a means of controlling the timing of evaluation of computations. Why this might be important (beyond deep efficiency mining) isn't readily apparent to me. The book makes the case that wrapping the graph->dot function inside a thunk delays the computation and allows the dot->png function to capture its output but I would argue that the same effect would be had if you simply plopped the function directly into the dot->png function as in this code
That gets you precisely the same output via the same methods. So it seems to me that what the thunk amounts to in this usage is to blackbox the creation of the data processed by the dot->png function. This is something I definitely understand and it allows us to plug any function name we want into the code above provided that the function outputs to console and (preferably) generates data parseable by the Graphviz 'dot' command. This also gets us off the hook for not actually outputting the correct data from the function directly. We use the thunk as a nullary (parameterless) function and ignore its output relying solely on the side effects of the function and the use of global values to perform the computations required.
Now, some less crunchy aspects of the function. The with-open-file function is very similar to let in that it defines a symbol to be used in the following forms.
The symbol becomes a stream variable (glossed over here) which is then available within the with-open-file form to be written to. In the usage above we very cleverly redefine the *standard-output* global stream variable (which is coincidentally the default stream for Lisp printing functions) to point to a file which is opened in write mode (:direction :output) and overwrites any existing file (:if-exists :supersede). Thus all the prints done from within the thunk (which we execute with funcall) are written to the file filename due to the dynamic value of *standard-output* from within the with-open-file form.
Now would be a good time for a quick interlude on keywords. These are symbols prepended with a colon as in :if-exists. These might be thought of as constants (though I'm unsure of the accuracy of the comparison to a C constant). They have an inherent value and that value is themselves and cannot be changed.
These are used in a number of functions to define various options such as file open modes. Options that might otherwise be incomprehensible strings of apparently arbitrary characters (yes, I'm looking at you Perl).
The last part of the chapter discusses the functions for generating an undirected graph. The siginifcant difference here is that the uedges->dot function uses maplist instead of mapcar which (as you might expect from the mnemonic) executes a function across a list (map) but instead of sending each car it sends the remaining list at each iteration (car vs list). Then the lambda function uses unless to skip any edges that are repeated later. The end result is that only the last direction of an edge gets printed but since the different directions of travel are meaningless in this context that's what we want.
The code for this chapter is at:
grapher.lisp
TheWizardAdventureGame.lisp
You'll want to do the following:
to get things running (using the correct paths of course).
Subscribe to:
Posts (Atom)