Jorge Tavares weblog

Posts Tagged ‘Common Lisp

Sorting algorithms used in the CL implementations

leave a comment »

Which sorting algorithm should one implement when developing a program? The best answer is probably none. Use the sort provided by your system/library/etc. Unless you know your input data has some special properties that you can take advantage of, the provided sort should be enough for your needs and probably is more efficiently implemented.

However, I think it is important to know what sorting algorithm is implemented. If one knows the properties of the data, it is possible to understand if the provided sort can or will pose a problem. In the same way a programmer shouldn’t implement a sorting algorithm every time it needs to sort something, the programmer should also be aware of the limitations/advantages of the system sort. That way one can decide if a special sort is needed or not.

Common Lisp provides the functions sort and stable-sort. The HyperSpec describes their operation well but it does not define the sorting algorithm. That decision is left free to the implementations. In addition, both functions don’t necessarily share the same algorithm. The difference between the two is that the second function sorts in a way that guarantees stability, i.e., two elements that are equal remain in the same position after sorting is completed. The use of sort and stable-sort requires some care (see the section sort pitfalls) but lets focus on the algorithms and not on its usage.

What sorting algorithms do the major open source CL implementations actually implement? I was curious about it and went to check the source for ABCL, CCL, CLISP, CMUCL, ECL and SBCL. Not surprising, we find some differences between the implementations. What it was more unexpected to discover is that some implementations also use different sorting algorithms according to the sequence type. A quick survey of the findings is summarized in the following table (if anythings is incorrect, please tell me). The links for the source code are in the implementation name (careful, in CCL and SBCL there are two links).

Implementation sort stable-sort
ABCL merge sort (lists) / quicksort merge sort
CCL merge sort (lists) / quicksort merge sort
CLISP tree sort tree sort
CMUCL heapsort merge sort
ECL merge sort (lists) / quicksort quicksort (strings + bit vectors) / merge sort
SBCL merge sort (lists) / heapsort merge sort

 
In terms of the implementation of sort, quicksort is the most used algorithm, followed by heapsort. The choice for these algorithms is expected. Both have an average-case performance of O(nlgn) and heapsort guarantees a worst-case performace of O(nlgn) too. Quicksort has a worst-case performance of O(n2) but it can be optimized in several ways so that it also gives an expected worst-case performance of O(nlgn). However, it seems that the quicksort implementations are not completely optimized. In ECL (and ABCL) quicksort implements a partition scheme which deals better with duplicate elements (although is not the three-way partitioning) but it always picks as pivot the first element. CCL chooses the pivot with a median-of-3 method and always sorts the smaller partition to ensure a worst-case stack depth of O(lgn).

As for CLISP, I think it uses a tree sort but I am not entirely sure. The only source file I could find with a sort implementation was sort.d and it looks like it contains an implementation of tree sort with a self-balanced binary tree, which also gives this algorithm an average and worst-case performance of O(nlgn).

As expected, most of the implementations use merge sort to implement stable-sort since it is a stable sort with average and worst-case performance of O(nlgn). Apparently, all implementations are bottom-up merge sorts with the exception of CCL and ECL. Another interesting thing is that merge sort is also used for lists in sort, in most of the implementations. However, I found it surprising to find quicksort in the stable-sort column because it is not a stable algorithm. Since it is only used for strings and bit vectors, it is not really an issue. While reading the source code of the implementations, I realized that ABCL was using quicksort in stable-sort for all non-list sequences. This is a problem that exists in the current 1.0.1 release but I’ve sent a bug report with a quick fix to the maintainers. The next release should have stable-sort fixed.

This exploration of the sorting algorithms used in the open source implementations was very educational and interesting to me. I’ve learned what algorithms are actually used and enjoyed seing how they were implemented. Just spotting the issue in ABCL stable-sort made this review worthwhile. I think there is still room for improvement in some implementations but knowing now the strengths and weaknesses of the sorts in CL is already good enough. On a final note, I just wonder what are the algorithms used in ACL and LW.

Written by Jorge Tavares

February 2, 2012 at 9:45

Packages organization and exporting symbols

with 4 comments

I’ve started to re-design my main library for evolutionary computation. One of the main things I did for the new version was a complete new organization of the packages (and respective files/modules). Before I had essentially two main packages, the library itself and the examples. Although simple, it became a pain to use this model when I extended it heavily with more algorithms and related utilities. I hope I am not going now in the opposite direction (too complicated) but so far I like the new organization.

In short, there is a package for each main branch of algorithms (e.g., GA, GP) with everything specific that kind, which imports from a core package with the common components. These “sub-packages” are gathered together in a single package (the main library package). This way, it is possible to use in a project everything or simply just the desired component (e.g., if you just want GP). Furthermore, an extra package for the users is also provided to allow REPL experimentation without being on the library main package.

However, while implementing this scheme I realized that I wanted to have all the exported symbols from the packages that compose the library, also exported by the library main package. This way, all the symbols that compose the library are easily seen on the main package. For me this is very useful since it allows exploration of a library, especially if it has many things. Since I have never done something like this before, I went and search for a way to solve this minor problem in an easy way.

The answer is basically use do-external-symbols. With this macro you iterate over the exported symbols of a given package and then export them again on the package you want. Do this inside an eval-when form and when the library is loaded and the main package will contain all the symbols. If *library-sub-packages* is a list with the packages labels that compose your library:

(eval-when (:compile-toplevel :compile-toplevel :execute)
  (dolist (package *library-sub-packages*)
    (do-external-symbols (symbol (find-package package))
      (export symbol (find-package *library-main-package*)))))

Making all the exported symbols of internal packages also exportable by the main package turn out to be an easy thing to do. I don’t recall seing do-external-symbols (or the related macros) but I’m glad such a macro is provided. As always, the HyperSpec is your friend :-)

Written by Jorge Tavares

November 8, 2011 at 16:43

Posted in Programming

Tagged with , ,

ECLM 2011 Notes

with 2 comments

This last weekend I was in Amsterdam to attend the European Common Lisp Meeting. This was my third participation in a organized Lisp meeting (after the first ZSLUG in Zurich and the ELS 2011 in Hamburg) and I am happy I’ve decided to go. I was only present at the meeting itself since going to the dinners and city tour would have been way out of my budget. Anyway, the ECLM was a nice venue. I enjoyed most of the talks and still had an opportunity to talk with fellow lispers. I enjoyed talking with Luís Oliveira and meeting Zach Beane.

The first talk was given by Nick Levine and it can be viewed in two parts. In the first one, he talked about his experiences of trying to write a CL book for O’Reilly. It was quite interesting to see how hard can it be to prepare a book, especially for a publisher who was (is?) not very lisp-friendly. The second part was mostly about the community, although presented with a rant on libraries. This is a topic that has been debated several times. Thanks to Quicklisp, the problem now is not installing libraries but finding them and knowing which ones are good. I am not sure if creating another site as suggested would be a good thing since resources are already scarce. Perhaps more thought must be made in how to improve the current ones. CLiki still seems to me the best starting point. Still, Nick Levine talk was good and entertaining. One of the best in the meeting.

The following talks were mostly about companies that use CL as their main programming language. Jack Harper talked about the company he recently started, Secure Outcomes, that produces a unique portable fingerprint scanner. I must say his talk was quite inspiring! He talked about how to get a startup running and the decisions that took him to choose Lisp as the main development language. In addition, he also explained why prefers Lispworks to any other implementation.

Next, it was the talk given by Dave Cooper. I must confess his talk was the weaker of the day mainly because he talked about two different subjects without any connection. He started talking about GDL, the main product from his company, Genworks. I’m sure GDL can be a great thing but I didn’t get much from his talk. About halftime, the talk suddenly changed to the Common Lisp Foundation. This was the interesting part of the talk since he explained the aims of CLF, the people behind it, etc. However, it was not clear how it will distinguish itself from ALU in terms of operation (in terms of purpose, CLF just focus on Common Lisp while ALU in all Lisp dialects) and this was the main concern that was expressed during the questions time. After presenting CLF, and since there was still some time left for the next presenter, he went back to GDL.

Afterwards, it was the turn of Luke Gorrie to present his lisp-hacker startup Teclo Networks. His talk was an expanded/updated version of the one given in Zurich. Still, it was also quite interesting. He started by telling how a group of hackers with a Lisp and/or Erlang background got together to improve the mobile TCP/IP communications. Then, he showed us how TCP badly misbehaves in a mobile network and how their product, Sambal, can give 10% to 27% improvements. Another interesting point of the talk was that CL is used as their main development language. In short, it is used to develop and study all their algorithms. They have a TCP/IP stack fully implemented in CL! Moreover, all their analysis and maintenance tools are also all in CL. However, in the actual product boxes they have reimplemented the algorithms in C. The reason: extreme pragmatism. Luke concluded by hinting that the sales of their product is going very well!

In the afternoon the talks started with Paul Miller from Xanalys. The talk was dedicated to Link Explorer, a windows desktop tool to analyze data. The application is quite impressive and was developed using just CL. Paul also gave us a demonstration of the tool as well as some notes on future development.

The best and most awaited talk, Quicklisp, technically and socially, was given by Zach Beane. The talk focused on several aspects of Quicklisp. Zach started by giving an overview of the famous library problem of CL, the solutions that existed before QL, explaining their advantages and disadvantages. Also, and very important, what people were actually using and what difficulties they were facing. In a survey he did, most CL programmers were installing libraries by hand, including Zach! Then he proceed to how Quicklisp was developed, some technical issues, what is the role of Quicklisp and what is the reception after one year. The talk focused then on the social impact of Quicklisp in the community. One of the things that makes Zach happy it’s the number of emails he gets saying that people are back to using CL and contributing more to the community (i.e., making libraries available) because of QL. Finally, some indications of what is to come. My perception is that the possibility to enable hacking as it was possible with clbuild is one of the most exciting future features for Quicklisp. Zach’s talk was excellent from all points of view!

The last talk of the day was by Hans Hübner. This was my second favorite presentation. Although the topic, code style and conventions, can start some heated discussions, I must say that I agree with almost everything Hans Hübner mentioned. However, like everything, some common sense is always necessary. One of the main points was that lispers should not use constructions which are not part of the standard language when the standard provide options, just because you want to save some typing. It is more important for another programmer to understand faster what is written than forcing him to look for the definition of the unusual constructs. The if*, bind were examples given. Hans also talked abut the 80-column rule, style guides, etc. In the end, it always depends on the project, the people, etc, but code style is important and should not be ignored.

The meeting ended with several lightning talks. The most interesting bits were: Marco Antoniotti announced ELS 2012, to be held in Zadar, Croatia, around April-May; Christophe Rhodes talked again about swankr, a swank and slime for R; the announcement of ABCL 1.0.0 by Erik Huelsmann.

Some words on the organization. Organizing a meeting of this kind is not easy and Edi Weitz and Arthur Lemmens must be congratulated for making a great event. Not all was perfect but everything went smoothly. I wish that it continues to happen in the coming years!

Written by Jorge Tavares

October 26, 2011 at 21:19

Posted in Programming

Tagged with , ,

The Lisp Alien arrived: a “Land of Lisp” review/opinion

with 6 comments

I finished reading Land of Lisp (LoL) from cover to cover some days ago. I bought the print+pdf pack the day it was released, because I simply couldn’t resist. The video and the comic available on the website convinced me that, even if it was the worst book in the world (which it obviously isn’t), I needed to have it. I read the first three chapters of the pdf just to check it and then I decided to wait for the real thing. Surprisingly, it arrived sooner than expected. I started again from the beginning and only stopped at the very end. So, how was the book? The short version: very good. The long version: well, keep reading this post :-)

Land of Lisp became one of my favorite Lisp, and programming, books. The main reason is how different the book is. The cartoons and the jokes are a major part of that but what really makes the book great, is how Conrad Barski connects the fun side to the actual content. And in this aspect, teaching and presenting Lisp through simple games is a key to that link. The pieces just fall into place. And the selection of games, characters, etc, is for the most part well done making you feel that you are in a conversation with the Wizard himself. That’s perhaps the best description of what it feels like reading the book: you’re the apprentice that is next to the Wizard and he just shows you, step by step, all those wonderful spells in a very convincing way.

What about the content itself? How is the book and how does it relate to other Lisp books, specifically the ones aiming for beginners? What does it present? Well, I think a lot can be said about all that. In my opinion, the introduction is one of the best chapters in the book. The history section is one of the most original texts I’ve read in a computer-related book. The first steps with Lisp are also well accomplished since I think it will be very easy for a beginner to surpass all those initial non-issues (which Lisp? what implementation, etc.) and at the same time understand why things are the way they are. Although I don’t use CLisp, I don’t think it was a bad decision to use it as the implementation for the book. It’s more than fine for learning and allows the book to explore certain areas that otherwise would not be possible in an agnostic manner (e.g., sockets). Unless libraries were used, but I don’t think that would be good in an entry-level book like this.

The approach to teach Lisp in LoL is mostly a functional one. From the first chapters the concepts are presented with the functional style always in mind. To be honest, I don’t think this is a bad approach but it has some drawbacks. It’s easy to show a lot of things and explain others but fails in showing how Common Lisp really excels. Chapter 5, building a text-based game, is a good example of how a functional style works well. However, this does not mean the more imperative, non-functional aspects, are “hidden” or held-back. It’s quite the opposite. The initial chapters, and specifically chapter 2, show how to declare variables and assign values. It also shows a lot of “with effects” stuff. The genius of the author is precisely in how he puts all these concepts together. In the end, the user feels the advantages of the functional aspects and at the same time sees lots of familiar stuff.

Along the way we find some gems. One is chapter 6.5, about lambda. Another is the “Periodic Table of the Loop Macro” in pages 200 and 201. This is the most innovate way I’ve seen to clearly present loop. I wish a poster would be made out of those two pages. I never liked Chemistry so much but I love this periodic table. The very last gem is surely the last cartoon explaining the main Lisp features (this cartoon is available at the website). Conrad Barski is able to present the main topics of Common Lisp. On top of that, he leaves the comfort zone and shows how to use Lisp for “real stuff”, more practical, even in the form of a game. You get to see how to produce graphviz files, how to do web development, play with SVG files among other things. I also liked very much how macros are used to introduce Lazy evaluation and how a faster version of the Dice of Doom game is developed.

However, just because LoL is now one of my favorite Lisp books, it does not imply that I think it is the best. And the reason is very simple: it does not show well the multiparadigm language Common Lisp is. It feels that the second part of the book is missing. And that becomes more clear when you finish the book and read the last comic. For instance, where is CLOS? Why was the condition system left “in half”? The book is strong in presenting things from a functional point of view but fails for the others, mainly in how mixing paradigms is better than going for just one. It’s not easy to write a book and choices must be made but a beginner will finish the book with the view that Common Lisp is more functional programming and, in my opinion, that can be misleading. The real power of Common Lisp is that it is the only language that accomplishes better multiparadigm programming. The Lisp way is adapting to the problem at hand and not the other way around (like most languages do). Still, this does not make the book any less good. But a second part would be nice, like “The Lisp Aliens Strike Back” or something along those lines :-)

This is definitely a beginners book, a first book for someone who wants to learn Common Lisp and knows already how to program. But, which kind of beginner? This is an important point. Different persons with different backgrounds will surely react and learn in their own way. I remember the beginning of “Successful Lisp” defining several types of beginners. For me, and taken into account my experience in teaching a subset of CL to undergraduates in an Artificial Intelligence course, this book benefits more young programmers with a curious mind (in terms of how to learn the language). The game-learning model fits very well and the style of the book will keep people interested until the end. If you booked a trip to the moon and you can only carry two books, Land of Lisp and Seibel’s excellent “Practical Common Lisp” (PCL) are the ones you should take to learn CL. After LoL, PCL is the perfect follow-up. Why not the other way around, since PCL is much more comprehensive, complete and focused on how to use CL for practical stuff, etc? Because I still believe PCL is the right book for someone who already knows/has some/little experience in CL or, is a mature programmer (by mature I mean someone who didn’t start programming just a few months ago, regardless of age). Or, in Seibel’s own words,If you’re a hard-nosed pragmatist who wants to know what advantages Common Lisp has. I admit, my opinion is biased because of my experience in trying to teach CL (not an easy task I must add, especially if you’re constrained like I was, but that’s a different story). During that time, I always wished for a book that could introduce CL in the right way. Although myself never quite pictured what would be “the right way”. When PCL came out it got close but I think that LoL in this particular aspect is better. A very good first contact with the world of Lisp! And for all the Lispers out there, this is surely the most fun book to read :-)

Written by Jorge Tavares

December 26, 2010 at 18:03

Posted in Programming

Tagged with ,

The answer is 1 if…

leave a comment »

… you know something about Floating-Point Arithmetic and if you use Common Lisp. :-)

Background: recent story that appeared on reddit. From time to time there is always an article about this issue. I am surprised by the amount of people who simply are unaware of it or who don’t care. And it’s not just someone who started to learn, which is a bit worrying. This particular example also reminds me of one of the many things that I like about Common Lisp: well-integrated rational arithmetic. So, if you code in Common Lisp the given C function taking into account rationals:

(defun f (n)
  (let ((a (/ (floor (* n 10)) 10)))
    (- a (/ (- (* a 10) 10) 10))))

And now you run for all the test cases presented in the article:

CL-USER> (defparameter *numbers*
	   '(5.1 91.3 451.9 7341.4 51367.7 897451.7 1923556.4 59567771.9
	     176498634.7 2399851001.7 60098761442.7 772555211114.1
	     1209983553611.9 59871426773404.9 190776306245331.2
	     2987154209766221.6 19843566622234755.9 719525522284533115.3
	     8399871243556765103.9 39847765103525225199.1 553774558711019983333.9))
*NUMBERS*
CL-USER> (loop for n in *numbers* collect (f n))
(1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1)

Et voilá! The answer is always 1. Naturally, if you code the function just like in the C version, you will get the same type of results as in the story. But then, why would you want to do that? :-)

Written by Jorge Tavares

November 24, 2010 at 16:28

Follow

Get every new post delivered to your Inbox.