A note on shortcircuiting of argument evaluation in #'<

There is an interesting thread in the Clozure Common Lisp development mailing list about argument evaluation in #'<. Until now, CCL was implementing shortcircuiting when evaluating the arguments of #'<. This essentially means that when calling (< 1 3 2 4 5) it would stop before evaluating all the arguments since it wold know it would be false. Just like and. However, as reported in the initial message by Eric Marsden, it violates the Hyperspec (section Function Forms) since it does not allow this behaviour for functions. The thread got interesting because a small discussion followed if the correct behaviour should be the one shown by CCL and not the one demanded by the Hyperspec. I must say, while reading, my initial reaction was also to consider shortcircuiting as the desired functionality but soon afterwards, you realize that you really don’t want that. The simplest argument is that you don’t want (< x y z) to have a different semantic from (funcall #'< x y z) as Tim Bradshaw rightly pointed out. Consistency is something very good to have and in the end, if you wish lazy semantics, you can always add them yourself in the true Lisp spirit. However, this shows once more how the writers of the Hyperspec got so many things right even if for a moment you might think otherwise. It was an interesting thread to read.

Professional Change

After many years in academia doing research, I’ve recently joined the software industry. I am now working at Microsoft, specifically, at their Search Technology Center in Munich. It’s an exciting and challenging work!

I will continue doing my Common Lisp explorations in my free time and talk about them in here as before (disclaimer: the views and opinions expressed on this blog are my own and not those of my employer).

zsort: portable sorting algorithms in Common Lisp

zsort is a library that I started working on as a simple hobby project. More or less around the same time I decided to check which algorithms the different Common Lisp implementations use. It is now part of Quicklisp so it can be easily used (thanks Zack!).

The main goal of zsort is to be a collection of portable sorting algorithms. If they can be fast, even better. Common lisp provides the sort and stable-sort functions but these can have different algorithms implemented according to each implementation, which can make an application unportable if you rely on a specific type of sorting. Also, the standard functions might not be the best for a certain situation and as such you might need a specialized sort. Even if for most situations the standard functions are more than enough, the zsort library could be a useful complement.

Right now the implemented algorithms are: insertion sort, quicksort, randomized quicksort, merge sort, heapsort and counting sort. The plan is to add more algorithms, for example, bucket sort and timsort. However, the main thing on the todo list is adding the possibility of external sorting (to handle large amounts of data) and parallel versions of some sorting algorithms. I am considering using lparallel for this but I am still undecided.

There is still a lot of work to be done, but I think the library as it is can already be a little useful. And of course, all kind of suggestions and improvements are welcome!

Best Paper Award at EuroGP 2012

Last week I was at EuroGP where I presented my latest paper “Automatic Design of Ant Algorithms with Grammatical Evolution” (pdf, slides) and it won the Best Paper Award! I am very happy with this distinction since EuroGP is the leading conference on Genetic Programming attended by the best researchers in the field.

In this paper, a Grammatical Evolution approach is used for the automatic design of Ant Colony Optimization algorithms. The used grammar has the ability to guide the learning of novel architectures, by rearranging components regularly found on manually designed variants (for example, the Elitist Ant-System, the Ant Colony System or the Max-Min Ant System). This approach was tested with the TSP and the results show that the evolved algorithmic strategies are effective, exhibit a good generalization capability and are competitive with human designed variants. This is still a starting point and there is a large amount of work to be done but the indications given by these results are encouraging!

This year the conference took place in Málaga, Spain, as part of Evo* as usual. The event was great with an excellent organization lead by Carlos Cotta. It was a fantastic Evo*! Next year it will be held in Wien!

Book review: The Tangled Web

For the past couple of months I’ve been reading The Tangled Web: A Guide to Securing Modern Web Applications by Michael Zalewski. The book was published by no starch press and they were kind enough to provide me a copy for review. I enjoy reading about topics outside my main field and interests. It’s a way to keep myself informed and learn new things and this book fits exactly into this category.

The Tangled Web is mostly about web technologies and how insecure they are by nature. The book is a very engaging narrative, full of details and impressive war stories. It focuses on the practical issues of web technologies and not on the theory of security. The book can be very useful for web developers and those interested in security. For example, at the end of each chapter we can find a “Security Engineering Cheat Sheet”, which presents us a summary of things to consider/do. These sheets alone make the book worthwhile having. The book is organized in three main parts. In the first one, the author tells us the story of the inception of the web until today and discusses all the important technologies, protocols, etc. The second part focuses on the browser security and the third part on “the things to come”. Although the book is not very thick (around 300 pages) it addresses too many important issues to completely absorb them in a single reading.

To conclude, the Tangled Web is a solid book, full of interesting and useful information. For web developers and security experts it should be a must read book. For the rest of us it is an enjoyable reading.

Macros Design Patterns

I was reading the thread What are some fun or useful macros? on reddit and it reminded me of another thread that appeared in the Pro mailing list. These kind of threads are always enjoyable because each time you learn something new and see really interesting things. While reading them, it crossed my mind that another variant of this question would be what are some fun or useful macros design patterns. Instead of examples of specific code macros (general or not) it would be nice to see common programming practices using macros. So, for a lack of a better expression name, let’s call them macros design patterns.

My favorite one is to configure an algorithm, especially when we want to use the correct types. Essentially, we write an algorithm using a macro that takes types or configuration arguments and then we expand it to the appropriate desired configurations. For example, if you have an algorithm that operates on different types of sequences, instead of writing several duplicate functions with the same algorithm but with the associated type declarations, just apply this pattern. A simple but contrived example: we want to compute the mean of a vector but use its proper type. In addition, we might also want the possibility of using a key to access the vector elements. We can write the following macro:

(defmacro mean-body (vector vector-type vector-ref key)
  (let ((size (gensym)) (i (gensym)))
	 (declare (type ,vector-type ,vector))
       (let ((,size (length ,vector)))
	 (/ (loop for ,i from 0 below ,size
		  sum ,(if key 
			   `(funcall ,key (,vector-ref ,vector ,i))
			   `(,vector-ref ,vector ,i)))

The macro contains the algorithm (in this simple case the mean) and the arguments allow us to configure the multiple versions we need. If we want a simple-vector, the macro will expand to use the correct type declaration and svref. If a key function is needed it will also include it. Then, we can call the macro with the several configurations value inside the main function:

(defun mean (vector &optional key)
  (typecase vector
     (if key 
	 (mean-body vector simple-vector svref key)
	 (mean-body vector simple-vector svref nil)))
     (if key 
	 (mean-body vector vector aref key)
	 (mean-body vector vector aref nil)))
      (if key 
	 (mean-body vector sequence elt key)
	 (mean-body vector sequence elt nil)))))

This can be very useful in situations where we want to optimize code since it becomes easy to add the proper type declarations to the input arguments of an algorithm. Moreover, we keep the algorithm in a single place, making it easier to maintain. Depending on the situation, we can also define a function for each configuration. In the example we could have a mean-simple-vector and mean-vector.

I don't know if it has already a specific name but I like to call it the configurable algorithm pattern. I find it very useful. And thinking back to the reddit thread, what are your favorite macros design patterns? Which ones do you find useful and use them regularly? If you want to share, feel free to drop a line. I am interested in seing and learning other patterns!

Sorting algorithms used in the CL implementations

Which sorting algorithm should one implement when developing a program? The best answer is probably none. Use the sort provided by your system/library/etc. Unless you know your input data has some special properties that you can take advantage of, the provided sort should be enough for your needs and probably is more efficiently implemented.

However, I think it is important to know what sorting algorithm is implemented. If one knows the properties of the data, it is possible to understand if the provided sort can or will pose a problem. In the same way a programmer shouldn’t implement a sorting algorithm every time it needs to sort something, the programmer should also be aware of the limitations/advantages of the system sort. That way one can decide if a special sort is needed or not.

Common Lisp provides the functions sort and stable-sort. The HyperSpec describes their operation well but it does not define the sorting algorithm. That decision is left free to the implementations. In addition, both functions don’t necessarily share the same algorithm. The difference between the two is that the second function sorts in a way that guarantees stability, i.e., two elements that are equal remain in the same position after sorting is completed. The use of sort and stable-sort requires some care (see the section sort pitfalls) but lets focus on the algorithms and not on its usage.

What sorting algorithms do the major open source CL implementations actually implement? I was curious about it and went to check the source for ABCL, CCL, CLISP, CMUCL, ECL and SBCL. Not surprising, we find some differences between the implementations. What it was more unexpected to discover is that some implementations also use different sorting algorithms according to the sequence type. A quick survey of the findings is summarized in the following table (if anythings is incorrect, please tell me). The links for the source code are in the implementation name (careful, in CCL and SBCL there are two links).

Implementation sort stable-sort
ABCL merge sort (lists) / quicksort merge sort
CCL merge sort (lists) / quicksort merge sort
CLISP tree sort tree sort
CMUCL heapsort merge sort
ECL merge sort (lists) / quicksort quicksort (strings + bit vectors) / merge sort
SBCL merge sort (lists) / heapsort merge sort

In terms of the implementation of sort, quicksort is the most used algorithm, followed by heapsort. The choice for these algorithms is expected. Both have an average-case performance of O(nlgn) and heapsort guarantees a worst-case performace of O(nlgn) too. Quicksort has a worst-case performance of O(n2) but it can be optimized in several ways so that it also gives an expected worst-case performance of O(nlgn). However, it seems that the quicksort implementations are not completely optimized. In ECL (and ABCL) quicksort implements a partition scheme which deals better with duplicate elements (although is not the three-way partitioning) but it always picks as pivot the first element. CCL chooses the pivot with a median-of-3 method and always sorts the smaller partition to ensure a worst-case stack depth of O(lgn).

As for CLISP, I think it uses a tree sort but I am not entirely sure. The only source file I could find with a sort implementation was sort.d and it looks like it contains an implementation of tree sort with a self-balanced binary tree, which also gives this algorithm an average and worst-case performance of O(nlgn).

As expected, most of the implementations use merge sort to implement stable-sort since it is a stable sort with average and worst-case performance of O(nlgn). Apparently, all implementations are bottom-up merge sorts with the exception of CCL and ECL. Another interesting thing is that merge sort is also used for lists in sort, in most of the implementations. However, I found it surprising to find quicksort in the stable-sort column because it is not a stable algorithm. Since it is only used for strings and bit vectors, it is not really an issue. While reading the source code of the implementations, I realized that ABCL was using quicksort in stable-sort for all non-list sequences. This is a problem that exists in the current 1.0.1 release but I’ve sent a bug report with a quick fix to the maintainers. The next release should have stable-sort fixed.

This exploration of the sorting algorithms used in the open source implementations was very educational and interesting to me. I’ve learned what algorithms are actually used and enjoyed seing how they were implemented. Just spotting the issue in ABCL stable-sort made this review worthwhile. I think there is still room for improvement in some implementations but knowing now the strengths and weaknesses of the sorts in CL is already good enough. On a final note, I just wonder what are the algorithms used in ACL and LW.