Collation is the assembly of written information into a standard order. This is commonly called alphabetisation, though collation is not limited to ordering letters of the alphabet. Collating lists of words or names into alphabetical order is the basis of most office filing systems, library catalogs and reference books.
Collation differs from classification in that classification is concerned with arranging information into logical categories, while collation is concerned with the ordering of those categories.
Advantages of sorted lists include:
- one can easily find the first n elements (e.g. the 5 smallest countries) and the last n elements (e.g. the 3 largest countries)
- one can easily find the elements in a given range (e.g. countries with an area between .. and .. square km)
- one can easily search for an element, and conclude whether it is in the list, e.g. with the binary search algorithm or interpolation search either automatically, or, roughly and perhaps unconsciously, manually.
A collation algorithm, e.g. the "Unicode collation algorithm", differs from a sorting algorithm: the first is a process to define the order, which corresponds to the process of just comparing two values, while a sorting algorithm is a procedure to put a list of items in this order.
Collation defines a total preorder on the set of possible items, typically by defining a total order on a sortkey. Note however that in the case of e.g. numerical sorting of strings representing numbers, the strings are only partially preordered, because e.g. 2e3 and 2000 have the same ranking, and 2 and 2.0 also. The numbers represented by the strings are totally ordered.
No comments:
Post a Comment