**Sorting **

- Comparable elements
**Bubblesort**- compare 1st and 2nd elements
- if 1st larger than 2nd, swap
- compare 2nd and 3rd, and swap if necessary
- continue until compare the last two elements
- the largest element is now the last element in the array.
- repeat statring from the beginning until no swaps are performed (i.e.,

the array is sorted) - each time you go through the elements bubbling up the largest element
- no need to try the last i elements for the ith run since the end

elements are already sorted

**Selection Sort**- array to be sorted: A
- array to be returned: B
- find smallest element in A and put in B
- mark space in A with null so it won’t be chosen again
- repeat last two steps until B is sorted array

**Insertion Sort**- algorithm
- passes through each element
- everything before element is sorted
- puts element in appropriate place in sorted half of array by checking

each element starting from the back of the sorted part of the array

- Code Methods: insertionsort
- Worst Case O(N
^{2}) – when array is reverse sorted - Best Case O(N) – when array is already sorted
- Swap Number
- = number of inversions (i.e., i < j but a[i] > a[j])
- Running Time = O(I + N)
- Notice that for a sorted array I = 0
- For a reverse array I = O(N
^{2}) - Average Number of inversions in an array of N distinct elements is N *

(N – 1 ) / 4 = O(N^{2})

- Average Case O(n
^{2}) - Any algorithm that sorts by exchanging adjacent elements requires

O(N^{2}time on average.- Including Bubblesort and Selection Sort

- algorithm
**Shellsort**- invented by Donald Shell
- Algorithm
- start separation = number of elements / 2
- compare a[i] and a[i+separation]
- swap if necessary
- do for each element of array i = 0 to a.length – separation
- set separation = separation /2
- recompare all the elements

- separation calculation
- shell suggested /2
- other sequences give great improvement
- note figure 7.3 does not / 2 properly

- Code Methods: shellsort
- Average Case Shellsort : Open Problem
- Worst Case O(N
^{2})- 1 9 2 10 3 11 4 12 5 13 6 14 7 15 8 16
- subsequant increments don’t do anything because not relatively prime
- last row is insertion sort of N(N-1)/2 comparisons
- second to last row is (N)(N/2 – 1)/2 comparisons and so on
- toss out the minus ones (we are rounding up – so will only increase

running time) - NN/2 + NN/2*2 + NN/4*2
- factor out NN/2 => NN/2 * ( 1 + 1/2 + 1/4 + ..)
- series is less than 2, so get NN/2 * (2) = NN => O(N
^{2}) - OR …
- each pass with increment h
_{k}has h_{k}insertion

sorts of N/h_{k}elements - insertion sort is O(N)
- each pass is thus of O(h
_{k}* (N/h_{k})^{2}) - or O(N
^{2}/h_{k}) - h
_{k}changes for each pass - h
_{k}in it’s worst case is 1, 2, 4, 8, … - all other cases will divide down smaller by throwing out remainder
- so sum of 1/h
_{k}= 1 + 1/2 + 1/4 + 1/8 + .. < 2 - Algorithm so we have 2N
^{2}= O(N^{2})

- Hibbard’s increments
- Use relatively prime gaps
- 1, 3, 7, 2
^{k}– 1 - Worst Case: O(N
^{3/2}) - I will spare you the proof.
- Average Case: O(N
^{5/4}) – not proved yet

- Sedgewick increments
- 1, 5, 19, 41, 109
- terms are either 9 * 4
^{i}– 9 * 2^{i}+ 1 or

4^{i}– 3 * 2^{i}+ 1 - Worst Case: O(N
^{5/4}) - Average Case: O(N
^{7/6}) – not proved

**Heapsort**- Worst Case: O(NlogN)
- Slower in practice than Sedgewick’s
- Code Methods: leftChild, percDown, heapsort
- algorithm:
- put elements in array
- build heap O(N)
- deleteMin N times and place back in array – O(N*logN)

- uses two arrays
- too much space for large arrays
- could use one, by placing deleteMins in back of array
- Will then be in reverse order
- could instead delete max
- and place max in back of array when delete it
- then use one array
- similar to selection sort

- array for heapsort starts in position 0 as opposed to a heap where it

starts in position 1 - Analysis
- Building Heap: 2N comparisons
- deleteMin logN each (N of them)
- O(NlogN)
- Performance is Extremely consistent
- On average uses only slightly fewer than worst bound
- To determine average case is difficult because successive deleteMax

destroy heap’s randomness - Average number of comparisons: 2NlogN – O(NloglogN)
- Notice that this is still O(NlogN)
- Best Case: NlogN – O(N)
- Notice that this is still O(NlogN)

**Mergesort**- Code Methods: mergesort (2), merge
- Worst Case: O(NlogN)
- Recursively merges two sorted lists
- Time to merge two sorted lists is linear (N-1 comparisons)
- 1 13 24 26 merge 2 15 27 38 gives 1 2 13 15 24 26 27 38
- Classic Divide and Conquer Strategy
- If more than one element, divide and merge sort the first and second

half - Analysis
- Recurrance Relation
- T(1) = 1
- T(N) = 2T(N/2) + N
- T(N)/ N = T(N/2)/N/2 + 1
- T(N/2) / N/2 = T(N/4) / N/4 + 1
- T(2) / 2 = T(1)/1 + 1
- Sum up all of the equations
- T(N)/N = T(1)/1 + logN <- this logN is the sum of the 1′s
- T(N) = NlogN + N = O(NlogN)

- Uses extra memory to merge and copy back into array
- Merging is cornerstone of external sorting routines
- Not often used for main memory sorts
- Can also do it not recursively
- Or can use less memory – much more complex algorithm (impractical)

**Quicksort**- Code Methods: quicksort (2), median3
- Fastest known sorting algorithm in practice
- Worst Case: O(N
^{2}) - Average Case: O(NlogN)
- Worst Case can be made very unlikely with little effort
- Divide and Conquer Algorithm
- Algorithm
- If the number of elements in S is 0 or 1, return
- pick any element in v in S. This is called the pivot.
- Partition the elements in S into two groups, those below the pivot

(numerically) and those above the pivot. - return quicksort of s1 followed by v followed by quicksort of s2

- Hope that half keys are greater than the pivot, other are less
- Subproblems are not guaranteed to be of equal size which is potentially

bad - Faster than mergesort because partitioning step can be performed in

place and efficiently - Picking the Pivot
- Pick first element
- Awful performance if list is sorted or reverse sorted
- All elements will go in S1 or S2
- Will take quadratic time and do nothing at all
- Presorted input or mostly sorted input is quite frequent
- Horrible pivot point

- Pick larger of the first two elements
- bad for the same reasons as above

- Pick pivot randomly
- Good solution
- Unless random number generator has a flaw (which is common)
- Random number generation expensive

- Median of Three
- Best choice would be the median of the array
- Would take too long
- Take median of first last and middle
- elimnates sorted case
- reduces running time of quicksort by about 5%

- Pick first element
- Partitioning Strategy
- Swap pivot into last element
- Assuming all distinct elements
- while i less than j
- Move i to right while elements less than pivot
- Move j to left while elements greater than pivot
- When i and j have stopped – swap elements and continue
- Once i and j have crossed, stop
- i is at lowest element greater than pivot
- swap i and pivot
- Duplicates to pivot
- both i and j go past the pivot
- Otherwise, if only i does, will put all duplicate in S1
- if both go past and all the same element – could create very uneven

subarrays - giving O(N
^{2}) results - best to make both stop and swap, since the extra swaps is better

than uneven subarrays - Why would we sort identical elements?
- Think of sorting 100,000 elements where 5000 are identical
- Eventually will call quicksort on 5000 elements (since quicksort is

recursive) - Will want these to be sorted efficiently

- Small Arrays
- N <= 20
- Insertion sort is faster
- Use insertion sort inside of quicksort for arrays smaller than 20
- Faster by 15% in running time
- Good cutoff is N = 10, although any number between 5 and 20 is fine
- Saves nasty cases – such as taking median of three elements when only

two

- Routines
- Driver calls quicksort method that specifies the start and stop of

array to be sorted - Median of Three Pivot selection
- returns pivot
- places smallest of left, median, and right to left
- places largest to the right
- places median in the middle then
- switched pivot to last element -1

- Heart of Quicksort
- Calls insertion sort on small arrays
- Skips outer elements since done in median3 with the ++ and –
- resores pivot into center from right – 1
- Need to increase i and j each loop (i.e., not inside while) because

otherwise i and j will never change and will always equal pivot

- Driver calls quicksort method that specifies the start and stop of
- Analysis
- Worst Case: O(N
^{2}) - Best Case: O(NlogN) using same proof as merge sort
- Average Case: O(NlogN)

- Worst Case: O(N

**Selection Problem**- Quickselect : Code
- |s| is the number of elements in s
- Algorithm
- if |s| is 1 then k = 1, return element
- if |S| <= cutoff (i.e., 10), then sort S and return kth largest
- pick pivot
- partition into S1 and S2
- if k <= s1, then kth smallest is in S1, return quicksort(s1, k)
- if k = 1 + s1, then return pivot
- otherwise quickselect(s2, k – |s1| – 1)

- makes only one recursive call instead of two
- Worst Case: O(N
^{2}) – when S1 or S2 is empty (not saving

recursive call) - Average Case: O(N)
- Array starts at zero, so kth smallest element in k-1 position
- If use median of three – then worst case condition almost impossible
- Does destroy initial ordering
- If not desirable – a copy need be made

**Decision Trees and Lower Bounds**- Any sorting algorithm that uses comparisons requires at least O(NlogN)

in worst time - merge sort and heap sort optimal within constant factor
- Decision Trees can represent all possible options, and can prove the

number of comparisons must be NlogN - Can only draw small decision trees (3 elements) will get too large
- Let T be a binary tree of depth d, then T has at most 2
^{d}

trees- Induction
- Basis: If d = 0, then there is at most one leaf
- Otherwise, we have a root, which cannot be a leaf, and a right and

left subtree, each of depth at most d-1. By the induction hypothesis, they

can have at most 2^{d-1}leaves giving a total of at most

2^{d}leaves.

- log(n!) = Log(NlogN)

- Any sorting algorithm that uses comparisons requires at least O(NlogN)
**Bucket Sort**- array of the size of maximum element + 1
- counts the number of elements of each
- useful where input is only small integers

**External Sorting**- Too large for main memory – skipping section
- Multiway Merge
- Polyphase Merge
- Replacement Selection