Programming Challenge: Merge an arbitrary number of arrays in sorted order. - ~comp

[2]

mironimous

September 7, 2018 (edited September 7, 2018)

Link

Here is my sed solution: :a s/[^0-9,]//g s/(,|$)+/%,/g :b s/$/@0@1@2@3@4@5@6@7@8@9@/ :c s/^([0-9]*)([0-9])%([0-9]*,)(.*@\2[^@]*)@/\4\1%\2\3@/ tc s/@(.|$)//g /(^|,)[^%]/{s/(,|^)%/\10%/g; bb}...

Here is my sed solution:

:a
s/[^0-9,]//g
s/(,|$)+/%,/g
:b
s/$/@0@1@2@3@4@5@6@7@8@9@/
:c
s/^([0-9]*)([0-9])%([0-9]*,)(.*@\2[^@]*)@/\4\1%\2\3@/
tc
s/@(.|$)//g
/(^|,)[^%]/{s/(,|^)%/\10%/g;
bb}
s/%0*([^,])|,$/\1/g
s/^,//
N
s/\n/,/
ba

Inputs are comma separated lists of numbers, each newline the currently sorted list is outputted, so if you want the whole list, pipe it into tail -n1.

Internally it uses radix sort (@0@1@2@3@4@5@6@7@8@9@ are the marks for the buckets and it adds padding zeros in front of the numbers as needed and removes them at the end).

Edit: Edited it so now it only outputs the list at EOF.

22 votes

balooga
September 7, 2018
Link Parent
You monster. That is amazing.

You monster.

That is amazing.

6 votes

[2]

Comment deleted by author

Link

jacebot
September 8, 2018
Link Parent
This is pretty sexy too. 3 lines and done. 2 if you don't count the import. Nice.

This is pretty sexy too. 3 lines and done. 2 if you don't count the import. Nice.

[3]

666

September 8, 2018

Link

This is the least efficient I can make it (Python 3), without adding irrelevant code in between: def sort_arrays(*a): return...

This is the least efficient I can make it (Python 3), without adding irrelevant code in between:

def sort_arrays(*a):
  return list(sorted(__import__('functools').reduce(__import__('operator').concat, a)))

Just call it like this: sort_arrays([5,7,1], [2,8,3], [9,5,7])

7 votes

jacebot
September 8, 2018
Link Parent
Now this is elegant, and mostly readable.

Now this is elegant, and mostly readable.

3 votes
Emerald_Knight (OP)
September 8, 2018
Link Parent
You sicken me. Well done!

You sicken me. Well done!

2 votes

jonluca

September 7, 2018

Link

Hmm fun. Naive (straight forward): def sort_arrays(*arrays): compiled = list() for array in arrays: compiled += array print(compiled.sort()) It's got a time complexity of O(nlogm) where n is the...

Hmm fun.

Naive (straight forward):

def sort_arrays(*arrays):
	compiled = list()
	for array in arrays:
		compiled += array 
	print(compiled.sort())

It's got a time complexity of O(nlogm) where n is the total number of elements in all arrays. I'll try to come up with some more fun ones when I get home (maybe a generator? multithreaded?)

5 votes

Celeo

September 7, 2018 (edited September 7, 2018)

Link

Super simple Python. Read a line, create a new list of those values as _int_s, throw it on to a master list, then unpack everything and sort it. import sys arrays = [] for line in sys.stdin: line...

Super simple Python. Read a line, create a new list of those values as _int_s, throw it on to a master list, then unpack everything and sort it.

import sys


arrays = []
for line in sys.stdin:
    line = line.strip()
    if not line:
        break
    arrays.append([int(e) for e in line.split(' ') if e])
print(sorted(i for j in arrays for i in j))

The easiest thing to do is cast to ints when reading off stdin, but sorted supports a key parameter to specify how to sort the data as well.

Similar thing in Go, but only keeping a single slice of ints and appending to it each time. Not likely a performant usage of append.

package main

import (
	"bufio"
	"fmt"
	"os"
	"sort"
	"strconv"
	"strings"
)

func main() {
	var ints []int
	reader := bufio.NewReader(os.Stdin)
	for {
		text, err := reader.ReadString('\n')
		if err != nil {
			panic(err)
		}
		text = strings.TrimSpace(text)
		if text == "" {
			break
		}
		for _, part := range strings.Split(text, " ") {
			partVal, err := strconv.Atoi(strings.TrimSpace(part))
			if err != nil {
				panic(err)
			}
			ints = append(ints, partVal)
		}
	}
	sort.Ints(ints)
	fmt.Println(ints)
}

5 votes

RiderOfGiraffes

September 7, 2018

Link

Python2 def sort_list_of_lists( lol ): combined = sum( lol, [] ) combined.sort() return combined That's just the function, I leave the I/O to the front-end people.

Python2

def sort_list_of_lists( lol ):
    combined = sum( lol, [] )
    combined.sort()
    return combined

That's just the function, I leave the I/O to the front-end people.

4 votes

[3]

balooga

September 7, 2018

Link

Just because of all the PHP umbrage around here lately, I wrote my solution in it. 😝 function combineAndSortArrays(...$params){ $arr = array_reduce($params, 'array_merge', []); sort($arr); return...

Just because of all the PHP umbrage around here lately, I wrote my solution in it. 😝

function combineAndSortArrays(...$params){
        $arr = array_reduce($params, 'array_merge', []);
        sort($arr);
        return $arr;
}

combineAndSortArrays() is a variadic function that accepts as many arrays as you can throw at it. It combines them into a single array and then sorts it by value.

4 votes

[2]
Klayy
September 8, 2018
Link Parent
You can actually do $arr = array_merge(...$params);

You can actually do
$arr = array_merge(...$params);

1 vote
1. balooga
  September 8, 2018
  Link Parent
  Good catch!
  
  Good catch!

[6]

Soptik

September 7, 2018 (edited September 21, 2018)

Link

I'll use my own sorting algorithm (but I'm sure it already exists): Theory Let's have a method that gets unlimited amount of arrays (params int[][] in c#). We build some collection, probably...

I'll use my own sorting algorithm (but I'm sure it already exists):

Theory

Let's have a method that gets unlimited amount of arrays (params int[][] in c#). We build some collection, probably binary tree. For each element in each of the arrays, I'll append the number into tree (or custom collection that works very similar to binary tree). Because the arrays are not ordered, we can safely choose first element of first array as tree root, as it's equivalent to choosing random element. I don't know if it's worth balancing the tree in the process.

After we have all elements, we return the tree (or convert it to array and return it).

Time complexity should be, where N is number of total elements, N*log(N), plus, if we convert the tree to an array afterwards, N.

It would be interesting to compare it to the most simple way how to do it, simply add everything to single array and then call .Sort().

Code

        static void Main(string[] args)
        {
            Random rnd = new Random();
            int[][] input = new int[10][];
            for(int i = 0; i < input.Length; i++)
            {
                input[i] = new int[10];
                for(int j = 0; j < input[i].Length; j++)
                {
                    input[i][j] = rnd.Next(101);
                }
            }
            Stopwatch sw = new Stopwatch();
            const int tries = 100;
            List<long> t1 = new List<long>(tries);
            List<long> t2 = new List<long>(tries);
            for (int i = 0; i < tries; i++)
            {
                sw.Start();
                SortArraysSimpleMethod(input).ToArray(); // Convert result to array, otherwise the code will never be executed, as LINQ uses lazy loading
                sw.Stop();
                t1.Add(sw.ElapsedMilliseconds);
                sw.Reset();

                sw.Start();
                SortArraysComplexMethod(input);
                sw.Stop();
                t2.Add(sw.ElapsedMilliseconds);
                sw.Reset();

                Console.Write(".");
            }
            Console.Write("\n");

            Console.WriteLine(t1.Average());
            Console.WriteLine(t2.Average());
            Console.ReadKey();
            return;
        }


        private static IEnumerable<int> SortArraysSimpleMethod(params int[][] arrays)
        {
            int[] result = new int[arrays.Select(array => array.Length).Sum()];
            int i = 0;
            foreach(int[] array in arrays)
            {
                foreach(int element in array)
                {
                    result[i] = element;
                    i++;
                }
            }
            return result.OrderBy(element => element);
        }

        private static IEnumerable<int> SortArraysComplexMethod(params int[][] arrays)
        {
            Node tree = null;
            foreach(int[] array in arrays)
            {
                foreach(int element in array)
                {
                    if(tree == null)
                    {
                        tree = new Node() { value = element };
                    }
                    else
                    {
                        tree.AddChild(element);
                    }
                }
            }

            // Read child
            List<int> values = new List<int>(arrays.Select(array => array.Length).Sum());
            tree.ReadValues(values);
            return values;
        }


        class Node
        {
            public Node leftChild;
            public Node rightChild;
            public Node parent;
            public int value;

            public void AddChild(int value)
            {
                if(value >= this.value)
                {
                    if(rightChild == null)
                    {
                        rightChild = new Node() { parent = this, value = value };
                    }
                    else
                    {
                        rightChild.AddChild(value);
                    }
                }
                else
                {
                    if(leftChild == null)
                    {
                        leftChild = new Node() { parent = this, value = value };
                    }
                    else
                    {
                        leftChild.AddChild(value);
                    }
                }
            }

            public void ReadValues(List<int> collection)
            {
                if(leftChild != null)
                {
                    leftChild.ReadValues(collection);
                }
                collection.Add(value);
                if(rightChild != null)
                {
                    rightChild.ReadValues(collection);
                }
            }
        }

Results

For small inputs, like 10*10, the results were surprising - the simple method had average (from 100 tries) 0.07ms, the complex one 0.01ms. For input 100*50, the simple one had average 1.62ms, and the complex one 2.13 ms. For input 100*1000, (this time average of 10 tries), the simple one had average 53.9ms and the complex one 1274.2ms.

^{^{^{Edit: Added syntax highlighting}}}

4 votes

[5]
Emerald_Knight (OP)
September 7, 2018
Link Parent
Those results aren't surprising at all, actually! Let's say we assume O(n log n) for both approaches and completely disregard any additional algorithmic factors. With a simple array, the allocated...

Those results aren't surprising at all, actually! Let's say we assume O(n log n) for both approaches and completely disregard any additional algorithmic factors. With a simple array, the allocated memory is sequential and thus your array accesses benefit from locality of reference, whereas your tree isn't constructed of sequentially-allocated memory and thus doesn't have that same benefit. Temporal and spatial locality of reference are concepts that are used to optimize caching behavior, reducing the impact of a cache miss and the subsequent retrieval from slower RAM. It would absolutely have an expected performance impact :)

3 votes
1. [4]
  Soptik
  September 7, 2018
  Link Parent
  I was surprised about the result for 10*10. I expected the simple solution to be faster, so I was quite surprised when the complex one was faster. I tested it multiple times, but the complex one...
  
  I was surprised about the result for 10*10. I expected the simple solution to be faster, so I was quite surprised when the complex one was faster. I tested it multiple times, but the complex one is always faster for 10*10. I'm not sure why is the complex solution faster, but I think, that the LINQ methods I use in simple solution have some overhead and the tree isn't that slower with just 100 elements.
  
  I thought that the tree approach would be slower just because it has to visit log(n) children when inserting new element, I didn't know about the sequentially-allocated memory :)
  
  If I created something like linked list instead of tree, allocated all required nodes at once and connected the nodes afterwards - the slowing effect of not-sequentially-allocated memory would disappear, right?
  
  2 votes
  1. [3]
    Emerald_Knight (OP)
    September 7, 2018
    Link Parent
    Regarding the tree being more efficient with 10*10, that could easily be due to amortized time complexity of your inserts. That amortized time complexity likely makes a large difference at the...
    
    Regarding the tree being more efficient with 10*10, that could easily be due to amortized time complexity of your inserts. That amortized time complexity likely makes a large difference at the beginning, but ends up being completely overshadowed by the impacts of locality of reference with larger arrays.
    
    Regarding the linked list structure, I wouldn't expect the sequential nature of the nodes to necessarily translate directly into a sequential allocation of memory. It's possible, for example, that a sequence of nodes a -> b -> c -> d -> e -> f is allocated in memory in the order e -> b -> d -> a -> c -> f. In such a case, in an array you would expect that accessing e then accessing f would be efficient, but in this example scenario of a linked list you would find that accessing e then accessing f would be inefficient since they're so far apart in memory.
    
    Locality of reference has nothing to do with how we structure the data representationally, and everything to do with where that data lives in actual, physical memory, on the hardware.
    
    1 vote
    
    [2]
    Soptik
    September 29, 2018 (edited September 29, 2018)
    Link Parent
    This is really interesting. for i in 0..n for j in 0..m for k in 0..p C[i][j] = C[i][j] + A[i][k] * B[k][j]; wiki However it looks a bit like black magic to me. When you're writing algorithms that...
    
    This is really interesting.
    
    A common example is matrix multiplication:
    
    for i in 0..n for j in 0..m for k in 0..p C[i][j] = C[i][j] + A[i][k] * B[k][j];
    
    By switching the looping order for j and k, the speedup in large matrix multiplications becomes dramatic, at least for languages that put contiguous array elements in the last dimension.
    
    On a year 2014 processor, the second case is approximately five times faster than the first case
    
    wiki
    
    However it looks a bit like black magic to me. When you're writing algorithms that need to run fast, do you use these techniques? Is it worth really diving into?
    
    1 vote
    
    Emerald_Knight (OP)
    September 29, 2018
    Link Parent
    I personally don't need to handle these kinds of optimizations. But yes, these are used if needed. They're only worth diving into if your applications need to squeeze out every bit of performance...
    
    I personally don't need to handle these kinds of optimizations. But yes, these are used if needed. They're only worth diving into if your applications need to squeeze out every bit of performance possible, particularly in processing-intensive programs.
    
    It may seem like black magic, but once you know how data is structured and how it's loaded into cache, it makes perfect sense. It's all just a matter of taking advantage of the architecture you're working with.
    
    Good on you for doing some additional research on the subject!
    
    1 vote

Leo

September 8, 2018

Link

Now for Python: fn = lambda x: sorted(sum(x, []))

Now for Python:

fn = lambda x: sorted(sum(x, []))

3 votes

Leo

September 7, 2018 (edited September 7, 2018)

Link

Don't have time to test this quite yet but it should work: #include <algorithm> #include <iterator> #include <vector> template <typename T> std::vector<T>...

Don't have time to test this quite yet but it should work:

#include <algorithm>
#include <iterator>
#include <vector>

template <typename T>
std::vector<T> flatten_and_sort(std::vector<std::vector<T>> arrays) {
	std::vector<T> result;
	auto it = std::back_inserter(result);
	
	for (auto &array : arrays) {
		std::move(array.begin(), array.end(), it);
	}
	
	std::sort(result.begin(), result.end());
	
	return result;
}

Passing the argument by value allows the caller to choose whether to pass an lvalue to be copied or an rvalue to be consumed.

Flattening is O(n) where n is the total number of elements in all the arrays, and sorting is O(n log n).

2 votes

Emerald_Knight (OP)

September 7, 2018 (edited September 21, 2018)

Link

I decided to take a database approach to this problem using MongoDB. Given a collection arrays where each document has a field values containing an array of numbers, the following aggregation...

I decided to take a database approach to this problem using MongoDB.

Given a collection arrays where each document has a field values containing an array of numbers, the following aggregation pipeline will produce a document with the desired result contained in a final merged field:

db.arrays.aggregate([
    {$unwind: "$values"},
    {$sort: {
        values: 1
    }},
    {$group: {
        _id: null,
        merged: {$push: "$values"}
    }}
]);

2 votes

[3]

onyxleopard

September 8, 2018

Link

Not sure if using Python's itertools.chain is cheating… Python 3.7.0 (default, Jul 15 2018, 14:01:53) Type 'copyright', 'credits' or 'license' for more information IPython 6.5.0 -- An enhanced...

Not sure if using Python's itertools.chain is cheating…

Python 3.7.0 (default, Jul 15 2018, 14:01:53) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.5.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from operator import itemgetter

In [2]: from itertools import chain

In [3]: def multi_sort_gen(*iterables, **kwargs):
   ...:     yield from sorted(chain(*iterables), **kwargs)
   ...:     
   
In [4]: list(multi_sort_gen([(1,3), (1,2), (1,1)], [(2,10), (2,9), (2,8)], key=itemgetter(1)))
Out[4]: [(1, 1), (1, 2), (1, 3), (2, 8), (2, 9), (2, 10)]

In [5]: list(multi_sort_gen([(1,3), (1,2), (1,1)], [(2,10), (2,9), (2,8)], key=itemgetter(1), reverse=True))
Out[5]: [(2, 10), (2, 9), (2, 8), (1, 3), (1, 2), (1, 1)]

In [6]: from random import randint

In [7]: arrays = [[randint(0,9) for _ in range(10 ** 3)] for _ in range(10 ** 3)]

In [8]: %timeit result = list(multi_sort_gen(*arrays))
263 ms ± 11.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [9]: arrays = [[(i, randint(0,9)) for _ in range(10 ** 3)] for i in range(10 ** 3)]

In [10]: %timeit result = list(multi_sort_gen(*arrays, key=itemgetter(1)))
570 ms ± 46.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I learned that heapq.merge exists and it claims:

Help on function merge in module heapq:

merge(*iterables, key=None, reverse=False)
    Merge multiple sorted inputs into a single sorted output.
    
    Similar to sorted(itertools.chain(*iterables)) but returns a generator,
    does not pull the data into memory all at once, and assumes that each of
    the input streams is already sorted (smallest to largest).
    
    >>> list(merge([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
    [0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25]
    
    If *key* is not None, applies a key function to each element to determine
    its sort order.
    
    >>> list(merge(['dog', 'horse'], ['cat', 'fish', 'kangaroo'], key=len))
    ['dog', 'cat', 'fish', 'horse', 'kangaroo']

This has the disadvantage that the inputs must be pre-sorted. The advantage, though, is that it does not require all inputs to fit in memory (whereas my solution above does). Nice to know that the tradeoff of space for time can be made here.

In [12]: import heapq

In [14]: %timeit result = list(heapq.merge(*(sorted(a, key=itemgetter(1)) for a in arrays), key=itemgetter(1)))      
3.24 s ± 252 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

2 votes

[2]
Emerald_Knight (OP)
September 8, 2018
Link Parent
Not really cheating, no. Otherwise we would have to decide where to draw the line at abstractions, which would be a large and fruitless debate not worth getting into.

Not really cheating, no. Otherwise we would have to decide where to draw the line at abstractions, which would be a large and fruitless debate not worth getting into.
1. onyxleopard
  September 8, 2018
  Link Parent
  Yeah, it’s just a matter of if it’s part of the challenge to re-implement something like itertools.chain, which is probably good practice in its own right.
  
  Yeah, it’s just a matter of if it’s part of the challenge to re-implement something like itertools.chain, which is probably good practice in its own right.
  
  1 vote

[2]

jacebot

September 8, 2018

Link

The magic that is JavaScript ES6... when working through this, I had a similar real world problem this week, so still rather fresh. Get a messy array => into new array of arrays with groups of 3...

The magic that is JavaScript ES6... when working through this, I had a similar real world problem this week, so still rather fresh. Get a messy array => into new array of arrays with groups of 3 obj. Fun times.

Don't hate on the web dev... we program too. Sometimes.*

let overallArr = [];
const arr = [[1,2,3], [3,4,5,], [5,6,7,8]]

function sortArr (current, arr) {
  let currentArr = current;
  let tempArr = [];
  
  for (let i=0; i < arr.length; i++) {
    let innerArr = arr[i];
    tempArr = [...tempArr, ...innerArr]
  }

// Not the most efficient in large sets
// https://gist.github.com/telekosmos/3b62a31a5c43f40849bb#gistcomment-1739870
  return overallArr = new Set(tempArr.sort())
}

sortArr(overallArr, arr)
console.log(sortArr(overallArr, arr))

2 votes

Emerald_Knight (OP)
September 8, 2018
Link Parent
No hate here. Despite all of the flak it gets, I work primarily with PHP, and occasionally with JS. The only thing I would hate on, if this were in a merge request, would be some minor naming...

No hate here. Despite all of the flak it gets, I work primarily with PHP, and occasionally with JS.

The only thing I would hate on, if this were in a merge request, would be some minor naming issues. For a programming challenge like this, though, it's not really an important point. Also, just a heads up, the new Set() call would be undesired for a simple merging of array elements as it strips the duplicates that you would expect to see. Great for getting unique values, though! :)