Entrepreneurship Challenge

Since my first (crazy) swing at anything business I consider myself “infected”. It seems like that is only a matter of time I start my own venture, for better or worse. It is just too much fun and to difficult to stay out of it.

Challenge

In line with that attitude, now I notice all kinds of related events, and fortunately some of them are in the neighbourhood. So, last Sunday found me at an entrepreneurship challenge – a business plan competition. It was organized by a local startup, Enspyre. It’s founder is a serial entrepreneur who started at the age of 15. The company itself also runs an internship program to find interesting/creative people, and I have no doubt that the event was aimed inspiring young people to start something new – which in turn would boost their own business. I say, double-clever!

The audience was quite mixed, of the 25-30 participants more than half was Taiwanese, mostly students in their sophomore and senior years. The rest of it were strange foreigners (your’s truly is no exception), from Indian software engineer to Filipino Business majors and American expats. The day was supposed to be learning by doing. First, some current entrepreneurs and VCs present their take on what is a business idea and how a business plan and a pitch comes out of it. Then, in the space of a couple of hours, people would form more-or-less random groups, come up with ideas, refine them, do some numbers of how much capital the idea would need, prepare a pitch, deliver it – and see how it flies with the real VC judges. Most of the people did each and every step of this process for the very first time. Sounds intense? It was and also incredibly stimulating.

Entrepreneurship Challenge
Entrepreneurship Challenge, "Double Dream" pitching

Experience

In the initial “name card exchange” phase I was really slow – or at least slower then the rest of the people. I did want to know a little bit about my potential team mates, so ended up talking more to them than the other (I guess 5 sentences instead of 1). This made me in the end one of the last person to choose a team. That’s not a problem, I like random, usually works out brilliantly.

Out of the 5 members, our team had 4 Taiwanese students (2 business, 1 psychology, 1 medical science) and me. The language posed a little bit of difficulty, but I’d say we worked around it pretty well (I came a long way in terms of patience since I started learning Chinese as well). It was pretty amazing to work with them, especially because we all didn’t know each other, and that I could resist pushing my own things.

Some lessons learned:

  • The original business ideas were pretty bad. Many of the better ones were maybe a bit too conventional. On the other hand, when we revisited them, each and every was brainstormed into something I could feel excited about and would be totally happy to start working on. It was absolutely awesome to hear and discuss their ideas, to live through the process. Later talking to some other groups, their ideas just grew but none of them changed substantially upon review. Sign that we had a naturally agile team?
  • The language barrier is pretty big at the moment. Once one gets beyond that, by either having more patience, communicating even a tiny bit in their native language, or letting them discuss things between themselves for a whole, creativity really shines. There’s a lot of potential in this country. (Not that I didn’t know that earlier:)
  • Feedback from VCs, even – and maybe especially – before pitching is invaluable. That is, if I can say my question clearly and concisely enough. And that gave me a duh moment: of course they want everyone to succeed. If your idea/pitch is boring, they are going to waste the time. If there are few creative people then they will have fewer investment opportunities. So it is their very on selfish interest to do everything for you to succeed. Don’t abuse it (i.e. make it more trouble for them to help then the potential reward) and you have the best teachers.
  • It’s good to take the back seat sometimes. I didn’t push that our group would develop one idea originating from me, but one that more people in the team liked. I had advantage in the language front so pitching wouldn’t have thought me much, I insisted that the girl (Ping) who came up with our original idea would do the pitch – regardless of her English. She protested first but in the end I’m sure she liked it. This way she had some awesome experience and I could also see (ie. introspect) how do I listen e.g. to a pitch – even our own. I’m sad to say, that I’m a terrible listener. Got to fix that, and glad I had a chance to realize it.
  • As we were told multiple times during the day, the team is more important than the idea. After working together for a few hours I can certainly see how that comes into play very-very quickly. The business ideas I had these days lack any kind of consideration who would I do them together with – really should think about who else is in my social circle who I should consider because no matter how idealistic I am, I won’t make it completely on my own.

Guess these are just part of what I have realized, the rest of it will probably pop up every now and again in the coming weeks and months.

And now for the punchline: out of the 6 teams we took home the first prize. We came up with an idea that could impress other people who are doing this for a long time (and one of whom were telling us how the salesgirls pamper you much better when buying an Armani suit than at any other boutique – a lifestyle clearly out of my league). Their feedback was that out of the 6 pitches, ours was the one that could be done realistically, on a sane budget, with sane assumptions and might just work! It is an interesting feeling. I want to keep this and continue growing on it. Already started: my part of the bounty for winning went for recovering the entrance fee and “investing” in more resources for my journey (“Technology Ventures”).

Poor lemmings, or adventures into cliques

I love programming competitions and attempt to try every new site I find. Those sites will worth another post later, this current will be about a programming contest on CoderCharts I took part recently. It finished about two days ago and I didn’t do as well as I hoped, but did much better than I would have a year ago. Out of the 8 puzzles i have: 5 solved, 1 given up on for now, 1 haven’t even started and 1 attempted but run out of time . This last on is one Lemmings mating, one of the “hard” puzzles, and when I say run out of time I mean at 4am I decided that for the 4 ours then left to finish the competition I’d rather sleep than try to think of another optimization. Still, it bothered be and I did manage to write solution (with ~70% score, not too bad). Now I want to document the how did I get there so I won’t forget.

* Spolier warning. If you haven’t solved the problem and want to do it independently, do not read on. *

Path to a solution

Looking at the problem statement, there’s clearly a graph-related problem behind all the story of the poor creatures. When I’m not sure about not just what should be the solution, but what exactly is the right description the problem, I usually open The Algorithm Design Manual (ADM). It’s a great book, so far I couldn’t find a problem where it had nothing to say. It is a bit dated, however, and while sets me on the right path, there’s usually much more reading than that. In this situation it quickly revealed that the problem I face is finding the maximum clique in the graph describing the problem. Well, quickly learned that there’s a big can of worms waiting for me in there.

Maximum clique

So, to summarize, a clique is a group of vertices that are all connected to each other by edges. Got a maximal clique when it is not possible to add any other vertex in a way that everyone’s still connected. Finally the maximum clique is the largest maximal clique. Note how logical the notation is but still so easy to mix up. It is relatively easy to find maximal cliques (start from any vertex and add more connected vertices until there’s none left that is connected to all already selected). Finding the maximum clique (or any maximum clique since there could be more than one) is, however, a proper NP-complete problem.

The ADM only says something along the lines, that “well, then you just have to do a backtracking through all the graph”. Which is a great idea and simple to implement, I would had a result, but it is terribly inefficient because in a naive implementation one would check lots of cases which cannot possibly have the result. So fist thing to do is pruning, or eliminating paths that we are sure that it cannot improve what we already have. Wikipedia helps in finding just such a thing, called Bron–Kerbosch algorithm. Idea is quite simple (just wrap the head around recursive functions) and it can be implemented in Python following the pseudo-code version on Wikipedia. Here’s mine (not really optimized or anything).

This really worked and actually 3 out of 7 tests are passed, but the rest timed out. Got to find something better. Looking at more on Wikipedia, I was tempted to get on with the version of the algorithm that they said is “the fastest”, by J.M. Robson. It’s written up, but it is from the pre-MathJax era, so it’s just terrible work to figure out and keep up with all the strange notation. Also, it looks like a big collection of special cases and theoretical shortcuts. I’m sure it works, maybe I’ll come back to it later but for now I wanted to have a little bit more of an insight as well, so looked a bit more.

Vertex colouring

Since at this point I really caught up with the fact that maximum clique finding is an important problem, so went to Google Scholar to see what the academia has written. A lot, apparently. I was browsing though them in a kind of reverse-chronological order, so I could find the “newest” algorithms, since that should be the best. In the end a pattern emerged: most of the hard work can be done by employing a different technique, the vertex colouring. The two are connected because a clique in a graph G is an independent set in the complement graph of G (two vertices are connected in the complement of G if  and only if are not connected in G), and vertex coloring is a good method of finding independent sets. What we want is labeling the vertices with the smallest number of different labels (“colours”) such that vertices with the same label are independent from each other. The number of different labels we have to use to do that helps pruning our clique finding algorithm by setting an upper bound how big clique it is possible to make from the vertices we have in a particular step.

As an example we can start out with a graph like this (taken from the original lemmings problem):

A sample graph for illustration
Defining the graph: vertices are labelled and edges show connection

When running this graph through a vertex colouring algorithm, we would get something along this lines:

Result of graph colouring
The result of the graph colouring: vertices of the same colour are independent

The blues are not connected to each other, neither are the greens and so on. In this case we need 4 colours to separate the graph into independent groups. Of course when programming one would use numbers instead of colours (thus often this called “numbering”): blue was actually “1”, red was “2”, … By adjusting the algorithm in terms of what is the order of the colours and the order of the vertices within each colour the previous algorithm can be improved by orders of magnitudes since we much better pruning. In the maximal clique every vertex would have different colour (though not all colours will necessarily be used).

Some notes on the algorithm to get started:

  1. Try to keep nodes in the order of neighbours in the current sub-graph. The smaller is good, because they can be quickly eliminated, reducing the test-space more.
  2. Go from colors with fewer members to more members, similar reasons.
  3. Can use the color number + the current clique size for better pruning: if my largest clique so far has Q elements, my current clique has P and the tested vertex comes from the highest colour number N, then in case of N + P <= Q it is futile to go on, I cannot improve the result and time to backtrack. Less efficient methods use the C number of elements in the current candidate set instead of N, and since N<=C, the pruning bound set by C+P is more “loose” letting more tests to be done than necessary.

For example, the final solution of this given problem is the set of vertices highlighted by the edges is shown on the next picture, and it is done in only 4 steps or so…

The final solution, the maximum clique
The maximum clique on this sample graph defined by the highlighted edges.

Different methods

In the papers I found a few different colouring schemes, and most of the improvements seem to be made in:

  • initial ordering, or
  • ordering within each of the colours, or
  • better upper-bound estimation for the possible clique size from the available colour information.

The hardest things about reading these papers is that often the pseudo-code is a big mess, there are plenty of typos and the examples are quite scarce.

In the end I used algorithms mixed from two different papers:

Tomita, Akutsu and Matsunaga, Efficient Algorithms for Finding Maximum and Maximal Cliques: Effective Tools for Bioinformatics, 2011 (link) This has a few different algorithms and the most effective one looked too complicated and I went with MCQ, one of the improved but not perfect on. Now 6/7 tests pass, that’s got to be the right path. According to benchmarks the coloring takes up the most time

Segundo, Rodriguez-Losada and Jimeneza, An exact bit-parallel algorithm for the maximum clique problem, Computers & Operations Research, Volume 38, Issue 2, 2011, pp.571-581 (doi:10.1016/j.cor.2010.07.019). Their algorithm is a variation on the previous one and I’m not using it to it’s full potential (wonder how fast a C implementation would be). Also, I wonder if I misunderstood something but the original form of the algorithm failed on me. There are nodes that don’t need to have colour information because it does not matter at that stage of the algorithm. Their version of colouring then seems to remove those vertices from the colouring function output. This does not matter for their example (it is correct) but in my tests it gives wrong result (the algorithm finishes too early). So, mixed in from the previous paper that I keep all nodes whether they are coloured now or not, because later they might be, it works like a charm. And about a factor of 2-3 faster than before, enough to pass all 7 tests on CoderCharts and with a reasonable score. I think the real improvement is not really in the pruning (this version seems to check a few more nodes than the previous) but somehow the colouring function is faster – that might tell something about my Python skills too, though.

Lessons

This could probably be solved with other algorithms as well, I would be curious to see others’ code / hear their stories. What I took home from this:

  • Know your problem and know where to look for solutions. The best is to study and practice a lot.
  • Often pre-processing the data is a crucial step to the solution. Since there’s usually a memory/speed tradeoff, it is worth experimenting: if and algorithm is too slow, what could be prepared and stores to exchange calculating something with a mere data lookup?
  • Different algorithms have bottlenecks at different sections. E.g. in this case there is the data preparation, the number of steps to test possible solutions and the time it takes for re-colouring. On the path to the final solution I had algorithms inefficient in each of the sections.
  • Test cases are important. The example in the problem setting and the examples in all the papers are too small to be helpful for optimization. The “bonus” test case on problem site, however , is just too big. Actually, even my passed algorithm is too slow to have a result for it in reasonable time. So, if there’s no provided test case, make your own
  • The example cases I generated are much smaller than the 7 tests I need to pass, but they are still slower. Every paper said that maximum clique finding has very different running time for different (random) graphs. I presume that some of the test graphs are in fact specially prepared. Also, some of  the changes in code that made my own test cases faster (sometimes 20%) often failed more tests on the web. The message is that coding competitions like this are set towards finding the optimal solution for their own tests – whatever your winning solution might be, it is not necessarily the best overall solution and if I ever come across a similar problem in real life setting I likely to have to do differently.
  • Most of the Python optimization advice I found is outdated and sometimes outright hurting the performance. Do plenty of performance testing. Start optimizing the big slow downs, the small shortcuts rarely worth the time in competition setting.
  • Language choice matters but does not matter that much. I’d think Python actually pushes me to be more efficient because many things are slower. The code on the other hand is more readable in the end (ideally).
  • Someone has to update Wikipedia. :)

I will share the code later, just need to wait until some time has passed after the competition.