CodeHerb home

TopCoder SRM 488 The BoringStore
Published on: 24 Apr 2011

Understanding the Problem :

For people new to TopCoder(like me) I think it is natural to feel overwhelmed when you first read the problem statement. The important thing is to let that feeling pass, re-read the problem and not give up.

../images/overwhelmed.gif

Now that the initial feeling has passed, lets re-read the problem statement and understand it. Do not think of how you will solve it. The best thing to do is print out the problem statement and sit with a pen and paper. Under no circumstance should you have your emacs window open! This problem took me a significant amount of time to solve (unfortunately I didn't time myself - but in future I plan on doing so). Don't be discouraged by the fact that you couldn't solve it in 15 mins. That comes with a LOT of practice!

../images/practice.gif

Alright enough commentary. Time to dissect the problem statement. As I have stated earlier it is important to get to a point where you understand the problem enough to rephrase it. If you are new, you should actually try writing the problem in as few and simple words as you can (having done/studied mathematical proofs can be of real help here).

So lets make an attempt at rephrasing: Given two string sj and sb, find the longest possible string such that A + C = B + D (or A+D=B+C) where A,B are non-overlapping contiguous substrings of sj and C,D are substrings of sb. I cannot stress enough the importance of rephrasing the problem. While the long winding TopCoder descriptions are important for understanding - the 'self' version is important to ensure that you have understood the problem! While my version is not completely correct and misses some finer points ( like how the 'longest possible string' is defined), it gives me a good sense of my understanding of the problem. So stop,re-read and think till you understand the problem.

../images/stopthink.gif

Lets break the solution into pieces now:

Finding all the non-overlapping contiguous string

How do we define two substrings? We pick four integers i,j,k,l where i <=j < k <=l. Lets look at some code

int ijlen = sj.length(); // sj = input string

for(int i = 0; i < ijlen ; ++i)
  for(int j = i; j < ijlen; ++j)
    for(int k = j+1; k < ijlen; ++k)
      for(int l = k; l < ijlen; ++l)
        {
          string a = sj.substr(i,j-i+1);
          string b = sj.substr(k,l-k+1);
        }

The above code creates all the substrings A,B for the first input string. Similarily we can construct C and D. So now we have generated all the possible 'non-overlapping' contiguous substrings of sj and sb.

Given all the substrings - how can we process them?

First of all we know that A+C=B+D or A+D=B+C. Now looking at this it seems that we shouldn't really need to do these two separate checks. Since we are creating the substrings, without loss of generality we can always say A < B and C < D.

if (a.length() > b.length()) {
  swap(a,b);
}

Now given the above constraint we can conclude that A + D = B + C for all valid strings. ( A + C = B + D is not possible since by the above constraint A+C < B+D).

Hmm… at this point it seems that we are storing a lot of substring for comparison. We know that A + D = B + C (where A,B are substrings of sj and len(A) < len(B)). Reiterating constrainsts helps in coming up with better solutions. Now looking at the constraints something should kind of jump out at us. Do you see something? Think.

We know len(A) < len(B). This necessitates that A be a prefix of B if this pairing was to be valid. Similarily C must be a suffix of D. Now instead of storing all the pairs of substrings, we only need to store those substrings where A is a prefix of B and C is a suffix of D! Stop. Did that makes sense?

So now we can basically rewrite the equation as A + D' + C = B' + A + C where A is the prefix, B (A + B'), C is the suffix and D=(D'+C). Thus for each pair of substring A,B we store A and B', and similariy C and D'. Then we can just compare D' and B' to see if A+D = B+C! It seems like we managed to cut down a significant amount of work happening in the loop. Lets look at the code now:

vector< pair<string, string> > jpairs; //(prefix, B')
vector< pair<string, string> > bpairs; //(suffix, D')

// if a is a prefix of b
if(std::equal(a.begin(), a.end(), b.begin())) 
  jpairs.push_back( make_pair(a,b.substr(a.length())));

// if c is a suffix of d
if(std::equal(c.rbegin(), c.rend(), d.rbegin())) 
  bpairs.push_back( make_pair(c, d.substr(0,d.length() -
c.length())));

Using STL and some iterators it is pretty trivial to find whether a string is a prefix/suffix. Obviously for java users you get string.StartsWith() and string.EndsWith() - but where's the fun in that?

At this point we basically have all the pairs of valid substrings which could potentially form a solution. Now all we need to do is run an 0(n2) loop where we compare each possible pair to pick the best name!

bool TheBoringStoreDivTwo::isBetter(const string& prev, const string& curr) const {
  return (prev.length() < curr.length() || 
          (prev.length() == curr.length() && curr.compare(prev) < 0));
}

string result;
for (int i = 0; i < jpairs.size(); ++i)
  for (int j = 0; j < bpairs.size(); ++j)
    if(bpairs[j].second == jpairs[i].second) {
      string curr = jpairs[i].first + bpairs[j].second + bpairs[j].first;
      result = isBetter(result, curr) ? curr : result;
    }

That's it! We walk through the pairs picking the best one. Our store finally has a name :) and we can go back to the drawing board and ponder how we can improve our solution!