Capture the Flag has officially ended. If you're around SF or London, we'll be hosting wrapup events in those cities. We'll be posting a video of our walkthrough afterwards, as well as posting a writeup on the CTF infrastructure soon. Hope you enjoyed playing!

Stripe: Capture the Flag

Welcome to Stripe CTF3!

Welcome to Stripe's Capture the Flag 3: Distributed Systems. Sign up to get started!

The mysterious program

At the heart of any distributed system is an undistributed one.

You've recently become the Director of Big Data at Large Corpus Systems, Inc. It's your first week on the job, and you've just received an urgent alert that your data pipeline has ground to a halt. After some painstaking debugging, you've managed to track the bottleneck to the following code. You're not quite sure what the script does, but whatever it's doing, it's doing it very slowly.

You can chat with fellow solvers in our CTF chat using your favorite IRC client at irc://irc.stripe.com:+6697/#ctf or just using our web client.


The code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#!/usr/bin/env ruby

# Our test cases will always use the same dictionary file (with SHA1
# 6b898d7c48630be05b72b3ae07c5be6617f90d8e). Running `test/harness`
# will automatically download this dictionary for you if you don't
# have it already.

path = ARGV.length > 0 ? ARGV[0] : '/usr/share/dict/words'
entries = File.read(path).split("\n")

contents = $stdin.read
output = contents.gsub(/[^ \n]+/) do |word|
  if entries.include?(word.downcase)
    word
  else
    "<#{word}>"
  end
end
print output

Your challenge

Your challenge is to make this code run much faster, without altering its output. In particular, you need to get it running at least as fast as our reference solution — when you submit a revision, we'll tell you how our solution compares.


For beginners…

If you've ever taken a computer science class (not at all a prerequsite for this CTF), you've probably learned that all that matters is your program's big-O runtime. However, most of what you do when running a real system is tune constant factors — it's often much easier to make your program omit a few operations here and there than to rather than rewrite using a fancy new algorithm. (And if you can make your program run twice as fast or use half as much memory, that's a huge win.)

In this level, the first challenge is to figure out what this code even does. It turns out this is an all-too-common challenge in the real world :). There are a few ways to approach this. One is trying to just read it and hope for divine inspiration. That's probably not the easiest way though.

One nice property of code is that even if you don't understand it, you know someone who does — your computer. You can try running the program with a bunch of different inputs to get a sense of what it does, and then with that knowledge figure out what various lines of code seem to be doing. (Note: most doctors recommend exercising some discretion in running code that you find on the Internet, however.)

Once you've gotten started, try cloning the level. There's a README.md file which contains some pointers for running the level (you may want to read up on Unix shell redirection). Try them out. Do you see a pattern in the program's output?

If you get stuck, feel free to hop into the CTF IRC channel.

Relevant resources

Ruby Koans Online and Ruby warrior: fun, interactive tools for learning Ruby.

Big-O in laymen's terms: a simpler explanation of big-O notation.

The Unix shell: a presentation on shell basics.