The Dangers of Tutorial Code

Sometimes the worst times make us reflect in productive ways. This holiday season as I coughed up a lung and sweat through 2 shirts a night, I stared angrily at the ceiling through several nights with thousands of thoughts a minute rolling through my head. Most related to my current physical state, like "don't cough, you know it hurts" or "is that sinus pressure or an aneurism?", but some were related to the work I've done this year.

The best part of my year was in the summer, when I had the chance mentor an energetic intern at work. The experience was everything I could have hoped for: my intern accomplished a rather large project and I got to work with them for extended periods to help them understand what was going on, both in the codebases and in life. But one thing has been eating away at the back of my mind from all the way back in their first day: software engineers, even good ones, have a tendency to copy-paste tutorial code.

What is tutorial code?

The most obvious example of tutorial code is code that is found in a tutorial somewhere. You might also call this "StackOverflow code" or even "textbook code" if you're still in school. I've encountered issues with people using such code almost exclusively from tutorial websites, but basically any resource that provides naive implementations you might copy-paste into your own codebase would fall into the category of "tutorial code".

Tutorial code is necessary. Let's get this out of the way. People don't walk into a computer science lecture knowing how a doubly-linked list might be constructed, so you have to provide an example. This example helps people see a concrete implementation, but may not be complete so as to not confuse someone who has just been introduced to a concept.

My favorite example of how this can be use is an implementation of the quicksort sorting algorithm in Haskell (taken from HaskellWiki):

quicksort :: Ord a => [a] -> [a]
quicksort []     = []
quicksort (p:xs) = (quicksort lesser) ++ [p] ++ (quicksort greater)
    where
        lesser  = filter (< p) xs
        greater = filter (>= p) xs

Even if you didn't know how quicksort worked before, you probably do now! It's an increbily simple algorithm conceptually:

  1. Start with a list of things. These things must be sortable (Ord a => [a])
  2. If that list is empty, return an empty list. It's already sorted!
  3. If not, separate the first element p in the list from the rest of the list ((p:xs)). This will be our new list
  4. Find all elements in the new list that are less than our first element p, call it lesser
  5. Find all elements in the new list that are greater than or equal to p, call it greater
  6. Run quicksort on lesser to create a sorted list of elements smaller than p, then add p, then run quicksort on greater to create a sorted list of elements greater than (or equal to) p

Amazing! In fact, this is how I've seen people introduced to quicksort, even if they can't guarantee everyone in the room knows anything about Haskell.

Tutorial code requires context

However, this is tutorial code. It is meant to demonstrate an idea, and requires the audience to continue learning about caveats and improvements that might be made. The issue with tutorials, or StackOverflow, or textbooks, or copying from a friend, or whatever, is that unless you understand why things were done, you're unlikely to have copy-pasted the most correct thing.

You might look at the example of quicksort, verify it works with some examples, and ship it to production. But working does not mean it will work well, and notably this example fails to perform with the complexity guarantees that quicksort is supposed to provide.

This is a bit more black and white in a classroom setting. You either got the sorting algorithm correct, or you didn't. You can compute hashes and look up associated values, or you can't. But once you leave the classroom, a huge portion of what you learned in school is gone. You're now provided all the algorithms for the basic operations. You no longer have to build your own operating system to be able to prove you know how a computer works; you're provided a computer with a very mature operating system already installed. You're provided a programming language that has already figured out how to sort and hash and iterate.

You can search for things like "how to get data from 2 tables at the same time in SQL" and find a perfectly valid answer. It's even straightforward to figure out if the solution works, but if you don't read the context (or the tutorial/StackOverflow/whatever) doesn't provide more information, you might be writing a terrible query. In this case, the most important thing to know about is indexing on columns in relational tables. Someone should make sure to yell that this operation is extremely slow, and while you may not notice over 100 or 1000 or even a million rows (yes, computers can be that fast), you will start to notice at some point and should know about the looming issue.

Tutorial code keeps you from learning

Many programs come with great documentation. I've seen experienced developers fall into the trap of copy-pasting something from StackOverflow when the documentation for that program explicitly asks them not to do things that way. Why? Because it works!

I see this mentality everywhere. "It worked out, so it's ok." The restaurant didn't give up our reservation, so the fact I didn't start getting ready till the time we were supposed to leave doesn't matter. Outside of work, these are the people I find hard to rely on. They're the ones who don't show up on time and the ones most likely to double book themselves. It's ok though because they haven't lost any friends to that... yet.

At work, these are the people I dread doing code reviews for. I realize I'm reasonably opinionated, but I am open to different styles of work and try not to let my preferences get in the way of someone else's preferences. However, some of the things I see people copy-paste into the codebase are downright awful.

I've seen people do terrible things in the codebase, like writing code in file A that depends on file B, which itself depends on file A, and then make a whole stink about how hard it was to resolve that dependency. At no point did critical thinking turn on and make them think that maybe they didn't have the right model, or that maybe they needed to move some of the logic into a new file to keep the files from all depending on each other. They were so adamant about copying in some code they found that they destroyed some of the fundamental ideas that create a codebase people want to work in. These people stopped learning as soon as they found code that "worked", despite it not working for the people they have to work with.

Conclusion

Copy in tutorial code if you're doing homework. Copy it in if you just need to see how something works. But please, please take a second to think about what you're doing before you copy it in.

This is, largely, a metaphor for life. Don't copy someone's workout routine because you like them. Figure out why that works for them, and what they had to do to make sure that was a reasonable choice for them. Don't copy someone's diet, or bedtime routine, or career choices. Take the time to think a little bit critically about your actions and you'll come out the other side with more information, a better perspective on what you're doing, and an appreciation for learning.