>>In the 1960s the MIT Instrumentation Lab had a contract with NASA to develop the Apollo Guidance Computer to be used on both the command and lunar modules
By Moshe Kranc, CTO, Ness Digital Engineering
Fifty years ago, on July 20, 1969, mankind took a giant leap as Neil Armstrong became the first man to walk on the moon. As a 14-year-old bored teenager stuck at home in Connecticut for the summer, I did not appreciate the magnitude of the event. Instead, I was annoyed by President Nixon’s scene-stealing speech and by the disruption the landing caused to coverage of my beloved Boston Red Sox.
Although I was too young to directly participate in the moon landing, I did have the privilege to later work with some of the software pioneers whose work enabled the landing. In the 1960s the MIT Instrumentation Lab had a contract with NASA to develop the Apollo Guidance Computer to be used on both the command and lunar modules. In 1969 a group of those engineers, including John Miller, Jim Flanders, Jim Miller, and Dan Lickly, left MIT to form Intermetrics, which landed contracts with NASA for developing software such as the HAL compiler used for the space shuttle.
A decade later in 1979, I was a young software engineer working at Bolt Beranek and Newman in Cambridge, Massachusetts, looking for my next challenge. An ex-colleague, Gary Fostel, recommended that I check out Intermetrics. After a series of technical interviews, it was time for me to interview with Dan Lickly (pronounced “likely”), who was in charge of the software products division. Entering his office, I found a tall man dressed in a baseball uniform, slouched in his chair, tossing a softball up in the air and catching it with his baseball glove. We talked about the Red Sox (I had strong opinions about their need for another starting pitcher), about bicycling (in those years I did not have a driver’s license as a matter of principle), about the Cambridge jazz scene – in short, about anything but software. At the end of half an hour, Dan told me I was hired.
As I discovered later, there was a method to Dan’s interviewing madness. Company legend had it that if Dan talked with you about software, that meant you were not hired. As one co-worker put it, “Dan has one superpower – he can smell a workaholic from a mile away.” Dan assumed that any candidate who got to him had already been vetted technically. What Dan was looking for was passion, a trait that shows itself in everything a person does, whether it’s programming, bicycling, or rooting for a sports team. I learned from Dan that if you hire people who are “all in,” who are totally committed to what they do, you can then afford to sit at your desk throwing a softball for as many hours a day as you please.
At MIT, Dan had been Director of Mission Program Development, leading the Apollo Guidance Computer development effort. Reading about the Apollo 11 development process, it is clear that Dan honed his recruitment skills there. Few software development projects require that level of total commitment, responsibility for human life, coolness under pressure, and appreciation of the goal’s historic significance. The Apollo 11 team was a remarkable group of passionate engineers who were up to the task, thanks in part, to Dan’s skill in selecting them. In an interview for the Wall Street Journal, Dan put it best: “there was art to find people who could translate engineering equations into a code for trips to another world.” Margaret Hamilton, who is known for her work developing software for Apollo’s guidance systems and for coining the term “software engineering” during that time said, “We took our work seriously, many of us beginning this journey while still in our 20s. Coming up with solutions and new ideas was an adventure. Dedication and commitment were a given. Mutual respect was across the board.”
At Intermetrics, the team Dan hired gelled into a highly effective, highly motivated team that achieved some great things, like implementing the first Ada compiler, while also having a great time together. It seems that Dan, besides finding passionate people, had also managed to weed out the jerks. We worked together and played together as a team with time for 3-hour Go games, weekly volleyball and softball tournaments, and lessons in juggling, unicycling and ballroom dancing. A team that is cohesive, collaborative and talented can move mountains.
The Apollo Guidance Computer (AGC) Dan Lickly and his team built for the moon landing was a remarkable technical accomplishment. In an era when a computer filled an entire air-conditioned room, the AGC occupied 1 cubic foot with less processing power than a smart watch and a meager 72 kilobytes of memory. These limitations meant that the computer could only process one mission phase at a time, rolling in one set of tasks to execute, then rolling them out to free up resources for the next phase. To handle these constraints, Hal Laning created a priority-based scheduling algorithm for the operating system, where each task was assigned a priority level, and the operating system would always execute higher priority tasks first. If a task could not be scheduled because no resources were available for it, the operating system would issue an alarm code 1202, then swiftly restart and drop the less critical tasks by freeing up their memory and removing them from the schedule.
For Dan’s software team, the Apollo 11 moon landing was a tense nail-biter, replete with unanticipated twists. At the most dangerous part of the mission, as astronauts Aldrin and Armstrong were descending, approximately three minutes prior to touchdown, they reported a 1202 alarm, followed by 4 more such alarms. Five times, the guidance computer restarted itself and continued its processing. Clearly the computer was being overwhelmed, but why? More importantly, what tasks was it not processing as a result of the overload? If these skipped tasks were related to the landing, the mission would have to be aborted.
Steve Bale, a young mission control guidance officer, made a split-second decision to trust that the priority-based operating system would do the right thing, i.e., devote scarce resources to the landing calculations and drop less important tasks. Bale’s boss, Mission Control Director Gene Kranz, told the astronauts to override the errors and proceed with the landing. This turned out to be the correct decision, and the astronauts landed safely with only 17 seconds of fuel to spare.
Later the cause of the computer overload was discovered – Buzz Aldrin had turned on the rendezvous radar during the lander’s descent, so it would be ready in case the landing was aborted. The landing radar’s tasks, coupled with a crucial but hard-to-reproduce mismatch between two circuits’ electrical supply, were overloading the computer with requests for non-essential calculations, triggering the 1202 alarm. Fortunately, the mission was rescued by the protection Dan’s software team had designed into the system so that it could rapidly recover from such errors by prioritizing the landing calculations above all other competing tasks.
Branch Rickey is credited with saying “Luck is the residue of design.” The Apollo 11 software team provides a graphic example of this. Thanks to their obsession with quality and their desire to anticipate and preempt any and all possible problems, they “got lucky” and were able to overcome an unanticipated error condition that could have led to national tragedy. Here is another lesson for software developers, as systems become ever more complex and more mission critical: expect the unexpected and plan your system’s response.