Engineering the Foundation of AI
•
Photo illustration by Klawe Rzeczy
Long before artificial intelligence permeated almost every corner of business, before the internet connected continents and digital technology revolutionized the modern world, a quirky polymath and University of Michigan alum was quietly putting the mathematical pieces together that would lay the groundwork for it all.
Claude Shannon, ’36, HDSC’61, never won a Nobel Prize, and was not the type of scientist to seek out the limelight for his work, but few have contributed to the vast landscape of advanced digital computing as significantly as Shannon did in the mid-20th century.
Shannon’s work, and its impact in the decades that followed, has earned him the title “Father of the Information Age.” His theories on the statistical nature of information and communication have been celebrated as the foundation of everything from fax machines and wireless internet to the AI systems shaping the world today.
“It’s actually extraordinary how many of the technologies, particularly within the domain of AI, we trace back to his work,” says Jimmy Soni, co-author of the 2017 Shannon biography “A Mind at Play.”
After graduating from U-M with a dual degree in mathematics and electrical engineering, Shannon developed what has been lauded as one of the most impactful master’s theses of the century, proving for the first time that mathematics could be used to simplify and design electrical switching circuits — a breakthrough that would make digital computing possible.
And in the 1940s, as an electrical engineer, Shannon spent nearly a decade scribbling out equations in secret at odd hours or behind closed doors, developing information theory — an entirely new field of study that uncovered the physical laws of communication systems, dictating how much and how fast information can be reliably shared, regardless of the medium.
Shannon’s theory simultaneously created the fundamental building blocks of all digital communication — the binary digit, or “bit,” commonly represented as zeros and ones — which gave future engineers the tools to encode, compress, and transmit information great distances without compromising quality.
“The impact of his work is everywhere,” says Nicole Hamilton, a lecturer of computer science and engineering at U-M. “Information theory affects us in, frankly, everything we do in computer science these days, from the physical hardware we work with, to the software mechanisms we use for encoding messages or compacting data, to how we get error-free communications.”
A Master’s Thesis For the Ages

Shannon was first introduced to the symbolic logic his master’s thesis would build upon at the University of Michigan — a system known as Boolean algebra, which reduced all logical reasoning to just two binary choices: “true” or “false.”
As a graduate student at the Massachusetts Institute of Technology (MIT), Shannon applied Boolean algebra to electrical switching circuits, proving how an electronic computer could process binary logic by utilizing its switching circuits as “logic gates” — components that evaluate logical conditions the same way a human mind reasons through an if/then statement. This insight laid the foundation for the logic-based processing inside every computer chip that exists today, though on a much smaller and faster scale than Shannon could have imagined.
“That was one of the most influential master’s theses of all time,” says Mahdi Cheraghchi, associate professor of computer science and engineering at U-M. “He was thinking about information in terms of digital representation. That led to a number of things, and one was the idea of digital computers.”
Inside modern computer chips, transistors — the microscopic building blocks of modern electronics that process information via electrical signals — are arranged into logic gates that carry out the countless computations a processor makes every second.
An average smartphone today contains upwards of 10 billion transistors in a single chip. But the growing demand for more powerful generative AI, machine-learning and deep-learning models has pushed manufacturers to develop tinier and more advanced chips with a processing power far beyond those staggering numbers, some containing hundreds of billions of transistors or more. Shannon’s later work gave engineers the mathematical framework they still use to determine how efficiently information can be compressed and transmitted.
“Data centers have started to use more sophisticated ideas to store things more efficiently and retrieve them more reliably,” Cheraghchi says. “And unsurprisingly, ideas of Shannon’s encoding theory has once again become useful.”
Laying the Groundwork for AI
Shannon joined Bell Labs as a research mathematician in the summer of 1940, after earning his master’s and Ph.D. from MIT. Growing up, Shannon had shown an early interest and aptitude for solving engineering problems and spent much of his time tinkering with broken radios or building model airplanes.
Still motivated by the same childlike curiosity that led him to fashion his own private telegraph line as a boy, Shannon was known to dedicate his spare time to various projects satisfying his own amusement as an adult as well, from building his very own fleet of custom unicycles to rigging up a flamethrowing trumpet for his son.
With the help of his wife, Betty — a brilliant mathematician in her own right — Shannon constructed one of his best-known inventions: an electromechanical mouse named Theseus that could solve a maze by “remembering” its path through trial and error.
Powered by a bank of telephone relays affixed to the back of the maze, which acted as Theseus’ “brain” to recall the solution, this passion project provided the public with one of the earliest physical examples of machine learning — a subject of great interest to Shannon throughout his life and career.
“Shannon was very much interested in the ways machines could learn, and he was very optimistic about where technology was going to go,” Soni says.
While his projects reflect the breadth of his curiosity, Shannon’s research at Bell had world-changing implications in the field of information technology and digital communications.
“Nobody was thinking about digital communications at the time, or almost nobody was,” says Dave Neuhoff, a professor emeritus of electrical engineering at U-M. “Before Shannon, when you wanted to transmit something a long distance, the only way to make it better was to increase the power or use a bigger antenna.”
Shannon’s “A Mathematical Theory of Communication,” published in the Bell System Technical Journal in 1948, turned that notion on its head by pinpointing the probabilistic nature of communication systems and identifying how to leverage that mathematically to optimize both telecommunications and computing.
“What he did was say, ‘well, that’s a really inelegant approach to communication,’” Soni says. “Because so much of communication is predictable and is repeatable, and a lot of words are unnecessary and a lot of letters are unnecessary, if we are able to abstract all communication into bits — like zeros and ones — then you can actually do a pretty good job of compressing information down to the smallest unit you need to communicate an idea and then sending that through.”
Building on the earlier theoretical work of two of his Bell Labs predecessors, Harry Nyquist and Ralph Hartley, Shannon’s 1948 theory defined information not by a message’s intrinsic meaning, but instead by the amount of perceived randomness or uncertainty a message carries — a concept known as “Shannon entropy.”
By measuring the uncertainty (entropy) of a message against the predictability and redundancy of the English language, Shannon proved a message could be stripped of the predictable information, compressed, encoded into bits, and sent much faster and more reliably — up to a certain theoretical limit.
“The whole way in which generative AI works today is basically looking at statistical patterns,” Hamilton says. “They are looking at large amounts of data to say, ‘well, what is the redundancy here? If you have these five words, can we say what the next one is going to be?’”
It’s those same insights that underpin some of the world’s most advanced large language models on the market today, from OpenAI’s ChatGPT to Anthropic’s aptly named Claude — which multiple sources, including Claude AI itself, have suggested bears Shannon’s name at least partially in tribute.
“Shannon is credited for being one of the early proponents of language models,” says Rada Mihalcea, director of the Artificial Intelligence Lab at U-M. “It was this idea that there is some predictability in language that you could model with information theory … this is really the main idea that current language models use. Of course, there are many differences, but the core intuition is still the same.”
Revolutionary Contributions
The combined contributions of Shannon’s pioneering master’s thesis and mathematical theory of communication helped put researchers on the trajectory toward the AI revolution. But Shannon was hesitant to even inflate his theory’s relevance beyond his own scientific discipline and warned the research community against the same.
“Seldom do more than a few of nature’s secrets give way at one time,” Shannon urged in an editorial. “It will be all too easy for our somewhat artificial prosperity to collapse overnight when it is realized that the use of a few exciting words like information, entropy, redundancy, do not solve all our problems.”
Shannon died in 2001 after a long battle with Alzheimer’s disease, and while information theory hasn’t yet given away all of nature’s secrets, it has certainly provided logarithmic tools to explore the deeper mathematical order of communication that, before Shannon, researchers didn’t know how to tangibly quantify.
“I compare him to Einstein,” Neuhoff says. “Einstein came up with [his theories of relativity] out of the blue. No one was asking anything but he suddenly answered all these questions. Shannon did the same, and he inspired the digital communications revolution.”
Jenny Sherman is a writer and copy editor for Michigan Alum.


