Orders of Magnitude
Transcript of Audio:
Hello and welcome to another episode of Three Deviations Out. My name is Amanda and I think for a living. Last week we talked about millennials and the technology that shapes our lives. This week we’re breaking down the details of the data storage revolution.
Data is not an outlier. By now we all know that. Data is an integral part of our lives, a looming constant that determines our decisions and grows as we create outcomes and outputs. The outlier today is not data, but what we humans store all that data on.
What: The data storage revolution is the current and previously uncharted territory of data storage innovation, driven by the need to create more and more data centers to store that data we humans keep creating.
Who: IBM, Microsoft, SanDisk, Hewlett Packard Enterprise, Fritz Pfleumer, the University of ElectroCommunications, Intel, the University of Manchester, Arizona State University, the University of Washington, Stanford University, and MIT
Why: Generally, we as humans tend to be packrats. In the digital age that hasn’t changed and may be even more pronounced. Photos of vacations you took years ago that you haven’t looked at, well, in years, are taking up space in that cloud drive or directly on your device. That space isn’t arbitrary, and whether you store on your device or in the cloud there needs to be hardware to back it up. Acres upon acres of land are owned by the US government, Amazon, Microsoft, IBM, Google, and other larger data center providers and users. The opportunity cost is potential farmland in a world where rural hunger and food deserts are a real thing. Or potential housing when home and rental prices are skyrocketing. Or potential natural space in a time when some kids think a baby carrot is actually what a carrot looks like. Innovation in data storage has massive implications on the physical space we humans take up on a large and influential scale. Also, if there isn’t innovation in the space it means that we will have to learn to start changing our nature as data piles up and we can’t build any more data centers or virtualize any more machines, forcing ourselves to cleanse of the unnecessary information we tend to cling to. Cloud prices, right now next to nothing, would continue to increase until only the affluent and the corporate can afford additional space. And tell me, how will I load another video of my dog chasing her tail then?
That’s the thesis today folks; data storage is an outlier because it has the potential to make or break the influence wonderful emerging tech will have, because without someplace to store it all there will be no revolution.
There will be no use case section today because there is one way to use data storage: to store data. If I’m wrong please feel free to educate me in the comments. Instead today’s provided program will look as such:
- The History
- The Trigger
- New technology
- Ideal world in Amanda’s head
In the 1970s when microchips began the trend of getting a makeover every 18 months, storage methods were widely left alone. The common mindset was that there would never be the need to store so much data that it warranted significant innovation. Enter the Internet, essentially ubiquitous access to cloud storage, and the desire of every organization and private citizen for big data analytics. Not only is the sheer amount of data larger than it has ever been (90% of the world’s data in 2016 had been created in the previous 2 years) and expected to reach 44 trillion gigabytes by 2020, there is also increased demand for edge storage. Edge storage is often on IoT devices that are small and track automated processes and demand for analytics and storage directly on these devices will only continue to increase as blockchain across devices becomes more prevalent. So today we focus on the data storage revolution and how hardware can keep up with the Yotta- prefix.
Storage is based on magnetization; if a particle is printed on the storage device in one direction that particle is read as a 1 and if it is printed in the other direction that particle is read as a 0. This writing of information has essentially been kept the same for the entirety of data storage history but gotten smaller along the way. The first magnetic tape was patented in 1928 by Fritz Pfleumer. This style of storage wasn’t actually used for data, though, until 1951 in the Mauchly-Eckert UNIVAC I. Key to this type of storage is that it can be overwritten, allowing for old information to be purged and new added to it. Especially for consumer devices, which often don’t hold as sensitive data as public or private organizations, the ability to rewrite allows for both cost and space savings.
This first magnetic tape was able to hold 128 bits of data per square inch, and the first recording of data was 1,200 feet long. A recent breakthrough by IBM has brought the density of magnetic tape to 201 gigabits per square inch. That means one inch of this new tape would have fit on 1.57 billion inches of Pfleumer’s tape, or 24,784 miles of tape and over 83,000 books of storage. Now all of that is able to fit in a space shorter than your pinky finger and thinner than the width of your phone.
That may seem like a lot of storage, but think of the amount of data generated daily. Every day we humans create 2.5 quintillion bytes of data, which when written out has 17 zeros and is converted to 2,500 petabytes. Petabyte is a new one so you may not have heard it used yet, but it is one factor larger than terabyte. The embedded image at the top of the page is a really handy reference chart if you would like to know all current 48 orders of magnitude. Petabyte is the second largest. Data rules over our lives these days. What is captured by our daily activities has repercussions on the ads we see, the loans we qualify for, and even what careers we’re considered for. Data is king, and data storage is both the army of and history written about that king. This has proven to be a complex relationship as we continue to create more and more data. Constantly adding servers doesn’t quite work because the more servers the more management there is required, and the more management required the greater likelihood of a crash or a bug derailing the entire system or worse accessing sensitive data held in that storage capacity. Also, just adding servers creates more convoluted connections between all that hardware as they all try to communicate with one another, processing or querying data stored across an entire server farm.
There are a number of ways this issue is being combatted right now. One is middle out storage, developed by HPE and illustrated in the development of The Machine. This allows for all data being processed by given hardware in the network to be stored in a single location, decreasing space requirements and increasing the ability to query across an entire collection of data. Another effort includes the molecular storage of data at cold temperatures, a technique still relying on magnetism but shrinking the space needed to house the same amount of data exponentially. As the ability to store on the molecules edges toward the temperature of liquid nitrogen, a fairly inexpensive cooling system, the scalability and commercialization of this method becomes highly viable.
Plans for the world’s largest data center have been proposed by Kolos Group, a US-Norwegian partnership that also operated in fishery & aquaculture, oil & gas, and power & industry markets. The large facility is planned to meld with the landscape through efficient design and much of the expected 1,000 kilowatts of power are expected to come from renewable energy. This is just the latest and largest in a wide range of data center types, sizes, and locations. As we continue to create data at greater and greater scales without purging of what we created yesterday or the day before our need for storage is going to continue to increase even as storage forms become denser. That means more space being taken up by storage facilities and more power being used to run those facilities, causing a not insignificant impact on the environment. In an age when housing prices are a struggle for even those well employed continuing to dedicate larger amounts of land to storage only exacerbates the problem. Data storage facilities aren’t all bad though. High skilled workers are required for each of these centers in areas spanning from admin and management to high tech data systems capabilities to cyber and physical security. Not only that, but often these fields are underpopulated and so offer wide opportunity for those just starting their careers or those looking to shift careers. With greater environmental efficacy and planning to optimize space, along with continued advancements in the tech we humans have the potential to comfortably live with our desire to hoard everything, even data.
In lieu of use cases today, we’re going to look at some of the new technology that is being researched and tested in the data storage sector.
Blockchain and the desire for edge computing, in addition to the massive amounts of data being created daily, are spurring the need for smaller and cheaper data storage options. E.g., if a product is being tracked through the supply chain with blockchain enabled RFIDs, the device needs to be small and cheap enough to span across hundreds of thousands of items of varying sizes while also being able to hold data for all the blocks in the chain associated with it. In comes our first new tech, molecular storage. While still in the research phase, molecular storage would enable high density information to be written on a single molecule, 100 times more dense than current technology. The downfall of molecular storage is the need to keep these molecules cold. Very cold. Recently a University of Manchester team working on this technology made a breakthrough in bringing the required temperature from -256 C to -213 C, a decrease of 53 degrees C. However, -213 C is still tremendously cold (the equivalent of -351.4 F) and cold enough where there is not currently effective and inexpensive cooling technology to support it. Continued research hopes to bring molecular storage down to a cooling temperature of -193 C, which is the temperature of liquid nitrogen. This product is relatively cheap when speaking in terms of high tech cooling systems and would be a breakthrough in the commercialization of the technology.
As weird as it sounds, let’s start printing things onto biology. I mean, we’re already printing biology, with some 3D printers able to print organs live for transplant. So why not print directly onto the fabric of what makes us us. That is what researchers, including the team at the University of Washington and Northwestern University’s Center for Synthetic Biology, plan on doing. Like molecular storage, DNA storage is 3D and therefore denser. Unlike molecular storage, DNA storage is further along in the GTM process. While still very much in the research and development phase, there have been some highly publicized and very interesting applications of the technology. E.g., the team at Northwestern was able to print a movie onto DNA storage in April of this year. The movie ‘Ride On, Annie!’ of a horse running was encoded into E. coli DNA. This specific application was encoded into an actual living organism, the E. coli cell itself. Both DNA storage and announcements by Arizona State University researchers of an RNA constructed biological computer have significant implications in the furthering of technological and biological crossroads.
3D storage/processor combo:
Bottleneck is a real thing. Between two chips, like a storage chip and a processing chip, bottleneck creates data latency which at scale is a scalable problem. That sounds like gibberish you say? Try watching your Excel model struggle processing a sheet full of sumifs formulas on 700,000 rows of data. Do you know Word? I Excel at it. I’m going off on tangents. Anyways, a joint effort by Stanford and MIT have produced a solution; a 3D chip that is both a processor and a storage mechanism. The device uses nanotechnology with carbon nanotubes instead of a silicon based material and have developed the most complex nanoelectric system to date. Layers of logic and storage are woven together to create a web of detailed and complicated connections.
Advances in tape storage:
Lastly I want to again touch on tape storage. As I mentioned earlier, IBM has recently announced a revolutionary tape that holds 330 terabytes in a length about as long as my pinky (pinky size may vary). Tape technology is the oldest form of storing data and has continued to evolve from the outset. Expect to see more from this legacy tech in the future, it’s not going anywhere.
We humans create a lot of data. Especially as the generations who grew up on screens start to overtake those who didn’t, data consumption and creation will continue to grow. It has to go somewhere and someone has to pay for it. I see a variety of these data models I spoke about today, along with a number of other emerging and legacy technologies, to optimize our space requirements. I also imagine that we as a society will start learning how to parse down on the data we keep, understanding more accurately what will bring use and what will sit in the back of the closet collecting dust. What I know for sure is that if we continue this upward spiral into the data dimension we will get lost in cyberspace and not realize the space around us has become exponentially more cluttered with hardware.
Thanks for listening today guys, this is everything I currently have to say about data storage. As always, comment below for any request, recommendations, corrections, updates, and overall bashing. Next week we dive into the murky waters of the distributed internet, so buckle up. Til then go do something greater than average.