link graphic Quartz Obsession link heading level 1 Data compression link October 22, 2019 heading level 2 Let's get small heading level 2 separator Data compression is as old as electronic communication. In A New Kind of Science, link Stephen Wolfram link writes that Morse code, invented in 1838 for use in telegraphy, is an early example of data compression based on using shorter codewords for letters such as e’ and t’ that are more common in English. At the dawn of the computer era, Claude Shannon, the father of information theory, link outlined its limits , demonstrating just how far it could go. Since then, some of the world’s best minds have explored how much information we don’t need, testing the limits of mathematics, computing power, and perception to reduce communication to its essence (technically speaking, anyway). The need continues to increase as streaming video gets more data intensive and more popular, taking up a growing share of worldwide internet traffic. It’s the main reason the data going through our pipes is expected to triple from 2016’s figure by 2021, and the world’s biggest tech companies are at work and at war on the next generation of video compression. Here’s a short as possible history. bird link Tweet this! globe with meridians link View this email on the web BRIEF HISTORY link 1867: Chicago Tribune publisher Joseph Medill argues for eliminating excess letters from the English language, like dropping the e in favorite. link 1929: RCA’s Ray Kell files the first patent for video compression. link 1934: Tribune publisher Robert R. Mc Cormick, Medill’s grandson, institutes compressed spelling rules; some stick ( analog, canceled ), some don’t ( hocky, doctrin ). link 1948: Claude Shannon and Robert Fano independently discover a technique for lossless compression known as link Shannon Fano coding dot link 1951: Fano’s student David Huffman pioneers a still used method called Huffman coding. link 1974: Nasir Ahmed develops Discrete Cosine Transform, used in JPEG and MPEG. link 1976: Jorma Rissanen and Richard Pasco develop arithmetic coding, used in formats like link JPEG and link H dot 264 dot link 1977: Abraham Lempel and Jacob Ziv develop dictionary based coding, which is used in ZIP and GIF formats. link 1986: 24 year old Phil Katz invents the ZIP file. link 1987: Compu Serve introduces GI Fs. link 1992: The JPEG format is introduced, a decade after the Joint Photographic Experts Group is convened to develop a standard for transmitting images electronically; Adobe link introduces the PDF dot link 1995: The MP 3 standard gets its famous dot mp 3 file extension, short for MPEG (Moving Picture Experts Group) Audio Layer III. NET POSITIVE heading level 2 The 2000s heading level 2 separator In the new millennium, the trickling stream of information on the internet grew into a powerful river as companies and communication platforms began to irreversibly change society. link Read about the link human side of the web on Net Positive. EXPLAIN IT LIKE I'M 5! heading level 2 How does compression work? heading level 2 separator There are many specific, and often proprietary, types of compression, so let’s focus on two major categories: lossy, and lossless. With link lossy compression , you lose some information forever. Take the JPEG image format, which uses both. It begins by converting an RGB (red, green, blue) image to Y Cb Cr. Y is luma, which determines how bright a pixel is; link Cb and Cr are the color information dot Why? The human eye is more sensitive to brightness than color. You can lose more color information than brightness, so the method separates them and downsamples the color, preserving less information about it. Discrete Cosine Transform is then used to link map an image space into a frequency (pdf). This is where the math gets complicated, though the idea is simple: identify the least necessary information in the picture and get rid of it. Low frequency parts of the image link represent gradual link color change , like the sky; high frequency parts represent lots of color changes in a small space, like leaves on a distant tree. Viewers notice if you skimp on the former; less so on the latter. The DCT process identifies what you can safely get rid of. (For visual examples, link this is a good link video dot ) link Next comes quantization dot To greatly simplify things, that’s doing math to take what the DCT identified as unnecessary and getting rid of it. That’s the lossy part of the compression; you can’t get that data back. Then there’s the lossless compression, based in part on David Huffman’s 1951 breakthrough. Data items that occur often in a file are coded with the fewest bits possible. link Take this example (pdf): Under the ASCII coding standard, each letter in the phrase happy hip hop is represented by eight bits. With Huffman coding, you can represent h and p with two bits, a and i with three bits, and so forth, getting a 104 bit phrase down to 39. (It also produces a header or file that allows the computer to translate the Huffman encoded information, like a secret decoder ring.) REUTERS slash Thomas White POP QUIZ heading level 2 Which song was used to perfect the mp 3? Bohemian Rhapsody, Queen Runnin' with the Devil, Van Halen Tom’s Diner, Suzanne Vega White Lines, Melle Mel If your inbox doesn’t support this quiz, find the solution at bottom of email. PERSON OF INTEREST heading level 2 The queen of compression heading level 2 separator In 1972, Lena Sjööblom, a Swedish model living in Chicago, was Playboy’s Miss November. Six or seven months later, the staff at the Signal and Image Processing Institute at the University of Southern California wanted an image that was glossy to ensure good output dynamic range, and they link wanted a human face (pdf). Someone had the November Playboy on hand, so they scanned her picture, and a legend was born. The Lena image became a standard to test compression algorithms. It’s more than just a pretty face; it’s a nice mixture of detail, flat regions, shading, and texture that link do a good job of testing various image processing algorithms. There’s focus blur, dramatic lighting, the tricky feather in her hat, and her reflection in a mirror. Arguably the most important test of an algorithm is to get the human face right. But the Lena image has fallen out of favor. First, no one cleared the copyright, though Playboy link decided to be chill about research and educational use. Second, potentially important technical information about link how the photo was made and printed was lacking. Finally, the use of a Playboy centerfold, even in SFW form (the photo is suggestive, but Lena bares only her shoulders), was an unwittingly apt representation of the tech world’s attitude toward women which is why some link started link using Fabio images instead dot As for Lena? Her career took an appropriate turn: she worked as link a model link at Kodak to test color film, then link taught disabled computer users in her home country. BY THE DIGITS link 33 zettabytes: Size of the global datasphere in 2018 link 175 zettabytes: Estimated size by 2025 link 73 percent : Share of internet traffic devoted to video in 2016 link 82 percent : Share projected for 2021 link 70 exabytes: Internet video traffic in 2016 link 228 exabytes: Projected internet video traffic in 2021 link 7 gigabytes: Maximum data per hour required to stream 4k Netflix video link 20 percent : Data savings Netflix realized by customizing compression for each title link 10 terabytes–100 terabytes: Estimate for how much data the human brain can store link 4: Grams of DNA required to hold the equivalent of all the data in the world as of 2011 heading level 3 Have a friend who would enjoy our Obsession with Data compression? link https: slash slash qz dot com slash email slash quartz obsession slash email protected link Forward link to a friend FUN FACT! heading level 2 A Stanford professor and graduate student consulting on HBO’s Silicon Valley came up with link a formula heading level 2 link to measure both the quality and speed of compression algorithms a major plot point in season one to heading level 2 give the show more tech cred. They called it the Weissman score, and though it’s not perfect, it’s heading level 2 link being used IRL dot HOW WE computer disk NOW heading level 2 The future of compression heading level 2 separator Video is by far the biggest strain on bandwidth, and it’s only getting bigger. The dominant video codec (coder decoder) link has been H dot 264 dot That was followed by H dot 265, or HEVC, in 2013. It’s much better at compression ( link here’s why ), but takeup has been slow. First, it requires link more powerful link hardware Apple didn’t go all in on H dot 265 link until 2017 dot Licensing fees are also link higher than H dot 264 dot Meanwhile, in 2015, tech giants Amazon, Cisco, Google, Intel, Microsoft, Mozilla, and Netflix formed the Alliance for Open Media and released the royalty free AV 1 codec in 2018; in 2019, link BBC link tests found that AV 1 is competitive with H dot 265. To make matters more complicated, there’s another next generation codec on the horizon, VVC, which is expected to be finalized in 2020. The same BBC tests found it to be better at compression than both AV 1 and H dot 265, but it’s not going to be royalty free. Then again, AV 1 might not be either: An IP protection company is link claiming royalties on any device that uses AV 1 which, for example, would add up to $29 million a year for Apple if it adopted the codec. As Jan Ozer writes at Streaming Media, the next dominant video codec could be link decided by patent attorneys dot link click upn equals 8h B 5yt N Iy S Hz 7ZW 9d 4x H… WATCH THIS! heading level 2 A movie only an algorithm could love heading level 2 separator In 2016, Netflix released Meridian, a 12 minute film noir. It got bad reviews, but the target audience engineers who wanted to test codecs and equipment. It’s a weird story wrapped up in a bunch of engineering requirements, link Netflix’s Chris Fetner said dot TAKE ME DOWN THIS rabbit face HOLE! heading level 2 Our bits, ourselves heading level 2 separator One form of data that’s getting crowded is our most important info: genomic databases. As Dmitri Pavlichin and Tsachy Weissman link explain in IEEE Spectrum , they contain a lot of redundant data, which remember Huffman coding makes it suited for genome specific compression that researchers like themselves are working on. POLL heading level 2 What is your favorite use for compression? Netflix Spotify Instagram Projectors, vinyl, and film for me, thanks speech balloon LET'S TALK! In yesterday’s poll about link black holes , 44 percent of you think there are wormholes inside, 33 percent say there’s nothing, and 23 percent of you said their interiors are empty screening rooms, playing an endless loop of 2001: A Space Odyssey. love letter Rick wrote in to suggest that perhaps they’re filled with the center s of all of the donuts ever eaten. thinking face link What did you think of today’s email? light bulb link What should we obsess over next? link game die Show me a random Obsession Today’s email was written by link Whet Moser , edited by link Annaliese Griffin , and produced by link Tori Smith dot The correct answer to the quiz is Tom’s Diner, Suzanne Vega. Enjoying the Quartz Obsession? link Send this link to a friend! Want to advertise in the Quartz Obsession? Send us an email at link email protected dot Not enjoying it? No worries. link Click here to unsubscribe. Quartz 675 Avenue of the Americas, 4th Fl New York, NY 10011 United States link Share this email link graphic twitter link graphic facebook link graphic email