117 lines
5.2 KiB
Markdown
117 lines
5.2 KiB
Markdown
---
|
|
title: "Volatile Formats"
|
|
tags: ["Volatile Mediums"]
|
|
date: 2021-03-18T14:24:00-04:00
|
|
draft: false
|
|
---
|
|
*Note: This is a continuation of the thoughts I started
|
|
thinking about in my [Volatile Mediums](https://beckmeyer.us/posts/volatile_mediums/) blog post.*
|
|
|
|
The next level up from physical mediums for data storage
|
|
is the *way* that the data is stored. In the digital age,
|
|
we have a plethora of formats for storing information.
|
|
For me, one of the most interesting areas of information
|
|
storage is the analog-digital space.
|
|
|
|
The fundamental problem of storing audio, video, and other
|
|
replications of the physical world is that there is so much
|
|
information that we can collect with sensors
|
|
(think microphones, video cameras, etc.). It would be great
|
|
if we could go get the best camera and microphone out there,
|
|
record whatever people record these days, and have that
|
|
exact physical experience "played back" for us on a screen
|
|
and speaker/headphones.
|
|
|
|
Unfortunately, there are several problems with this. Among
|
|
those is the actual design of the sensor. It takes a lot of
|
|
careful thought, engineering, and the like to create a truly
|
|
good microphone or camera. And after all of that, this sensor
|
|
will cost something. Hopefully, that cost will correspond to
|
|
the actual technical ability of that sensor! In any case,
|
|
not everyone can have the best camera or microphone due to
|
|
any number of constraints, not just those listed above.
|
|
|
|
The second problem is the sampling issue. The sensor will
|
|
create some sort of output that can then be measured, or
|
|
**sampled**, by an ADC (analog-to-digital converter). The
|
|
very word "sample" belies what this nearly magical box is
|
|
doing: it is only looking at certain portions or timestamps
|
|
of the analog signal. Granted, the time between samples
|
|
can be very small (e.g. 44.1 kHz is a fairly common sample
|
|
rate for audio), but there is still some loss of signal.
|
|
Once the ADC creates these samples, it converts them into
|
|
a digital format (something that can be stored on a
|
|
CD, hard drive, thumb drive, etc.).
|
|
|
|
The third problem is the encoding issue. The ADC creates all
|
|
of these samples, but we need to start thinking about storage
|
|
limitations. Storing the raw output of a sensor can take a
|
|
lot of space: an average album length (40 minutes) could
|
|
easily take 400MB of space! Now, again, the physical storage
|
|
space is moving in the upward direction to combat this, but
|
|
storing isn't the only problem. One prime issue is internet
|
|
bandwidth.
|
|
|
|
The solution to this is compression, like a ZIP file. It
|
|
makes big files smaller by doing some fancy math tricks
|
|
that can be reversed by a computer to reconstruct the
|
|
original file. However, for audio/video files, another level
|
|
of compression exists which actually gets rid of some of the
|
|
information in the original file to save more space. This
|
|
is called "lossy" compression, as opposed to "lossless"
|
|
compression.
|
|
|
|
Great! We've found a way to save more space. The problem
|
|
with lossy compression is that we have to decide which
|
|
information to throw away. Usually, this is frequencies
|
|
that the average human ear/eye can't perceive. But, let's
|
|
just say that some compression is a bit too "greedy" when it
|
|
comes to saving space and starts to cut into the band of
|
|
frequencies that can be perceived. Also note that
|
|
the design of these compression algorithms is an artform
|
|
and takes lots of careful consideration.
|
|
|
|
The final problem I want to mention is the codec problem.
|
|
There are many different codecs available today, and for
|
|
each and every one of them to be useful, you need to have a
|
|
way to decode each and every one of them. Unfortunately,
|
|
this is sometimes very difficult.
|
|
|
|
It could be a licensing
|
|
issue, where you don't have the correct software installed
|
|
or purchased to actually decode that file on your computer.
|
|
|
|
Or it could be a physical constraints issue, where your
|
|
computer isn't powerful enough to decode the file at a fast
|
|
enough rate for you to view it without stuttering,
|
|
buffering, etc.
|
|
|
|
Third, it could be a personal preference. Some people
|
|
have much more sensitive eyes/ears and need to have formats
|
|
that are more **transparent**, meaning that the lossy file
|
|
is perceptually identical to the source it was encoded from.
|
|
|
|
With all of these issues at play, I think there are several
|
|
key points to make:
|
|
|
|
### 1. Codecs need to be freely available for widespread use with no strings attached.
|
|
Can't stress this one enough: we need to make sure we are
|
|
doing everything possible to not let our information die
|
|
when a corporation or individual makes a decision that
|
|
impacts the "who, what, where, when, and how" of their codec
|
|
usage.
|
|
|
|
### 2. Lossless compression is good, but it is not the only thing we need.
|
|
We need to remember that not everyone has the ability to use
|
|
lossless codecs, whether that be because of internet
|
|
bandwidth limitations, storage limitation, or the like.
|
|
Instead, we need to continue to innovate in the lossy
|
|
compression space to narrow the perceptual gap between lossy
|
|
and lossless more and more.
|
|
|
|
### 3. A codec should never become obsolete.
|
|
This one may sound weird, but the fact is, if we're talking
|
|
about long-term storage of information, we can't let codecs
|
|
die, since there may come a day where we need a codec to
|
|
decode great-grandpa's album that never made it big.
|