Encoding is the process of translating data between two formats according to a set of rules or a formula. For example, you can encode "abc" to "ABC" using lowercase-to-uppercase rules. Decoding is the inverse process. You can decode "ABC" to "abc" using the same set of rules. There are many different applications for data encoding and decoding. Encryption, for example, is a form of encoding that uses a key. Without the key, the encoded data can't be decoded. Video data is encoded to make it smaller so that high-definition streams of video can be viewed over the Internet.
Encoding is commonly used to reduce the size of audio and video data. A coder-decoder program, called a codec, applies a series of mathematical algorithms that eliminate redundant data. For example, suppose a file contained the data "ABCDQABC." Codec #1's algorithm might be to replace "ABC" with "Z." The resulting file would be "ZDQZ," which is 50 percent smaller than the original file. Codec #2's algorithm might replace "ABC?" with "Y" and "?ABC" with "X," where "?" indicates any character. The resulting file would be "YX," which is 75 percent smaller than the original file.
Decoding uses the same codec to reconstruct the original file from an encoded file. For example, applying Codec #1 to "ZDQZ" in reverse replaces "Z" with "ABC" to create the original file "ABCDQABC." Codec #1, which was 50 percent smaller, is called a lossless codec because decoding always recreates the original file. Codec #2 decodes "YX" with "ABC??ABC," and then tries to guess what the missing characters are. The codec might guess "DE," which results in "ABCDEABC." Codec #2, which was 75 percent smaller, is a lossy codec, because the decoding process might create a file that's close to the original, but not identical.
URLs and Character Sets
Not all encoding produces a result that's smaller than the unencoded data. For example, the URL "example.com/Secret of Life.html" is invalid because it contains spaces. A Web programmer encodes the URL, which replaces all spaces with "%20" to create "example.com/Secret%20of%20Life.html." The decoding process performs the inverse operation and replaces "%20" with a space. Similarly, a database program might encode all data in Unicode, a master set of characters across most languages. When a user retrieves data, the program decodes the Unicode to match the user's language and keyboard settings.
Encoding That Can't Be Decoded
Some encodings are not intended to be decoded. A hash is a long string of random characters used with an encoding algorithm to produce an encrypted result that can't be decoded, even when you know the hash. For example, a user's password might be encrypted with a hash and stored in a database. If a hacker finds the database, he can't decode any passwords. When a user logs in, the system encodes the password the user enters with the hash and compares the result to that stored in the database. If a user forgets his password, he must change it, because the system doesn't know his original password, only its encrypted value.
- The Digital FAQ: Introduction to Video Encoding/Converting, Part 1
- Time: Can You Hear the Difference Between Lossless and Lossy Audio?
- HTML Purifier: UTF-8 -- The Secret of Character Encoding
- Why Salted Hash Is as Good for Passwords as for Breakfast
- Axis Communications: H.264 Video Compression Standard