Can any file and its data be converted to a String or plaintext?

cheese_greater@lemmy.world · edit-2 2 months ago

Can any file and its data be converted to a String or plaintext?

palebluethought@lemmy.world · 2 months ago

can it? Sure, most any arrangement of bits can be converted into some kind of Unicode text. Can it be converted to something meaningful or readable? No, some formats are plain text (.txt, .ini, .json, .html for some random examples) that are meant to be read by humans, and others are binary formats that are only meaningful when decoded by a computer into specific data structures inside a piece of software.

Nibodhika@lemmy.world · 2 months ago

At the end of the day data is just binary, i.e. it’s composed of 0 and 1. What those 0 and 1 represent is mostly irrelevant to this discussion. The short version is that 01000001 can mean A or it can mean that a given pixel is 65/256 red, or that the speaker should vibrate in a specific frequency, etc, etc.

So what happens when you open a file that’s not text in a text editor? Well, some of the 0 and 1 make up gibberish, or characters that are not meant to be printed. Fun fact, you should be able do this the other way around too, i.e. open a text as an image, but again it will be gibberish, and most likely would not load since images have lots of information that relate to size, compression, etc, that if incorrect the program won’t know what to do, but because text can always be valid it will always work, although sometimes your editor might show weird thing in the places where there’s a non-printable character.

AbouBenAdhem@lemmy.world · edit-2 2 months ago

Yes, see Binary-to-text encoding (e.g., Base64).

cheese_greater@lemmy.world · edit-2 2 months ago

Can you comment on the specific makeup of a “rendered” audio file in plaintext, how is the computer representing every little noise bit of sound at any given point, the polyphony etc?

What are the conventions of such representation? How can a spectrogram tell pitches are where they are, how is the computer representing that?

Is it the same to view plaintext as analysing it with a hex-viewer?

Ephera@lemmy.ml · 2 months ago

There’s two things at play here.

MP3 (or WAV, OGG, FLAC etc.) provide a way to encode polyphony and stereo and such into a sequence of bytes.

And then separately, there’s Unicode (or ASCII) for encoding letters into bytes. These are just big tables which say e.g.:

01000001 = uppercase ‘A’
01000010 = uppercase ‘B’
01100001 = lowercase ‘A’

So, what your text editor does, is that it looks at the sequence of bytes that MP3 encoded and then it just looks into its table and somewhat erronously interprets it as individual letters.

AbouBenAdhem@lemmy.world · 2 months ago

Most binary-to-text encodings don’t attempt to make the text human-readable—they’re just intended to transmit the data over a text-only medium to a recipient who will decode it back to the original binary format.

cheese_greater@lemmy.world · edit-2 2 months ago

I do understand I’m not able to read it myself, I’m more curious about the architecture of how that data is represented and stored and conceptually how such representation is practically organized/reified…

AbouBenAdhem@lemmy.world · edit-2 2 months ago

The original binary format is split into six-bit chunks (e.g., 100101), which in decimal format correspond to the integers from 0 to 63. These are just mapped to letters in order:

000000 = A,
000001 = B,
000010 = C,
000011 = D,

etc.—it goes through the capital letters first, then lower-case letters, then digits, then “+” and “/”. It’s so simple you could do it by hand from the above description, if you were looking at the data in binary format.

JASN_DE@lemmy.world · 2 months ago

Are those binary files by any chance?

cheese_greater@lemmy.world · edit-2 2 months ago

I just mean like any file (pdf, jpeg, mp4, mp3, exe—

mp4/mp3 most famously for me

I find it so damn cool and incredible I can record something/anything right now and open the audio in a text file and its all right there—albeit in an incomprehensible format but there altogether.

Its like a thinking rock etching sound into stone