The Invisible Character

It’s Stan. Peter Pan’s long long, invisible brother. Truly the longest. He doesn’t do much. Some say he doesn’t even exist. He plays the flute, or whatever, and it is beautiful. All the town mice follow him around. He flies, temporarily, if thrown off a cliff. Buzz Lightyear is his spirit animal.

But nobody cares about Stan. Stan is not the character in today’s story. But today’s character is invisible.

This character isn’t hidden in the pages of a book, though. It is hidden in your keyboard. Sort of kind of but not really.

Let’s say we have a text file:

How many characters (letters/symbols) is this?

62.

If you are lazy and didn’t count, that probably doesn’t seem particularly impressive. But if you did, you think I am crazy. The text file contains 51 letters and 2 periods. That’s only 53 characters, right?

Well, no. But the thing you have to keep in mind is that characters aren’t just letters. Characters are basically every symbol that you can type out – even ones you cannot see.

Here is a full list of characters:

This is known as an ASCII table. The exact reason this exists doesn’t matter that much, the point is that the characters listed here are the things that can exist in a text file.

So, when you step on a screw (because you aren’t the sharpest cheese in the… uh, cheese shed), you say !@)(&$@#(%. Those are all characters.

Look at number 32. See that word there? SPACE. That is that little empty space that goes into your text when you press the big bar on your keyboard, conveniently named the spacebar. Even though it is invisible, it is still a character. These types of characters are called whitespace, because… well, they take up space, and usually text is black on a white background, hence they use white space. Look ma, etymology!

Note that other characters you use, such as tab, are also whitespace.

Let’s revisit that text file again.

How many characters are there?

The answer is still 62, in case you forgot. But if you count all of the letters, spaces, and periods, you will find they total 60. I mean, I know they changed math and all, but this doesn’t seem right still. What isn’t being counted?

As it turns out, there is an invisible character in this file. It is the Enter key. It is also often called the Return key.

Or at least, that is what you press on the keyboard to produce it. The actual character this produces, ASCII #10, is called a newline. Because, well, you press it, and your text entry goes onto a new line. Sometimes words make sense, you know.

Lines in a text file don’t really seem like characters. But ultimately, at the end of the day, the computer must store the text file in a specific file format, with a specific encoding. For text files like this, that encoding is ASCII, which means everything in that file must be representable using characters from the ASCII table.

So, when you press enter, a newline character is stored inside of that file. That adds to our count, giving us 62 instead of 60.

And just to be clear here: a newline character goes between lines. There is not one newline character per line, there is one newline character at the end of each line if it is followed by another line. So there is 1, and only 1, newline character in this text file.

But wait, you say. Why does adding one newline character increase the character count by 2?

History Time

Computers have been around for a while. And they haven’t always been the same, either.

For example, look at characters 0-31 in the ASCII table. Most of these characters are not printable characters – they aren’t even whitespace. These are called control characters. These were created back when computers were room-sized and you interacted with them using a terminal.

Many of them are no longer used anymore, or are used in obscure places, possibly to mean something different entirely. The newline character also has a heritage.

There are actually two newline characters in ASCII. One is ASCII #10, which is called a line feed (LF). The other, ASCII #13, is a carriage return (CR). These terms come from typewriters.

On a typewriter, moving to a new line in text is two different physical actions. First comes the line feed, which moved the paper up the height of one line so the “cursor” was now over blank space. Second is the carriage return, because the “cursor” must be moved from the right side of the paper back to the left side.

So, with typewriters, those were two separate actions that needed to be taken by the operator to move to a new line. So they were both given a place in the ASCII table.

Now, with modern computers, the distinction between these two characters no longer matters. There isn’t really a use case for a line feed without a carriage return, or a carriage return without a line feed. But, history being history, we are stuck with them anyway.

But as it turns out, different operating systems denote newlines in text files differently. In Windows, newlines are represented by a CR followed by an LF. In MacOS and Linux, it is represented by an LF alone.

This is why having one newline in windows is actually two characters, making our previous count of 60 become 62.

Is this useful information?

Um… maybe. It depends what you are doing. I think it is certainly interesting, but whether or not it will ever affect you is hard to say. Presumably if you didn’t already know this then the answer is no. But information not being useful has never stopped me, so let’s delve into it a bit.

The first thing to note is that this distinction between newlines only applies to plain text. Most text you read or write is not plain text, though – it is usually some form of rich text. If you write a document using a word processor, you are using rich text. This blog post that I am writing right now is also rich text.

Basically, if you cannot open it using notepad, it isn’t plain text. Nuff’ said.

What is still plain text, these days? Well, off the top of my head, config files, program source code, and some network protocols like FTP and HTTP. I deal with all of these on a regular basis, so this distinction can be useful. Let’s just say if you ever move a text file from Linux to Windows, you are going to have a hard time reading it because all of the newlines will not be displayed as they use LF and not CR LF, which means notepad doesn’t think they are newlines.

That being said, most modern tools and text editors are capable of displaying and saving in both formats, so that problem is mostly historical. Even notepad can do it now, as of 2018, which feels like the first time notepad has been updated in over a decade.

For realsies though, don’t use notepad anymore, it is 2019.

 

Jacob Clarity

 

Leave a Reply