|
|
Asia ... Documentation ... Downloads ... DSSSL ... EFL/ESL ... HTML tools ... Japanese text ... Linux ... Photography ... Python ... SGML ... Tcl/Tk ... TeX ... Tips ... Typography ... Word processing |
Document Processing
Text conversion:
1 /
2 /
3 /
4 /
5 /
6 /
7
|
What's the problem?A text file is made up of characters. Which you know, of course -- bear with me just a moment. Some of those are printable characters -- letters, numbers, punctuation, and symbols; others are control characters. Internally, the computer sees everything as a number -- a sequence of 1s and 0s. So there's one number that represents the letter K, and another number for the end of a line -- called, appropriately, the newline character. So, when your word processor or text editor sees a newline character, it ends one line and starts a new one, right? Well, not exactly. Unfortunately -- and I don't know why, so don't ask -- UNIX, DOS and MS-Windows (hereafter lumped together as DOS), and MacOS use different control characters to indicate the end of a line. UNIX uses just the newline character (sometimes written as LF, \n, or ^J); MacOS uses just the carriage return character (CR, \r, or ^M), and DOS uses both (CRLF, \r\n, or ^M^J). So, unless your software is smart enough to deal with the differences, a UNIX or Mac text file will appear as one very long line on a DOS system. A DOS text file displayed on UNIX or a Mac will have the correct line breaks, but will also have an extra control character at the end of every line -- which, if your computer displays a symbol for the control character, is annoying to read. More importantly, the extra character may cause errors in automated text processing. |
|
|
Matt Gushee
Last modified: Sun Oct 17 11:08:43 EDT 1999 |