Articles

Convert MAC EOL to Unix


I have been bitten by this one several times. In MAC, newlines are saved with the Carriage Return character only, which is the ASCII 13 (or 0xd in hexadecimal), while UNIX uses the character Line Feed only (ASCII 10, or 0xa in hexadecimal). In MS-DOS compatible systems, the new line is saved with both characters (0xd then 0xa).

This is a pain because when you have to process a text file that might be generated by humans in using environments, because a different encoding for a new line will definitely break your code if it's not prepared.

We are going to see how to translate any of those formats into classic UNIX EOL so we can process text files with the same code without having to worry about the newline format.

For this we are going to use the TRanslate unix command (or just tr) to transform our files in the following way:

The script uses the tr command to translate every carriage return character (or \r) into a line feed character (or \n). So mac files will end up having \n instead of \r, and DOS files will end up having \n\n instead of \r\n.

The second part of the command will squeeze all \n ocurrences, so \n\n ocurrences will end up like 1 \n (which is the EOL in standard UNIX systems).