Thursday, October 8, 2009

Shell|Text Conversion/Filter Tools

GNU/Linux Command-Line Tools Summary - Text Conversion/Filter Tools

11.5. Text Conversion/Filter Tools

  • Filters (UNIX System/dos formats)
  •    

    The following filters allow you to change text from Dos-style to UNIX system style and vice-versa, or convert a file to other formats. Also note that many modern text editors can do this for you...

    • Why use filters?
    •    

      Because UNIX systems and Microsoft use two different standards to represent the end-of-line in an ASCII text file.

      This can sometimes causes problems in editors or viewers which aren't familiar with the other operating systems end-of-line style. The following tools allow you to get around this difference.


    • Whats the difference?
    •    

      The difference is very simple, on a Windows text file, a newline is signalled by a carriage return followed by a newline, '\r\n' in ASCII .

      On a UNIX system a newline is simply a newline, '\n' in ASCII .



  • dos2unix
  •    

    This converts Microsoft-style end-of-line characters to UNIX system style end-of-line characters.

    Simply type:


       

    dos2unix file.txt


  • fromdos
  •    

    This does the same as dos2unix (above).

    Simply type:


       

    fromdos file.txt

    fromdos can be obtained from the from/to dos website.


  • unix2dos
  •    

    This converts UNIX system style end-of-line characters to Microsoft-style end-of-line characters.

    Simply type:


       

    unix2dos file.txt


  • todos
  •    

    This does the same as unix2dos (above).

    Simply type:


       

    todos file.txt

    todos can be obtained from the from/to dos website.


  • antiword
  •    

    This filter converts Microsoft word documents into plain ASCII text documents.

    Simply type:


       

    antiword file.doc

    You can get antiword from the antiword homepage.


  • recode
  •    

    Converts text files between various formats including HTML and dozens of different forms of text encodings.

    Use recode -l for a full listing. It can also be used to convert text to and from Windows and UNIX system formats (so you don't get the weird symbols).


           Warning
            

    By default recode overwrites the input file, use '<' to use recode as a filter only (and to not overwrite the file).

    • Examples:
    •    


    UNIX system text to Windows text:


       

    recode ..pc file_name

    Windows text to UNIX system text:


       

    recode ..pc/ file_name

    UNIX system text to Windows text without overwriting the original file (and creating a new output file):


       

    recode ..pc < file_name > recoded_file


  • tr
  •    

    (Windows to UNIX system style conversion only). While tr is not specifically designed to convert files from Windows-format to UNIX system format by doing:


       

    tr -d '\r' < inputFile.txt > outputFile.txt

    The -d switch means to simply delete

No comments: