nlp - changing every non letter character to \n in a file using unix utilities -


i watching tutorial using unix utilities guy using on mac had windows laptop downloaded gnuwin32 package came part want replace non letter character in file newline "\n"

the command line in tutorial :

tr -sc 'a-za-z' '\n'  < filename.txt |less  

it worked him when tried it put singleqoute "'" character after character

's'h'a'k'e's'p'e'a'r'e't'h'e't'e'm'p'e's't'f'r'o'm'o'n'l'i'n'e'l'i'b'r'a'r'y'o'f'l'i'b'e'r't'y'h't't'p'o'l'l'l'i'b'e'r't'y'f'u'n'd'o'r'g' 

i tried

tr -sc "a-za-z" "\n"  < filename.txt |less  

it added new line after each character

n e l b r 

i tried remove compliment option , add ^ in regex

tr "[^a-za-z]" "\n"  < filename.txt |less  

the result replacing every letter newline

the question command line options in unix utilities of gnuwin32 differ others ? , putting regex between single quotes 'a-z' differ "a-z" , if best answer replace every non-letter character newline , other failed trials above

the source of text trying on

i tested examples in tr --version (gnu coreutils) 8.5 ,

1) using single or double quotes makes no difference 2) looks there no way negate characters using ^

when write [^a-za-z] these chars treated literally:

echo "abc abd [hh] d^o 1976" | tr '[^a-za-z]' '.' 

or double quotes

echo "abc abd [hh] d^o 1976" | tr "[^a-za-z]" '.' 

produces following output

... ... .... ... 1976 

which proves aphabetic chars, caret , square brackets have been treated literally , replaced.

this leads conclusion split non-alphabetic chars have use -c range 'a-za-z', did in first example.


Comments

Popular posts from this blog

jasper reports - Fixed header in Excel using JasperReports -

media player - Android: mediaplayer went away with unhandled events -

python - ('The SQL contains 0 parameter markers, but 50 parameters were supplied', 'HY000') or TypeError: 'tuple' object is not callable -