nlp - changing every non letter character to \n in a file using unix utilities -
i watching tutorial using unix utilities guy using on mac had windows laptop downloaded gnuwin32 package came part want replace non letter character in file newline "\n"
the command line in tutorial :
tr -sc 'a-za-z' '\n' < filename.txt |less it worked him when tried it put singleqoute "'" character after character
's'h'a'k'e's'p'e'a'r'e't'h'e't'e'm'p'e's't'f'r'o'm'o'n'l'i'n'e'l'i'b'r'a'r'y'o'f'l'i'b'e'r't'y'h't't'p'o'l'l'l'i'b'e'r't'y'f'u'n'd'o'r'g' i tried
tr -sc "a-za-z" "\n" < filename.txt |less it added new line after each character
n e l b r i tried remove compliment option , add ^ in regex
tr "[^a-za-z]" "\n" < filename.txt |less the result replacing every letter newline
the question command line options in unix utilities of gnuwin32 differ others ? , putting regex between single quotes 'a-z' differ "a-z" , if best answer replace every non-letter character newline , other failed trials above
i tested examples in tr --version (gnu coreutils) 8.5 ,
1) using single or double quotes makes no difference 2) looks there no way negate characters using ^
when write [^a-za-z] these chars treated literally:
echo "abc abd [hh] d^o 1976" | tr '[^a-za-z]' '.' or double quotes
echo "abc abd [hh] d^o 1976" | tr "[^a-za-z]" '.' produces following output
... ... .... ... 1976 which proves aphabetic chars, caret , square brackets have been treated literally , replaced.
this leads conclusion split non-alphabetic chars have use -c range 'a-za-z', did in first example.
Comments
Post a Comment