nlp - changing every non letter character to \n in a file using unix utilities -
i watching tutorial using unix utilities guy using on mac had windows laptop downloaded gnuwin32 package came part want replace non letter character in file newline "\n"
the command line in tutorial :
tr -sc 'a-za-z' '\n' < filename.txt |less
it worked him when tried it put singleqoute "'" character after character
's'h'a'k'e's'p'e'a'r'e't'h'e't'e'm'p'e's't'f'r'o'm'o'n'l'i'n'e'l'i'b'r'a'r'y'o'f'l'i'b'e'r't'y'h't't'p'o'l'l'l'i'b'e'r't'y'f'u'n'd'o'r'g'
i tried
tr -sc "a-za-z" "\n" < filename.txt |less
it added new line after each character
n e l b r
i tried remove compliment option , add ^ in regex
tr "[^a-za-z]" "\n" < filename.txt |less
the result replacing every letter
newline
the question command line options in unix utilities of gnuwin32 differ others ? , putting regex between single quotes 'a-z' differ "a-z" , if best answer replace every non-letter character newline , other failed trials above
i tested examples in tr --version
(gnu coreutils) 8.5 ,
1) using single or double quotes makes no difference 2) looks there no way negate characters using ^
when write [^a-za-z]
these chars treated literally:
echo "abc abd [hh] d^o 1976" | tr '[^a-za-z]' '.'
or double quotes
echo "abc abd [hh] d^o 1976" | tr "[^a-za-z]" '.'
produces following output
... ... .... ... 1976
which proves aphabetic chars, caret , square brackets have been treated literally , replaced.
this leads conclusion split non-alphabetic chars have use -c
range 'a-za-z'
, did in first example.
Comments
Post a Comment