EnderUNIX Team.


EnderUNIX tips

Mail to My Friend , Home Page

[ Shell Scripting ]

"Word Histogram" - Mehmet Uluer - (2006-07-04 08:56:20)   [10509]

One can want to look for word frequency in a file..

If the content of a sample file "tonguetwister.txt" is:

--
How much wood would a woodchuck chuck
If a woodchuck could chuck wood?
He would chuck, he would, as much as he could,
And chuck as much wood as a woodchuck would
If a woodchuck could chuck wood.
--

the command sequence:

--
muluer:~ # cat tonguetwister.txt | tr '[a-z]' '[A-Z]' | tr '[:punct:]' ' ' | tr '\t' ' ' | tr ' ' '\n' | sed '/^$/d' | sort -d | uniq -c | sort -nr
--

produce the result like:
--
5 CHUCK
4 WOULD
4 WOODCHUCK
4 WOOD
4 AS
4 A
3 MUCH
3 HE
3 COULD
2 IF
1 HOW
1 AND
--

at which;
1- Change the letters into capital,
2- Replace punctuations into spaces,
3- Replace tabs into spaces,
4- Replace spaces into new lines,
5- Clear out blank rows,
6- Sort with dictionary order for uniqs,
7- Add up unique words,
8- Sort again numerically and show the reverse.

Mail to My Friend , Home Page