CS246-F20-01-UnixShell
Lecture 1.7
• UNIX pipes and building a pipeline
CS246
UNIX pipes
• Pipes (symbol: “|”, or vertical bar) allow us to use the output
of one program (via stdout) as the input to another (via
stdin)
e.g., How many words occur in the first 20 lines of foo.txt?
head -n file prints the first n lines of file
wc counts words, lines, and characters
wc -w counts just the words
Two easy solutions:
$ head -20 foo.txt | wc -w
$ cat foo.txt | head -20 | wc -w
UNIX pipes
e.g., Suppose words1.txt, words2.txt, … contain lists of
words, one per line (lists are unsorted, possibly with duplicates)
– Print a duplicate-free list of all words that occur in any of these files
uniq removes adjacent duplicate lines from a stream
(so if the file is already sorted …)
sort sorts lines in a file (lexicographically)
Solution:
$ cat words*.txt | sort | uniq
• A multi-stage process using redirection requires explicit creation of
intermediate (temp) files; e.g., store in file named result a sorted list of
all lines of the file data not containing the string “abc”, and translating
all ‘b’ chars into ‘X’ chars everywhere in each line
• … but the UNIX shell pipe operator “|” connects stdout for a command
with stdin for the next command, without creating a temp file
Building a pipeline
$ sort data > sortedData
$ egrep -v “abc” sortedData > temp
$ tr b X < temp > result
$ rm sortedData temp
$ sort data | grep -v “abc” | tr b X > result
• Standard error is not piped unless redirected to standard output
$ sort data 2>&1 | grep -v “abc” 2>&1 | tr b X > result 2>&1
[Now both stdout and stderr go through the pipe]
• UNIX pipes are a standard tool for joining together steps in a
complex process, where the output of one step is the input to
the next
Building a pipeline Building a pipeline
$ find cs246
cs246
cs246/a1
cs246/a1/q1x.C
cs246/a1/q2y.h
cs246/a1/q2y.cc
cs246/a1/q3z.cpp
$ find cs246 | sed ‘s|[^/]*/| |g’
cs246
a1
q1x.C
q2y.h
q2y.cc
q3z.cpp
sed is a text stream editor, the command changes all occurrences (g) of
string [^/]*/ (zero or more characters not “/” and then “/”, where “*”
is a wildcard qualifier not a wildcard) to 3 spaces
$ cat appleStory.txt
Apple did not violate patents owned by Samsung
Electronics in making the iPod touch, iPhone and iPad,
a judge at the International Trade Commission said in a
preliminary ruling on Friday.
Apple and Samsung have taken their bruising patent […]
$ cat appleStory.txt | sed ‘s|Apple|RIM|g’ > rimStory.txt
$ cat rimStory.txt
RIM did not violate patents owned by Samsung
Electronics in making the iPod touch, iPhone and iPad,
a judge at the International Trade Commission said in a
preliminary ruling on Friday.
RIM and Samsung have taken their bruising patent[…]
$ cat appleStory.txt | sed ‘s|Apple|RIM|g’ | sed
‘s|iPhone|Blackberry|g’ > rimStory.txt
$ cat rimStory.txt
RIM did not violate patents owned by Samsung
Electronics in making the iPod touch, Blackberry and iPad,
a judge at the International Trade Commission said in a
preliminary ruling on Friday.
RIM and Samsung have taken their bruising patent[…]
$ cat appleStory.txt | sed ‘s|Apple|RIM|g’ | sed
‘s|iPhone|Blackberry|g’ | sed ‘s|iPad|Playbook|g’
> rimStory.txt
[and so on]
End
CS246