12 min read

Shell 学习笔记

Introduction to Shell for Data Science

  • 4 hours
  • 0 Videos
  • 54 Exercises

Greg Wilson | DataCamp 这个哥们是教Git和Shell的,follow一下。 shell 其实在不需呀安装任何软件的条件下,也可以数据处理和最简单的分析。 但是最关键的是,整理文件。

学习一下cmd,就可以像程序员一样,好好的鼠标不用全部打开cmd,码代码,声音打的异常响,好酷,我也要这样。 shell 要常常练习,不然忘记了。

Where am I? | Shell

  • print working directory: pwd
  • find out who the computer thinks you are: whoami

How else can I identify files and directories? | Shell

  • A relative path, .../...
  • A absolute path, /.../..., / is relevant.
$ ls course.txt
course.txt
$ ls seasonal/summer.csv
seasonal/summer.csv

How can I move to another directory? | Shell

change directory: cd

If you type cd seasonal and then type pwd, the shell will tell you that you are now in /home/repl/seasonal. If you then run ls on its own, it shows you the contents of /home/repl/seasonal, because that’s where you are. If you want to get back to your home directory /home/repl, you can use the command cd /home/repl.

$ cd seasonal
$ pwd
/home/repl/seasonal
$ ls
autumn.csv  spring.csv  summer.csv  winter.csv

How can I move up a directory? | Shell

  • parent directory: .., cd .. \(\to\) move to parent directory
  • current directory: ., ls . = ls, cd . = cd
  • home directory: ~, cd ~ \(\to\) move to home directory
  • example: cd ~/../. \(\to\) move to ‘home directory’, ‘up a level’, ‘here’.

How can I copy files? | Shell

  • copy: cp cp original.txt duplicate.txt, creates a copy of original.txt called duplicate.txt. If there already was a file called duplicate.txt, it is overwritten.

Make a copy of seasonal/summer.csv in the backup directory, calling the new file summer.bck.

$ cd ~
$ pwd
/home/repl
$ ls
backup  bin  course.txt  people  seasonal
$ cp seasonal/summer.csv backup/summer.bck
$ cp seasonal/spring.csv seasonal/summer.csv backup

如果backup是一个已经存在的路径,那么 cp seasonal/spring.csv seasonal/summer.csv就是把文件夹seasonal下的 spring.csvsummer.csv复制到文件夹backup下面。

How can I move a file? | Shell

这里就可以使用mv来当搬运工了。

$ mv seasonal/spring.csv seasonal/summer.csv backup

How can I rename files? | Shell

mv course.txt old-course.txt这里使用命令mvcourse.txt的名字改为old-course.txt。 但是如果原来路径中就有old-course.txt文档, 那么这个文档就被覆盖了。

$ pwd
/home/repl
$ ls
backup  bin  course.txt  people  seasonal
$ cd seasonal
$ pwd
/home/repl/seasonal
$ mv winter.csv winter.csv.bck
$ ls
autumn.csv  spring.csv  summer.csv  winter.csv.bck

How can I delete files? | Shell

remove: rm

rm thesis.txt backup/thesis-2017-08.txt是删除 thesis.txtbackup/thesis-2017-08.txt。 并且不会进回收站。

$ ls
backup  bin  course.txt  people  seasonal
$ cd seasonal
$ ls
autumn.csv  spring.csv  summer.csv  winter.csv
$ rm autumn.csv
$ cd ~
$ ls
backup  bin  course.txt  people  seasonal
$ rm seasonal/summer.csv

就是一个思路的笔记,不需要看。

How can I create and delete directories? | Shell

也就是说, mv old-directory new-directory相当于修改了old-directory文件夹的名字为new-directory

但是rm不能这么做,主要怕不小心把一个文件夹给删除了。 这里能使用的只有rmdir,但是除非文件夹为空,否则不能删除。

$ ls
backup  bin  course.txt  people  seasonal
$ rm people/agarwal.txt
$ rm people
rm: cannot remove 'people': Is a directory
$ rmdir people
$ mkdir ~/yearly
$ ls
backup  bin  course.txt  seasonal  yearly
$ mkdir yearly
mkdir: cannot create directory 'yearly': File exists
$ mkdir yearly/2017

Wrapping up | Shell

$ ls
backup  bin  course.txt  people  seasonal
$ $ cd /tmp
$ ls
tmpd9gse9kn  tmpfadc8q5e  tmpzafhyewb  tmpzx3fpfmb
$ mkdir scratch
$ mv /home/repl/people/agarwal.txt ~/tmp
$ mv ~/people/agarwal.txt ~/tmp
mv: cannot stat '/home/repl/people/agarwal.txt': No such file or directory

mv: cannot stat '/home/repl/people/agarwal.txt': No such file or directory这条报错,就说明对了,现在路径已经切换了。

How can I view a file’s contents? | Shell

concatenate: cat 。 it will print all the files whose names you give it。

$ cat course.txt
Introduction to the Unix Shell for Data Science

The Unix command line has survived and thrived for almost fifty years
because it lets people to do complex things with just a few
keystrokes. Sometimes called "the duct tape of programming", it helps
users combine existing programs in new ways, automate repetitive
tasks, and run programs on clusters and clouds that may be halfway
around the world. This lesson will introduce its key elements and show
you how to use them efficiently.

How can I view a file’s contents piece by piece? | Shell

感觉有点复杂。

  • 一次只看一页: less , 空格执行下一页的功能,
  • 表示退出: q
  • 切换到下一个文件, next: :n
  • 切换到上一个文件, previous: :p
$ less seasonal/spring.csv seasonal/summer.csv

How can I look at the start of a file? | Shell

$ head seasonal/summer.csv
Date,Tooth
2017-01-11,canine
2017-01-18,wisdom
2017-01-21,bicuspid
2017-02-02,molar
2017-02-27,wisdom
2017-02-27,wisdom
2017-03-07,bicuspid
2017-03-15,wisdom
2017-03-20,canine

反馈文档的前10行: head

How can I type less? | Shell

tab completion,就是很多软件的tab自动补充功能。

For example, if you type sea and press tab, it will fill in the word seasonal. If you then type a and tab, it will complete the path as seasonal/autumn.csv. If the path is ambiguous, such as seasonal/s, pressing tab a second time will display a list of possibilities.

How can I control what commands do? | Shell

$ head -n 5 seasonal/winter.csv
Date,Tooth
2017-01-03,bicuspid
2017-01-05,incisor
2017-01-21,wisdom
2017-02-05,molar

head-n5可以看前5行,而非默认的前10行。 -表示flag,之后用的很多。 并且-n表示行数的变量,后面紧跟着行数。

How can I list everything below a directory? | Shell

recursive1: -R,类似于树形结构。具体看例子。

How can I list everything below a directory? | Shell

ls -R后面加不加路径都无所谓。

$ ls -R /home/repl/
/home/repl/:
backup  bin  course.txt  people  seasonal

/home/repl/backup:

/home/repl/bin:

/home/repl/people:
agarwal.txt

/home/repl/seasonal:
autumn.csv  spring.csv  summer.csv  winter.csv
$ ls -R
.:
backup  bin  course.txt  people  seasonal

./backup:

./bin:

./people:
agarwal.txt

./seasonal:
autumn.csv  spring.csv  summer.csv  winter.csv

How can I get help for a command? | Shell

manual: man,类似于help,用法如man head

HEAD(1)               BSD General Commands Manual              HEAD(1)

NAME
     head -- display first lines of a file

SYNOPSIS
     head [-n count | -c bytes] [file ...]

DESCRIPTION
     This filter displays the first count lines or bytes of each of
     the specified files, or of the standard input if no files are
     specified.  If count is omitted it defaults to 10.

     If more than a single file is specified, each file is preceded by
     a header consisting of the string ``==> XXX <=='' where ``XXX''
     is the name of the file.

SEE ALSO
     tail(1)

man automatically invokes less, so you may need to press spacebar to page through the information.

man自带less功能,空格键疯狂按起来。 q退出。

Use tail to display all but the first six lines of seasonal/spring.csv.

tail -n +7 seasonal/spring.csv.

How can I select columns from a file? | Shell

cut -f 2-5,8 -d , values.csvcut用来选择列, field: -f表示指定的列, 2-5,8表示2到5列,和第8列,这里用的是index = 1,和Python不同, -d表示指定的分割符,比如tab、colons、space等等。

cut -d , -f 1 seasonal/spring.csvcut -d, -f1 seasonal/spring.csv,其中flag的顺序不重要的。

分析的逻辑。

$ pwd
/home/repl
$ ls
backup  bin  course.txt  people  seasonal
$ cd seasonal/
$ pwd
/home/repl/seasonal
$ ls
autumn.csv  spring.csv  summer.csv  winter.csv
$ head spring.csv
Date,Tooth
2017-01-25,wisdom
2017-02-19,canine
2017-02-24,canine
2017-02-28,wisdom
2017-03-04,incisor
2017-03-12,wisdom
2017-03-14,incisor
2017-03-21,molar
2017-04-29,wisdom
$ cut -f 1 -d, spring.csv
Date
2017-01-25
2017-02-19
2017-02-24
2017-02-28
2017-03-04
2017-03-12
2017-03-14
2017-03-21
2017-04-29
2017-05-08
2017-05-20
2017-05-21
2017-05-25
2017-06-04
2017-06-13
2017-06-14
2017-07-10
2017-07-16
2017-07-23
2017-08-13
2017-08-13
2017-08-13
2017-09-07

What can’t cut do? | Shell

但是cut不能准确分割文本,具体看例子。

Name,Age
"Johel,Ranjit",28
"Sharma,Rupinder",26

用逗号切的时候,会切"Johel,Ranjit"里面的,cut -f 2 -d , everyone.csv反馈的结果为

Age
Ranjit"
Rupinder"

How can I repeat commands? | Shell

你可以用键盘上的\(\uparrow\)键。 或者有几种方法,以例子展示。

!加命令,再执行最近一个执行的指定命令。 history反馈看执行过的命令列表,按!和对应数字,再执行对应的指定命令。

Run head summer.csv in your home directory (which should fail).

$ head summer.csv
head: cannot open 'summer.csv' for reading: No such file or directory

Change directory to seasonal.

$ cd seasonal

Re-run the head command using ! followed by the command name. Do not type any spaces between ! and what follows.

$ !head
head summer.csv
Date,Tooth
2017-01-11,canine
2017-01-18,wisdom
2017-01-21,bicuspid
2017-02-02,molar
2017-02-27,wisdom
2017-02-27,wisdom
2017-03-07,bicuspid
2017-03-15,wisdom
2017-03-20,canine

Use history to look at what you have done.

$ history
    1  head summer.csv
    2  cd seasonal
    3  head summer.csv
    4  history

Re-run head again using ! followed by a command number. Do not type any spaces between ! and what follows.

$ !1
head summer.csv
Date,Tooth
2017-01-11,canine
2017-01-18,wisdom
2017-01-21,bicuspid
2017-02-02,molar
2017-02-27,wisdom
2017-02-27,wisdom
2017-03-07,bicuspid
2017-03-15,wisdom
2017-03-20,canine

How can I select lines containing particular values? | Shell

  • 正则化: grep 选择改行中包含某个信息,类似于R的str_subset函数。 如,grep bicuspid seasonal/winter.csv

总结下,这节信息量大。

$ grep -c incisor seasonal/autumn.csv seasonal/winter.csv
seasonal/autumn.csv:3
seasonal/winter.csv:6

-c分别衡量两个文档中incisor的行数。 并且,count作为一个列,文档名作为index了。

有一个骚操作,当输入完seasonal/按tab,系统会提示有那些子路径可以选择,真棒。

$ grep -c inciser seasonal/
autumn.csv  spring.csv  summer.csv  winter.csv
$ grep -v -n molar seasonal/spring.csv
1:Date,Tooth
2:2017-01-25,wisdom
3:2017-02-19,canine
4:2017-02-24,canine
5:2017-02-28,wisdom
6:2017-03-04,incisor
7:2017-03-12,wisdom
8:2017-03-14,incisor
10:2017-04-29,wisdom
11:2017-05-08,canine
12:2017-05-20,canine
13:2017-05-21,canine
14:2017-05-25,canine
16:2017-06-13,bicuspid
17:2017-06-14,canine
18:2017-07-10,incisor
19:2017-07-16,bicuspid
20:2017-07-23,bicuspid
21:2017-08-13,bicuspid
22:2017-08-13,incisor
23:2017-08-13,wisdom

这里

  • -n: print line numbers for matching lines
  • -v: invert the match, i.e., only show lines that don’t match

当然,如果不加-n,也就是index,最后反馈。

$ grep -v molar seasonal/autumn.csv
Date,Tooth
2017-01-05,canine
2017-01-17,wisdom
2017-01-18,canine
2017-02-22,bicuspid
2017-03-10,canine
2017-03-13,canine
2017-04-30,incisor
2017-05-02,canine
2017-05-10,canine
2017-05-19,bicuspid
2017-06-22,wisdom
2017-06-25,canine
2017-07-10,incisor
2017-07-10,wisdom
2017-07-20,incisor
2017-07-21,bicuspid
2017-08-09,canine
2017-08-16,canine

Why isn’t it always safe to treat data as text? | Shell

paste相当于pandasmerge函数。

$ paste seasonal/autumn.csv seasonal/winter.csv
Date,Tooth      Date,Tooth
2017-01-05,canine       2017-01-03,bicuspid
2017-01-17,wisdom       2017-01-05,incisor
2017-01-18,canine       2017-01-21,wisdom
2017-02-01,molar        2017-02-05,molar
2017-02-22,bicuspid     2017-02-17,incisor
2017-03-10,canine       2017-02-25,bicuspid
2017-03-13,canine       2017-03-12,incisor
2017-04-30,incisor      2017-03-25,molar
2017-05-02,canine       2017-03-26,incisor
2017-05-10,canine       2017-04-04,canine
2017-05-19,bicuspid     2017-04-18,canine
2017-05-25,molar        2017-04-26,canine
2017-06-22,wisdom       2017-04-26,molar
2017-06-25,canine       2017-04-26,wisdom
2017-07-10,incisor      2017-04-27,canine
2017-07-10,wisdom       2017-05-08,molar
2017-07-20,incisor      2017-05-13,bicuspid
2017-07-21,bicuspid     2017-05-14,wisdom
2017-08-09,canine       2017-06-17,canine
2017-08-16,canine       2017-07-01,incisor
        2017-07-17,canine
        2017-08-10,incisor
        2017-08-11,bicuspid
        2017-08-11,wisdom
        2017-08-13,canine

真丑。行数不一致。

How can I store a command’s output in a file? | Shell

concatenate: cat 。 it will print all the files whose names you give it。

head -n 5这个不陌生了。 > top.csv>表示redirect,不属于head里面的命令。 后面的top.csv保存一个新文件,以top.csv命名。 在 head -n 5 sesaonal/winter.csv > top.csv 中。

How can I use one command’s output as the input to another command? | Shell

headtail一起使用。

head -n 5 sesaonal/winter.csv > top.csv
tail -n 3 top.csv

get lines 3-5.

What’s a better way to use one command’s output as another command’s input? | Shell

两个bug, 太多的中间表, 代码在history分散的。

pipe,思路和R相似。 |cmd中的命令。

$ cut -f 2 -d, seasonal/summer.csv | grep -v Tooth
canine
wisdom
bicuspid
molar
wisdom
wisdom
bicuspid
wisdom
canine
molar
bicuspid
wisdom
canine
canine
incisor
incisor
canine
incisor
incisor
incisor
canine
canine
bicuspid
canine

好,别忘了逗号在-d后面。 而且还可以把标题给删除了,真强。

How can I combine many commands? | Shell

开始提高效率了!

$ cut -f 2 -d, seasonal/autumn.csv | grep -v Tooth | head -n 1
canine

How can I count the records in a file? | Shell

  • word count: wc
  • number of characters: -c
  • number of words: -w
  • number of lines : -l
$ grep 2017-07 seasonal/spring.csv
2017-07-10,incisor
2017-07-16,bicuspid
2017-07-23,bicuspid
$ grep 2017-07 seasonal/spring.csv | wc -w
3

这里不需要cut指定哪个列。 直接grep 2017-07wc -w是对字数统计,对word。

How can I specify many files with a single command? | Shell

$ head -n 3 seasonal/s*.csv
==> seasonal/spring.csv <==
Date,Tooth
2017-01-25,wisdom
2017-02-19,canine

==> seasonal/summer.csv <==
Date,Tooth
2017-01-11,canine
2017-01-18,wisdom

对两张表使用一个规则。

*代表任意的东西2

What other wildcards can I use? | Shell

?代表一个字符。

[...] matches any one of the characters inside the square brackets, so 201[78].txt matches 2017.txt or 2018.txt, but not 2016.txt.

[...]代表了方框内字符的任意一个。

{...} matches any of the command-separated patterns inside the curly brackets, so {*.txt, *.csv} matches any file whose name ends with .txt or .csv, but not files whose names end with .pdf.

{...}不太好解释,看引文吧。

How can I sort lines of text? | Shell

arrange的功能来了。

cut -f 2 d, seasonal/winter.csv | grep -v Tooth | sort -r
  • 排序: sort
  • 正序参数: -n
  • 正序参数: -r

while -b tells it to ignore leading blanks and -f tells it to fold case (i.e., be case-insensitive).

How can I remove duplicate lines? | Shell

去重: uniq

it only has to keep the most recent unique line in memory.

$ cut -d , -f 2 seasonal/* | grep -v Tooth | sort | uniq -c
     15 bicuspid
     31 canine
     18 incisor
     11 molar
     17 wisdom
$ cut -d , -f 2 seasonal/* | grep -v Tooth | sort | uniq
bicuspid
canine
incisor
molar
wisdom

-c就是帮助计数。

Wrapping up | Shell

$ wc seasonal/
autumn.csv  spring.csv  summer.csv  winter.csv
$ wc seasonal/*
  21   21  378 seasonal/autumn.csv
  24   24  434 seasonal/spring.csv
  25   25  454 seasonal/summer.csv
  26   26  471 seasonal/winter.csv
  96   96 1737 total
$ wc seasonal/* | grep -v total | sort -n | head -n 1
  21   21  378 seasonal/autumn.csv

How does the shell store information? | Shell

## # A tibble: 4 x 3
##   Variable Purpose                           Value              
##   <chr>    <chr>                             <chr>              
## 1 HOME     User's home directory             /home/repl         
## 2 PWD      Present working directory         Same as pwd command
## 3 SHELL    Which shell program is being used /bin/bash          
## 4 USER     User's ID                         repl

set可以看全部,但是好乱。

$ set | grep HISTFILESIZE
HISTFILESIZE=2000

可和grep一起使用。

How can I print a variable’s value? | Shell

echo类似于Python的print

$ echo $OSTYPE
linux-gnu

$加在变量之前。

OSTYPE衡量系统类型。

How else does the shell store information? | Shell

$ testing=seasonal/winter.csv
$ head -n 1 testing

就用=就好了,跟Python一样。

How can I repeat a command many times? | Shell

$ for suffix in gif jpg png; do echo $suffix; done
gif
jpg
png

How can I repeat a command once for each file? | Shell

$ for x in seasonal/*.csv; do echo $x; done
seasonal/autumn.csv
seasonal/spring.csv
seasonal/summer.csv
seasonal/winter.csv

How can I record the names of a set of files? | Shell

$ files=seasonal/*.csv
$ for f in $files; do echo $f; done
seasonal/autumn.csv
seasonal/spring.csv
seasonal/summer.csv
seasonal/winter.csv
files=seasonal/*.csv
for f in files; do echo $f; done

这里只反馈一个单词files,因为没加$

$ for f in seasonal/*.csv; do echo $f; head -n 2 $f | tail -n 1; done
seasonal/autumn.csv
2017-01-05,canine
seasonal/spring.csv
2017-01-25,wisdom
seasonal/summer.csv
2017-01-11,canine
seasonal/winter.csv
2017-01-03,bicuspid

Yes: echo produces one line that includes the filename twice, which tail then copies.

$ for f in seasonal/*.csv; do echo $f head -n 2 $f | tail -n 1; done
seasonal/autumn.csv head -n 2 seasonal/autumn.csv
seasonal/spring.csv head -n 2 seasonal/spring.csv
seasonal/summer.csv head -n 2 seasonal/summer.csv
seasonal/winter.csv head -n 2 seasonal/winter.csv

How can I run many commands in a single loop? | Shell

$ for file in seasonal/*.csv; do echo $file; grep -h 2017-07 $file; done
seasonal/autumn.csv
2017-07-10,incisor
2017-07-10,wisdom
2017-07-20,incisor
2017-07-21,bicuspid
seasonal/spring.csv
2017-07-10,incisor
2017-07-16,bicuspid
2017-07-23,bicuspid
seasonal/summer.csv
2017-07-25,canine
seasonal/winter.csv
2017-07-01,incisor
2017-07-17,canine
$ for file in seasonal/*.csv; do grep 2017-07 $file; done
2017-07-10,incisor
2017-07-10,wisdom
2017-07-20,incisor
2017-07-21,bicuspid
2017-07-10,incisor
2017-07-16,bicuspid
2017-07-23,bicuspid
2017-07-25,canine
2017-07-01,incisor
2017-07-17,canine

还是有点区别的。

Why shouldn’t I use spaces in filenames? | Shell

mv 'July 2017.csv' '2017 July data.csv'加上string符号。

How can I edit a file? | Shell

nano names.txt修改这个文档,或者新建,如果不存在的话。 在Windows上,使用copy

copy NUL name.Rmd
  • Ctrl-K: delete a line.
  • Ctrl-U: un-delete a line.
  • Ctrl-O: save the file (‘O’ stands for ‘output’). 并且再加一个回车。
  • Ctrl-X: exit the editor.

先搞shell 终端在家太慢了,完全跟不上速度进行练习。 代码不知道什么情况。 继续吧。 回头再好好搞搞。

How can I record what I just did? | Shell

  • Run history.
  • Pipe its output to tail -n 10 (or however many recent steps you want to save).
  • Redirect that to a file called something like figure-5.history. 这里figure-5.history是文件名。

This is better than writing things down in a lab notebook because it is guaranteed not to miss any steps. It also illustrates the central idea of the shell: simple tools that produce and consume lines of text can be combined in a wide variety of ways to solve a broad range of problems.

$ cp seasonal/spring.csv seasonal/summer.csv ~/
$ grep -h -v Tooth spring.csv summer.csv
.bash_logout  .profile      bin/          people/       spring.csv
.bashrc       backup/       course.txt    seasonal/     summer.csv
$ grep -h -v Tooth spring.csv summer.csv > temp.csv
$ history | tail -n 3 > steps.txt
  • ~/ home directory

How can I save commands to re-run later? | Shell

head -n 1 seasonal/*.csv存入一个叫做headers.sh的文档, 运行命令,bash headers.sh就执行了。 当然你也nano来修改/创建它。

How can I re-use pipes? | Shell

bash all-dates.sh > dates.out可以把执行后的output存入一个新的文件。

$ nano teeth.sh
$ bash teeth.sh > teeth.out
$ cat teeth.out
     15 bicuspid
     31 canine
     18 incisor
     11 molar
     17 wisdom

cat就是查看数据。

How can I pass filenames to scripts? | Shell

从字面意思理解,就是如何使用一个script,当然里面含有了我们定义好的函数。 如何对某个文档执行函数。

$@就是个代词。

如果我们有个文档unique-lines.sh,含有函数 sort $@ | uniq

当我们执行 bash unique-lines.sh seasonal/summer.csv

相当于我们执行了, sort seasonal/summer.csv | uniq

下面举例子。

$ bash count-records.sh seasonal/*.csv > num-records.out

How can I process a single argument? | Shell

$1, $2类似于$@。 建立一个文档, column.sh含有命令, cut -d , -f $2 $1。 然后可以执行这个script, bash column.sh seasonal/autumn.csv 1。 当然,这里参数可以换位置的。

How can one shell script do many things? | Shell

就是写两行命令就好,用$@代替参数就好。

How can I write loops in a shell script? | Shell

# Print the first and last data records of each file.
for filename in $@
do
    head -n 2 $filename | tail -n 1
    tail -n 1 $filename
done

可以用分号分好或者缩进。

How can I stop a running program? | Shell

Ctrl-C: ^C, cancel 终止程序。


  1. recursive 美音 /rɪ’kɝsɪv/ 每次重读都读错。

  2. wildcards: n. 通配符;万用字符 对这种计算机单次完全不知道啊!