Introduction to Shell for Data Science
- 4 hours
- 0 Videos
- 54 Exercises
Greg Wilson | DataCamp 这个哥们是教Git和Shell的,follow一下。 shell 其实在不需呀安装任何软件的条件下,也可以数据处理和最简单的分析。 但是最关键的是,整理文件。
学习一下cmd
,就可以像程序员一样,好好的鼠标不用全部打开cmd
,码代码,声音打的异常响,好酷,我也要这样。
shell 要常常练习,不然忘记了。
Where am I? | Shell
- print working directory:
pwd
- find out who the computer thinks you are:
whoami
How can I identify files and directories? | Shell
listing: ls
, ls /home/rep
How else can I identify files and directories? | Shell
- A relative path,
.../...
- A absolute path,
/.../...
,/
is relevant.
$ ls course.txt
course.txt
$ ls seasonal/summer.csv
seasonal/summer.csv
How can I move to another directory? | Shell
change directory: cd
If you type
cd
seasonal and then typepwd
, the shell will tell you that you are now in/home/repl/seasonal
. If you then runls
on its own, it shows you the contents of/home/repl/seasonal
, because that’s where you are. If you want to get back to your home directory/home/repl
, you can use the commandcd /home/repl
.
$ cd seasonal
$ pwd
/home/repl/seasonal
$ ls
autumn.csv spring.csv summer.csv winter.csv
How can I move up a directory? | Shell
- parent directory:
..
,cd ..
\(\to\) move to parent directory - current directory:
.
,ls .
=ls
,cd .
=cd
- home directory:
~
,cd ~
\(\to\) move to home directory - example:
cd ~/../.
\(\to\) move to ‘home directory’, ‘up a level’, ‘here’.
How can I copy files? | Shell
- copy:
cp
cp original.txt duplicate.txt
, creates a copy oforiginal.txt
calledduplicate.txt
. If there already was a file calledduplicate.txt
, it is overwritten.
Make a copy of seasonal/summer.csv
in the backup directory, calling the new file summer.bck
.
$ cd ~
$ pwd
/home/repl
$ ls
backup bin course.txt people seasonal
$ cp seasonal/summer.csv backup/summer.bck
$ cp seasonal/spring.csv seasonal/summer.csv backup
如果backup
是一个已经存在的路径,那么
cp seasonal/spring.csv seasonal/summer.csv
就是把文件夹seasonal
下的
spring.csv
和summer.csv
复制到文件夹backup
下面。
How can I rename files? | Shell
mv course.txt old-course.txt
这里使用命令mv
把course.txt
的名字改为old-course.txt
。
但是如果原来路径中就有old-course.txt
文档,
那么这个文档就被覆盖了。
$ pwd
/home/repl
$ ls
backup bin course.txt people seasonal
$ cd seasonal
$ pwd
/home/repl/seasonal
$ mv winter.csv winter.csv.bck
$ ls
autumn.csv spring.csv summer.csv winter.csv.bck
How can I delete files? | Shell
remove: rm
如rm thesis.txt backup/thesis-2017-08.txt
是删除
thesis.txt
和
backup/thesis-2017-08.txt
。
并且不会进回收站。
$ ls
backup bin course.txt people seasonal
$ cd seasonal
$ ls
autumn.csv spring.csv summer.csv winter.csv
$ rm autumn.csv
$ cd ~
$ ls
backup bin course.txt people seasonal
$ rm seasonal/summer.csv
就是一个思路的笔记,不需要看。
How can I create and delete directories? | Shell
也就是说,
mv old-directory new-directory
相当于修改了old-directory
文件夹的名字为new-directory
。
但是rm
不能这么做,主要怕不小心把一个文件夹给删除了。
这里能使用的只有rmdir
,但是除非文件夹为空,否则不能删除。
$ ls
backup bin course.txt people seasonal
$ rm people/agarwal.txt
$ rm people
rm: cannot remove 'people': Is a directory
$ rmdir people
$ mkdir ~/yearly
$ ls
backup bin course.txt seasonal yearly
$ mkdir yearly
mkdir: cannot create directory 'yearly': File exists
$ mkdir yearly/2017
Wrapping up | Shell
$ ls
backup bin course.txt people seasonal
$ $ cd /tmp
$ ls
tmpd9gse9kn tmpfadc8q5e tmpzafhyewb tmpzx3fpfmb
$ mkdir scratch
$ mv /home/repl/people/agarwal.txt ~/tmp
$ mv ~/people/agarwal.txt ~/tmp
mv: cannot stat '/home/repl/people/agarwal.txt': No such file or directory
mv: cannot stat '/home/repl/people/agarwal.txt': No such file or directory
这条报错,就说明对了,现在路径已经切换了。
How can I view a file’s contents? | Shell
concatenate: cat
。
it will print all the files whose names you give it。
$ cat course.txt
Introduction to the Unix Shell for Data Science
The Unix command line has survived and thrived for almost fifty years
because it lets people to do complex things with just a few
keystrokes. Sometimes called "the duct tape of programming", it helps
users combine existing programs in new ways, automate repetitive
tasks, and run programs on clusters and clouds that may be halfway
around the world. This lesson will introduce its key elements and show
you how to use them efficiently.
How can I view a file’s contents piece by piece? | Shell
感觉有点复杂。
- 一次只看一页:
less
, 空格执行下一页的功能, - 表示退出:
q
- 切换到下一个文件, next:
:n
- 切换到上一个文件, previous:
:p
$ less seasonal/spring.csv seasonal/summer.csv
How can I look at the start of a file? | Shell
$ head seasonal/summer.csv
Date,Tooth
2017-01-11,canine
2017-01-18,wisdom
2017-01-21,bicuspid
2017-02-02,molar
2017-02-27,wisdom
2017-02-27,wisdom
2017-03-07,bicuspid
2017-03-15,wisdom
2017-03-20,canine
反馈文档的前10行: head
How can I type less? | Shell
tab completion,就是很多软件的tab
自动补充功能。
For example, if you type
sea
and press tab, it will fill in the wordseasonal.
If you then typea
and tab, it will complete the path asseasonal/autumn.csv
. If the path is ambiguous, such asseasonal/s
, pressing tab a second time will display a list of possibilities.
How can I control what commands do? | Shell
$ head -n 5 seasonal/winter.csv
Date,Tooth
2017-01-03,bicuspid
2017-01-05,incisor
2017-01-21,wisdom
2017-02-05,molar
head
-n5
可以看前5行,而非默认的前10行。
-
表示flag,之后用的很多。
并且-n
表示行数的变量,后面紧跟着行数。
How can I list everything below a directory? | Shell
recursive1: -R
,类似于树形结构。具体看例子。
How can I list everything below a directory? | Shell
ls -R
后面加不加路径都无所谓。
$ ls -R /home/repl/
/home/repl/:
backup bin course.txt people seasonal
/home/repl/backup:
/home/repl/bin:
/home/repl/people:
agarwal.txt
/home/repl/seasonal:
autumn.csv spring.csv summer.csv winter.csv
$ ls -R
.:
backup bin course.txt people seasonal
./backup:
./bin:
./people:
agarwal.txt
./seasonal:
autumn.csv spring.csv summer.csv winter.csv
How can I get help for a command? | Shell
manual: man
,类似于help
,用法如man head
。
HEAD(1) BSD General Commands Manual HEAD(1)
NAME
head -- display first lines of a file
SYNOPSIS
head [-n count | -c bytes] [file ...]
DESCRIPTION
This filter displays the first count lines or bytes of each of
the specified files, or of the standard input if no files are
specified. If count is omitted it defaults to 10.
If more than a single file is specified, each file is preceded by
a header consisting of the string ``==> XXX <=='' where ``XXX''
is the name of the file.
SEE ALSO
tail(1)
man
automatically invokesless
, so you may need to press spacebar to page through the information.
man
自带less
功能,空格键疯狂按起来。
q
退出。
Use
tail
to display all but the first six lines ofseasonal/spring.csv
.
tail -n +7 seasonal/spring.csv
.
How can I select columns from a file? | Shell
cut -f 2-5,8 -d , values.csv
中
cut
用来选择列,
field: -f
表示指定的列,
2-5,8
表示2到5列,和第8列,这里用的是index = 1
,和Python不同,
-d
表示指定的分割符,比如tab、colons、space等等。
cut -d , -f 1 seasonal/spring.csv
和
cut -d, -f1 seasonal/spring.csv
,其中flag的顺序不重要的。
分析的逻辑。
$ pwd
/home/repl
$ ls
backup bin course.txt people seasonal
$ cd seasonal/
$ pwd
/home/repl/seasonal
$ ls
autumn.csv spring.csv summer.csv winter.csv
$ head spring.csv
Date,Tooth
2017-01-25,wisdom
2017-02-19,canine
2017-02-24,canine
2017-02-28,wisdom
2017-03-04,incisor
2017-03-12,wisdom
2017-03-14,incisor
2017-03-21,molar
2017-04-29,wisdom
$ cut -f 1 -d, spring.csv
Date
2017-01-25
2017-02-19
2017-02-24
2017-02-28
2017-03-04
2017-03-12
2017-03-14
2017-03-21
2017-04-29
2017-05-08
2017-05-20
2017-05-21
2017-05-25
2017-06-04
2017-06-13
2017-06-14
2017-07-10
2017-07-16
2017-07-23
2017-08-13
2017-08-13
2017-08-13
2017-09-07
What can’t cut do? | Shell
但是cut
不能准确分割文本,具体看例子。
Name,Age
"Johel,Ranjit",28
"Sharma,Rupinder",26
用逗号切的时候,会切"Johel,Ranjit"
里面的,
。
cut -f 2 -d , everyone.csv
反馈的结果为
Age
Ranjit"
Rupinder"
How can I repeat commands? | Shell
你可以用键盘上的\(\uparrow\)键。 或者有几种方法,以例子展示。
!
加命令,再执行最近一个执行的指定命令。
history
反馈看执行过的命令列表,按!
和对应数字,再执行对应的指定命令。
Run head
summer.csv
in your home directory (which should fail).
$ head summer.csv
head: cannot open 'summer.csv' for reading: No such file or directory
Change directory to
seasonal
.
$ cd seasonal
Re-run the
head
command using!
followed by the command name. Do not type any spaces between!
and what follows.
$ !head
head summer.csv
Date,Tooth
2017-01-11,canine
2017-01-18,wisdom
2017-01-21,bicuspid
2017-02-02,molar
2017-02-27,wisdom
2017-02-27,wisdom
2017-03-07,bicuspid
2017-03-15,wisdom
2017-03-20,canine
Use
history
to look at what you have done.
$ history
1 head summer.csv
2 cd seasonal
3 head summer.csv
4 history
Re-run head again using
!
followed by a command number. Do not type any spaces between!
and what follows.
$ !1
head summer.csv
Date,Tooth
2017-01-11,canine
2017-01-18,wisdom
2017-01-21,bicuspid
2017-02-02,molar
2017-02-27,wisdom
2017-02-27,wisdom
2017-03-07,bicuspid
2017-03-15,wisdom
2017-03-20,canine
How can I select lines containing particular values? | Shell
- 正则化:
grep
选择改行中包含某个信息,类似于R的str_subset
函数。 如,grep bicuspid seasonal/winter.csv
。
总结下,这节信息量大。
$ grep -c incisor seasonal/autumn.csv seasonal/winter.csv
seasonal/autumn.csv:3
seasonal/winter.csv:6
-c
分别衡量两个文档中incisor
的行数。
并且,count作为一个列,文档名作为index了。
有一个骚操作,当输入完seasonal/
按tab,系统会提示有那些子路径可以选择,真棒。
$ grep -c inciser seasonal/
autumn.csv spring.csv summer.csv winter.csv
$ grep -v -n molar seasonal/spring.csv
1:Date,Tooth
2:2017-01-25,wisdom
3:2017-02-19,canine
4:2017-02-24,canine
5:2017-02-28,wisdom
6:2017-03-04,incisor
7:2017-03-12,wisdom
8:2017-03-14,incisor
10:2017-04-29,wisdom
11:2017-05-08,canine
12:2017-05-20,canine
13:2017-05-21,canine
14:2017-05-25,canine
16:2017-06-13,bicuspid
17:2017-06-14,canine
18:2017-07-10,incisor
19:2017-07-16,bicuspid
20:2017-07-23,bicuspid
21:2017-08-13,bicuspid
22:2017-08-13,incisor
23:2017-08-13,wisdom
这里
-n
: print line numbers for matching lines-v
: invert the match, i.e., only show lines that don’t match
当然,如果不加-n
,也就是index,最后反馈。
$ grep -v molar seasonal/autumn.csv
Date,Tooth
2017-01-05,canine
2017-01-17,wisdom
2017-01-18,canine
2017-02-22,bicuspid
2017-03-10,canine
2017-03-13,canine
2017-04-30,incisor
2017-05-02,canine
2017-05-10,canine
2017-05-19,bicuspid
2017-06-22,wisdom
2017-06-25,canine
2017-07-10,incisor
2017-07-10,wisdom
2017-07-20,incisor
2017-07-21,bicuspid
2017-08-09,canine
2017-08-16,canine
Why isn’t it always safe to treat data as text? | Shell
paste
相当于pandas
的merge
函数。
$ paste seasonal/autumn.csv seasonal/winter.csv
Date,Tooth Date,Tooth
2017-01-05,canine 2017-01-03,bicuspid
2017-01-17,wisdom 2017-01-05,incisor
2017-01-18,canine 2017-01-21,wisdom
2017-02-01,molar 2017-02-05,molar
2017-02-22,bicuspid 2017-02-17,incisor
2017-03-10,canine 2017-02-25,bicuspid
2017-03-13,canine 2017-03-12,incisor
2017-04-30,incisor 2017-03-25,molar
2017-05-02,canine 2017-03-26,incisor
2017-05-10,canine 2017-04-04,canine
2017-05-19,bicuspid 2017-04-18,canine
2017-05-25,molar 2017-04-26,canine
2017-06-22,wisdom 2017-04-26,molar
2017-06-25,canine 2017-04-26,wisdom
2017-07-10,incisor 2017-04-27,canine
2017-07-10,wisdom 2017-05-08,molar
2017-07-20,incisor 2017-05-13,bicuspid
2017-07-21,bicuspid 2017-05-14,wisdom
2017-08-09,canine 2017-06-17,canine
2017-08-16,canine 2017-07-01,incisor
2017-07-17,canine
2017-08-10,incisor
2017-08-11,bicuspid
2017-08-11,wisdom
2017-08-13,canine
真丑。行数不一致。
How can I store a command’s output in a file? | Shell
concatenate: cat
。
it will print all the files whose names you give it。
head -n 5
这个不陌生了。
> top.csv
中>
表示redirect,不属于head
里面的命令。
后面的top.csv
保存一个新文件,以top.csv
命名。
在
head -n 5 sesaonal/winter.csv > top.csv
中。
How can I use one command’s output as the input to another command? | Shell
用head
和tail
一起使用。
head -n 5 sesaonal/winter.csv > top.csv
tail -n 3 top.csv
get lines 3-5.
What’s a better way to use one command’s output as another command’s input? | Shell
两个bug,
太多的中间表,
代码在history
分散的。
用pipe
,思路和R
相似。
|
是cmd
中的命令。
$ cut -f 2 -d, seasonal/summer.csv | grep -v Tooth
canine
wisdom
bicuspid
molar
wisdom
wisdom
bicuspid
wisdom
canine
molar
bicuspid
wisdom
canine
canine
incisor
incisor
canine
incisor
incisor
incisor
canine
canine
bicuspid
canine
好,别忘了逗号在-d
后面。
而且还可以把标题给删除了,真强。
How can I combine many commands? | Shell
开始提高效率了!
$ cut -f 2 -d, seasonal/autumn.csv | grep -v Tooth | head -n 1
canine
How can I count the records in a file? | Shell
- word count:
wc
- number of characters:
-c
- number of words:
-w
- number of lines :
-l
$ grep 2017-07 seasonal/spring.csv
2017-07-10,incisor
2017-07-16,bicuspid
2017-07-23,bicuspid
$ grep 2017-07 seasonal/spring.csv | wc -w
3
这里不需要cut
指定哪个列。
直接grep 2017-07
。
wc -w
是对字数统计,对word。
How can I specify many files with a single command? | Shell
$ head -n 3 seasonal/s*.csv
==> seasonal/spring.csv <==
Date,Tooth
2017-01-25,wisdom
2017-02-19,canine
==> seasonal/summer.csv <==
Date,Tooth
2017-01-11,canine
2017-01-18,wisdom
对两张表使用一个规则。
*
代表任意的东西2。
What other wildcards can I use? | Shell
?
代表一个字符。
[...]
matches any one of the characters inside the square brackets, so201[78].txt
matches2017.txt
or2018.txt
, but not2016.txt
.
[...]
代表了方框内字符的任意一个。
{...}
matches any of the command-separated patterns inside the curly brackets, so{*.txt, *.csv}
matches any file whose name ends with.txt
or.csv
, but not files whose names end with
{...}
不太好解释,看引文吧。
How can I sort lines of text? | Shell
arrange
的功能来了。
cut -f 2 d, seasonal/winter.csv | grep -v Tooth | sort -r
- 排序:
sort
- 正序参数:
-n
- 正序参数:
-r
while -b tells it to ignore leading blanks and -f tells it to fold case (i.e., be case-insensitive).
How can I remove duplicate lines? | Shell
去重: uniq
it only has to keep the most recent unique line in memory.
$ cut -d , -f 2 seasonal/* | grep -v Tooth | sort | uniq -c
15 bicuspid
31 canine
18 incisor
11 molar
17 wisdom
$ cut -d , -f 2 seasonal/* | grep -v Tooth | sort | uniq
bicuspid
canine
incisor
molar
wisdom
-c
就是帮助计数。
Wrapping up | Shell
$ wc seasonal/
autumn.csv spring.csv summer.csv winter.csv
$ wc seasonal/*
21 21 378 seasonal/autumn.csv
24 24 434 seasonal/spring.csv
25 25 454 seasonal/summer.csv
26 26 471 seasonal/winter.csv
96 96 1737 total
$ wc seasonal/* | grep -v total | sort -n | head -n 1
21 21 378 seasonal/autumn.csv
How does the shell store information? | Shell
## # A tibble: 4 x 3
## Variable Purpose Value
## <chr> <chr> <chr>
## 1 HOME User's home directory /home/repl
## 2 PWD Present working directory Same as pwd command
## 3 SHELL Which shell program is being used /bin/bash
## 4 USER User's ID repl
set
可以看全部,但是好乱。
$ set | grep HISTFILESIZE
HISTFILESIZE=2000
可和grep
一起使用。
How can I print a variable’s value? | Shell
echo
类似于Python的print
。
$ echo $OSTYPE
linux-gnu
$
加在变量之前。
OSTYPE
衡量系统类型。
How else does the shell store information? | Shell
$ testing=seasonal/winter.csv
$ head -n 1 testing
就用=
就好了,跟Python一样。
How can I repeat a command many times? | Shell
$ for suffix in gif jpg png; do echo $suffix; done
gif
jpg
png
How can I repeat a command once for each file? | Shell
$ for x in seasonal/*.csv; do echo $x; done
seasonal/autumn.csv
seasonal/spring.csv
seasonal/summer.csv
seasonal/winter.csv
How can I record the names of a set of files? | Shell
$ files=seasonal/*.csv
$ for f in $files; do echo $f; done
seasonal/autumn.csv
seasonal/spring.csv
seasonal/summer.csv
seasonal/winter.csv
files=seasonal/*.csv
for f in files; do echo $f; done
这里只反馈一个单词files
,因为没加$
。
$ for f in seasonal/*.csv; do echo $f; head -n 2 $f | tail -n 1; done
seasonal/autumn.csv
2017-01-05,canine
seasonal/spring.csv
2017-01-25,wisdom
seasonal/summer.csv
2017-01-11,canine
seasonal/winter.csv
2017-01-03,bicuspid
Yes:
echo
produces one line that includes the filename twice, whichtail
then copies.
$ for f in seasonal/*.csv; do echo $f head -n 2 $f | tail -n 1; done
seasonal/autumn.csv head -n 2 seasonal/autumn.csv
seasonal/spring.csv head -n 2 seasonal/spring.csv
seasonal/summer.csv head -n 2 seasonal/summer.csv
seasonal/winter.csv head -n 2 seasonal/winter.csv
How can I run many commands in a single loop? | Shell
$ for file in seasonal/*.csv; do echo $file; grep -h 2017-07 $file; done
seasonal/autumn.csv
2017-07-10,incisor
2017-07-10,wisdom
2017-07-20,incisor
2017-07-21,bicuspid
seasonal/spring.csv
2017-07-10,incisor
2017-07-16,bicuspid
2017-07-23,bicuspid
seasonal/summer.csv
2017-07-25,canine
seasonal/winter.csv
2017-07-01,incisor
2017-07-17,canine
$ for file in seasonal/*.csv; do grep 2017-07 $file; done
2017-07-10,incisor
2017-07-10,wisdom
2017-07-20,incisor
2017-07-21,bicuspid
2017-07-10,incisor
2017-07-16,bicuspid
2017-07-23,bicuspid
2017-07-25,canine
2017-07-01,incisor
2017-07-17,canine
还是有点区别的。
Why shouldn’t I use spaces in filenames? | Shell
mv 'July 2017.csv' '2017 July data.csv'
加上string符号。
How can I edit a file? | Shell
nano names.txt
修改这个文档,或者新建,如果不存在的话。
在Windows上,使用copy
copy NUL name.Rmd
- Ctrl-K: delete a line.
- Ctrl-U: un-delete a line.
- Ctrl-O: save the file (‘O’ stands for ‘output’). 并且再加一个回车。
- Ctrl-X: exit the editor.
先搞shell 终端在家太慢了,完全跟不上速度进行练习。 代码不知道什么情况。 继续吧。 回头再好好搞搞。
How can I record what I just did? | Shell
- Run
history
. - Pipe its output to
tail -n 10
(or however many recent steps you want to save). - Redirect that to a file called something like
figure-5.history
. 这里figure-5.history
是文件名。
This is better than writing things down in a lab notebook because it is guaranteed not to miss any steps. It also illustrates the central idea of the shell: simple tools that produce and consume lines of text can be combined in a wide variety of ways to solve a broad range of problems.
$ cp seasonal/spring.csv seasonal/summer.csv ~/
$ grep -h -v Tooth spring.csv summer.csv
.bash_logout .profile bin/ people/ spring.csv
.bashrc backup/ course.txt seasonal/ summer.csv
$ grep -h -v Tooth spring.csv summer.csv > temp.csv
$ history | tail -n 3 > steps.txt
~/
home directory
How can I save commands to re-run later? | Shell
head -n 1 seasonal/*.csv
存入一个叫做headers.sh
的文档,
运行命令,bash headers.sh
就执行了。
当然你也nano
来修改/创建它。
How can I re-use pipes? | Shell
bash all-dates.sh > dates.out
可以把执行后的output存入一个新的文件。
$ nano teeth.sh
$ bash teeth.sh > teeth.out
$ cat teeth.out
15 bicuspid
31 canine
18 incisor
11 molar
17 wisdom
cat
就是查看数据。
How can I pass filenames to scripts? | Shell
从字面意思理解,就是如何使用一个script,当然里面含有了我们定义好的函数。 如何对某个文档执行函数。
$@
就是个代词。
如果我们有个文档unique-lines.sh
,含有函数
sort $@ | uniq
。
当我们执行
bash unique-lines.sh seasonal/summer.csv
相当于我们执行了,
sort seasonal/summer.csv | uniq
。
下面举例子。
$ bash count-records.sh seasonal/*.csv > num-records.out
How can I process a single argument? | Shell
$1
, $2
类似于$@
。
建立一个文档,
column.sh
含有命令,
cut -d , -f $2 $1
。
然后可以执行这个script,
bash column.sh seasonal/autumn.csv 1
。
当然,这里参数可以换位置的。
How can one shell script do many things? | Shell
就是写两行命令就好,用$@
代替参数就好。
How can I write loops in a shell script? | Shell
# Print the first and last data records of each file.
for filename in $@
do
head -n 2 $filename | tail -n 1
tail -n 1 $filename
done
可以用分号分好或者缩进。
How can I stop a running program? | Shell
Ctrl-C: ^C
,
cancel 终止程序。