用 grep 搭配正規表示式擷取字串

Posted on Oct 8, 2021 in Unix-like 命令列教學 by Amo Chen ‐ 2 min read

覺得我們的內容實用嗎？ MyApollo 電子報讀者募集中！歡迎訂閱電子報!

grep 是長春實用的指令，每當需要從大量日誌(log)/資料中找出含特定字串的資料行時， grep 無疑是你的好幫手。

不過 grep 會將符合條件的資料行整行列出，因此如果只想要擷取符合的字串，就需要結合正規表示式(regular expression)中的幾個方法。

本文環境

Ubuntu
grep 3.1

測試資料

以下為本文所使用的測試資料，檔名為 test.txt, 其檔案內容如下：

abc [123] abc
def [def] def
ghi [789] ghi

擷取字串

以前述測試資料為例，假設我們想撈出中括號內為數字的資料行的話，我們可能會以正規表示式 \[[0-9]*\] 進行查找，例如：

$ grep "\[[0-9]*\]" test.txt
abc [123] abc
ghi [789] ghi

上述執行結果可以看到僅有中括號內為數字的資料行被列出。

但如果想進一步只顯示中括號加數字的部分該怎麼辦？可以加上參數 -o 僅列出符合的部分即可：

$ grep -o "\[[0-9]*\]" test.txt
[123]
[789]

-o , --only-matching Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.

假設要更進一步只顯示中括號內的數字呢？

這時就得靠正規表示式(regular expression)了！

在開始之前必須先了解 grep 共支援 3 種正規表示式的語法：

grep understands three different versions of regular expression syntax: “basic” (BRE), “extended” (ERE) and “perl” (PCRE).

3 種語法各有差異，本文使用 PCRE 作為示範，因此後續範例的 grep -P 代表我們使用 PCRE 版的正規表示式。

PCRE 提供 (?= 語法，讓使用者可以向右比對特定字串，但是不會讓比對結果包含該字串，比方 \[[0-9]*\] 可以找出中括號包含數字的字串，例如 [123]，將右括號的部分 \] 改為 (?=]) 後，一樣可以找出中括號包含數字的字串，但是最終比對結果不會包含右括號，實際執行以下指令將會更加清楚：

Lookahead assertions start with (?= for positive assertions and (?! for negative assertions. For example, \w+(?=;) matches a word followed by a semicolon, but does not include the semicolon in the match

$ grep -Po "\[[0-9]*(?=])" test.txt
[123
[789

上述指令執行結果可以發現右括號被移掉了。

接著，來移除左括號吧！

PCRE 正規表示式提供 \K 語法，讓正規表示式的比對引擎(engine)能夠更改比對開始的位置，例如 \[\K 代表比對到左括號時，將左括號之後（不包含左括號）的字串放入比對結果，從結果來看就像是移除了左括號一樣，實際執行以下指令將會更加清楚：

\K tells the engine to pretend that the match attempt started at this position.

$ grep -Po "\[\K[0-9]*\]" test.txt
123]
789]

上述指令執行結果可以發現左括號被移掉了。

最後，只要結合前述 2 種技巧，就能夠完成擷取字串的效果：

$ grep -Po "\[\K[0-9]*(?=])" test.txt
123
789

以上就是如何單靠 grep 指令結合正規表示式擷取字串的方法！

Happy coding!

References

https://stackoverflow.com/questions/33573920/what-does-k-mean-in-this-regex

https://stackoverflow.com/questions/16675179/how-to-use-sed-to-extract-substring

https://www.pcre.org/original/doc/html/pcrepattern.html

覺得我們的內容實用嗎？ MyApollo 電子報讀者募集中！歡迎訂閱電子報!

grep regexp re regular expression command

用 grep 搭配正規表示式擷取字串

本文環境

測試資料

擷取字串

References

對抗久坐職業傷害

贊助我們的創作

用 grep 搭配正規表示式擷取字串

本文環境 #

測試資料 #

擷取字串 #

References #

對抗久坐職業傷害

贊助我們的創作

你可能也會感興趣的文章

jq 實戰教學

3 個簡單實用的 sed 指令

超實用 parallel 指令教學

本文環境

測試資料

擷取字串

References