What is AWK Command in Linux? Explain with Examples
Introduction
AWK is a highly valuable tool for Linux users and system administrators, enabling quick data manipulation and analysis from the command line or within scripts. The tool reads input line by line, dividing each line into fields based on a delimiter, such as spaces or tabs, and allows for actions or manipulations on these fields using patterns and actions.
In this article, we’ll explore this powerful text processing tool in Linux, which enables the manipulation and analysis of text data in files or streams. The main focus of this piece is the awk command in Linux, which is used for text processing.
AWK Command Syntax
Let’s have a look at the basic syntax of the awk command in Linux below:
awk 'pattern { action }' filename
The term “pattern” is like a filter inside the awk command. This term enables performance by awk only when the pattern matches with something in line. If this pattern is skipped, the command will act on each line. Moving on, the term “{action}” is like the instruction given to the command when the pattern is matched completely. The range of action can include calculation, printing, and more. The last term in the syntax, “filename,” is used to instruct the command to process the specified file.
How Does the AWK Command Work?
AWK is like a detective that reads lines of text one by one and searches for specific patterns that you instruct it to find. Once it finds something that matches, it performs the actions you have specified. For example, if you want to find lines with the word “error” and display only those lines, AWK can do that for you. It can even add up numbers in a column if you need it to.
This tool is particularly useful when you need to analyze messy text files, such as logs, and extract only the relevant information. It’s like having an intelligent assistant that can sort through a mountain of words and pinpoint exactly what you’re searching for. This makes working with large amounts of data a lot easier and more comprehensible.
AWK Operations
As a user, you can easily perform a variety of operations on a specified file through the awk command in Linux. Take a look at some of them below.
In AWK, printing text or fields from input is the most basic action. It can perform mathematical operations like addition, subtraction, multiplication, etc., on numbers found in the data. AWK executes actions based on specific conditions, such as the appearance of a certain word or meeting a specific criterion for a number. Additionally, it can extract specific parts of data, rearrange it, and format it differently.
AWK Statements
The awk command helps to build small programs, which are composed of statements. These programs are designed to instruct the system on how to process data or text within a particular file. These statements act as filters, thereby allowing for faster processing. Control flow statements, which are similar to instructions that guide a program on what to do based on specific conditions, are used in this command. In simpler terms, they direct how a program moves through different parts of its code. For a clearer understanding, see the Linux awk command example below.
awk -F ',' '{if($2==$3){print $1","$2","$3} else {print "No Duplicates"}}' answers.txt
To execute the above command, you need to provide input from the file named answers.txt. This file should have multiple lines, each containing three fields separated by commas. The command verifies whether the second and third fields of each line are identical. If they match, the command prints all three fields separated by commas. On the other hand, if they do not match, the command prints “No Duplicates” for that particular line.
AWK Patterns
AWK patterns act as filters or conditions that determine which lines of data or text the AWK command should operate on. These patterns essentially specify the criteria for selecting specific portions of the input for further action. The below patterns can be used to refine the search and editing on this command.
Regular Expression Patterns
This type of pattern is very flexible, which means the scope of editing and searching is increased by using a regular expression pattern.
Relational Expression Patterns
A relational expression is a perfect use case when it comes to specific comparison with values like equal to, greater than, etc.
Range Patterns
If you want to focus on a particular line number, then a range pattern can be the perfect solution.
Special Expression Patterns
Special expressions include “begin,” “end,” and more for peculiar pattern matches. The “BEGIN” pattern executes an action before processing starts, and the END pattern executes an action after processing ends.
Combining Patterns
Sometimes, you need to have a more complex pattern match, which may include a variety of pattern combinations. A combining pattern does that for you by using “&&, !!, ll.”
AWK Variables
The variable part of this command allows data organization, manipulation, and analysis at a much deeper level by breaking down text into smaller segments. The built-in field variables of the Linux awk command permit the user to work with individual parts of the text easily. Take a look at the following variables to understand their role in the command.
- $0: This variable is useful to represent the entire line.
- $1, $2, $3: In AWK, we can think of the data that we are processing as a cabinet with drawers. Each $ symbol, followed by a number, represents a different drawer (field) where AWK stores a specific part of the line. For instance, $1 refers to the first drawer, which stores the first piece of information; $2 points to the second drawer, where the next bit is kept; and so on.
AWK Actions
Up until now, we have discussed how to select specific segments of data using variables and patterns. In this section, we will explore how awk actions can further enhance identification and matching by executing the required actions. These functionalities enable precise manipulation, analysis, and formatting of information within a file or data stream. For instance, if you want to print the second and fourth fields of each line from a file using AWK, you can achieve this with ease.
awk '{print $2, $4}' file.txt
When using AWK, ‘{print $2, $4}’ will display the second and fourth fields. This means it will show the contents of these fields for each line that AWK processes. The instruction is straightforward and concise, making it easy to understand.
Also Read: 50+ Linux Commands with Screenshots (Download PDF)
Conclusion
The AWK command is a powerful tool for text processing in Linux. It allows users to manipulate and analyze data quickly and effectively. The command works by breaking down input text into fields, which enables specific actions to be performed based on patterns and conditions. The syntax of the AWK command in Linux includes patterns, actions, and filenames, which makes data handling and manipulation precise. Patterns act as filters, specifying criteria for selecting data, while actions dictate operations to perform on the selected data. With a range of built-in variables and pattern types, AWK offers a versatile approach to text manipulation, from simple printing to complex data analysis. Its ability to process and transform text in a structured and controlled manner makes it invaluable for tasks involving data extraction, formatting, and analysis within the Linux environment.