To select multiple columns in a pandas dataframe, you can use various techniques and methods provided by the pandas library in Python. In this answer, we will explore some of the commonly used methods for selecting multiple columns in a pandas dataframe.
Method 1: Using Bracket Notation
One of the simplest and most commonly used methods to select multiple columns in a pandas dataframe is by using bracket notation. You can pass a list of column names inside the brackets to select those specific columns.
Here’s an example:
import pandas as pd # Create a sample dataframe data = {'Name': ['John', 'Jane', 'Mike', 'Emily'], 'Age': [25, 30, 35, 40], 'City': ['New York', 'London', 'Paris', 'Tokyo'], 'Salary': [50000, 60000, 70000, 80000]} df = pd.DataFrame(data) # Select multiple columns using bracket notation selected_columns = df[['Name', 'Age', 'Salary']] print(selected_columns)
Output:
Name Age Salary 0 John 25 50000 1 Jane 30 60000 2 Mike 35 70000 3 Emily 40 80000
In the above example, we created a dataframe with four columns: ‘Name’, ‘Age’, ‘City’, and ‘Salary’. We then used the bracket notation to select the ‘Name’, ‘Age’, and ‘Salary’ columns. The resulting dataframe selected_columns
contains only those selected columns.
Related Article: How To Create Pandas Dataframe From Variables - Valueerror
Method 2: Using the loc[] method
Another method to select multiple columns in a pandas dataframe is by using the loc[]
method. The loc[]
method allows you to select rows and columns based on labels.
Here’s an example:
import pandas as pd # Create a sample dataframe data = {'Name': ['John', 'Jane', 'Mike', 'Emily'], 'Age': [25, 30, 35, 40], 'City': ['New York', 'London', 'Paris', 'Tokyo'], 'Salary': [50000, 60000, 70000, 80000]} df = pd.DataFrame(data) # Select multiple columns using loc[] selected_columns = df.loc[:, ['Name', 'Age', 'Salary']] print(selected_columns)
Output:
Name Age Salary 0 John 25 50000 1 Jane 30 60000 2 Mike 35 70000 3 Emily 40 80000
In the above example, we used the loc[]
method with the :
operator to select all rows and the list ['Name', 'Age', 'Salary']
to select the desired columns. The resulting dataframe selected_columns
contains only those selected columns.
Method 3: Using the iloc[] method
The iloc[]
method is similar to the loc[]
method, but instead of using labels, it uses integer-based indexing to select rows and columns. You can use the integer-based column indices to select multiple columns in a pandas dataframe.
Here’s an example:
import pandas as pd # Create a sample dataframe data = {'Name': ['John', 'Jane', 'Mike', 'Emily'], 'Age': [25, 30, 35, 40], 'City': ['New York', 'London', 'Paris', 'Tokyo'], 'Salary': [50000, 60000, 70000, 80000]} df = pd.DataFrame(data) # Select multiple columns using iloc[] selected_columns = df.iloc[:, [0, 1, 3]] print(selected_columns)
Output:
Name Age Salary 0 John 25 50000 1 Jane 30 60000 2 Mike 35 70000 3 Emily 40 80000
In the above example, we used the iloc[]
method with the :
operator to select all rows and the list [0, 1, 3]
to select the columns at positions 0, 1, and 3. The resulting dataframe selected_columns
contains only those selected columns.
Best Practices and Additional Tips
– When selecting multiple columns using the bracket notation or the loc[]
method, make sure to pass the column names as a list.
– The order of the columns in the selected dataframe will be the same as the order of the column names in the list.
– If you want to select consecutive columns, you can use the :
operator with the column indices in the iloc[]
method. For example, df.iloc[:, 0:3]
will select columns 0, 1, and 2.
– If you want to select all columns except a few, you can use the drop()
method. For example, df.drop(['Column1', 'Column2'], axis=1)
will drop columns ‘Column1’ and ‘Column2’ from the dataframe.
These are some of the commonly used methods for selecting multiple columns in a pandas dataframe. You can choose the method that best suits your needs and coding style.
Related Article: How to Sort a Pandas Dataframe by One Column in Python