How to Select Multiple Columns in a Pandas Dataframe

Avatar

By squashlabs, Last Updated: October 17, 2023

How to Select Multiple Columns in a Pandas Dataframe

To select multiple columns in a pandas dataframe, you can use various techniques and methods provided by the pandas library in Python. In this answer, we will explore some of the commonly used methods for selecting multiple columns in a pandas dataframe.

Method 1: Using Bracket Notation

One of the simplest and most commonly used methods to select multiple columns in a pandas dataframe is by using bracket notation. You can pass a list of column names inside the brackets to select those specific columns.

Here’s an example:

import pandas as pd

# Create a sample dataframe
data = {'Name': ['John', 'Jane', 'Mike', 'Emily'],
        'Age': [25, 30, 35, 40],
        'City': ['New York', 'London', 'Paris', 'Tokyo'],
        'Salary': [50000, 60000, 70000, 80000]}

df = pd.DataFrame(data)

# Select multiple columns using bracket notation
selected_columns = df[['Name', 'Age', 'Salary']]

print(selected_columns)

Output:

   Name  Age  Salary
0  John   25   50000
1  Jane   30   60000
2  Mike   35   70000
3  Emily  40   80000

In the above example, we created a dataframe with four columns: ‘Name’, ‘Age’, ‘City’, and ‘Salary’. We then used the bracket notation to select the ‘Name’, ‘Age’, and ‘Salary’ columns. The resulting dataframe selected_columns contains only those selected columns.

Related Article: How To Create Pandas Dataframe From Variables - Valueerror

Method 2: Using the loc[] method

Another method to select multiple columns in a pandas dataframe is by using the loc[] method. The loc[] method allows you to select rows and columns based on labels.

Here’s an example:

import pandas as pd

# Create a sample dataframe
data = {'Name': ['John', 'Jane', 'Mike', 'Emily'],
        'Age': [25, 30, 35, 40],
        'City': ['New York', 'London', 'Paris', 'Tokyo'],
        'Salary': [50000, 60000, 70000, 80000]}

df = pd.DataFrame(data)

# Select multiple columns using loc[]
selected_columns = df.loc[:, ['Name', 'Age', 'Salary']]

print(selected_columns)

Output:

   Name  Age  Salary
0  John   25   50000
1  Jane   30   60000
2  Mike   35   70000
3  Emily  40   80000

In the above example, we used the loc[] method with the : operator to select all rows and the list ['Name', 'Age', 'Salary'] to select the desired columns. The resulting dataframe selected_columns contains only those selected columns.

Method 3: Using the iloc[] method

The iloc[] method is similar to the loc[] method, but instead of using labels, it uses integer-based indexing to select rows and columns. You can use the integer-based column indices to select multiple columns in a pandas dataframe.

Here’s an example:

import pandas as pd

# Create a sample dataframe
data = {'Name': ['John', 'Jane', 'Mike', 'Emily'],
        'Age': [25, 30, 35, 40],
        'City': ['New York', 'London', 'Paris', 'Tokyo'],
        'Salary': [50000, 60000, 70000, 80000]}

df = pd.DataFrame(data)

# Select multiple columns using iloc[]
selected_columns = df.iloc[:, [0, 1, 3]]

print(selected_columns)

Output:

   Name  Age  Salary
0  John   25   50000
1  Jane   30   60000
2  Mike   35   70000
3  Emily  40   80000

In the above example, we used the iloc[] method with the : operator to select all rows and the list [0, 1, 3] to select the columns at positions 0, 1, and 3. The resulting dataframe selected_columns contains only those selected columns.

Best Practices and Additional Tips

– When selecting multiple columns using the bracket notation or the loc[] method, make sure to pass the column names as a list.
– The order of the columns in the selected dataframe will be the same as the order of the column names in the list.
– If you want to select consecutive columns, you can use the : operator with the column indices in the iloc[] method. For example, df.iloc[:, 0:3] will select columns 0, 1, and 2.
– If you want to select all columns except a few, you can use the drop() method. For example, df.drop(['Column1', 'Column2'], axis=1) will drop columns ‘Column1’ and ‘Column2’ from the dataframe.

These are some of the commonly used methods for selecting multiple columns in a pandas dataframe. You can choose the method that best suits your needs and coding style.

Related Article: How to Sort a Pandas Dataframe by One Column in Python

More Articles from the How to do Data Analysis with Python & Pandas series:

How To Reset Index In A Pandas Dataframe

Resetting the index in a Pandas dataframe using Python is a process. This article provides two methods for resetting the index: using the reset_index() method and using... read more

How to Create and Fill an Empty Pandas DataFrame in Python

Creating an empty Pandas DataFrame in Python is a common task for data analysis and manipulation. This article will guide you through the process of creating an empty... read more

How to Drop All Duplicate Rows in Python Pandas

Eliminating duplicate rows in Python Pandas is a common task that can be easily accomplished using the drop_duplicates() method. By following a specific approach, you... read more

Fixing ‘Dataframe Constructor Not Properly Called’ in Python

"Guide on resolving 'Dataframe Constructor Not Properly Called' error in Python. This article provides step-by-step instructions to fix the error and get your DataFrame... read more

How To Handle Ambiguous Truth Value In Python Series

Learn how to handle ambiguous truth value in Python series using a.empty, a.bool(), a.item(), a.any() or a.all(). This article covers background information and specific... read more