Pandas: Top Interview Questions and Answers

Top 50 Pandas Interview Questions and Answers

1. What is Pandas in Python? 2. What are the key features of Pandas? 3. How do you install Pandas in Python? 4. What is a DataFrame in Pandas? 5. What is a Series in Pandas? 6. How do you create a DataFrame in Pandas? 7. What are the differences between a DataFrame and a Series? 8. How do you handle missing data in Pandas? 9. What is the use of the .head() method in Pandas? 10. How do you select a subset of columns in Pandas? 11. What are some common methods to read data in Pandas? 12. How do you filter rows based on conditions in Pandas? 13. What is the .loc[] method in Pandas? 14. What is the .iloc[] method in Pandas? 15. How do you group data in Pandas? 16. What is the .groupby() function in Pandas used for? 17. What are Pandas pivot tables and how are they used? 18. What is the .apply() function in Pandas? 19. How do you merge two DataFrames in Pandas? 20. What are the different types of joins in Pandas? 21. How do you handle duplicate data in Pandas? 22. What is the .dropna() method in Pandas? 23. How do you reset the index of a DataFrame in Pandas? 24. What is the purpose of the .pivot() method in Pandas? 25. How do you rename columns in a Pandas DataFrame? 26. How do you sort data in a DataFrame? 27. What is the .duplicated() function in Pandas? 28. What is the .drop() method in Pandas? 29. How do you change the data type of a column in Pandas? 30. How do you check for missing values in Pandas? 31. What is the .fillna() method in Pandas? 32. What is the .astype() method in Pandas? 33. How do you concatenate two DataFrames in Pandas? 34. How do you create a new column in a DataFrame based on other columns? 35. How do you filter a DataFrame based on a condition in Pandas? 36. How do you calculate the correlation between two columns in Pandas? 37. What is the difference between .loc[] and .iloc[] in Pandas? 38. How do you convert a column of dates to datetime format in Pandas? 39. How do you concatenate multiple DataFrames along rows and columns in Pandas? 40. What is the purpose of the .rolling() function in Pandas? 41. How do you deal with outliers in Pandas? 42. What is the .map() function in Pandas? 43. How do you perform aggregation operations on data in Pandas? 44. What is the .transform() function in Pandas? 45. How do you change the order of rows in a Pandas DataFrame? 46. How do you handle string data in Pandas? 47. What is the purpose of .str.contains() in Pandas? 48. How do you perform vectorized operations in Pandas? 49. How do you work with categorical data in Pandas? 50. What is the purpose of .merge() in Pandas?

1. What is Pandas in Python?

Pandas is a powerful data analysis and manipulation library in Python. It provides data structures like Series and DataFrame for handling structured data efficiently.

2. What are the key features of Pandas?

Key features of Pandas include data cleaning, reshaping, merging, and aggregation. It supports time series data, is highly optimized for performance, and offers easy-to-use data structures like Series and DataFrame.

3. How do you install Pandas in Python?

You can install Pandas using pip with the following command:

pip install pandas

4. What is a DataFrame in Pandas?

A DataFrame is a two-dimensional, labeled data structure in Pandas. It can hold different types of data (e.g., integers, strings, etc.) in columns.

5. What is a Series in Pandas?

A Series is a one-dimensional labeled array capable of holding any data type. It is similar to a column in a DataFrame.

6. How do you create a DataFrame in Pandas?

You can create a DataFrame by passing a dictionary of lists, NumPy arrays, or another DataFrame. For example:

import pandas as pd
data = {'name': ['Alice', 'Bob'], 'age': [25, 30]}
df = pd.DataFrame(data)

7. What are the differences between a DataFrame and a Series?

A DataFrame is a two-dimensional structure with rows and columns, whereas a Series is a one-dimensional structure. A DataFrame can hold multiple Series (columns).

8. How do you handle missing data in Pandas?

Pandas offers several methods to handle missing data: - .dropna() to remove missing values - .fillna() to fill missing values - .isna() to check for missing values

9. What is the use of the .head() method in Pandas?

The .head() method returns the first 5 rows of a DataFrame by default. It is useful for quickly inspecting the data.

10. How do you select a subset of columns in Pandas?

You can select a subset of columns by passing a list of column names to the DataFrame:

df[['name', 'age']]

11. What are some common methods to read data in Pandas?

Common methods to read data in Pandas include: - pd.read_csv() for reading CSV files - pd.read_excel() for reading Excel files - pd.read_sql() for reading SQL queries

12. How do you filter rows based on conditions in Pandas?

You can filter rows by passing a condition inside the DataFrame. For example:

df[df['age'] > 30]

This filters rows where the 'age' column is greater than 30.

13. What is the .loc[] method in Pandas?

The .loc[] method is used for label-based indexing to select rows and columns. For example:

df.loc[0, 'age']

This selects the value in the first row and 'age' column.

14. What is the .iloc[] method in Pandas?

The .iloc[] method is used for position-based indexing. It is used to select rows and columns based on their integer index:

df.iloc[0, 1]

This selects the first row and second column.

15. How do you group data in Pandas?

You can group data in Pandas using the .groupby() method. For example:

df.groupby('age').mean()

This groups the DataFrame by the 'age' column and calculates the mean for each group.

16. What is the .groupby() function in Pandas used for?

The .groupby() function is used for splitting the data into groups based on some criteria (e.g., column values) and then applying an aggregation function (e.g., sum, mean, etc.) on each group.

17. What are Pandas pivot tables and how are they used?

A pivot table is used to summarize and aggregate data. The .pivot_table() method is used to create a pivot table. For example:

df.pivot_table(values='age', index='name', aggfunc='mean')

This creates a pivot table with 'name' as the index and the average 'age' for each name.

18. What is the .apply() function in Pandas?

The .apply() function is used to apply a function along the axis of a DataFrame or Series. For example:

df['age'].apply(lambda x: x + 1)

This adds 1 to each value in the 'age' column.

19. How do you merge two DataFrames in Pandas?

You can merge two DataFrames using the .merge() method. For example:

pd.merge(df1, df2, on='id')

This merges two DataFrames on the 'id' column.

20. What are the different types of joins in Pandas?

Pandas supports four types of joins: - Inner join: Returns only matching rows. - Left join: Returns all rows from the left DataFrame and matching rows from the right DataFrame. - Right join: Returns all rows from the right DataFrame and matching rows from the left DataFrame. - Outer join: Returns all rows from both DataFrames, filling with NaN for missing values.

21. How do you handle duplicate data in Pandas?

You can remove duplicate rows using the .drop_duplicates() method:

df.drop_duplicates()

To keep the first occurrence of each duplicate, use the keep parameter:

df.drop_duplicates(keep='first')

22. What is the .dropna() method in Pandas?

The .dropna() method is used to remove missing values (NaNs) from a DataFrame or Series. For example:

df.dropna()

This drops rows with missing values.

23. How do you reset the index of a DataFrame in Pandas?

You can reset the index of a DataFrame using the .reset_index() method:

df.reset_index()

This resets the index and creates a new index.

24. What is the purpose of the .pivot() method in Pandas?

The .pivot() method is used to reshape data by converting unique values from one column into multiple columns. For example:

df.pivot(index='name', columns='date', values='sales')

25. How do you rename columns in a Pandas DataFrame?

You can rename columns using the .rename() method. For example:

df.rename(columns={'old_name': 'new_name'}, inplace=True)

26. How do you sort data in a DataFrame?

You can sort data using the .sort_values() method:

df.sort_values(by='age')

This sorts the DataFrame by the 'age' column in ascending order.

27. What is the .duplicated() function in Pandas?

The .duplicated() function returns a boolean Series that indicates whether each row is a duplicate of a previous row.

28. What is the .drop() method in Pandas?

The .drop() method is used to remove rows or columns from a DataFrame. For example:

df.drop('column_name', axis=1)

This removes the column 'column_name' from the DataFrame.

29. How do you change the data type of a column in Pandas?

You can change the data type of a column using the .astype() method:

df['age'] = df['age'].astype('float')

30. How do you check for missing values in Pandas?

You can check for missing values using the .isna() or .isnull() method:

df.isna()

This returns a DataFrame of boolean values indicating whether each cell is missing.

31. What is the .fillna() method in Pandas?

The .fillna() method is used to fill missing values (NaN) in a DataFrame or Series with a specified value. You can fill missing values with a constant, the mean, or other methods.
Example: df['column_name'].fillna(0)

32. What is the .astype() method in Pandas?

The .astype() method is used to cast a pandas object (DataFrame or Series) to a specified data type.
Example: df['column_name'] = df['column_name'].astype(int)

33. How do you concatenate two DataFrames in Pandas?

To concatenate two DataFrames, use pd.concat(). You can concatenate along rows (axis=0) or columns (axis=1).
Example for concatenating along rows: pd.concat([df1, df2], axis=0)
Example for concatenating along columns: pd.concat([df1, df2], axis=1)

34. How do you create a new column in a DataFrame based on other columns?

You can create a new column by performing operations on existing columns.
Example: df['new_column'] = df['col1'] + df['col2']

35. How do you filter a DataFrame based on a condition in Pandas?

You can filter rows based on a condition using boolean indexing.
Example: df[df['column'] > 50]

36. How do you calculate the correlation between two columns in Pandas?

You can calculate the correlation between two columns using the .corr() method.
Example: df['col1'].corr(df['col2'])

37. What is the difference between .loc[] and .iloc[] in Pandas?

.loc[] is label-based indexing, meaning you use row and column labels to select data.
Example: df.loc[0, 'column_name']
.iloc[] is integer-based indexing, meaning you use row and column indices to select data.
Example: df.iloc[0, 1]

38. How do you convert a column of dates to datetime format in Pandas?

You can convert a column to datetime using the pd.to_datetime() function.
Example: df['date_column'] = pd.to_datetime(df['date_column'])

39. How do you concatenate multiple DataFrames along rows and columns in Pandas?

You can concatenate multiple DataFrames using pd.concat(). Specify axis=0 to concatenate along rows and axis=1 to concatenate along columns.
Example: pd.concat([df1, df2, df3], axis=0)

40. What is the purpose of the .rolling() function in Pandas?

The .rolling() function is used to apply window-based functions, such as mean or sum, to a sliding window of a DataFrame or Series.
Example: df['col'].rolling(3).mean() (calculates rolling mean with a window size of 3)

41. How do you deal with outliers in Pandas?

You can handle outliers by: - Removing them using .drop() - Replacing them with a predefined value using .fillna() or .replace() - Using statistical methods like z-scores or IQR (Interquartile Range).

42. What is the .map() function in Pandas?

The .map() function is used to map values of a Series using a dictionary, Series, or function.
Example: df['column_name'] = df['column_name'].map({'A': 1, 'B': 2})

43. How do you perform aggregation operations on data in Pandas?

You can perform aggregation operations like sum, mean, and count using .groupby() combined with aggregation functions.
Example: df.groupby('column_name').sum()

44. What is the .transform() function in Pandas?

The .transform() function is used to perform element-wise transformations of a DataFrame or Series.
Example: df['column_name'].transform(lambda x: x + 1)

45. How do you change the order of rows in a Pandas DataFrame?

You can change the order of rows using .sample() for random reordering or by sorting with .sort_values().
Example: df.sample(frac=1) (random shuffle)
Example: df.sort_values(by='column_name') (sort by column)

46. How do you handle string data in Pandas?

Pandas provides the .str accessor to perform vectorized string operations, such as splitting, replacing, and finding substrings.
Example: df['column'].str.lower() (converts all strings to lowercase)

47. What is the purpose of .str.contains() in Pandas?

The .str.contains() method is used to check if a substring is present in each element of a Series.
Example: df['column'].str.contains('substring')

48. How do you perform vectorized operations in Pandas?

Vectorized operations in Pandas allow you to apply operations to entire columns without using explicit loops. These operations are much faster and more efficient.
Example: df['column'] + 10 (adds 10 to each value in the column)

49. How do you work with categorical data in Pandas?

Categorical data can be represented in Pandas using the category dtype. It helps to optimize memory usage and improve performance when working with large datasets.
Example: df['column_name'] = df['column_name'].astype('category')

50. What is the purpose of .merge() in Pandas?

The .merge() function is used to combine two DataFrames based on common columns or indices, similar to SQL joins (inner, left, right, and outer joins).
Example: pd.merge(df1, df2, on='column_name', how='inner')