Chapter 2: Data Handling Using Pandas - I: || Informatics Practices (IP) || Class 12th || NCERT CBSE || NOTES IN ENGLISH || 2024-25

 Chapter 2: Data Handling Using Pandas - I:


Introduction to Python Libraries

  • Python is widely used in data science and analytics due to its extensive libraries designed for efficient data processing.

  • Primary Libraries for Data Science:

    • NumPy: Used for numerical computations and working with arrays.

    • Pandas: A high-level data manipulation tool, providing data structures like Series and DataFrame.

    • Matplotlib: A visualization library for plotting graphs and charts.

Difference between Pandas and NumPy

  1. Data Types:

    • NumPy arrays are homogeneous, meaning all elements must be of the same data type.

    • Pandas DataFrames can contain multiple data types, allowing for more flexible data handling.

  2. Data Manipulation:

    • Pandas offers higher-level functionality like grouping, merging, and reshaping, which are either limited or unavailable in NumPy.

  3. Tabular Data:

    • Pandas is optimized for data in rows and columns, making it a better choice for handling structured data.

Installing Pandas

Install Pandas using the Python package manager with the command:
python
Copy code
pip install pandas



Series in Pandas

  • A Series is a one-dimensional labeled array that can hold data of any type (integers, floats, strings, etc.). Each element in a Series is associated with a label or index.

Creating a Series

  1. From a Scalar Value:

    • A single value, applied to each index in the Series.

    • Example:
      python
      Copy code
      import pandas as pd
      s = pd.Series(5, index=[0, 1, 2])

  2. From a List or Array:

    • Series can be created from lists, where each element in the list becomes an element in the Series.

    • Example:
      python
      Copy code
      data = [10, 20, 30]
      s = pd.Series(data)

  3. From a Dictionary:

    • The dictionary keys become the index of the Series, and values become the Series data.

    • Example:
      python
      Copy code
      data = {'a': 10, 'b': 20, 'c': 30}
      s = pd.Series(data)

Accessing Elements in a Series

  • Indexing:

    • Use s[index] to access elements by position or label.

    • Example:
      python
      Copy code
      print(s[0])  # Access by position
      print(s['a'])  # Access by label

  • Slicing:

    • Allows retrieval of a subset of elements using start:end.

Example:
python
Copy code
print(s[1:3])  # Returns elements from index 1 to 2


Series Attributes

  1. index: Returns the labels (index) of the Series.

  2. values: Returns the Series values as an array.

  3. size: Number of elements in the Series.

  4. dtype: Data type of the Series elements.

  5. empty: Checks if the Series is empty.

Series Methods

  • head(n): Returns the first n elements.

  • tail(n): Returns the last n elements.

  • count(): Counts non-null values.

  • sum(): Returns the sum of elements.

  • mean(): Calculates the average value.


DataFrame in Pandas

  • A DataFrame is a two-dimensional data structure, similar to a table with rows and columns.

Creating a DataFrame

  1. From a Dictionary of Lists:

    • Keys are column names, and values are lists representing column data.

    • Example:
      python
      Copy code
      data = {'Name': ['John', 'Anna'], 'Age': [25, 28]}
      df = pd.DataFrame(data)

  2. From a List of Dictionaries:

    • Each dictionary represents a row, and keys serve as column names.

    • Example:
      python
      Copy code
      data = [{'Name': 'John', 'Age': 25}, {'Name': 'Anna', 'Age': 28}]
      df = pd.DataFrame(data)

  3. From a NumPy Array:

    • Directly creating DataFrame from arrays with specified column names.

    • Example:
      python
      Copy code
      import numpy as np
      data = np.array([[1, 2], [3, 4]])
      df = pd.DataFrame(data, columns=['A', 'B'])

Operations on DataFrames

  1. Adding Columns:

    • New columns can be added directly by specifying the column name and assigning values.

Example:
python
Copy code
df['Salary'] = [50000, 60000]


  1. Deleting Rows/Columns:

    • Use the drop() function to delete rows or columns by label.

Example:
python
Copy code
df.drop('Age', axis=1, inplace=True)  # Deletes the 'Age' column


  1. Renaming Columns:

    • The rename() method allows renaming of column labels.

Example:
python
Copy code
df.rename(columns={'Name': 'Employee Name'}, inplace=True)


Accessing DataFrame Elements

  1. Label-based Indexing:

    • Access specific columns or rows using labels.

Example:
python
Copy code
df['Name']  # Accesses the 'Name' column


  1. Boolean Indexing:

    • Filter rows based on conditions.

Example:
python
Copy code
df[df['Age'] > 25]  # Rows where Age > 25


  1. Slicing:

    • Use slicing for subsets of rows and columns.

Example:
python
Copy code
df.loc[0:1, ['Name', 'Age']]  # Rows 0 to 1, only 'Name' and 'Age' columns


Joining, Merging, and Concatenation

  1. Appending Data:

    • Use append() to add rows from one DataFrame to another.

Example:
python
Copy code
df1.append(df2, ignore_index=True)


  1. Merging:

    • Combines data from different DataFrames based on common columns or indexes.

  2. Concatenation:

    • Joins multiple DataFrames along a particular axis (row-wise or column-wise).

DataFrame Attributes

  1. index: Lists row labels.

  2. columns: Lists column labels.

  3. dtypes: Data types of each column.

  4. shape: Returns the DataFrame’s dimensions.

  5. values: Returns data in the DataFrame as a NumPy array.


Importing and Exporting Data between CSV Files and DataFrames

  • Importing Data:

    • read_csv(): Reads data from a CSV file into a DataFrame.

Example:
python
Copy code
df = pd.read_csv('data.csv')


  • Exporting Data:

    • to_csv(): Exports DataFrame contents to a CSV file.

Example:
python
Copy code
df.to_csv('output.csv', index=False)



Pandas Series vs NumPy ndarray

  • Series:

    • Can contain elements of different types and have non-numeric indexes.

    • Allows automatic alignment by index labels, which is useful for data manipulation.

  • ndarray:

    • A NumPy array with fixed-size elements of the same type.

    • Optimized for mathematical operations but lacks the flexible indexing available in Series.



0 comments: