Chapter 3: Data Handling using Pandas - II || Informatics Practices (IP) || Class 12th || NCERT CBSE || NOTES IN ENGLISH || 2024-25

 Chapter 3: Data Handling using Pandas - II 


Introduction

  • Pandas is a Python library used for data manipulation, processing, and analysis. Building on previous DataFrame basics, this chapter introduces advanced features like sorting, aggregating, and handling missing values in data.


Descriptive Statistics

  • Descriptive statistics summarize data and give insights into basic properties.

  • Key Statistical Functions:

    • max(): Finds the highest values in columns.

    • min(): Finds the lowest values.

    • sum(): Calculates column-wise totals.

    • count(): Counts non-null values.

    • mean(): Calculates the average.

    • median(): Finds the middle value.

    • mode(): Finds the most frequently occurring values.

    • quantile(): Divides data into quartiles.

    • var(): Finds variance, a measure of data spread.

    • std(): Finds standard deviation, indicating data dispersion.


Data Aggregations

  • Aggregation combines multiple values to return a single output using functions like max(), min(), sum(), count(), std(), and var().

  • Aggregations can be applied to one or more columns, producing summary statistics.


Sorting a DataFrame

  • Sorting arranges data by specified columns, either in ascending or descending order, using sort_values().

  • Syntax: DataFrame.sort_values(by=[column], axis=0, ascending=True).

  • Sorting can be performed on multiple columns, with secondary columns used when primary columns have identical values.


GROUP BY Functions

  • The GROUP BY function splits data based on a criterion and applies functions like sum, mean, and max on each group.

  • Steps in GROUP BY:

    1. Split: Break data into groups based on a criterion.

    2. Apply: Perform operations like sum or count on each group.

    3. Combine: Merge results back into a new DataFrame.


Altering the Index

  • Indexing allows efficient data access and retrieval. A new column can be set as an index for better data organization.

  • reset_index(): Creates a new continuous index.

  • set_index(): Assigns a new column as the index.


Other DataFrame Operations

3.7.1 Reshaping Data

  • Pivot: Reshapes data for clarity. Example: Transforming year-wise sales data into a format with stores as rows and years as columns.

  • Pivot Table: Similar to pivot but handles duplicate entries by applying an aggregate function like sum or mean.


Handling Missing Values

  • Missing values can affect data analysis. Methods to address this include:

    • Dropping Rows: Removes rows with missing data using dropna().

    • Filling Missing Values: Replaces NaNs with meaningful values using fillna(), which can substitute with averages, zeros, or other custom values.




0 comments: