📖 Guide
Pandas — Complete Reference
Pandas cheat sheet for data manipulation, analysis, and transformation in Python.
86 commands across 11 categories
Creating DataFramesReading/Writing DataSelection & IndexingFilteringSortingGrouping & AggregationMerging & JoiningMissing DataString MethodsDate/TimeCommon Operations
Creating DataFrames
| Command | Description |
|---|---|
pd.DataFrame({'a': [1,2], 'b': [3,4]}) | Create DataFrame from dictionary |
pd.DataFrame(np.array([[1,2],[3,4]]), columns=['a','b']) | Create from NumPy array with column names |
pd.Series([1, 2, 3], name='x') | Create a named Series |
pd.DataFrame(list_of_dicts) | Create from list of dictionaries |
df.copy() | Create a deep copy of a DataFrame |
pd.DataFrame(index=range(5), columns=['a','b']) | Create empty DataFrame with shape |
Reading/Writing Data
| Command | Description |
|---|---|
pd.read_csv('file.csv') | Read CSV file into DataFrame |
pd.read_csv('file.csv', sep=';', encoding='utf-8') | Read CSV with custom separator and encoding |
df.to_csv('out.csv', index=False) | Write DataFrame to CSV without row index |
pd.read_excel('file.xlsx', sheet_name='Sheet1') | Read specific Excel sheet |
df.to_excel('out.xlsx', index=False) | Write DataFrame to Excel |
pd.read_json('file.json') | Read JSON file into DataFrame |
df.to_json('out.json', orient='records') | Write as JSON array of objects |
pd.read_sql('SELECT * FROM t', conn) | Read from SQL database connection |
Selection & Indexing
| Command | Description |
|---|---|
df['col'] | Select single column (returns Series) |
df[['col1', 'col2']] | Select multiple columns (returns DataFrame) |
df.loc[0] | Select row by label/index |
df.loc[0:5, 'col1':'col3'] | Select rows and columns by label (inclusive) |
df.iloc[0] | Select row by integer position |
df.iloc[0:5, 0:3] | Select rows and columns by integer position |
df.at[0, 'col'] | Fast access to single scalar value by label |
df.head(10) / df.tail(10) | First or last 10 rows |
df.sample(5) | Random sample of 5 rows |
Filtering
| Command | Description |
|---|---|
df[df['col'] > 5] | Filter rows where column value exceeds 5 |
df[(df['a'] > 1) & (df['b'] < 10)] | Multiple conditions with & (and) |
df[(df['a'] > 1) | (df['b'] < 10)] | Multiple conditions with | (or) |
df[df['col'].isin(['x', 'y'])] | Filter rows where column is in list |
df[df['col'].str.contains('pattern')] | Filter by string pattern |
df[df['col'].between(10, 20)] | Filter values in range (inclusive) |
df.query('col > 5 and col2 == "x"') | Filter using query string syntax |
df[~df['col'].isna()] | Filter out rows with NaN in column |
Sorting
| Command | Description |
|---|---|
df.sort_values('col') | Sort by column ascending |
df.sort_values('col', ascending=False) | Sort by column descending |
df.sort_values(['a', 'b'], ascending=[True, False]) | Sort by multiple columns with mixed order |
df.sort_index() | Sort by row index |
df.nlargest(5, 'col') | Top 5 rows by column value |
df.nsmallest(5, 'col') | Bottom 5 rows by column value |
df.rank() | Rank values (1 = smallest) |
Grouping & Aggregation
| Command | Description |
|---|---|
df.groupby('col').sum() | Group by column and sum all numeric columns |
df.groupby('col')['val'].mean() | Group by column, compute mean of specific column |
df.groupby('col').agg(['mean', 'sum', 'count']) | Multiple aggregations at once |
df.groupby('col').agg({'a': 'sum', 'b': 'mean'}) | Different aggregations per column |
df.groupby(['a', 'b']).size() | Group by multiple columns, count rows |
df.pivot_table(values='v', index='a', columns='b', aggfunc='mean') | Create pivot table |
df.value_counts('col') | Count occurrences of each value |
df.groupby('col').apply(lambda g: g.head(2)) | Apply custom function per group |
Merging & Joining
| Command | Description |
|---|---|
pd.merge(df1, df2, on='key') | Inner join on common column |
pd.merge(df1, df2, on='key', how='left') | Left join — keep all rows from df1 |
pd.merge(df1, df2, left_on='a', right_on='b') | Join on differently named columns |
pd.concat([df1, df2]) | Stack DataFrames vertically (append rows) |
pd.concat([df1, df2], axis=1) | Concatenate side by side (add columns) |
df1.join(df2, on='key', how='inner') | Join using index or key column |
pd.merge(df1, df2, on='key', how='outer') | Full outer join — keep all rows from both |
Missing Data
| Command | Description |
|---|---|
df.isna().sum() | Count NaN values per column |
df.dropna() | Drop rows with any NaN |
df.dropna(subset=['col']) | Drop rows where specific column is NaN |
df.fillna(0) | Replace all NaN with 0 |
df['col'].fillna(df['col'].mean()) | Fill NaN with column mean |
df.ffill() | Forward-fill: propagate last valid value |
df.interpolate() | Fill NaN by linear interpolation |
df.replace([np.inf, -np.inf], np.nan) | Replace infinity with NaN |
String Methods
| Command | Description |
|---|---|
df['col'].str.lower() | Convert to lowercase |
df['col'].str.upper() | Convert to uppercase |
df['col'].str.strip() | Remove leading/trailing whitespace |
df['col'].str.replace('old', 'new') | Replace substring |
df['col'].str.contains('pattern', regex=True) | Check if string contains pattern |
df['col'].str.split('-', expand=True) | Split into multiple columns |
df['col'].str.len() | Length of each string |
df['col'].str.extract(r'(\d+)') | Extract first regex group match |
Date/Time
| Command | Description |
|---|---|
pd.to_datetime(df['col']) | Convert column to datetime |
df['col'].dt.year | Extract year from datetime column |
df['col'].dt.month | Extract month |
df['col'].dt.day_name() | Get day name (Monday, Tuesday, ...) |
df['col'].dt.date | Extract date part (without time) |
pd.date_range('2024-01-01', periods=30, freq='D') | Generate date range |
df.resample('M', on='date').sum() | Resample time series by month |
(df['end'] - df['start']).dt.days | Calculate difference in days between two date columns |
Common Operations
| Command | Description |
|---|---|
df.describe() | Summary statistics for numeric columns |
df.info() | Column types, non-null counts, memory usage |
df.rename(columns={'old': 'new'}) | Rename columns |
df.drop(columns=['col']) | Drop columns |
df.drop_duplicates() | Remove duplicate rows |
df.apply(lambda x: x * 2) | Apply function column-wise (or row-wise with axis=1) |
df['col'].map({'a': 1, 'b': 2}) | Map values using dictionary |
df.reset_index(drop=True) | Reset index to 0, 1, 2, ... |
df.set_index('col') | Set column as index |
📖 Free, searchable command reference. Bookmark this page for quick access.