Categories We Write About

Batch split large CSVs

If you have large CSV files and need to split them into smaller batches, you can do it using a variety of methods depending on the tools or programming languages you’re comfortable with. Here are some common approaches:

1. Using Python (Pandas)

If you prefer to use Python, the pandas library makes this task quite straightforward. Here’s how you can split a large CSV into smaller ones:

Code Example:

python
import pandas as pd # Load the large CSV df = pd.read_csv('large_file.csv') # Specify the number of rows per split batch_size = 10000 # Split the DataFrame and save them as separate CSV files for i in range(0, len(df), batch_size): batch_df = df.iloc[i:i + batch_size] batch_df.to_csv(f'batch_{i//batch_size + 1}.csv', index=False)

In this script:

  • batch_size controls how many rows each smaller CSV file will have.

  • The range function ensures the DataFrame is split in batches, and each batch is saved as a new CSV file (batch_1.csv, batch_2.csv, etc.).

Notes:

  • If you don’t have pandas installed, you can install it with:
    pip install pandas

  • This method works well for very large CSVs and is highly customizable.


2. Using Unix/Linux Command Line (split Command)

If you’re working on a Unix-like OS (e.g., Linux, macOS), you can use the split command directly from the command line to split large CSVs.

Example Command:

bash
split -l 10000 large_file.csv batch_
  • The -l 10000 option specifies that each split file should contain 10,000 lines.

  • batch_ is the prefix for the output files (e.g., batch_aa, batch_ab, etc.).

Notes:

  • This is very fast and requires no coding knowledge.

  • Make sure the CSV doesn’t have a header row or you might need additional handling to ensure the header is included in each split.


3. Using Excel (for smaller files)

If the file isn’t too large (Excel can handle files up to 1 million rows), you can open the CSV in Excel, and then manually split it into multiple sheets. This method isn’t suitable for massive files but works well for more manageable sizes.

4. Using R

If you prefer using R, the following approach can split large CSV files:

Code Example:

r
library(data.table) # Load the large CSV file data <- fread("large_file.csv") # Specify the number of rows per split batch_size <- 10000 # Split the data and write to separate files for (i in seq(1, nrow(data), by=batch_size)) { write.csv(data[i:min(i+batch_size-1, nrow(data)), ], paste0("batch_", (i-1)%/%batch_size + 1, ".csv"), row.names = FALSE) }

In this script:

  • fread from the data.table package is used for fast loading of large CSV files.

  • Similar to the Python approach, it writes batches of rows into new CSV files.


5. Using PowerShell (Windows)

For Windows users, PowerShell can be a handy tool to split CSV files.

Example PowerShell Command:

powershell
$batch_size = 10000 $counter = 1 Import-Csv "large_file.csv" | ForEach-Object -Begin { $batch = @() } -Process { $batch += $_ if ($batch.Count -ge $batch_size) { $batch | Export-Csv "batch_$counter.csv" -NoTypeInformation $counter++ $batch = @() } } -End { if ($batch.Count -gt 0) { $batch | Export-Csv "batch_$counter.csv" -NoTypeInformation } }

This PowerShell script:

  • Reads the large CSV file with Import-Csv.

  • Collects the rows into batches, and once the batch reaches the specified size, it writes them to a new CSV file.


6. Using Online Tools

If your CSV file is not extremely large (less than a few MBs), some online tools can split CSV files:

These tools can be convenient for quick and small tasks but aren’t recommended for files that are too large due to upload limits.


Conclusion

For large CSV files, the Python (Pandas) method or the Unix split command are typically the most efficient, especially when dealing with massive data. If you’re comfortable with coding, they offer flexibility and control over how you want to split the files. For quick, non-technical approaches, PowerShell and online tools are good alternatives.

Share This Page:

Enter your email below to join The Palos Publishing Company Email List

We respect your email privacy

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Categories We Write About