Bash Shell: Working with large text files

When working with multi-Gb text files I use these commands:

1. Get the first line which often contains column names and dump it into a small text file



uki $ head -n 1 source_file_name.txt > header_line.txt



2. Get first record after the headline and dump it into a small text file

uki $ head -n 2 source_file_name.txt | tail -1 > first_data_line.txt 

3. Finally, when developing using large files, I take SAMPLE 1000 records (out of millions) to speed up the dev time, I use 1000 because that is default SELECT * number of records in MySQL, but you can use any other if you want, but I would not go too small as you many not catch memory leak errors. The random number 2500 in this example I would change occasionally to pull different sample. You do want to sample your data in different places.


uki $ head -n 2500 source_file_name.txt | tail -1000 > sample_1000_records.txt 

Resulting files:



As an Amazon Associate I earn from qualifying purchases.

No comments:

Post a Comment

Post Scriptum

The views in this article are mine and do not reflect those of my employer.
I am preparing to cancel the subscription to the e-mail newsletter that sends my articles.
Follow me on:
X.com (Twitter)
LinkedIn
Google Scholar

Popular Recent Posts

Most Popular Articles

apt quotation..