Showing posts with label Bash. Show all posts
Showing posts with label Bash. Show all posts

A very BIG ML dataset un-TAR GZIP command

I have learned that none of my GUI Mac programs were able to expand the 13 GB dataset, however, the command line had no problem with it.


$ tar xvzf BIG_DATASET_MANY_THOUSANDS_FOLDERS.tar.gz

It would be great is it was this simple!

The command has failed as I run out of 41 GB of free disk space before I was able to expand it.

Alternatively, I considered going one directory at the time,

$ tar xvfz BIG_DATASET_MANY_THOUSANDS_FOLDERS.tar.gz /directory_path


with a script that traverses the directories. This way I can keep track which directories were correctly expanded.

At this point, I ended up with multiple directories on various disks, a directory merging tool is very useful:

# parameters:
# -a --archive; look at everything recursively
# -i; --itemize-changes; print update about each file
# -h; --human-readable
# -W; --whole-file; avoid file deltas
# --progress; show progress in terminal
# --log-file=XYZ.log; log the progress to file, this might be useful when resuming
$ rsync -aW source_directory/ destination_directory/


References:

  • https://www.thegeekstuff.com/2010/04/unix-tar-command-examples/
  • https://medium.com/@sethgoldin/a-gentle-introduction-to-rsync-a-free-powerful-tool-for-media-ingest-86761ca29c34









Windows 10 UBuntu bash

My company uses mostly Windows and Ubuntu servers, I requested a MacBook, obviously.

However, there are certain internally-written programs that will run on the Windows only, so I run Parallels hypervisor with Windows 10 and Ubuntu.

On Windows 10, I have a hard time with the Command Prompt, last time I used DOS was in c. 1999, so I opted to try installing Bash.

$ ping Google.com
PING Google.com (74.125.138.102) 56(84) bytes of data.
64 bytes from yi-in-f102.1e100.net (74.125.138.102): icmp_seq=1 ttl=128 time=30.4 ms

$ ssh dummy@74.125.138.102
ssh: connect to host 74.125.138.102 port 22: Connection refused

Checking for Python


$ python --version

Command 'python' not found, but can be installed with:

sudo apt install python3
sudo apt install python
sudo apt install python-minimal

You also have python3 installed, you can run 'python3' instead.

$ python3 --version
Python 3.6.5

$ python3
Python 3.6.5 (default, Apr  1 2018, 05:46:30)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> print("Hello World")
Hello World
>>> exit()
$

Attempting to get Anaconda


wget https://3230d63b5fc54e62148e-c95ac804525aac4b6dba79b00b39d1d3.ssl.cf1.rackcdn.com/Anaconda-2.3.0-Linux-x86_64.sh
$ chmod +x Anaconda-2.3.0-Linux-x86_64.sh
$ . Anaconda-2.3.0-Linux-x86_64.sh
-bash: .: Anaconda-2.3.0-Linux-x86_64.sh: cannot execute binary file


$ java

Command 'java' not found, but can be installed with:

$ sudo apt install default-jre
$ sudo apt install openjdk-11-jre-headless
$ sudo apt install openjdk-8-jre-headless
$ sudo apt install openjdk-11-jre-headless

$ java --version
openjdk 10.0.1 2018-04-17
OpenJDK Runtime Environment (build 10.0.1+10-Ubuntu-3ubuntu1)
OpenJDK 64-Bit Server VM (build 10.0.1+10-Ubuntu-3ubuntu1, mixed mode)

Find files that are too big for GitHub


Command explanation

  • Find
    • in current directory
    • files bigger than 50 MB, (100Mb is too big for GitHub)
    • edited within 365 days
    • the type will be a file
  • execute list files
    • list format
    • human readable
    • sort by Size (not working)





$ find . -size +50M -mtime +365 -type f -exec ls -lhS {} \;



-rw-r--r--@ 1 uki  admin    49M Nov  6  2016 ./Coursera/UW/ML/Week3/amazon_baby.gl/m_bfaa91c17752f745.0000
-rw-r--r--@ 1 uki  admin    69M Nov  6  2016 ./Coursera/UW/ML/Week4/people_wiki.gl/m_4549381c276b46c6.0000
-rw-r--r--@ 1 uki  admin    60M Nov  6  2016 ./Coursera/UW/ML/Week5/song_data.gl/m_cccc16853452d1ed.0000
-rw-r--r--@ 1 uki  admin    63M Nov  6  2016 ./Coursera/UW/ML/Week6/image_test_data/m_e16f5ffd2c088370.0000

-rw-r--r--@ 1 uki  admin    31M Nov  6  2016 ./Coursera/UW/ML/Week6/image_train_data/m_504edbda459b24ff.0000


The "ls" command with sorting

The Mac/Unix "ls" command does not include sorting by file name. 

To get it done I added a new function in Bash to my ~/.bash_profile


# ls with sort - updated: February 13, 2018
function list()
{
    if [ "$1" != "" ]; then
        ls -a $1 | sort
    fi
   
    ls -a | sort
}

sed command line tool

The sed command line tool allows you to pipe a string of text and substitute part of it.

Note that it does for the first occurrence only:



$ echo "my, this is my sentence" | sed 's/my/My/'
My, this is my sentence


Escaping forward slashes with the backslashes:


$ echo "convert /usr/local/bin to /common/bin" | sed "s/\/usr\/local\/bin/\/common\/bin/"
convert /common/bin to /common/bin



$ echo "Repeat me 5 times." | sed 's/[0-9]/& & & & &/'

Repeat me 5 5 5 5 5 times.

Making symbolic link named ".m2" to Gradle JAR repo

After many years of using Maven, I am used to that my JARs are in ~/.m2/ directory, so I created myself a link as follows:


$ ln -s ~/.gradle/caches/modules-2/files-2.1 ~/.m2 

$ ls -alt ~/.m2/
total 0
drwxr-xr-x  65 ukilucas  staff  2210 Apr 28 23:18 .
drwxr-xr-x   3 ukilucas  staff   102 Apr 28 23:18 junit
drwxr-xr-x   4 ukilucas  staff   136 Apr 28 23:18 org.hamcrest
drwxr-xr-x   3 ukilucas  staff   102 Apr 28 14:15 com.android.tools.external.lombok
drwxr-xr-x   3 ukilucas  staff   102 Apr 28 14:15 org.abego.treelayout

drwxr-xr-x   3 ukilucas  staff   102 Apr 28 14:15 com.intellij
..

Running Groovy with multiple command line parameters


Source for ArgumentsTest.groovy


#!/bin/bash
//usr/bin/env groovy  -cp extra.jar:spring.jar:etc.jar -d -Dlog4j.configuration=file:/etc/myapp/log4j.xml "$0" $@; exit $?


def params = ""
args.each() {
    if (it) {
        params +=   it + "! "
    }
}

println "Hello World " + params

Execute permission

$ sudo chmod +x ArgumentsTest.groovy
Password:

Output


$ ./ArgumentsTest.groovy Uki Natalia Zoe
Hello World Uki! Natalia! Zoe!

git: updating all repos using Bash

I have a lot (hundreds) of repositories I want to keep updated on daily basis, here is a handy script I use:


# saving current working directory
cwd=$(pwd)

echo "reading each repo directory in $cwd"
for repo in *
do
echo '#################################################'
# change to give repo directory
cd $repo
# print repo url
    git config --get remote.origin.url
    git fetch
    git status
    # go back to directory you started with
    cd $cwd
done
}

If you like this post, please give me your 2 cents ($0.02 litterally) to show token of appreciation and encourage me to write more:

Donate Bitcoins

Linux tail command

To constantly monitor log files being appended, you can use:


tail -f /var/log/xyz*.log

if you want to see last 200 lines added:

tail -n 200 /var/log/xyz*.log

note the asterisk "*" symbol, that monitors ALL logs that meet the pattern, which is helpful with log names ending with DATE.

Bash: GIT repetitive tasks

I have too many GIT repos to remember that I need to keep updated, to do that I have a script that updates what I need.

GIT: update ALL REPOS in current directory

If you have a ZILLION GIT repositories in your current workspace you may benefit from a Bash shell command that does operations on all of them at one swoop.

Bash: syntax error: unexpected end of file

When you get:

syntax error: unexpected end of file

You should make sure that you don't have problems with end of line characters, however the ERROR in logic of your code (bad command) wil cause that as well.

Sometimes it is easier to comment out all lines with "# your code"and turn on some little-by-little.

Unix: dos2unix on Mac

Removing Windows line ends on Mac:

In Terminal.app

$ cat file1.txt | col -b > file2.txt

Warning: Do not run this with destination of the same file, or it will become empty.

UNIX: -bash: xyz.bash: cannot execute binary file

You create your script file but it does not want to execute...

Terminal.app error:

 -bash: xyz.bash: cannot execute binary file


Step 1:

Make sure it has execute permissions

@ scripts $ chmod 777 xyz.bash 

Step 2:

Make sure scrip does not have ERROR inside, special characters that prevent execution.