[Bash] Quick file renaming

I had a bunch of .deb packages from the apt archive. But the problem was that in the names of the packages, there was a “%3” in the places where there should be a “_”. We found that the symbol was corresponding to the underscore character and it was somehow getting replaced in the name. Hence we needed to rename all filenames, changing the ‘%3’ to ‘_’ and I was requested to write a quick script. As my usual bash buddy was busy, I resolved to some Google’ing and found this.

I first started by trying to echo the file name, then replace the ‘%3’ with ‘_’, print it, and finally rename the file.

for FILE in `ls`; do echo $FILE; done; 
for FILE in `ls`; do NEWFILE=`echo $FILE | sed 's/%3a/_3a/g'`; echo $NEWFILE; done;
for FILE in `ls`; do NEWFILE=`echo $FILE | sed 's/%3a/_3a/g'`; mv "$FILE"  $NEWFILE ; done;

But when I execute my final script, it informs that the oldfile and newfile to the `mv` command are the same. Think the `sed` command itself renamed the file. Anyways, it did the job I wanted it to 🙂

Updates from floyd_n_milan:

for file in *; do mv -v "$i" "${i/\%3a/_3a/}"; done;

Advertisements

(Bash) My Top 10 Commands – September 2007

Long since I looked at what commands I frequently use, I decided to re-run that small command I got to learn some months ago. This shows am working a lot on files and that too on some other machine through networking 😉

$ history | awk ‘{print $2}’ | awk ‘BEGIN {FS=”|”}{print $1}’ | sort | uniq -c | sort -n | tail | sort -nr

142 ls
64 cd
60 vim
36 exit
25 sudo
24 ssh
15 scp
14 find
13 svn
13 apt-cache

(Bash) Copy with Find

Wanted to copy all my EPM .list files to a separate folder so I can do a bit of edit and test with it. Wanted to do it in a true geek mode and hence decided to make use of bash’s find command (Thanks to floyd_n_milan for introducing me to this wonderful command). Since it some time I really used `find`, I had to do a bit of Googling to find I needed. This might be a piece of piss for seasoned Bash’ers but for me it was a chance to revive my detoriating Bash knowledge and as it always happens, I love to post newbie stuffs.

$ find ./myproj/pkgs -name \*.list -exec cp {} epm-list/ \;

This command searches the myproj/pkgs directory (within current directory) and its sub-directories for files ending with .list. Upon each file found, it executes the `cp` command copying the file to the directory epm-list. The {} is replaced by the name of the file found (each time).

Please note the escaping ‘\’ before the ‘*’ in the regular expression and before the trailing ‘;’. If you miss the trailing semicolon or the escaping, bash will complain that the -exec has no options. We need not put `epm-list/{}`, it uses the file name automatically (if we use {} then it tries to put the file in epm-list/<path-where-the-file-is-actually-found> which ends in a path not found error).

(Bash) My Top 10 Commands – June 2007

Following the posts in Planet Ubuntu, I thought of trying out the same to find out which commands I have often used recently. So I tried..

$ history | awk ‘{print $2}’ | awk ‘BEGIN {FS=”|”}{print $1}’ | sort | uniq -c | sort -n | tail | sort -nr

And the result was..

81 ls
68 sudo
64 cd
33 apt-cache
26 exit
24 screen
22 vim
12 ping
10 cat
8 svn

Positional Parameters in Bash

Been long since I posted something on Bash and as usual, I again post something from Mrugesh’s (floyd_n_milan) key board. This time its an explanation of Positional parameters in Bash, which was actually a reply to a question in GLUG-BOM mailing list. I felt the explanation to be more clear than what I found in many books on the same, hence here goes his explanation (slightly edited).

$* expands the positional parameters. 

Positional parameters are arguments passed to a command.
When you run your script by doing, say, 

$ ./image-resize image1 image2 image3 

the positional parameters get set to ./image-resize,
image1, image2 and image3. 

The positional parameters can be used as the variables
$0 to $n where n is any integer. $0 is always set to
the name of the script, in this case image-resize.
The parameters from $1 onwards are the actual
arguments. So in this case, 

$1 == image1
$2 == image2 and
$3 == image3 

These are read only variables. You can't manually
assign a value to them. You must use either set or shift. 

When you process the arguments, you can either call them
individually, or, as is most common, process them in a
loop, one by one. You've used the second method. 

$* and $@ are two variables that can be used to query
all the positional parameters together. When unquoted,
they are subject to IFS word splitting like any other
variable in bash. What this means is that if your arguments
have spaces or newlines embedded in them, they'll be
split up to be two different entities. 

The safe way is to double quote ( " ) the variables,
which prevents this internal splitting. 

Mind the double quotes. VERY important. 

Here's an example: 

$ set "foo bar" baz blah
$ # "foo bar" is intended to be a single argument.
$ # set can be used to set various bash options and
   positional parameters.
$ echo $1
foo bar
$ echo $2
baz
$ echo $3
blah
$ for i in $@; do echo '==> '${i}' <=='; done
==> foo <==
==> bar <==
==> baz <==
==> blah <==
$ # Notice that foo and bar have been split up. Undesirable.
$ for i in $*; do echo '==> '${i}' <=='; done
==> foo <==
==> bar <==
==> baz <==
==> blah <==
$ # When unquoted $@ and $* produce the same result.
$ for i in "$@"; do echo '==> '${i}' <=='; done
==> foo bar <==
==> baz <==
==> blah <==
$ # I suppose this is what you wanted?
$ for i in "$*"; do echo '==> '${i}' <=='; done
==> foo bar baz blah <==
$ # When double quoted, $@ and $* differ vastly..
$ # And that's not the result you wanted either.
$ # Always use "$@".

And here is a small addition from cydork,

<cydork> add that when you use “$*” the positional parameters will be separated with first char of IFS 😉

And one additional point from Mrugesh himself,

the ${array[@]} and ${array[*]} behave in a similar manner

Newlines and Sed

This was actually written by floyd_n_milan, but he had some problem posting it in blogger and hence I do him this favor of posting it in my blog 🙂

This is a not so nice little problem.

Say, you need to rid an HTML file of all the tags. Seems pretty simple if the
entire tag is contained on one single line, like this:

<html>

Could be even two tags on the same line, like this:

<html><head>

That’s not a problem since both tags start and end on the same line. Here’s
what you can do in sed to get rid of them:

sed -e 's/<[^>]*>//g'

It reads, substitute a pattern that starts with a <, followed by zero or more
characters that are not > and ends with a > with a blank, for all instances
on the line.

Why’s the [^>]* necessary then? Why not just <.*>?

Consider this:

<title>Hello World</title>

What would happen with <.*> here? sed will see the < from <title> and the >
from </title>, because by default it matches the longest possible match.
This’ll remove the entire line, which is not what we want. We just want to
remove the <title> and </title>, keeping the Hello World intact.

Hence, we use [^>]*. This means, match zero or more characters that are not >.
So, in essence, we’re matching the shortest pair of <>. Inside this pair,
after the initial <, there can be no >, unless its the end of the pattern.
This’ll match both <title> and </title> separately and keep Hello World
intact.

The problem still remains though. The above sed command will only work if the
entire tag is on one single line, because sed can read the file only line by
line. So, if something like this comes up:

<p><font size="5"
face="blah">Blah blah blah</font></p>

sed will remove the <p> fine. Then it’ll find the <font but won’t find the
corresponding > on the same line. On the next line, it’ll remove the </font>
and the </p> at the end, but won’t remove the face=”blah”> at the start
because it can’t find the initial <.

This problem can be solved using the multiline pattern space in sed. This
script will work:

sed -e '/</{
N
s/<[^>]*>//g
}'

First /</ takes sed to a line that has a <. The commands inside the {} will
then operate on this line.

The N command causes sed to read in the next line, keeping the initial and the
newly read in line, both in the pattern space. So the content that sed
operates upon, now looks like this:

<p><font size="5"nface="blah">Blah blah blah</font></p>

with \n being just another character in the line. The contents then match the
<[^>]*> used by the substitute command properly.