I need some help with 'wget'

Hi,

I am trying to download the Hindu sacred text "Rig Veda" from this website http://www.sacred-texts.com/hin/rigveda/index.htm. The structure of the directory is simple, the texts parts are stored like this:

http://www.sacred-texts.com/hin/rigveda/index.htm (main index)

http://www.sacred-texts.com/hin/rigveda/rvi01.htm (book 1)
http://www.sacred-texts.com/hin/rigveda/rv01001.htm (hymn 1)
http://www.sacred-texts.com/hin/rigveda/rv01002.htm (hymn 2)
etc.

http://www.sacred-texts.com/hin/rigveda/rvi02.htm (book 2)
http://www.sacred-texts.com/hin/rigveda/rv02001.htm (hymn 1)
http://www.sacred-texts.com/hin/rigveda/rv02002.htm (hymn 2)
etc.

It goes on for a total of 10 books.

Alas, at the top of each page there are links to other parts of the website including the home page (http://www.sacred-texts.com/index.htm). So when I use 'wget -r -l 2 http://www.sacred-texts.com/hin/rigveda/index.htm' to get everything two levels down from http://www.sacred-texts.com/hin/rigveda/index.htm I ALSO get two levels down from the home page and the rest of the links.

I tried 'wget -l 2 -I http://www.sacred-texts.com/hin/rigveda/*' but that did not work and I only got a "missing URL" message.

How do I 'pump' only the pages that I need, i.e. the full Rig Veda and not the rest of the world's spirituality?

Many thanks for any pointers,

VS

0

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

relative and no-parent

Hi there

In addition to your -r -l 2 options, try --relative --no-parent.

For more info on what that means, try the man page.

Good luck,
Georg

wget

If you use firefox you could get the Extension "Down them all" which downloads all links from a page. You can uncheck some of the links. This is probably less hassle than writing a scripts that can do it.
Alternatively you could download the index pages, strip the links from it and make a list of all URL's to be downloaded and pass that list to wget.
I recommend my first suggestion.

Cheap trick

If gromit comes up with such cheap trick I will give you two to. Install Gwget or Kget.

I will try them all ;-)

Thanks for the pointers guys!

Cheers,

VS

Motto: chown -R linux:GNU world
Distros: Debian, Kanotix, Frenzy, Damn Small Linux

Syndicate content