HTTrack from the command line, some preliminary notes
Update
Use apt-get install webhttrack
in Debian or Ubuntu and use the fine graphical user interface.
Old notes
Warning: None of the below was successfully used. We switched from a Mac to use an httrack GUI client.
From http://www.httrack.com/html/fcguide.html
Example:
httrack http://www.shoesizes.com/bob/ -O /tmp/shoesizes -N "%h%p/%n.%t"
[where] %h is http://www.shoesizes.com or ftp://ftp.shoesizes.com), %p stands for the pathname (e.g., /bob/), %n stands for the name of the file, and %t stands for type (file extension).
So I think I want to use this option:
-N "http://archive.example.org%p/%n.%t"
Details: User-defined option N
%[param] param variable in query stringThis new option is important: you can include query-string content when forming the destination filename!
www.foo.com/catalog.php3?page=engineering
[...]Then you can use the -N option:
httrack www.foo.com -N "%h%p/%n%[page].%t"
If found, the "page" parameter will be included after the filename, and the URLs above will be saved as:
/home/mywebsites/foo/www.foo.com/catalogengineering.php3
So:
http://example.org/tiki-read_article.php?articleId=2058
http://coanews.org/tiki-view_articles.php?topic=2&topicName=Media
Putting it all together, I'm trying:
httrack "http://coanews.org/index.php" -o "coanews.org-static-copy" "+*.coanews.org/*" -v -n -N "http://archive.coanews.org%p/%n.%t"
Subject: Re: way to grab what's after a ? in the file name
http://forum.httrack.com/readmsg/4123/4022/index.html
Comments
Post new comment