User login

HTTrack from the command line, some preliminary notes

Update

Use apt-get install webhttrack in Debian or Ubuntu and use the fine graphical user interface.

Old notes

Warning: None of the below was successfully used. We switched from a Mac to use an httrack GUI client.

From http://www.httrack.com/html/fcguide.html

Example:

httrack http://www.shoesizes.com/bob/ -O /tmp/shoesizes -N "%h%p/%n.%t"

[where] %h is http://www.shoesizes.com or ftp://ftp.shoesizes.com), %p stands for the pathname (e.g., /bob/), %n stands for the name of the file, and %t stands for type (file extension).

So I think I want to use this option:

-N "http://archive.example.org%p/%n.%t"

Details: User-defined option N
%[param] param variable in query string

This new option is important: you can include query-string content when forming the destination filename!

www.foo.com/catalog.php3?page=engineering
[...]

Then you can use the -N option:

httrack www.foo.com -N "%h%p/%n%[page].%t"

If found, the "page" parameter will be included after the filename, and the URLs above will be saved as:

/home/mywebsites/foo/www.foo.com/catalogengineering.php3

So:

http://example.org/tiki-read_article.php?articleId=2058
http://coanews.org/tiki-view_articles.php?topic=2&topicName=Media

Putting it all together, I'm trying:

httrack "http://coanews.org/index.php" -o "coanews.org-static-copy" "+*.coanews.org/*" -v -n -N "http://archive.coanews.org%p/%n.%t"

Subject: Re: way to grab what's after a ? in the file name
http://forum.httrack.com/readmsg/4123/4022/index.html

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
  • You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.
  • You can use Markdown syntax to format and style the text. Also see Markdown Extra for tables, footnotes, and more.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <img> <blockquote> <small> <h2> <h3> <h4> <h5> <h6> <sub> <sup> <p> <br> <strike> <table> <tr> <td> <thead> <th> <tbody> <tt> <output>
  • Lines and paragraphs break automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.