GNU Wget の利用


  1. Wget を使う

    おっきなファイルや多数のファイルを http、ftp で取ってくるのに便利な Wget をインストールします。

  2. ダウンロード

    今回は ftp.gnu.org から wget-1.6.tar.gz をダウンロードしてインストールしてみます。

  3. コンパイルとインストール

    アーカイブを展開後にコンパイルしてインストールします。
    ↓こんな感じでさくさく。

    % pwd
    /opt/src/wget-1.6
    % ./configure --prefix=/opt/local
    :
    :
    % gmake
    :
    :
    % su
    :
    :
    # pwd
    /opt/src/wget-1.6
    # gmake install
    :
    :
    #

    実行可能なバイナリは <prefix_dir>/bin (上の例だと /opt/local/bin) にインストールされます。
    せっかくなので path を通しておきましょ。

  4. 使い方

  5. 初期設定ファイル wgetrc

    デフォルトの動作を変更したり、コマンドラインでのオプション記述を省くために初期設定ファイルを利用できます。 <prefix>/etc/wgetrc は全てのユーザに対する設定ファイルで、 $HOME/.wgetrc は一人のユーザに対する設定ファイルです。 サンプル<prefix>/etc/wgetrc としてインストールされますので利用できます。

  6. コマンドラインオプション一覧

    wget --help で表示されるオプションの一覧です。

    table1 - Startup
    option notes
    -V, --version Wget の version を表示します
    -h, --help help を表示します
    -b, --background バックグラウンドで実行します
    -e, --execute=COMMAND wgetrc コマンドを指定します

    table2 - Logging and input file
    option notes
    -o, --output-file=FILE ログを出力する FILE を指定します
    -a, --append-output=FILE ログを追加出力する FILE を指定します
    -d, --debug デバッグ (用の) 出力をします
    -q, --quiet ログを出力しません
    -v, --verbose 冗長なログを出力します (デフォルト)
    -nv, --non-verbose 簡潔なログを出力します
    -i, --input-file=FILE 取得する url を記述した FILE を指定します
    -F, --force-html input file を HTML ファイルとして扱います (幾つかのオプションに影響します)
    -B, --base=URL prepends URL to relative links in -F -i file.

    table3 - Download
    option notes
    --bind-address=ADDRESS bind to ADDRESS (hostname or IP) on local host.
    -t, --tries=NUMBER set number of retries to NUMBER (0 unlimits).
    -O --output-document=FILE write documents to FILE.
    -nc, --no-clobber don't clobber existing files or use .# suffixes.
    -c, --continue restart getting an existing file.
    --dot-style=STYLE set retrieval display style.
    -N, --timestamping ローカルに存在する file より新しくない file を取得しません
    -S, --server-response print server response.
    --spider don't download anything.
    -T, --timeout=SECONDS set the read timeout to SECONDS.
    -w, --wait=SECONDS wait SECONDS between retrievals.
    --waitretry=SECONDS wait 1...SECONDS between retries of a retrieval.
    -Y, --proxy=on/off turn proxy on or off.
    -Q, --quota=NUMBER set retrieval quota to NUMBER.

    table4 - ローカルディレクトリに関するオプション
    option notes
    -nd --no-directories ローカルにディレクトリを作成しません (ファイルは全てカレントディレクトリに取得されます)
    -x, --force-directories 強制的にディレクトリを作成します (単一のファイルをディレクトリ構成を保持して取得したりとか)
    -nH, --no-host-directories ホスト名のディレクトリを作成しません (ホスト下のディレクトリは作成されます)
    -P, --directory-prefix=PREFIX PREFIX で指定するディレクトリ下にファイルを取得します
    --cut-dirs=NUMBER ignore NUMBER remote directory components.

    table5 - HTTP options
    option notes
    --http-user=USER set http user to USER.
    --http-passwd=PASS set http password to PASS.
    -C, --cache=on/off (dis)allow server-cached data (normally allowed).
    -E, --html-extension save all text/html documents with .html extension.
    --ignore-length ignore `Content-Length' header field.
    --header=STRING insert STRING among the headers.
    --proxy-user=USER set USER as proxy username.
    --proxy-passwd=PASS set PASS as proxy password.
    --referer=URL include `Referer: URL' header in HTTP request.
    -s, --save-headers save the HTTP headers to file.
    -U, --user-agent=AGENT identify as AGENT instead of Wget/VERSION.

    table6 - FTP options
    option notes
    --retr-symlinks when recursing, retrieve linked-to files (not dirs).
    -g, --glob=on/off turn file name globbing on or off.
    --passive-ftp use the "passive" transfer mode.

    table7 - 再帰取得のオプション
    option notes
    -r, --recursive 再帰的な取得を行います (通常 -l と同時に指定します)
    -l, --level=NUMBER 再帰処理の深度を指定します (inf 或いは 0 で最大深度指定になります)
    --delete-after delete files locally after downloading them.
    -k, --convert-links convert non-relative links to relative.
    -K, --backup-converted before converting file X, back up as X.orig.
    -m, --mirror -r -N -l inf -nr の指定と同等です (ミラーサイト作成用)
    -nr, --dont-remove-listing don't remove `.listing' files.
    -p, --page-requisites get all images, etc. needed to display HTML page.

    table8 - 再帰取得の許可に関するオプション
    option notes
    -A, --accept=LIST 取得を許可する suffix のリストをコンマで区切って指定
    -R, --reject=LIST 取得を拒否する suffix のリストをコンマで区切って指定
    -D, --domains=LIST comma-separated list of accepted domains.
    --exclude-domains=LIST comma-separated list of rejected domains.
    --follow-ftp follow FTP links from HTML documents.
    --follow-tags=LIST comma-separated list of followed HTML tags.
    -G, --ignore-tags=LIST comma-separated list of ignored HTML tags.
    -H, --span-hosts 他サイトからも取得します
    -L, --relative 相対指定のリンクだけを取得します
    -I, --include-directories=LIST list of allowed directories.
    -X, --exclude-directories=LIST list of excluded directories.
    -nh, --no-host-lookup don't DNS-lookup hosts.
    -np, --no-parent 親ディレクトリを取得しません

  7. wgetrc コマンド一覧

    マニュアルに記述されている wgetrc コマンドを一覧します。

    table9 - Wgetrc Commands
    command notes
    accept/reject = string Same as `-A'/`-R' (see section Types of Files).
    add_hostdir = on/off Enable/disable host-prefixed file names. `-nH' disables it.
    continue = on/off Enable/disable continuation of the retrieval--the same as `-c' (which enables it).
    background = on/off Enable/disable going to background--the same as `-b' (which enables it).
    backup_converted = on/off Enable/disable saving pre-converted files with the suffix `.orig'---the same as `-K' (which enables it).
    base = string Consider relative URLs in URL input files forced to be interpreted as HTML as being relative to string---the same as `-B'.
    bind_address = address Bind to address, like the `--bind-address' option.
    cache = on/off When set to off, disallow server-caching. See the `-C' option.
    convert links = on/off Convert non-relative links locally. The same as `-k'.
    cut_dirs = n Ignore n remote directory components.
    debug = on/off Debug mode, same as `-d'.
    delete_after = on/off Delete after download--the same as `--delete-after'.
    dir_prefix = string Top of directory tree--the same as `-P'.
    dirstruct = on/off Turning dirstruct on or off--the same as `-x' or `-nd', respectively.
    domains = string Same as `-D' (see section Domain Acceptance).
    dot_bytes = n Specify the number of bytes "contained" in a dot, as seen throughout the retrieval (1024 by default). You can postfix the value with `k' or `m', representing kilobytes and megabytes, respectively. With dot settings you can tailor the dot retrieval to suit your needs, or you can use the predefined styles (see section Download Options).
    dots_in_line = n Specify the number of dots that will be printed in each line throughout the retrieval (50 by default).
    dot_spacing = n Specify the number of dots in a single cluster (10 by default).
    dot_style = string Specify the dot retrieval style, as with `--dot-style'.
    exclude_directories = string Specify a comma-separated list of directories you wish to exclude from download--the same as `-X' (see section Directory-Based Limits).
    exclude_domains = string Same as `--exclude-domains' (see section Domain Acceptance).
    follow_ftp = on/off Follow FTP links from HTML documents--the same as `-f'.
    follow_tags = string Only follow certain HTML tags when doing a recursive retrieval, just like `--follow-tags'.
    force_html = on/off If set to on, force the input filename to be regarded as an HTML document--the same as `-F'.
    ftp_proxy = string Use string as FTP proxy, instead of the one specified in environment.
    glob = on/off Turn globbing on/off--the same as `-g'.
    header = string Define an additional header, like `--header'.
    html_extension = on/off Add a `.html' extension to `text/html' files without it, like `-E'.
    http_passwd = string Set HTTP password.
    http_proxy = string Use string as HTTP proxy, instead of the one specified in environment.
    http_user = string Set HTTP user to string.
    ignore_length = on/off When set to on, ignore Content-Length header; the same as `--ignore-length'.
    ignore_tags = string Ignore certain HTML tags when doing a recursive retrieval, just like `-G' / `--ignore-tags'.
    include_directories = string Specify a comma-separated list of directories you wish to follow when downloading--the same as `-I'.
    input = string Read the URLs from string, like `-i'.
    kill_longer = on/off Consider data longer than specified in content-length header as invalid (and retry getting it). The default behaviour is to save as much data as there is, provided there is more than or equal to the value in Content-Length.
    logfile = string Set logfile--the same as `-o'.
    login = string Your user name on the remote machine, for FTP. Defaults to `anonymous'.
    mirror = on/off Turn mirroring on/off. The same as `-m'.
    netrc = on/off Turn reading netrc on or off.
    noclobber = on/off Same as `-nc'.
    no_parent = on/off Disallow retrieving outside the directory hierarchy, like `--no-parent' (see section Directory-Based Limits).
    no_proxy = string Use string as the comma-separated list of domains to avoid in proxy loading, instead of the one specified in environment.
    output_document = string Set the output filename--the same as `-O'.
    page_requisites = on/off Download all ancillary documents necessary for a single HTML page to display properly--the same as `-p'.
    passive_ftp = on/off/always/never Set passive FTP---the same as `--passive-ftp'. Some scripts and `.pm' (Perl module) files download files using `wget --passive-ftp'. If your firewall does not allow this, you can set `passive_ftp = never' to override the commandline.
    passwd = string Set your FTP password to password. Without this setting, the password defaults to `username@hostname.domainname'.
    proxy_user = string Set proxy authentication user name to string, like `--proxy-user'.
    proxy_passwd = string Set proxy authentication password to string, like `--proxy-passwd'.
    referer = string Set HTTP `Referer:' header just like `--referer'. (Note it was the folks who wrote the HTTP spec who got the spelling of "referrer" wrong.)
    quiet = on/off Quiet mode--the same as `-q'.
    quota = quota Specify the download quota, which is useful to put in the global `wgetrc'. When download quota is specified, Wget will stop retrieving after the download sum has become greater than quota. The quota can be specified in bytes (default), kbytes `k' appended) or mbytes (`m' appended). Thus `quota = 5m' will set the quota to 5 mbytes. Note that the user's startup file overrides system settings.
    reclevel = n Recursion level--the same as `-l'.
    recursive = on/off Recursive on/off--the same as `-r'.
    relative_only = on/off Follow only relative links--the same as `-L' (see section Relative Links).
    remove_listing = on/off If set to on, remove FTP listings downloaded by Wget. Setting it to off is the same as `-nr'.
    retr_symlinks = on/off When set to on, retrieve symbolic links as if they were plain files; the same as `--retr-symlinks'.
    robots = on/off Use (or not) `/robots.txt' file (see section Robots). Be sure to know what you are doing before changing the default (which is `on').
    server_response = on/off Choose whether or not to print the HTTP and FTP server responses--the same as `-S'.
    simple_host_check = on/off Same as `-nh' (see section Host Checking).
    span_hosts = on/off Same as `-H'.
    timeout = n Set timeout value--the same as `-T'.
    timestamping = on/off Turn timestamping on/off. The same as `-N' (see section Time-Stamping).
    tries = n Set number of retries per URL---the same as `-t'.
    use_proxy = on/off Turn proxy support on/off. The same as `-Y'.
    verbose = on/off Turn verbose on/off--the same as `-v'/`-nv'.
    wait = n Wait n seconds between retrievals--the same as `-w'.
    waitretry = n Wait up to n seconds between retries of failed retrievals only--the same as `--waitretry'. Note that this is turned on by default in the global `wgetrc'.

もどる Last modified: 04/09/01 00:41:13 JST