顯示具有 commandline tools 標籤的文章。 顯示所有文章
顯示具有 commandline tools 標籤的文章。 顯示所有文章

plowprobe


plowdown 取消了 -c 選項,不過多了 plowprobe。

plowprobe可以輸出的格式比 plowdown -c 多了很多


printf選項比較重要的的參數
--------------------------------------------------------
* %c probe function return status (0 for success, 13 for dead link, see list below)
* %f filename (can be empty string)
* %s filesize in bytes (can be empty string if not available). Note: it's often approxi‐mative.


Return Code
--------------------------------------------------------
0 Success.
1 Fatal error. Upstream site updated or unexpected result.
2 No available module (provided URL is not supported).
3 Network error. Mostly curl related.
4 Authentication failed (bad login/password).
5 Timeout reached (refer to -t/--timeout command-line option).
6 Maximum tries reached (refer to -r/--max-retries command-line option).
7 Captcha generic error.
8 System generic error.
10 Link alive but temporarily unavailable.
11 Link alive but requires a password.
12 Link alive but requires some authentication (private or premium link).
13 Link is dead.
14 Can't download link because file is too big (need permissions).
15 Unknown command line parameter or incompatible options.


Return code 跟plowdown一樣。 裡面比較重要的是0跟13, 0=> Ok, 13=> dead link。

A simple g.e-hentai downloader

#!/usr/bin/env python
#-*- coding:utf-8 -*-

import os
import urllib2
import sys
from BeautifulSoup import BeautifulSoup

try:
    url = sys.argv[1]
except IndexError:
    print "geh - A simeple g.e-hentai downloader\nUsage: geh.py [url]"
        

def find_next_page_link(tag):
    try:
        if tag.name == 'a' and tag.text == '>': return True
        return False
    except TypeError:
        return False 
               

def find_image(html):
    soup = BeautifulSoup(html)
    return soup.find('img', {'id': 'img'})['src']
    

def parse_index(url):    
    html = urllib2.urlopen(url).read()
    soup = BeautifulSoup(html)

    title = soup.h1.text
    next_page_url = soup.find(find_next_page_link)
    image_list = [node['href'] for node in soup.find('div', {'id': 'gdt'}).findAll('a')]

    return title, next_page_url, image_list

c = 1
while True:    
    title, next_page_url, image_list = parse_index(url)

    dst_dir = title
    if not os.path.exists(dst_dir):
        os.mkdir(dst_dir)
    
    for page in image_list:        
        html = urllib2.urlopen(page).read()
        image_url = find_image(html)

        fn = '{0}.{1}'.format(str(c).zfill(3), image_url.split('.')[-1])
        fn = os.path.join(dst_dir, fn)
        
        print '{0}: {1} ... '.format(dst_dir, str(c).zfill(3)),
        with file(fn, 'wb') as f:
            image_data = urllib2.urlopen(image_url).read()
            f.write(image_data)
        print 'done'            
        c += 1

    if not next_page_url:
        break
    
    url = next_page_url['href']

curl POST之後遇到302重導的問題


curl POST資料到主機後,主機回應302時,如果只是用-L參數會得不到想要的資料。
這時候要加上--post302參數。

curl -L --post302 {POST data} {url}