最近写了一个程序,模拟登录百度进行一些操作,例如下载文库文档,网盘,修改个人信息等等,分析过程很痛苦,但是做完之后回头想想还是很意思的。代码有点乱,我不整理了,这里分享一下模拟百度相关原理与相关代码,以及一些注意事项。我比较热爱各种抓包,之前做过模拟新浪微博登录、发微博,转发、评论等操作,有兴趣的程序员可以一起交流。
我使用的是java去模拟,apache下的core包。抓包工具我用的是Httpwatch pro和firebug。
Httpclinet 官方API:http://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/
Httpclient LIB以及DEMO下载地址:http://hc.apache.org/downloads.cgi.
登录代码已整理:http://blog.csdn.net/programmer_sir/article/details/43193611
百度登录源码及LIB下载地址:http://download.csdn.net/detail/programmer_sir/8400495
一、百度登录原理。
1、访问http://www.baidu.com生成cookie,cookie里有一个BAIDUID。
2、访问https://passport.baidu.com/v2/api/?getapi&tpl=mn&apiver=v3&class=login&logintype=dialogLogin,得到token。
{"errInfo":{ "no": "0" }, "data": { "rememberedUserName" : "sir_belen", "codeString" : "", "token" : "a5ff667c0360dbdcb43cae85018e3e97", "cookie" : "1", "usernametype":"1", "spLogin" : "rate", "disable":"", "loginrecord":{ 'email':[ ], 'phone':[ ] } }}
3、访问https://passport.baidu.com/v2/api/?login,登录为POST,需要的将登录参数传过去。
// 登录POST参数。private List<NameValuePair> produceFormEntity() throws UnsupportedEncodingException{List<NameValuePair> list = new ArrayList<NameValuePair>();list.add(new BasicNameValuePair("tt", ""+System.currentTimeMillis()));list.add(new BasicNameValuePair("tpl", "mn"));list.add(new BasicNameValuePair("token", token));list.add(new BasicNameValuePair("isPhone", ""));list.add(new BasicNameValuePair("username", username));list.add(new BasicNameValuePair("password", password));list.add(new BasicNameValuePair("verifycode", verifycode));list.add(new BasicNameValuePair("codestring", codestring));return list;}
4、访问http://www.baidu.com。搜索是否有登录两字,若没有说明登录成功!
二、注意事项:
1、要得到token必须要先预访问一下百度,只有获取到BAIDUID才能获取到token。
2、登录返回结果如下:
<!DOCTYPE html><html><head><meta http-equiv=Content-Type content="text/html; charset=UTF-8"></head><body><script>var href = decodeURIComponent("http:\/\/www.baidu.com\/cache\/user\/html\/v3Jump.html")+"?"var accounts = '&accounts='href += "err_no=0&callback=parent.bd__pcbs__3tp8m3&codeString=&userName=sir_belen&phoneNumber=&mail=&hao123Param=UkxUbFU0V2tWSlpscC1OR3RLTkhsU2RVTlVWVFpsYVhsbFRVbHhka3BWUVhVNFkzbEpOSEJyUlU1alEzQlZRVkZCUVVGQkpDUUFBQUFBQUFBQUFBRUFBQUFDRFRJZGMybHlYMkpsYkdWdUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBM2pBbFFONHdKVVNE&u=http://www.baidu.com/s%3Fbs%3D%25E7%2599%25BE%25E5%25BA%25A6%25E7%25BD%2591%25E7%259B%2598%25E7%2599%25BB%25E9%2599%2586%26f%3D3%26rsv_bp%3D1%26rsv_spt%3D3%26oq%3D%25E7%2599%25BE%25E5%25BA%25A6%25E7%25BD%2591%25E7%259B%2598%26rsp%3D2%26inputT%3D1624%26tn%3Dbaidu%26ie%3Dutf-8%26wd%3D%25E7%2599%25BE%25E5%25BA%25A6%25E7%25BD%2591%25E7%259B%2598%25E6%2590%259C%25E7%25B4%25A2%25E5%25BC%2595%25E6%2593%258E%26rsv_sug3%3D6%26rsv_sug4%3D76%26rsv_sug1%3D6%26rsv_sug2%3D1%26rsv_sug%3D2&tpl=&secstate=&gotourl=&authtoken=&loginproxy=&resetpwd=&vcodetype=&lstr=<oken=&bckv=1&bcsync=W5zkqgOHIqvzP2dSHFvxQSKUiIr9D4rDjYJVtVgWV8R1%2FhxlTPdebvE9CnqW9QwX2Z06SDpX2ZS7bwbKRkzgXZZPS%2Fiy55wg39sQJJGA5bqxJG%2BOCgXLwcGk74YwxZNtxZev8gVFi3QzyRHn7gEoKx9S3lfyFIvxW6%2BOmu80ZjPghWWWYDzVFAUVJ3XHO3lqdQ40vDn5KPoMmNecbOGxSWnBT2vGt9lhcRrK%2B%2BVahmf8GVuUoYES145FnNR3ET1z3B8CsIXLI5QyQgSEHpQvv%2Fy6PwI5RwiwNjokhD3yie1%2BVOIuh2hOsr7FsJeoUKCAxmdAUgyI39H0GlZnn0D5Hw%3D%3D&bcchecksum=1367615822&bctime=1409475342"+accounts;if(window.location){window.location.replace(href);
}else{document.location.replace(href);
}
</script>
请注意第四行的err_no以及codeString。
err_no相关参数意思,参照以下代码:
int code = Integer.parseInt((String)json.get("err_no"));switch (code) {case 0: // 登录成功。m.setStatusCode(200);break;case 4: // 密码错误。m.setStatusCode(6);m.setO(json);break;case 5: // 帐号被百度认定为异常,需要去百度(www.baidu.com)验证手机号。m.setStatusCode(5);break;case 6: // 验证码错误。m.setStatusCode(4);m.setO(json);break;case 257:// 需要填写验证码。m.setStatusCode(3);m.setO(json);break;case 120019: // 帐号被百度认定为异常,需要去百度(www.baidu.com)验证手机号。m.setStatusCode(5);break;default: // 其它原因,请联系工作人员。m.setStatusCode(500);break;}
codeString是验证码,如果不为空说明需要输入验证码,如果出现验证码的问题,可以将图片下载到本地服务器。
然后让用户看图片填写验证码,登录时将验证码与codeString带过去即可。
3、Httpclient core能自动管理Cookie,在模拟登录时保证client是同一个,且多个地方使用到get或post请求,可以将httpclient封装一下。这里我写的是单例,以保证HTTP请求是同一个。这样登录成功后管理或下载百度相关的,例如获取个人信息,下载百度文库或其它操作都不会有问题了。
package com.baidu.service;import java.util.List;import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.NameValuePair;
import org.apache.http.client.CookieStore;
import org.apache.http.client.HttpClient;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.client.methods.HttpUriRequest;
import org.apache.http.client.protocol.ClientContext;
import org.apache.http.impl.client.BasicCookieStore;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager;
import org.apache.http.protocol.BasicHttpContext;
import org.apache.http.protocol.HttpContext;import com.baidu.util.Log;@SuppressWarnings("deprecation")
public class BaiduConnectService {private CookieStore cookieStore = new BasicCookieStore();private BaiduConnectService(){}private static class BaiduConnectServiceContainer{private static BaiduConnectService bc = new BaiduConnectService();}public static BaiduConnectService getInstance(){Log.logInfo("init BaiduConnectService.");return BaiduConnectServiceContainer.bc;}public HttpResponse execute(String url) throws Exception{return this.execute(url,null);}public HttpResponse execute(String url, List<NameValuePair> params) throws Exception{HttpClient httpClient = new DefaultHttpClient(new ThreadSafeClientConnManager());HttpResponse response = null;HttpUriRequest request = null;if (params != null) {HttpPost httpPost = new HttpPost(url);try {HttpEntity postBodyEnt = new UrlEncodedFormEntity(params);httpPost.setEntity(postBodyEnt);} catch (Exception e) {e.printStackTrace();}request = httpPost;} else {HttpGet httpGet = new HttpGet(url);request = httpGet;}HttpContext localContext = new BasicHttpContext();localContext.setAttribute(ClientContext.COOKIE_STORE, cookieStore);response = httpClient.execute(request, localContext);Log.logInfo("Request URL:"+url);Log.logRed("[Status:" + response.getStatusLine().getStatusCode()+"]");return response;}public CookieStore getCookieStore() {return cookieStore;}public void setCookieStore(CookieStore cookieStore) {this.cookieStore = cookieStore;}
}
4、我的代码中没有在请求添加头信息,例如Host,User-Agent之类的,百度登录这块是不需要添加的。
5、以下是我做的一个小系统的部分功能页面,供参考:模拟登录百度、验证码、搜索并下载百度文库、搜索并下载网盘等。
1)登录百度
2)密码输入错误三次,百度会出现验证码,下载验证码。
3)验证码输入后,模拟调用百度验证接口,检测验证码输入是否正确 。
更换验证码请求地址:
https://passport.baidu.com/v2/?reggetcodestr&token="+ getToken(getCookies()) + "&tpl=mn&apiver=v3&tt="+ System.currentTimeMillis() + "&fr=login
检测验证码输入是否正确请求地址:
https://passport.baidu.com/v2/?checkvcode&token="+ getToken(getCookies()) + "&tpl=mn&apiver=v3&tt="+System.currentTimeMillis()+"&verifycode="+verifycode+"&codestring="+codeString
4)登录成功后搜索JAVA,并选择一个文件进行下载。
5)下载百度文库时百度发现异常,有时候需要输入验证码。
6)输入验证码后,点击“提交”。
7)下载百度文库成功!