java登錄並抓取網頁_Java訪問指定URL並獲取網頁源代碼

1. java怎麼抓取需要登錄後的網站數據

一般爬蟲都不會抓登錄以後的頁面, 如果你只是臨時抓某個站,可以模擬登錄,然後拿到登錄以後的Cookies,再去請求相關的頁面。

2. "java網路爬蟲怎麼實現抓取登錄後的頁面" 你好，由於最近也急需這個代碼

沒做過網路爬蟲,不過順手寫了個自動登錄貓撲打卡的程序你可以參考一下,需要的包是commons-logging.jar,commons-net-1.4.1.jar,commons-codec-1.3.jar,log4j.jar,httpclient-4.3.1.jar ,下面是源代碼,希望可以幫到你~~
package com.ly.mainprocess;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.List;

import org.apache.http.Consts;
import org.apache.http.Header;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.NameValuePair;
import org.apache.http.StatusLine;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.cookie.Cookie;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.util.EntityUtils;

public class Test1 {
public static void main(String[] args){
Test1 test1 = new Test1();
System.out.println(test1.process("******","******"));
}

@SuppressWarnings("deprecation")
public boolean process(String username,String password) {
boolean ret=false;
DefaultHttpClient httpclient = new DefaultHttpClient();
try {
HttpGet httpget;
HttpResponse response;
HttpEntity entity;

List<Cookie> cookies;

//組建登錄的post包
HttpPost httppost = new HttpPost("http://login.hi.mop.com/Login.do"); // 用戶登錄
List<NameValuePair> nvps = new ArrayList<NameValuePair>();
nvps.add(new BasicNameValuePair("nickname", username));
nvps.add(new BasicNameValuePair("password", password));
nvps.add(new BasicNameValuePair("origURL", "http://hi.mop.com/SysHome.do"));
nvps.add(new BasicNameValuePair("loginregFrom", "index"));
nvps.add(new BasicNameValuePair("ss", "10101"));

httppost.setEntity(new UrlEncodedFormEntity(nvps, Consts.UTF_8));
httppost.addHeader("Referer", "http://hi.mop.com/SysHome.do");
httppost.addHeader("Connection", "keep-alive");
httppost.addHeader("Content-Type", "application/x-www-form-urlencoded");
httppost.addHeader("Accept-Language", "zh-CN,zh;q=0.8");
httppost.addHeader("Origin", "http://hi.mop.com");
httppost.addHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36");
response = httpclient.execute(httppost);
entity = response.getEntity();
// System.out.println("Login form get: " + response.getStatusLine());
EntityUtils.consume(entity);

// System.out.println("Post logon cookies:");
cookies = httpclient.getCookieStore().getCookies();
if (cookies.isEmpty()) {
// System.out.println("None");
} else {
for (int i = 0; i < cookies.size(); i++) {
// System.out.println("- " + cookies.get(i).toString());
}
}

//進行頁面跳轉
String url = ""; // 頁面跳轉
Header locationHeader = response.getFirstHeader("Location");
// System.out.println(locationHeader.getValue());
if (locationHeader != null) {
url = locationHeader.getValue(); // 得到跳轉href
HttpGet httpget1 = new HttpGet(url);
response = httpclient.execute(httpget1);
// 登陸成功。。。hoho
}
entity = response.getEntity();
// System.out.println(response.getStatusLine());
if (entity != null) {
// System.out.println("Response content length: " + entity.getContentLength());
}
// 顯示結果
BufferedReader reader = new BufferedReader(new InputStreamReader(entity.getContent(), "UTF-8"));
String line = null;
while ((line = reader.readLine()) != null) {
// System.out.println(line);
}

//自動打卡
// 訪問網站的子網頁。
HttpPost httppost1 = new HttpPost("http://home.hi.mop.com/ajaxGetContinusLoginAward.do"); // 設置個人信息頁面
httppost1.addHeader("Content-Type", "text/plain;charset=UTF-8");
httppost1.addHeader("Accept", "text/plain, */*");
httppost1.addHeader("X-Requested-With", "XMLHttpRequest");
httppost1.addHeader("Referer", "http://home.hi.mop.com/Home.do");
response = httpclient.execute(httppost1);
entity = response.getEntity();
// System.out.println(response.getStatusLine());
if(response.getStatusLine().toString().indexOf("HTTP/1.1 200 OK")>=0){
ret = true;
}
if (entity != null) {
// System.out.println("Response content length: " + entity.getContentLength());
}
// 顯示結果
reader = new BufferedReader(new InputStreamReader(entity.getContent(), "UTF-8"));
line = null;
while ((line = reader.readLine()) != null) {
System.out.println(line);
}
} catch (Exception e) {

} finally {
httpclient.getConnectionManager().shutdown();
}
return ret;
}
}

3. java如何利用httpclient 抓取登陸後的頁面

說明你應該沒復有登陸成制功或者沒有使用登陸後的cookies。在瀏覽器中登陸看看返回的響應，再輸出下登陸後獲取的response的header 和響應看看和瀏覽器返回是否一致。重點看看你的cookies的值。

4. java 爬蟲登陸網頁

最簡單的辦法就是在論壇頁面審查元素,找到登錄按鈕,找到他的action,把它傳到你的程序裡面就好了,比如我現在回答你的問題,下面的提交回答按鈕也會有一個action

5. java抓取登陸後的網頁內容，是登陸後的內容，顯示出來

一款java 強悍的htm一款java 強悍的html解析器 jar

6. Java訪問指定URL並獲取網頁源代碼

1．編寫類的基本框架，該類僅包括無返回值的main ()方法，該方法從參數中獲取URL，通過輸入緩沖和輸出緩沖將該URL 原碼輸出。
2．編寫useSourceViewer 類，代碼如下：
import java.net.*;
import java.io.*;
public class useSourceViewer
{
public static void main (String[] args)
{
if (args.length > 0)
{
try
{
//讀入URL
URL u = new URL(args[0]);
InputStream in = u.openStream( );
// 為增加性能存儲輸入流
in = new BufferedInputStream(in);
// 將輸入流連接到閱讀器
Reader r = new InputStreamReader(in);
int c;
while ((c = r.read( )) != -1)
{
System.out.print((char) c);
}
Object o = u.getContent( );
System.out.println("I got a " + o.getClass().getName( ));
}
catch (MalformedURLException e)
{
System.err.println(args[0] + " is not a parseable URL");
}
catch (IOException e)
{
System.err.println(e);
}
} // end if
} // end main
} // end SourceViewer}

7. 關於java一個登陸頁面獲取數據的實現，急

給你一個參考
httpclient 模擬登錄，更加登錄頁面傳參數 username 和 password 你先分析下那個頁專面的登錄用戶名和密碼的屬屬性名然後傳值進去

超簡單，網路一下 httpclient get或post提交
成功後悔response數據給你，如果你還需要分析頁面數據的話，再加個jsoup 框架做頁面解析得到你要的數據

程序靠的是思想

8. java爬蟲怎麼抓取登陸後的網頁數據

一般爬蟲都不會抓登錄以後的頁面，
如果你只是臨時抓某個站，可以模擬登錄，然後拿到登錄以後的Cookies，再去請求相關的頁面。

9. java網路爬蟲怎麼實現抓取登錄後的頁面

原理即是保存cookie數據

保存登陸後的cookie.

以後每次抓取頁面把cookie在頭部信息版裡面發送過去。

系統權是根據cookie來判斷用戶的。

有了cookie就有了登錄狀態，以後的訪問都是基於這個cookie對應的用戶的。

補充：Java是一種可以撰寫跨平台應用軟體的面向對象的程序設計語言。Java 技術具有卓越的通用性、高效性、平台移植性和安全性，廣泛應用於PC、數據中心、游戲控制台、科學超級計算機、行動電話和互聯網，同時擁有全球最大的開發者專業社群。

10. 如何java寫/實現網路爬蟲抓取網頁

原理即是保存cookie數據復保存登陸後的制cookie.以後每次抓取頁面把cookie在頭部信息裡面發送過去。系統是根據cookie來判斷用戶的。有了cookie就有了登錄狀態，以後的訪問都是基於這個cookie對應的用戶的。補充：Java是一種可以撰寫跨平台應用軟體的面向對象的程序設計語言。Java技術具有卓越的通用性、高效性、平台移植性和安全性，廣泛應用於PC、數據中心、游戲控制台、科學超級計算機、行動電話和互聯網，同時擁有全球最大的開發者專業社群。

導航:首頁 > 編程語言 > java登錄並抓取網頁

java登錄並抓取網頁

與java登錄並抓取網頁相關的資料

友情鏈接