C#中的HTTP操作
[malicTOC]
System.Net.Http 命名空间提供用于现代 HTTP 应用程序的编程接口,可以用来开发桌面应用程序。如果要进行网络信息收集分析也离不开HTTP操作。最常用的操作就是发送GET和POST请求,以及POST上传内容。
发送GET请求
GET请求是最常用的访问方式,是直接通过URL发送的。如果有参数,则在url最后以?key1=value1&key2=value2&key3=value3
这样的格式传递。
首先引入命名空间 using System.Net;
。接下来先要设计一个函数CreateGetHttpResponse()
用来获取GET请求的HTTP响应。GET方法的参数写在URL中,所以如果有参数我们将之加在URL之后,然后创建请求。如果未能成功连接,就返回null,成功连接则返回HttpResponse。
class HttpFunctions { public static HttpWebResponse CreateGetHttpResponse( string url, IDictionary<string, string> parameters=null, string token = null) { string urlWithGetParam = url; if (!(parameters == null || parameters.Count == 0)) { StringBuilder buffer = new StringBuilder(); int i = 0; foreach (string key in parameters.Keys) { if (i > 0) { urlWithGetParam += String.Format("&{0}={1}", key, parameters[key]); } else { urlWithGetParam += String.Format("?{0}={1}", key, parameters[key]); i++; } } } HttpWebRequest request = null; request = WebRequest.Create(urlWithGetParam) as HttpWebRequest; request.Method = "GET"; //设置代理UserAgent和超时 //request.UserAgent = userAgent; //request.Timeout = timeout; if (token != null) { request.Headers.Add(HttpRequestHeader.Authorization, "Bearer " + token); } try { return request.GetResponse() as HttpWebResponse; } catch { return null; } } }
这样获得了HttpResponse。例如我们在主调中,用必应搜索传入q=搜索词的参数,来搜索”C#”这个关键词
class Program { static void Main(string[] args) { Dictionary<string, string> myParams = new Dictionary<string, string>(); myParams["q"] = "C#"; var h1 = HttpFunctions.CreateGetHttpResponse("http://cn.bing.com/search",myParams); Console.WriteLine(h1.StatusCode); } }
可以看到状态码为200,说明HTTP响应正常。
这样有了HttpResponse,但还没能获取到HTTP的内容。要获取HTTP的内容,最好使用流式传输。在HttpFunctions类当中设计一个读取stream中文本的函数:
public static string responseText(HttpWebResponse h) { if(h!=null) { System.IO.Stream receiveStream = h.GetResponseStream(); Encoding encode = System.Text.Encoding.GetEncoding("utf-8"); System.IO.StreamReader readStream = new System.IO.StreamReader(receiveStream, encode); Char[] read = new Char[256]; // Reads 256 characters at a time. int count = readStream.Read(read, 0, 256); string ret = ""; while (count > 0) { String str = new String(read, 0, count); ret += str; count = readStream.Read(read, 0, 256); } readStream.Close(); return ret; } else { return ""; } }
我们请求的是HTML页面,内容往往会很多,我们在主调函数中测试可以只看一下前100个字符:
class Program { static void Main(string[] args) { Dictionary<string, string> myParams = new Dictionary<string, string>(); myParams["q"] = "C#"; var h1 = HttpFunctions.CreateGetHttpResponse("http://cn.bing.com/search",myParams); Console.WriteLine(h1.StatusCode); Console.WriteLine(HttpFunctions.responseText(h1).Substring(0,100)); h1.Dispose(); } }
输出了一行HTML结果,说明也成功读取到了HTML的内容
<!DOCTYPE html><html lang="zh" xml:lang="zh" xmlns="http://www.w3.org/1999/xhtml" xmlns:Web="http://
发送POST请求进行登录操作
URL传递有长度限制,如果数据比较长,则使用GET方法传参数就不合适了。 GET方法明文传输,并且可以被缓存,其安全性较差。像登录系统这种敏感数据的操作不适合使用GET方法。而POST请求将数据与URL分离,在传输数据时更常用。
在使用POST方法提交数据到服务端时,有多种编码供选择,默认是application/x-www-form-urlencoded
,此时所有非字母数字类型的字符都需要转换为十六进制的ASCII值。但是如果表单中包含大量非字母数字时,这种编码的效率就非常低,比如处理二进制文件上传时就存在该问题,此时就需要定义multipart/form-data
作为编码类型,使用这种类型时不会对输入进行编码,而是使用MIME协议将之作为多个部分进行发送,和邮件传输的标准相同。
例如网址 https://www.runoob.com/try/ajax/demo_post2.php
接收POST方法的两个参数fname和lname,将返回一句问候语。创建POST的HttpResponse与GET方法类似,只是参数parameters需要用Stream的方法以字节写入。
public static HttpWebResponse CreatePostHttpResponse( string url, IDictionary<string, string> parameters=null) { HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest; request.Method = "POST"; request.ContentType = "application/x-www-form-urlencoded"; if (!(parameters == null || parameters.Count == 0)) { var paraString = ""; int i = 0; foreach(var it in parameters) { if(i>0) { paraString += "&"; } paraString += String.Format("{0}={1}", it.Key,it.Value); i += 1; } byte[] byteArray = Encoding.Default.GetBytes(paraString); System.IO.Stream stream1 = request.GetRequestStream(); stream1.Write(byteArray, 0, byteArray.Length);//写入参数 stream1.Close(); } try { return request.GetResponse() as HttpWebResponse; } catch { return null; }
主函数中调用
Dictionary<string, string> myParams = new Dictionary<string, string>(); string url = "https://www.runoob.com/try/ajax/demo_post2.php"; myParams["fname"] = "Henry"; myParams["lname"] = "Lord"; var h1 = HttpFunctions.CreatePostHttpResponse(url, myParams); var htContent = HttpFunctions.responseText(h1); Console.WriteLine(htContent); h1.Dispose();
则可以看到结果
<p style='color:red;'>你好,Henry Lord,今天过得怎么样?</p>
利用COOKIES模拟登录
接下来为了演示POST请求的方法,将以《用Python编写网络爬虫》一书提供的测试网站 :http://example.webscraping.com/places/default/user/login
为示例,自己注册后,进行模拟登录系统。
进入我们要登入的网站,检查这个登录的表单,可以发现除了图中有的这些条目,还有display:none;的一组数据,这些是网站用来验证用户的,如果C#模拟登录只传输电子邮件和密码,是不能登录成功的,我们需要把form中所有input条目都撮出来。在C#中,我们可以先请求到这个网页,用正则表达式将<input />字段的内容提取出来:
private static List<string> showMatch(string text, string expr) { System.Text.RegularExpressions.MatchCollection mc = System.Text.RegularExpressions.Regex.Matches(text, expr); List<string> ret = new List<string>(); foreach (System.Text.RegularExpressions.Match m in mc) { ret.Add(m.ToString()); } return ret; }
<input class="string" id="auth_user_email" name="email" type="text" value="" /> <input class="password" id="auth_user_password" name="password" type="password" value="" /> <input class="boolean" id="auth_user_remember_me" name="remember_me" type="checkbox" value="on" /> <input type="submit" value="Log In" /> <input name="_next" type="hidden" value="/places/default/index" /> <input name="_formkey" type="hidden" value="fe10396d-3a5f-4d8c-b03e-2960a2820cac" /> <input name="_formname" type="hidden" value="login" />
除了type=submit的提交按钮,另外6个name-type键值对则是我们需要post发送的数据。
private static Dictionary<string,string> getinputParameters(string message) { Dictionary<string, string> ret = new Dictionary<string, string>(); var r = showMatch(message, @"<input .*?/>"); foreach (var it in r) { System.Text.RegularExpressions.MatchCollection lineMatchKey = System.Text.RegularExpressions.Regex.Matches(it, "name=\\\".*?\\\""); if (lineMatchKey.Count > 0) { System.Text.RegularExpressions.MatchCollection lineMatchValue = System.Text.RegularExpressions.Regex.Matches(it, "value=\\\".*?\\\""); if (lineMatchValue.Count > 0) { ret[lineMatchKey[0].ToString().Substring(6, lineMatchKey[0].Length - 7)] = lineMatchValue[0].ToString().Substring(7, lineMatchValue[0].Length - 8); } } } return ret; }
这样得到了所有的参数,就可以发送POST请求了
class Program { static void Main(string[] args) { Dictionary<string, string> myParams = new Dictionary<string, string>(); string url = "http://example.webscraping.com/places/default/user/login"; var h1 = HttpFunctions.CreateGetHttpResponse(url); var htContent = HttpFunctions.responseText(h1); h1.Dispose(); myParams = getinputParameters(htContent); myParams["email"] = "[email protected]"; myParams["password"] = "88888888"; var h2 = HttpFunctions.CreatePostHttpResponse(url, myParams); htContent = HttpFunctions.responseText(h2); Console.WriteLine(htContent); h2.Dispose(); } }
传的参数都是正确的,但是返回的HTML内容却没有登录信息(登录处的HTML仍显示的Log In而不是用户名),这是因为网页会话信息保存在Cookies中,当前的程序并没有为网页的头部信息添加Cookies,这就不能保持登录状态。
针对我们要登录的网站,可以看到它有两个cookie字段。我们就编写函数读取cookie
public static string getCookies(HttpWebResponse h1) { if(h1!=null) { string ret = ""; ret += h1.Headers.GetValues("Set-Cookie")[0].Split(';')[0]; ret += "; "; ret += h1.Headers.GetValues("Set-Cookie")[1].Split(';')[0]; return ret; } else { return ""; } }
然后在刚才post方法的函数添加参数 string cookies=null
,函数中添加
if (cookies != null) { request.Headers.Add(HttpRequestHeader.Cookie, cookies); }
就能在POST时将COOKIES发送过去。
这里getCookies(HttpWebResponse h1)
方法因服务端而异,并不具通用性,换到其它网站上进行操作就要重要分析HTML结构并根据其结构来提取字段。这里只是用于展示添加cookie的方法。
读取图片内容
HTTP的返回结果在C#中都以stream形式展示,例如使用url表示的是服务端的一张图片,我们只需要使用Image.FromStream()即可:
public static void downloadPicture(string url) { var h1 = CreateGetHttpResponse(url); if (h1 != null) { System.Drawing.Image image = System.Drawing.Image.FromStream(h1.GetResponseStream()); image.Save("download.png",System.Drawing.Imaging.ImageFormat.Png); } }
如果URL正确,就可以看到在程序的路径下保存下了download.png
如果进行的是C#控制台程序设计,那么是没有
System.Drawing
的,需要手动添加对System.Drawing.Common.dll
的引用
上传图片
向服务端发送文件往往采用Content-type为”multipart/form-data”的post方法,添加POST参数时按照 multipart/form-data 类型的规范进行编写。
public static string Sys_uploadStudentPhoto( string url, string imageName, IDictionary<string, string> stringDict) { return HttpPostData(url, "file", imageName, stringDict); } private static string HttpPostData( string url, string fileKeyName, string filePath, IDictionary<string, string> stringDict) { string responseContent; var memStream = new MemoryStream(); var request = (HttpWebRequest)WebRequest.Create(url); var boundary = "---------------" + DateTime.Now.Ticks.ToString("x"); var beginBoundary = Encoding.ASCII.GetBytes("--" + boundary + "\r\n"); var fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read); var endBoundary = Encoding.ASCII.GetBytes("--" + boundary + "--\r\n"); request.Method = "POST"; request.ContentType = "multipart/form-data; boundary=" + boundary; const string filePartHeader = "Content-Disposition: form-data; name=\"{0}\"; filename=\"{1}\"\r\n" + "Content-Type: application/octet-stream\r\n\r\n"; var header = string.Format(filePartHeader, fileKeyName, filePath); var headerbytes = Encoding.UTF8.GetBytes(header); memStream.Write(beginBoundary, 0, beginBoundary.Length); memStream.Write(headerbytes, 0, headerbytes.Length); var buffer = new byte[1024]; int bytesRead; // =0 while ((bytesRead = fileStream.Read(buffer, 0, buffer.Length)) != 0) { memStream.Write(buffer, 0, bytesRead); } var stringKeyHeader = "\r\n--" + boundary + "\r\nContent-Disposition: form-data; name=\"{0}\"" + "\r\n\r\n{1}\r\n"; foreach (byte[] formitembytes in from string key in stringDict.Keys select string.Format(stringKeyHeader, key, stringDict[key]) into formitem select Encoding.UTF8.GetBytes(formitem)) { memStream.Write(formitembytes, 0, formitembytes.Length); } memStream.Write(endBoundary, 0, endBoundary.Length); request.ContentLength = memStream.Length; var requestStream = request.GetRequestStream(); memStream.Position = 0; var tempBuffer = new byte[memStream.Length]; memStream.Read(tempBuffer, 0, tempBuffer.Length); memStream.Close(); requestStream.Write(tempBuffer, 0, tempBuffer.Length); requestStream.Close(); responseContent = responseText(request.GetResponse()); fileStream.Close(); httpWebResponse.Close(); request.Abort(); return responseContent; }