<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Space-X</title>
  
  <subtitle>X-Blog</subtitle>
  <link href="/atom.xml" rel="self"/>
  
  <link href="https://spaces-x.github.io/"/>
  <updated>2019-05-10T13:15:37.590Z</updated>
  <id>https://spaces-x.github.io/</id>
  
  <author>
    <name>[object Object]</name>
    
  </author>
  
  <generator uri="http://hexo.io/">Hexo</generator>
  
  <entry>
    <title>hadoopRPC</title>
    <link href="https://spaces-x.github.io/2019/05/10/hadoopRPC/"/>
    <id>https://spaces-x.github.io/2019/05/10/hadoopRPC/</id>
    <published>2019-05-10T12:56:12.000Z</published>
    <updated>2019-05-10T13:15:37.590Z</updated>
    
    <content type="html"><![CDATA[<h1 id="Hadoop-RPC模块源码分析"><a href="#Hadoop-RPC模块源码分析" class="headerlink" title="Hadoop RPC模块源码分析"></a>Hadoop RPC模块源码分析</h1><h2 id="RPC概述"><a href="#RPC概述" class="headerlink" title="RPC概述"></a>RPC概述</h2><p>参考文章</p><blockquote><p><a href="https://www.cnblogs.com/qq503665965/p/6708644.html" target="_blank" rel="noopener">https://www.cnblogs.com/qq503665965/p/6708644.html</a></p></blockquote><h2 id="RPC组成"><a href="#RPC组成" class="headerlink" title="RPC组成"></a>RPC组成</h2><p>Hadoop RPC主要由三大类组成，即RPC、Client、和Server ，分别对应对外编程接口、客户端实现和服务器端实现。Hadoop 关于rpc的代码在hadoop-common下的org.apache.hadoop.ipc包中。</p><h2 id="类结构关系详解"><a href="#类结构关系详解" class="headerlink" title="类结构关系详解"></a>类结构关系详解</h2><p>类图是老版本的，部分函数名有变化但是架构没变。</p><ol><li><p><strong>ipc.RPC</strong></p><p>关键类图如下：</p><p><a href="https://imgchr.com/i/E8xkOx" target="_blank" rel="noopener"><img src="https://s2.ax1x.com/2019/04/30/E8xkOx.md.png" alt="E8xkOx.md.png"></a> </p></li><li><p><strong>ipc.Client</strong></p><p>关键类图分析如下：</p><p><a href="https://imgchr.com/i/E8x0cn" target="_blank" rel="noopener"><img src="https://s2.ax1x.com/2019/04/30/E8x0cn.md.png" alt="E8x0cn.md.png"></a></p></li></ol><ol start="3"><li><p><strong>ipc.Server</strong></p><p><a href="https://imgchr.com/i/E8xBXq" target="_blank" rel="noopener"><img src="https://s2.ax1x.com/2019/04/30/E8xBXq.md.png" alt="E8xBXq.md.png"></a></p></li></ol><h2 id="源码分析"><a href="#源码分析" class="headerlink" title="源码分析"></a>源码分析</h2><h3 id="client-实现"><a href="#client-实现" class="headerlink" title="client 实现"></a>client 实现</h3><p>Client端实现结构如下图所示，从图中可以看出Client 包含两个内部类 Call和Connection</p><p><img src="https://s2.ax1x.com/2019/04/30/EGkhgU.png" alt="EGkhgU.png"></p><ol><li><p><strong>static class Call 内部类</strong></p><p>该类封装了一个RPC请求，它包含五个成员变量，分别是唯一标识<strong>id</strong>，函数调用信息<strong>rpcRequest</strong>、函数执行返回值<strong>rpcResponse</strong>，异常信息<strong>error</strong>和执行完成标识<strong>done</strong>。由于HadoopRPCServer采用了异步方式处理客户端请求，这使得远程过程调用的发生顺序与结果返回顺序无直接关系，而Client端正是通过id识别不同的函数调用。当客户端向服务端发送请求时，只需要填充<strong>id</strong>和<strong>rpcRequest</strong>这两个变量，而剩下的三个变量：<strong>rpcResponse,error,done</strong>,则由服务端根据函数执行情况填充.</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">static</span> <span class="class"><span class="keyword">class</span> <span class="title">Call</span> </span>&#123;</span><br><span class="line">  <span class="keyword">final</span> <span class="keyword">int</span> id;               <span class="comment">// call id</span></span><br><span class="line">  <span class="keyword">final</span> <span class="keyword">int</span> retry;           <span class="comment">// retry count</span></span><br><span class="line">  <span class="keyword">final</span> Writable rpcRequest;  <span class="comment">// the serialized rpc request</span></span><br><span class="line">  Writable rpcResponse;       <span class="comment">// null if rpc has error</span></span><br><span class="line">  IOException error;          <span class="comment">// exception, null if success</span></span><br><span class="line">  <span class="keyword">final</span> RPC.RpcKind rpcKind;      <span class="comment">// Rpc EngineKind</span></span><br><span class="line">  <span class="keyword">boolean</span> done;               <span class="comment">// true when call is done</span></span><br><span class="line">  ...</span><br><span class="line">  <span class="function"><span class="keyword">public</span> <span class="keyword">synchronized</span> <span class="keyword">void</span> <span class="title">setRpcResponse</span><span class="params">(Writable rpcResponse)</span> </span>&#123;</span><br><span class="line">    <span class="keyword">this</span>.rpcResponse = rpcResponse;</span><br><span class="line">    callComplete();</span><br><span class="line">  &#125;</span><br><span class="line">  ...</span><br><span class="line">  <span class="function"><span class="keyword">protected</span> <span class="keyword">synchronized</span> <span class="keyword">void</span> <span class="title">callComplete</span><span class="params">()</span> </span>&#123;</span><br><span class="line">    <span class="keyword">this</span>.done = <span class="keyword">true</span>;</span><br><span class="line">    notify();                                 <span class="comment">// notify caller</span></span><br><span class="line">  &#125;  </span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>通过Call的setRpcResponse来设置RPC请求返回的结果，设置后并调用Call的callComplete方法</p></li><li><p><strong>private class Connection extends Thread内部类</strong></p><p>用Client与每个Server之间维护一个通信连接。该连接相关的基本信息及操作被封装到Connection类中，其中基本信息主要包括：通信连接唯一标识remoteId,与Server端通信的Socket,网络输入流in,网络输出流out,保存RPC请求的哈希表calls等.</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">private</span> <span class="class"><span class="keyword">class</span> <span class="title">Connection</span> <span class="keyword">extends</span> <span class="title">Thread</span> </span>&#123;</span><br><span class="line">    <span class="keyword">private</span> InetSocketAddress server;             <span class="comment">// server ip:port</span></span><br><span class="line">    <span class="keyword">private</span> <span class="keyword">final</span> ConnectionId remoteId;                <span class="comment">// connection id</span></span><br><span class="line">    <span class="keyword">private</span> Socket socket = <span class="keyword">null</span>;                 <span class="comment">// connected socket</span></span><br><span class="line">    <span class="keyword">private</span> DataInputStream in;</span><br><span class="line">    <span class="keyword">private</span> DataOutputStream out;</span><br><span class="line">    ...</span><br><span class="line">    <span class="keyword">private</span> Hashtable&lt;Integer, Call&gt; calls = <span class="keyword">new</span> Hashtable&lt;Integer, Call&gt;();</span><br><span class="line">    ...</span><br><span class="line">    <span class="function"><span class="keyword">private</span> <span class="keyword">synchronized</span> <span class="keyword">void</span> <span class="title">setupIOstreams</span><span class="params">(</span></span></span><br><span class="line"><span class="function"><span class="params">        AtomicBoolean fallbackToSimpleAuth)</span> </span></span><br><span class="line"><span class="function">    </span>&#123;</span><br><span class="line">      <span class="keyword">if</span> (socket != <span class="keyword">null</span> || shouldCloseConnection.get()) &#123;</span><br><span class="line">        <span class="keyword">return</span>;</span><br><span class="line">      &#125; </span><br><span class="line">      <span class="keyword">try</span> &#123;</span><br><span class="line">        <span class="keyword">if</span> (LOG.isDebugEnabled()) &#123;</span><br><span class="line">          LOG.debug(<span class="string">"Connecting to "</span>+server);</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="keyword">if</span> (Trace.isTracing()) &#123;</span><br><span class="line">          Trace.addTimelineAnnotation(<span class="string">"IPC client connecting to "</span> + server);</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="keyword">short</span> numRetries = <span class="number">0</span>;</span><br><span class="line">        Random rand = <span class="keyword">null</span>;</span><br><span class="line">        <span class="keyword">while</span> (<span class="keyword">true</span>) &#123;</span><br><span class="line">          <span class="comment">// 与远程服务器建立连接, 创建一个Socket对象  </span></span><br><span class="line">          setupConnection();</span><br><span class="line">          InputStream inStream = NetUtils.getInputStream(socket);<span class="comment">// 获取输入流 </span></span><br><span class="line">          OutputStream outStream = NetUtils.getOutputStream(socket); <span class="comment">// 获取输出流</span></span><br><span class="line">          <span class="comment">// 发送RPC Header信息给RPC服务器, 这里RPC服务器正常接收后不会响应, 因为只会验证客户端和服务端RPC程序版本是否匹配, 但是验证没通过后会响应失败状态, 并且服务端会关闭连接 </span></span><br><span class="line">          writeConnectionHeader(outStream);</span><br><span class="line">          ...</span><br><span class="line">          <span class="comment">// 包装输入输出流给in 和 out</span></span><br><span class="line">          <span class="keyword">this</span>.in = <span class="keyword">new</span> DataInputStream(<span class="keyword">new</span> BufferedInputStream(inStream));</span><br><span class="line">          <span class="keyword">if</span> (!(outStream <span class="keyword">instanceof</span> BufferedOutputStream)) &#123;</span><br><span class="line">            outStream = <span class="keyword">new</span> BufferedOutputStream(outStream);</span><br><span class="line">          &#125;</span><br><span class="line">          <span class="keyword">this</span>.out = <span class="keyword">new</span> DataOutputStream(outStream);</span><br><span class="line">          <span class="comment">// 调用start()启动线程</span></span><br><span class="line">          start();</span><br><span class="line">          <span class="keyword">return</span>;</span><br><span class="line">        &#125;</span><br><span class="line">      &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    ...</span><br><span class="line">    </span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>在Connection的setupIOstreams方法中会去建立和服务端的连接，本质会去创建一个Socket对象，建立一个TCP长连接，并且封装相关输入输出流。最后调用start（）启动线程</p></li></ol>   <figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">sendRpcRequest</span><span class="params">(<span class="keyword">final</span> Call call)</span></span></span><br><span class="line"><span class="function">        <span class="keyword">throws</span> InterruptedException, IOException </span>&#123;</span><br><span class="line">      <span class="keyword">if</span> (shouldCloseConnection.get()) &#123;</span><br><span class="line">        <span class="keyword">return</span>;</span><br><span class="line">      &#125;</span><br><span class="line">    ...</span><br><span class="line">     <span class="keyword">synchronized</span> (sendRpcRequestLock) &#123;</span><br><span class="line">        Future&lt;?&gt; senderFuture = sendParamsExecutor.submit(<span class="keyword">new</span> Runnable() &#123;</span><br><span class="line">          <span class="meta">@Override</span></span><br><span class="line">          <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">run</span><span class="params">()</span> </span>&#123;</span><br><span class="line">            <span class="keyword">try</span> &#123;</span><br><span class="line">              <span class="keyword">synchronized</span> (Connection.<span class="keyword">this</span>.out) &#123;</span><br><span class="line">                  <span class="comment">//// 对于同一个OutputStream必须同步发送RPC调用, 因为在同一个连接上的多个调用Call必须在同步下进行RPC请求  </span></span><br><span class="line">                <span class="keyword">if</span> (shouldCloseConnection.get()) &#123;</span><br><span class="line">                  <span class="keyword">return</span>;</span><br><span class="line">                &#125;</span><br><span class="line">                </span><br><span class="line">                <span class="keyword">if</span> (LOG.isDebugEnabled())</span><br><span class="line">                  LOG.debug(getName() + <span class="string">" sending #"</span> + call.id);</span><br><span class="line">         </span><br><span class="line">                <span class="keyword">byte</span>[] data = d.getData();</span><br><span class="line">                <span class="keyword">int</span> totalLength = d.getLength();</span><br><span class="line">                out.writeInt(totalLength); <span class="comment">// Total Length 1.写入CallId和调用参数（方法名、方法参数类型、方法参数值）的长度, 4个字节  </span></span><br><span class="line">                out.write(data, <span class="number">0</span>, totalLength);<span class="comment">// RpcRequestHeader + RpcRequest2.写入CallId和序列化后的调用参数（方法名、方法参数类型、方法参数值）  </span></span><br><span class="line">                out.flush();</span><br><span class="line">              &#125;</span><br><span class="line">            &#125; <span class="keyword">catch</span> (IOException e) &#123;...&#125;</span><br><span class="line">            <span class="keyword">try</span> &#123;</span><br><span class="line">              senderFuture.get();</span><br><span class="line">            &#125; <span class="keyword">catch</span> (ExecutionException e) &#123;</span><br><span class="line">              Throwable cause = e.getCause();&#125;</span><br><span class="line">           ...</span><br><span class="line">          &#125;</span><br><span class="line">        &#125; ...</span><br></pre></td></tr></table></figure><p>   客户端发起RPC请求时，会先去把请求相关的调用方法参数等序列化成字节流发送给服务端，核心代码如上</p>   <figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">run</span><span class="params">()</span> </span>&#123;</span><br><span class="line">      <span class="keyword">if</span> (LOG.isDebugEnabled())</span><br><span class="line">        LOG.debug(getName() + <span class="string">": starting, having connections "</span> </span><br><span class="line">            + connections.size());</span><br><span class="line"></span><br><span class="line">      <span class="keyword">try</span> &#123;</span><br><span class="line">        <span class="keyword">while</span> (waitForWork()) &#123;<span class="comment">//wait here for work - read or close connection</span></span><br><span class="line">          receiveRpcResponse();</span><br><span class="line">        &#125;</span><br><span class="line">      &#125; <span class="keyword">catch</span> (Throwable t) &#123;</span><br><span class="line">        <span class="comment">// This truly is unexpected, since we catch IOException in receiveResponse</span></span><br><span class="line">        <span class="comment">// -- this is only to be really sure that we don't leave a client hanging</span></span><br><span class="line">        <span class="comment">// forever.</span></span><br><span class="line">        LOG.warn(<span class="string">"Unexpected error reading responses on connection "</span> + <span class="keyword">this</span>, t);</span><br><span class="line">        markClosed(<span class="keyword">new</span> IOException(<span class="string">"Error reading responses"</span>, t));</span><br><span class="line">      &#125;</span><br><span class="line">      </span><br><span class="line">      close();</span><br><span class="line">      </span><br><span class="line">      <span class="keyword">if</span> (LOG.isDebugEnabled())</span><br><span class="line">        LOG.debug(getName() + <span class="string">": stopped, remaining connections "</span></span><br><span class="line">            + connections.size());</span><br><span class="line">    &#125;</span><br></pre></td></tr></table></figure><p>   connection类的run函数不停地调用receiveRpcResponse（）方法来获取服务端结果</p><p>   receiveResponse 函数的关键代码如下，在receiveResponse中主要获取应答头部，根据服务端返回的头部信息判断Rpc请求应答的<strong>status</strong>,并读取<strong>callId</strong>通过callId映射到<strong>Call</strong>对象，并从该Connection持有的所有的calls映射中删除该call，读取输入流，调用Call对象的<strong>setRpcResponse()</strong>为该call设置<strong>RpcResponse</strong></p>   <figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">private</span> <span class="keyword">void</span> <span class="title">receiveRpcResponse</span><span class="params">()</span> </span>&#123;</span><br><span class="line">    <span class="keyword">try</span> &#123;</span><br><span class="line">        <span class="keyword">int</span> totalLen = in.readInt();</span><br><span class="line">           RpcResponseHeaderProto header = </span><br><span class="line">               RpcResponseHeaderProto.parseDelimitedFrom(in);</span><br><span class="line">           checkResponse(header);</span><br><span class="line">        <span class="keyword">int</span> callId = header.getCallId();</span><br><span class="line">        Call call = calls.get(callId);</span><br><span class="line">        RpcStatusProto status = header.getStatus();</span><br><span class="line">        <span class="keyword">int</span> callId = header.getCallId();</span><br><span class="line">        Call call = calls.get(callId);</span><br><span class="line">        <span class="keyword">if</span> (status == RpcStatusProto.SUCCESS) &#123;</span><br><span class="line">             Writable value = ReflectionUtils.newInstance(valueClass, conf);</span><br><span class="line">             value.readFields(in);                 <span class="comment">// read value</span></span><br><span class="line">             calls.remove(callId);</span><br><span class="line">             call.setRpcResponse(value);</span><br><span class="line">            ...</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;<span class="keyword">catch</span>()&#123;&#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><ol start="3"><li><p>Client 主类</p><p><strong>call()方法：</strong>通过ConnectionId获取/建立连接，并封装rpc请求call，通过connection发送rpc请求，发送后同步call代码段中不停地检测call是否done，如果非done则wait()阻塞直到相应的connection调用receiveRpcResponse（）方法触发call.setRpcResponse(value)进而触发callComplete（）方法。</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">public</span> Writable <span class="title">call</span><span class="params">(RPC.RpcKind rpcKind, Writable rpcRequest,</span></span></span><br><span class="line"><span class="function"><span class="params">     ConnectionId remoteId, <span class="keyword">int</span> serviceClass,</span></span></span><br><span class="line"><span class="function"><span class="params">     AtomicBoolean fallbackToSimpleAuth)</span> <span class="keyword">throws</span> IOException </span>&#123;</span><br><span class="line">   <span class="keyword">final</span> Call call = createCall(rpcKind, rpcRequest);</span><br><span class="line">   Connection connection = getConnection(remoteId, call, serviceClass,</span><br><span class="line">     fallbackToSimpleAuth);</span><br><span class="line">   <span class="keyword">try</span> &#123;</span><br><span class="line">     connection.sendRpcRequest(call);                 <span class="comment">// send the rpc request</span></span><br><span class="line">   &#125; <span class="keyword">catch</span> (RejectedExecutionException e) &#123;</span><br><span class="line">     <span class="keyword">throw</span> <span class="keyword">new</span> IOException(<span class="string">"connection has been closed"</span>, e);</span><br><span class="line">   &#125; <span class="keyword">catch</span> (InterruptedException e) &#123;</span><br><span class="line">     Thread.currentThread().interrupt();</span><br><span class="line">     LOG.warn(<span class="string">"interrupted waiting to send rpc request to server"</span>, e);</span><br><span class="line">     <span class="keyword">throw</span> <span class="keyword">new</span> IOException(e);</span><br><span class="line">   &#125;</span><br><span class="line">   </span><br><span class="line">   <span class="keyword">synchronized</span> (call) &#123;</span><br><span class="line">     <span class="keyword">while</span> (!call.done) &#123;</span><br><span class="line">       <span class="keyword">try</span> &#123;</span><br><span class="line">         call.wait();                           <span class="comment">// wait for the result</span></span><br><span class="line">       &#125; <span class="keyword">catch</span> (InterruptedException ie) &#123;</span><br><span class="line">         Thread.currentThread().interrupt();</span><br><span class="line">         <span class="keyword">throw</span> <span class="keyword">new</span> InterruptedIOException(<span class="string">"Call interrupted"</span>);</span><br><span class="line">       &#125;</span><br><span class="line">     &#125;</span><br><span class="line">   </span><br><span class="line">     <span class="keyword">if</span> (call.error != <span class="keyword">null</span>) &#123;</span><br><span class="line">       <span class="keyword">if</span> (call.error <span class="keyword">instanceof</span> RemoteException) &#123;</span><br><span class="line">         call.error.fillInStackTrace();</span><br><span class="line">         <span class="keyword">throw</span> call.error;</span><br><span class="line">       &#125; <span class="keyword">else</span> &#123; <span class="comment">// local exception</span></span><br><span class="line">         InetSocketAddress address = connection.getRemoteAddress();</span><br><span class="line">         <span class="keyword">throw</span> NetUtils.wrapException(address.getHostName(),</span><br><span class="line">                 address.getPort(),</span><br><span class="line">                 NetUtils.getHostname(),</span><br><span class="line">                 <span class="number">0</span>,</span><br><span class="line">                 call.error);</span><br><span class="line">       &#125;</span><br><span class="line">     &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">       <span class="keyword">return</span> call.getRpcResponse();</span><br><span class="line">     &#125;</span><br><span class="line">   &#125;</span><br><span class="line"> &#125;</span><br></pre></td></tr></table></figure><p><strong>getConnection()方法：</strong> </p><p>首先通过ConnectionID查找client的connections中是否包含改connection, 不包含则创建新的并加入到connections中。调用Connection的setupIOstreams方法包装输入、输出流并调用start().s</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">private</span> Connection <span class="title">getConnection</span><span class="params">(ConnectionId remoteId,</span></span></span><br><span class="line"><span class="function"><span class="params">     Call call, <span class="keyword">int</span> serviceClass, AtomicBoolean fallbackToSimpleAuth)</span></span></span><br><span class="line"><span class="function">     <span class="keyword">throws</span> IOException </span>&#123;</span><br><span class="line">   <span class="keyword">if</span> (!running.get()) &#123;</span><br><span class="line">     <span class="comment">// the client is stopped</span></span><br><span class="line">     <span class="keyword">throw</span> <span class="keyword">new</span> IOException(<span class="string">"The client is stopped"</span>);</span><br><span class="line">   &#125;</span><br><span class="line">   Connection connection;</span><br><span class="line">   <span class="comment">/* we could avoid this allocation for each RPC by having a  </span></span><br><span class="line"><span class="comment">    * connectionsId object and with set() method. We need to manage the</span></span><br><span class="line"><span class="comment">    * refs for keys in HashMap properly. For now its ok.</span></span><br><span class="line"><span class="comment">    */</span></span><br><span class="line">   <span class="keyword">do</span> &#123;</span><br><span class="line">     <span class="keyword">synchronized</span> (connections) &#123;</span><br><span class="line">       connection = connections.get(remoteId);</span><br><span class="line">       <span class="keyword">if</span> (connection == <span class="keyword">null</span>) &#123;</span><br><span class="line">         connection = <span class="keyword">new</span> Connection(remoteId, serviceClass);</span><br><span class="line">         connections.put(remoteId, connection);</span><br><span class="line">       &#125;</span><br><span class="line">     &#125;</span><br><span class="line">   &#125; <span class="keyword">while</span> (!connection.addCall(call));</span><br><span class="line">   </span><br><span class="line">   <span class="comment">//we don't invoke the method below inside "synchronized (connections)"</span></span><br><span class="line">   <span class="comment">//block above. The reason for that is if the server happens to be slow,</span></span><br><span class="line">   <span class="comment">//it will take longer to establish a connection and that will slow the</span></span><br><span class="line">   <span class="comment">//entire system down.</span></span><br><span class="line">   connection.setupIOstreams(fallbackToSimpleAuth);</span><br><span class="line">   <span class="keyword">return</span> connection;</span><br><span class="line"> &#125;</span><br></pre></td></tr></table></figure></li></ol><p>综上所述，Client端处理流程具体序列如下图所示：</p><p><a href="https://imgchr.com/i/EGAcse" target="_blank" rel="noopener"><img src="https://s2.ax1x.com/2019/04/30/EGAcse.png" alt="EGAcse.png"></a></p><h3 id="Server-实现"><a href="#Server-实现" class="headerlink" title="Server 实现"></a>Server 实现</h3><p>Server的结构如下图所示，从图中可以看出，server端包含的内部类比较多，其中一些是和Client端重复的<strong>Call</strong>，还有一些是Server独有的如Reader（Listener的内部类）、Handler、Listener、Responder 他们的作用如下：</p><ul><li>Listener ： 请求监听类，用于监听客户端发来的请求. </li><li>Connection ：连接类，真正的客户端请求读取逻辑在这个类中. </li><li>Reader : （Listener的内部类）当监听器监听到用户请求，便让Reader读取用户请求数据. </li><li>Call ：用于封装客户端发来的请求. </li><li>Handler ：请求处理类，会循环阻塞读取callQueue中的call对象，并对其进行操作.</li><li>Responder ：响应RPC请求类，请求处理完毕，由Responder发送给请求客户端. </li></ul><p><a href="https://imgchr.com/i/EGEVF1" target="_blank" rel="noopener"><img src="https://s2.ax1x.com/2019/04/30/EGEVF1.png" alt="EGEVF1.png"></a></p><ol><li><p><strong>请求处理阶段</strong></p><p>​    该阶段的主要任务是接收来自各个客户端的RPC请求，并将它们封装成固定的格式（Call对象）放到一个共享阻塞队列callQueue中，以便进行后续处理。该阶段内部又分为两个子阶段：请求接收和请求读取，分别有两种线程完成：Listener和Reader请求接收线程Listener初始化源码如下，整个Server只有一个Listener线程，统一负责监听来自客户端的连接请求，一旦有新的请求到达，它会采用轮训的方式从线程池中选择一个Reader线程进行处理。Listener的run() 方法中会阻塞等待客户端请求建立连接，Listener的run()方法的核心代码. </p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">private</span> <span class="class"><span class="keyword">class</span> <span class="title">Listener</span> <span class="keyword">extends</span> <span class="title">Thread</span> </span>&#123;</span><br><span class="line">   </span><br><span class="line">   <span class="keyword">private</span> ServerSocketChannel acceptChannel = <span class="keyword">null</span>; <span class="comment">//the accept channel</span></span><br><span class="line">   <span class="keyword">private</span> Selector selector = <span class="keyword">null</span>; <span class="comment">//the selector that we use for the server</span></span><br><span class="line">   <span class="keyword">private</span> Reader[] readers = <span class="keyword">null</span>;</span><br><span class="line">   <span class="keyword">private</span> <span class="keyword">int</span> currentReader = <span class="number">0</span>;</span><br><span class="line">   <span class="keyword">private</span> InetSocketAddress address; <span class="comment">//the address we bind at</span></span><br><span class="line">   ...</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>Listerner 的run方法: 在Selector中</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">run</span><span class="params">()</span> </span>&#123;</span><br><span class="line">     LOG.info(Thread.currentThread().getName() + <span class="string">": starting"</span>);</span><br><span class="line">     SERVER.set(Server.<span class="keyword">this</span>);</span><br><span class="line">     connectionManager.startIdleScan();</span><br><span class="line">     <span class="keyword">while</span> (running) &#123;</span><br><span class="line">       SelectionKey key = <span class="keyword">null</span>;</span><br><span class="line">       <span class="keyword">try</span> &#123;</span><br><span class="line">         getSelector().select();<span class="comment">// 如果Selector中注册的ServerSocketChannel没有新的Socket请求的话, 就阻塞在这里  </span></span><br><span class="line">         Iterator&lt;SelectionKey&gt; iter = getSelector().selectedKeys().iterator();</span><br><span class="line">         <span class="keyword">while</span> (iter.hasNext()) &#123;</span><br><span class="line">           key = iter.next();</span><br><span class="line">           iter.remove();</span><br><span class="line">           <span class="keyword">try</span> &#123;</span><br><span class="line">             <span class="keyword">if</span> (key.isValid()) &#123;</span><br><span class="line">               <span class="keyword">if</span> (key.isAcceptable())</span><br><span class="line">                 doAccept(key);<span class="comment">// 处理连接事件  </span></span><br><span class="line">             &#125;</span><br><span class="line">           &#125; <span class="keyword">catch</span> (IOException e) &#123;</span><br><span class="line">           &#125;</span><br><span class="line">           key = <span class="keyword">null</span>;</span><br><span class="line">         &#125;</span><br><span class="line">       &#125; <span class="keyword">catch</span> (OutOfMemoryError e) &#123;</span><br><span class="line">         <span class="comment">// we can run out of memory if we have too many threads</span></span><br><span class="line">         <span class="comment">// log the event and sleep for a minute and give </span></span><br><span class="line">         <span class="comment">// some thread(s) a chance to finish</span></span><br><span class="line">         LOG.warn(<span class="string">"Out of Memory in server select"</span>, e);</span><br><span class="line">         closeCurrentConnection(key, e);</span><br><span class="line">         connectionManager.closeIdle(<span class="keyword">true</span>);</span><br><span class="line">         <span class="keyword">try</span> &#123; Thread.sleep(<span class="number">60000</span>); &#125; <span class="keyword">catch</span> (Exception ie) &#123;&#125;</span><br><span class="line">       &#125; <span class="keyword">catch</span> (Exception e) &#123;</span><br><span class="line">         closeCurrentConnection(key, e);</span><br><span class="line">       &#125;</span><br><span class="line">     &#125;</span><br><span class="line">    ...</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>紧接着具体的请求接收处理是在Listener的doAccept()方法中处理的，获取连接后会往Reader线程中的多路复用器Selector注册连接，Listener的doAccept方法的核心代码如下：</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">void</span> <span class="title">doAccept</span><span class="params">(SelectionKey key)</span> <span class="keyword">throws</span> InterruptedException, IOException,  OutOfMemoryError </span>&#123;</span><br><span class="line">      ServerSocketChannel server = (ServerSocketChannel) key.channel();<span class="comment">// 拿到ServerSocketchannel  </span></span><br><span class="line">      SocketChannel channel;<span class="comment">// 拿到Socketchannel  </span></span><br><span class="line">      <span class="keyword">while</span> ((channel = server.accept()) != <span class="keyword">null</span>) &#123; <span class="comment">// 非阻塞的拿到SocketChannel  </span></span><br><span class="line"></span><br><span class="line">        channel.configureBlocking(<span class="keyword">false</span>);<span class="comment">// 把SocketChannel设置为非阻塞模式  </span></span><br><span class="line">        channel.socket().setTcpNoDelay(tcpNoDelay);</span><br><span class="line">        channel.socket().setKeepAlive(<span class="keyword">true</span>);</span><br><span class="line">        </span><br><span class="line">        Reader reader = getReader();<span class="comment">// 随机轮询获取一个Rearder线程  </span></span><br><span class="line">        Connection c = connectionManager.register(channel);</span><br><span class="line">        <span class="comment">// If the connectionManager can't take it, close the connection.</span></span><br><span class="line">        <span class="keyword">if</span> (c == <span class="keyword">null</span>) &#123;</span><br><span class="line">          <span class="keyword">if</span> (channel.isOpen()) &#123;</span><br><span class="line">            IOUtils.cleanup(<span class="keyword">null</span>, channel);</span><br><span class="line">          &#125;</span><br><span class="line">          connectionManager.droppedConnections.getAndIncrement();</span><br><span class="line">          <span class="keyword">continue</span>;</span><br><span class="line">        &#125;</span><br><span class="line">        key.attach(c);  <span class="comment">// so closeCurrentConnection can get the object</span></span><br><span class="line">        reader.addConnection(c);</span><br><span class="line">      &#125;</span><br><span class="line">    &#125;</span><br></pre></td></tr></table></figure><p>​    客户端和服务端连接建立成功之后，服务端的Reader线程中维护了连接，有了连接就可以传输数据，Reader线程的run方法中就是阻塞去等待客户端的请求数据，一旦该连接上有可读数据，该Reader线程就会被唤醒，紧接着会去解析字节流序列化请求数据，封装成Call对象，塞到callQueue阻塞队列，Reader的run()方法的核心代码如下：</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">run</span><span class="params">()</span> </span>&#123;</span><br><span class="line">       LOG.info(<span class="string">"Starting "</span> + Thread.currentThread().getName());</span><br><span class="line">       <span class="keyword">try</span> &#123;</span><br><span class="line">         doRunLoop();</span><br><span class="line">       &#125; <span class="keyword">finally</span> &#123;</span><br><span class="line">         <span class="keyword">try</span> &#123;</span><br><span class="line">           readSelector.close();</span><br><span class="line">         &#125; <span class="keyword">catch</span> (IOException ioe) &#123;</span><br><span class="line">           LOG.error(<span class="string">"Error closing read selector in "</span> + Thread.currentThread().getName(), ioe);</span><br><span class="line">         &#125;</span><br><span class="line">       &#125;</span><br><span class="line">     &#125;</span><br><span class="line"><span class="function"><span class="keyword">private</span> <span class="keyword">synchronized</span> <span class="keyword">void</span> <span class="title">doRunLoop</span><span class="params">()</span> </span>&#123;</span><br><span class="line">       <span class="keyword">while</span> (running) &#123;</span><br><span class="line">         SelectionKey key = <span class="keyword">null</span>;</span><br><span class="line">         <span class="keyword">try</span> &#123;</span><br><span class="line">           <span class="comment">// consume as many connections as currently queued to avoid</span></span><br><span class="line">           <span class="comment">// unbridled acceptance of connections that starves the select</span></span><br><span class="line">           <span class="keyword">int</span> size = pendingConnections.size();</span><br><span class="line">           <span class="keyword">for</span> (<span class="keyword">int</span> i=size; i&gt;<span class="number">0</span>; i--) &#123;</span><br><span class="line">             Connection conn = pendingConnections.take();</span><br><span class="line">             conn.channel.register(readSelector, SelectionKey.OP_READ, conn);</span><br><span class="line">           &#125;</span><br><span class="line">           readSelector.select();<span class="comment">// 如果Selector中注册的SocketChannel中都没有可读数据的话, 就阻塞在这里  </span></span><br><span class="line">   </span><br><span class="line">           Iterator&lt;SelectionKey&gt; iter = readSelector.selectedKeys().iterator();</span><br><span class="line">           <span class="keyword">while</span> (iter.hasNext()) &#123;</span><br><span class="line">             key = iter.next();</span><br><span class="line">             iter.remove();</span><br><span class="line">             <span class="keyword">try</span> &#123;</span><br><span class="line">               <span class="keyword">if</span> (key.isReadable()) &#123; <span class="comment">// SocketChannel有可读数据  </span></span><br><span class="line">                 doRead(key);</span><br><span class="line">               &#125;</span><br><span class="line">                 ...</span><br><span class="line">             &#125;</span><br><span class="line">           &#125;</span><br><span class="line">         &#125;</span><br><span class="line">       &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>​    在Reader的run 中调用了doRunLoop()方法，该方法将connections注册到readSelector，并调用doRead()读取SockletChannel中的数据（如果有）。doRead（）中具体的读取及解析请求数据交给Connection来处理，核心代码如下：</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"> <span class="function"><span class="keyword">void</span> <span class="title">doRead</span><span class="params">(SelectionKey key)</span> <span class="keyword">throws</span> InterruptedException </span>&#123;</span><br><span class="line">      <span class="keyword">int</span> count = <span class="number">0</span>;</span><br><span class="line">      Connection c = (Connection)key.attachment();</span><br><span class="line">      <span class="keyword">if</span> (c == <span class="keyword">null</span>) &#123;</span><br><span class="line">        <span class="keyword">return</span>;  </span><br><span class="line">      &#125;</span><br><span class="line">      c.setLastContact(Time.now());</span><br><span class="line">      </span><br><span class="line">      <span class="keyword">try</span> &#123;</span><br><span class="line">        count = c.readAndProcess();</span><br><span class="line">      &#125; <span class="keyword">catch</span> (InterruptedException ieo) &#123;</span><br><span class="line">        LOG.info(Thread.currentThread().getName() + <span class="string">": readAndProcess caught InterruptedException"</span>, ieo);</span><br><span class="line">        <span class="keyword">throw</span> ieo;</span><br><span class="line">      &#125; <span class="keyword">catch</span> (Exception e) &#123;...</span><br><span class="line">                            &#125;</span><br><span class="line"> &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>在doRead中调用了Connection的readAndProcess（）方法，接着来看Connection类的readAndProcess()方法，主要从连接中读取请求数据，核心代码如下：</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">int</span> <span class="title">readAndProcess</span><span class="params">()</span></span></span><br><span class="line"><span class="function">       <span class="keyword">throws</span> WrappedRpcServerException, IOException, InterruptedException </span>&#123;</span><br><span class="line">     <span class="keyword">while</span> (<span class="keyword">true</span>) &#123;</span><br><span class="line">         ...;</span><br><span class="line">        <span class="keyword">if</span> (data == <span class="keyword">null</span>) &#123;</span><br><span class="line">         dataLengthBuffer.flip();</span><br><span class="line">         dataLength = dataLengthBuffer.getInt();</span><br><span class="line">         checkDataLength(dataLength);</span><br><span class="line">         data = ByteBuffer.allocate(dataLength);<span class="comment">// 根据dataLength创建一个dataLength大小的缓冲区, 用来读数据  </span></span><br><span class="line">       &#125;</span><br><span class="line">       </span><br><span class="line">       count = channelRead(channel, data);<span class="comment">// 读取第一次请求Header信息或请求数据  </span></span><br><span class="line">       </span><br><span class="line">       <span class="keyword">if</span> (data.remaining() == <span class="number">0</span>) &#123;</span><br><span class="line">         dataLengthBuffer.clear();<span class="comment">// 清空dataLengthBuffer  </span></span><br><span class="line">         data.flip();</span><br><span class="line">         <span class="keyword">boolean</span> isHeaderRead = connectionContextRead;</span><br><span class="line">         processOneRpc(data.array());<span class="comment">// 处理rpc请求,把封装好的请求信息Call塞到callQueue阻塞队列  </span></span><br><span class="line">         data = <span class="keyword">null</span>;</span><br><span class="line">         <span class="keyword">if</span> (!isHeaderRead) &#123; <span class="comment">// 读取第一次RPC请求Header之后会再continue, 继续读取请求数据</span></span><br><span class="line">           <span class="keyword">continue</span>;</span><br><span class="line">         &#125;</span><br><span class="line">       &#125; </span><br><span class="line">       <span class="keyword">return</span> count;</span><br><span class="line">     &#125;</span><br></pre></td></tr></table></figure><p>在readAndProcess中调用processOneRpc()方法处理rpc请求，在processOneRpc（）中调用processRpcRequest（）方法来将请求解析封装成server端的Call对象并加入callQueue中。</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">private</span> <span class="keyword">void</span> <span class="title">processOneRpc</span><span class="params">(<span class="keyword">byte</span>[] buf)</span></span></span><br><span class="line"><span class="function">    <span class="keyword">throws</span> IOException, WrappedRpcServerException, InterruptedException </span>&#123;</span><br><span class="line">    <span class="keyword">int</span> callId = -<span class="number">1</span>;</span><br><span class="line">    <span class="comment">// 通过输入流读取buf</span></span><br><span class="line">    <span class="keyword">final</span> DataInputStream dis =</span><br><span class="line">            <span class="keyword">new</span> DataInputStream(<span class="keyword">new</span> ByteArrayInputStream(buf));</span><br><span class="line">    <span class="comment">// 通过流操作，获取header</span></span><br><span class="line">    <span class="keyword">final</span> RpcRequestHeaderProto header =</span><br><span class="line">            decodeProtobufFromStream(RpcRequestHeaderProto.newBuilder(), dis);</span><br><span class="line">    callId = header.getCallId();</span><br><span class="line">    callId = header.getCallId();</span><br><span class="line">    retry = header.getRetryCount();</span><br><span class="line">    .......;</span><br><span class="line">    <span class="keyword">if</span> (callId &lt; <span class="number">0</span>) &#123; <span class="comment">// callIds typically used during connection setup</span></span><br><span class="line">          processRpcOutOfBandRequest(header, dis);</span><br><span class="line">        &#125; <span class="keyword">else</span> <span class="keyword">if</span> (!connectionContextRead) &#123;</span><br><span class="line">          <span class="keyword">throw</span> <span class="keyword">new</span> WrappedRpcServerException(</span><br><span class="line">              RpcErrorCodeProto.FATAL_INVALID_RPC_HEADER,</span><br><span class="line">              <span class="string">"Connection context not established"</span>);</span><br><span class="line">        &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">        <span class="comment">// callId正常 调用processRpcRequest</span></span><br><span class="line">          processRpcRequest(header, dis);</span><br><span class="line">        &#125;</span><br><span class="line">      &#125; <span class="keyword">catch</span> (WrappedRpcServerException wrse) &#123; <span class="comment">// inform client of error</span></span><br><span class="line">        Throwable ioe = wrse.getCause();</span><br><span class="line">    <span class="comment">//构造error call，并调用setupResponse函数通知给客户端错误。</span></span><br><span class="line">        <span class="keyword">final</span> Call call = <span class="keyword">new</span> Call(callId, retry, <span class="keyword">null</span>, <span class="keyword">this</span>);</span><br><span class="line">        setupResponse(authFailedResponse, call,</span><br><span class="line">            RpcStatusProto.FATAL, wrse.getRpcErrorCodeProto(), <span class="keyword">null</span>,</span><br><span class="line">            ioe.getClass().getName(), ioe.getMessage());</span><br><span class="line">        call.sendResponse();</span><br><span class="line">        <span class="keyword">throw</span> wrse;</span><br><span class="line">      &#125;</span><br></pre></td></tr></table></figure></li></ol>   <figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">private</span> <span class="keyword">void</span> <span class="title">processRpcRequest</span><span class="params">(RpcRequestHeaderProto header,</span></span></span><br><span class="line"><span class="function"><span class="params">       DataInputStream dis)</span> <span class="keyword">throws</span> WrappedRpcServerException,</span></span><br><span class="line"><span class="function">       InterruptedException </span>&#123;</span><br><span class="line">     Writable rpcRequest;</span><br><span class="line">     <span class="keyword">try</span> &#123; <span class="comment">//Read the rpc request</span></span><br><span class="line">       rpcRequest = ReflectionUtils.newInstance(rpcRequestClass, conf);</span><br><span class="line">       rpcRequest.readFields(dis);</span><br><span class="line">     &#125;<span class="keyword">catch</span>()&#123;&#125;</span><br><span class="line">          ......;</span><br><span class="line">  <span class="comment">// 构造新call</span></span><br><span class="line">     Call call = <span class="keyword">new</span> Call(header.getCallId(), header.getRetryCount(),</span><br><span class="line">         rpcRequest, <span class="keyword">this</span>, ProtoUtil.convert(header.getRpcKind()),</span><br><span class="line">         header.getClientId().toByteArray(), traceSpan);</span><br><span class="line">     <span class="comment">//将call 加入到队列中</span></span><br><span class="line">      <span class="keyword">if</span> (callQueue.isClientBackoffEnabled()) &#123;</span><br><span class="line">       <span class="comment">// if RPC queue is full, we will ask the RPC client to back off by</span></span><br><span class="line">       <span class="comment">// throwing RetriableException. Whether RPC client will honor</span></span><br><span class="line">       <span class="comment">// RetriableException and retry depends on client ipc retry policy.</span></span><br><span class="line">       <span class="comment">// For example, FailoverOnNetworkExceptionRetry handles</span></span><br><span class="line">       <span class="comment">// RetriableException.</span></span><br><span class="line">       queueRequestOrAskClientToBackOff(call);</span><br><span class="line">     &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">       callQueue.put(call);              <span class="comment">// queue the call; maybe blocked here</span></span><br><span class="line">     &#125;</span><br><span class="line">     incRpcCount();  <span class="comment">// Increment the rpc count</span></span><br><span class="line">     </span><br><span class="line">           </span><br><span class="line">       &#125;</span><br></pre></td></tr></table></figure><p>   <strong>至此请求接收结束。</strong></p><hr><ol start="2"><li><p><strong>请求处理</strong></p><p>该阶段的主要任务是从共享队列callQueue中获取Call对象，执行相应的函数调用，并将结果返回给客户端，这全部由Handler线程完成的。Server端可同时存在多个Handler线程。它们并行从共享队列中读取Call对象,经执行对应的韩式调用后，将尝试着直接将结果返回给对应的客户端。但考虑到某些函数调用返回的结果很大或者网络速度过慢，可能难以将结果一次性发送到客户端，此时Handler将尝试着将后续发送任务交给Responder线程。Handler的run方法中会阻塞等待callQueue队列中有请求数据，Handler的run()核心代码如下：</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">run</span><span class="params">()</span> </span>&#123; </span><br><span class="line">    LOG.debug(Thread.currentThread().getName() + <span class="string">": starting"</span>);</span><br><span class="line">      SERVER.set(Server.<span class="keyword">this</span>);</span><br><span class="line">      ByteArrayOutputStream buf = </span><br><span class="line">        <span class="keyword">new</span> ByteArrayOutputStream(INITIAL_RESP_BUF_SIZE);</span><br><span class="line">    <span class="keyword">while</span> (running) &#123;</span><br><span class="line">            TraceScope traceScope = <span class="keyword">null</span>;</span><br><span class="line">            <span class="keyword">try</span> &#123;</span><br><span class="line">              <span class="keyword">final</span> Call call = callQueue.take(); <span class="comment">// pop the queue; maybe blocked here</span></span><br><span class="line">              String errorClass = <span class="keyword">null</span>;</span><br><span class="line">              String error = <span class="keyword">null</span>;</span><br><span class="line">              RpcStatusProto returnStatus = RpcStatusProto.SUCCESS;</span><br><span class="line">              RpcErrorCodeProto detailedErr = <span class="keyword">null</span>;</span><br><span class="line">              Writable value = <span class="keyword">null</span>;</span><br><span class="line"></span><br><span class="line">              CurCall.set(call);</span><br><span class="line">              ...;</span><br><span class="line"></span><br><span class="line">              <span class="keyword">try</span> &#123;</span><br><span class="line">                <span class="comment">// Make the call as the user via Subject.doAs, thus associating</span></span><br><span class="line">                <span class="comment">// the call with the Subject</span></span><br><span class="line">                <span class="keyword">if</span> (call.connection.user == <span class="keyword">null</span>) &#123;</span><br><span class="line">                  value = call(call.rpcKind, call.connection.protocolName, call.rpcRequest, call.timestamp);</span><br><span class="line">                &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">                  value = </span><br><span class="line">                    call.connection.user.doAs</span><br><span class="line">                      (<span class="keyword">new</span> PrivilegedExceptionAction&lt;Writable&gt;() &#123;</span><br><span class="line">                         <span class="meta">@Override</span></span><br><span class="line">                         <span class="function"><span class="keyword">public</span> Writable <span class="title">run</span><span class="params">()</span> <span class="keyword">throws</span> Exception </span>&#123;</span><br><span class="line">                           <span class="comment">// make the call</span></span><br><span class="line">                          <span class="comment">// 反射调用对应服务，返回结果ObjectWritable, 传入Connection中接口的Class对象, 是在建立连接之后第一次客户端请求带过来的  </span></span><br><span class="line">                           <span class="keyword">return</span> call(call.rpcKind, call.connection.protocolName, </span><br><span class="line">                                       call.rpcRequest, call.timestamp);</span><br><span class="line"></span><br><span class="line">                         &#125;</span><br><span class="line">                       &#125;</span><br><span class="line">                      );</span><br><span class="line">                &#125;<span class="keyword">catch</span>(Expection e)&#123;...&#125;</span><br><span class="line">                ...;</span><br><span class="line">                 CurCall.set(<span class="keyword">null</span>);</span><br><span class="line">              <span class="keyword">synchronized</span> (call.connection.responseQueue) &#123;</span><br><span class="line">                  <span class="comment">// 同一个连接上的多个响应必须在同步下进行  </span></span><br><span class="line">                setupResponse(buf, call, returnStatus, detailedErr,</span><br><span class="line">                    value, errorClass, error);<span class="comment">// 生成返回给客户端的数据包,包含(客户端调用ID+状态status+RPC方法返回值),设置到Call对象中  </span></span><br><span class="line"></span><br><span class="line">                <span class="comment">// Discard the large buf and reset it back to smaller size</span></span><br><span class="line">                <span class="comment">// to free up heap.</span></span><br><span class="line">                <span class="keyword">if</span> (buf.size() &gt; maxRespSize) &#123;</span><br><span class="line">                  LOG.warn(<span class="string">"Large response size "</span> + buf.size() + <span class="string">" for call "</span></span><br><span class="line">                      + call.toString());</span><br><span class="line">                  buf = <span class="keyword">new</span> ByteArrayOutputStream(INITIAL_RESP_BUF_SIZE);</span><br><span class="line">                &#125;</span><br><span class="line">                call.sendResponse();</span><br><span class="line">              &#125;<span class="keyword">catch</span>(Exception e)&#123;...&#125;</span><br><span class="line"></span><br><span class="line">            &#125;</span><br><span class="line">     &#125;</span><br><span class="line">    &#125;</span><br></pre></td></tr></table></figure><p>服务端拿到调用参数之后，会反射调用对应服务，返回方法返回值</p></li></ol><ol start="3"><li><p><strong>请求响应</strong></p><p>​    每个Handler线程执行完函数调用后，会尝试着将执行结果返回给客户端，但由于特殊情况，比如函数调用返回的结果过大或者网络异常情况，会将发送任务交给Responder线程，Server端仅存在一个Responder线程，它的内部包含一个多路复用器Selector对象，用于监听SelectionKey.OP_WRITE事件，当Handler没能够将结果一次性发送到客户端时，会向该Selector对象注册SelectorKey.OP_WRITE事件，进而由Responder线程采用异步方式继续发送未发送完成的结果，具体的核心代码如下：</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">run</span><span class="params">()</span> </span>&#123;</span><br><span class="line">      LOG.info(Thread.currentThread().getName() + <span class="string">": starting"</span>);</span><br><span class="line">      SERVER.set(Server.<span class="keyword">this</span>);</span><br><span class="line">      <span class="keyword">try</span> &#123;</span><br><span class="line">        doRunLoop();</span><br><span class="line">      &#125; <span class="keyword">finally</span> &#123;</span><br><span class="line">        LOG.info(<span class="string">"Stopping "</span> + Thread.currentThread().getName());</span><br><span class="line">        <span class="keyword">try</span> &#123;</span><br><span class="line">          writeSelector.close();</span><br><span class="line">        &#125; <span class="keyword">catch</span> (IOException ioe) &#123;</span><br><span class="line">          LOG.error(<span class="string">"Couldn't close write selector in "</span> + Thread.currentThread().getName(), ioe);</span><br><span class="line">        &#125;</span><br><span class="line">      &#125;</span><br><span class="line">    &#125;</span><br></pre></td></tr></table></figure><p>看看 doRunLoop函数干什么,从多路复用器Selector对象获取Handler 未发送的结果，调用doAsyncWrite异步写发送。</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">private</span> <span class="keyword">void</span> <span class="title">doRunLoop</span><span class="params">()</span> </span>&#123;</span><br><span class="line">  <span class="keyword">long</span> lastPurgeTime = <span class="number">0</span>;   <span class="comment">// last check for old calls.</span></span><br><span class="line"></span><br><span class="line">  <span class="keyword">while</span> (running) &#123;</span><br><span class="line">    <span class="keyword">try</span> &#123;</span><br><span class="line">      waitPending();     <span class="comment">// If a channel is being registered, wait.</span></span><br><span class="line">      writeSelector.select(PURGE_INTERVAL);</span><br><span class="line">      Iterator&lt;SelectionKey&gt; iter = writeSelector.selectedKeys().iterator();</span><br><span class="line">      <span class="keyword">while</span> (iter.hasNext()) &#123;</span><br><span class="line">        SelectionKey key = iter.next();</span><br><span class="line">        iter.remove();</span><br><span class="line">        <span class="keyword">try</span> &#123;</span><br><span class="line">          <span class="keyword">if</span> (key.isWritable()) &#123;</span><br><span class="line">            doAsyncWrite(key);</span><br><span class="line">          &#125;</span><br><span class="line">        &#125; <span class="keyword">catch</span> (CancelledKeyException cke) &#123;...&#125;</span><br><span class="line">        ...;</span><br><span class="line"></span><br><span class="line">      &#125;</span><br><span class="line">    &#125;</span><br><span class="line">  &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>那再看看doAsyncWrite（）内部</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">private</span> <span class="keyword">void</span> <span class="title">doAsyncWrite</span><span class="params">(SelectionKey key)</span> <span class="keyword">throws</span> IOException </span>&#123;</span><br><span class="line">  Call call = (Call)key.attachment();</span><br><span class="line">  <span class="keyword">if</span> (call == <span class="keyword">null</span>) &#123;</span><br><span class="line">    <span class="keyword">return</span>;</span><br><span class="line">  &#125;</span><br><span class="line">  <span class="keyword">if</span> (key.channel() != call.connection.channel) &#123;</span><br><span class="line">    <span class="keyword">throw</span> <span class="keyword">new</span> IOException(<span class="string">"doAsyncWrite: bad channel"</span>);</span><br><span class="line">  &#125;</span><br><span class="line"></span><br><span class="line">  <span class="keyword">synchronized</span>(call.connection.responseQueue) &#123;<span class="comment">//  同一个连接上的多个响应必须在同步下进行</span></span><br><span class="line">    <span class="keyword">if</span> (processResponse(call.connection.responseQueue, <span class="keyword">false</span>)) &#123;</span><br><span class="line">      <span class="keyword">try</span> &#123;</span><br><span class="line">        key.interestOps(<span class="number">0</span>);</span><br><span class="line">      &#125; <span class="keyword">catch</span> (CancelledKeyException e) &#123;</span><br><span class="line">        <span class="comment">/* The Listener/reader might have closed the socket.</span></span><br><span class="line"><span class="comment">         * We don't explicitly cancel the key, so not sure if this will</span></span><br><span class="line"><span class="comment">         * ever fire.</span></span><br><span class="line"><span class="comment">         * This warning could be removed.</span></span><br><span class="line"><span class="comment">         */</span></span><br><span class="line">        LOG.warn(<span class="string">"Exception while changing ops : "</span> + e);</span><br><span class="line">      &#125;</span><br><span class="line">    &#125;</span><br><span class="line">  &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>Server 端的状态转移图如下所示：</p><p><a href="https://imgchr.com/i/EWF8gI" target="_blank" rel="noopener"><img src="https://s2.ax1x.com/2019/05/10/EWF8gI.md.png" alt="EWF8gI.md.png"></a></p></li></ol>]]></content>
    
    <summary type="html">
    
      
      
        &lt;h1 id=&quot;Hadoop-RPC模块源码分析&quot;&gt;&lt;a href=&quot;#Hadoop-RPC模块源码分析&quot; class=&quot;headerlink&quot; title=&quot;Hadoop RPC模块源码分析&quot;&gt;&lt;/a&gt;Hadoop RPC模块源码分析&lt;/h1&gt;&lt;h2 id=&quot;RPC概述&quot;&gt;&lt;a
      
    
    </summary>
    
      <category term="RPC" scheme="https://spaces-x.github.io/categories/RPC/"/>
    
      <category term="Hadoop" scheme="https://spaces-x.github.io/categories/RPC/Hadoop/"/>
    
    
      <category term="Hadoop" scheme="https://spaces-x.github.io/tags/Hadoop/"/>
    
      <category term="RPC" scheme="https://spaces-x.github.io/tags/RPC/"/>
    
  </entry>
  
  <entry>
    <title>Memory Paging</title>
    <link href="https://spaces-x.github.io/2019/04/04/memorypaging/"/>
    <id>https://spaces-x.github.io/2019/04/04/memorypaging/</id>
    <published>2019-04-04T05:03:25.000Z</published>
    <updated>2019-04-04T05:42:56.985Z</updated>
    
    <content type="html"><![CDATA[<h1 id="一、分页内存管理"><a href="#一、分页内存管理" class="headerlink" title="一、分页内存管理"></a>一、分页内存管理</h1><h2 id="1-1-解决问题之道"><a href="#1-1-解决问题之道" class="headerlink" title="1.1 解决问题之道"></a>1.1 解决问题之道</h2><p>　　为了解决交换系统存在的缺陷，分页系统横空出世。分页系统的核心在于：<strong>将虚拟内存空间和物理内存空间皆划分为大小相同的页面，如4KB、8KB或16KB等，并以页面作为内存空间的最小分配单位，一个程序的一个页面可以存放在任意一个物理页面里</strong>。</p><p>　　（1）解决空间浪费碎片化问题</p><p>　　由于将虚拟内存空间和物理内存空间按照某种规定的大小进行分配，这里我们称之为页（Page），然后按照页进行内存分配，也就克服了外部碎片的问题。</p><p>　　（2）解决程序大小受限问题</p><p>　　程序增长有限是因为一个程序需要全部加载到内存才能运行，因此解决的办法就是使得一个程序无须全部加载就可以运行。使用分页也可以解决这个问题，只需将当前需要的页面放在内存里，其他暂时不用的页面放在磁盘上，这样一个程序同时占用内存和磁盘，其增长空间就大大增加了。而且，分页之后，如果一个程序需要更多的空间，给其分配一个新页即可（而无需将程序倒出倒进从而提高空间增长效率）。</p><h2 id="1-2-虚拟地址的构成与地址翻译"><a href="#1-2-虚拟地址的构成与地址翻译" class="headerlink" title="1.2 虚拟地址的构成与地址翻译"></a>1.2 虚拟地址的构成与地址翻译</h2><p>　　（1）虚拟地址的构成</p><p>　　在分页系统下，<strong>一个程序发出的虚拟地址由两部分组成：页面号和页内偏移值</strong>，如下图所示：</p><p><img src="https://s2.ax1x.com/2019/04/04/AgXsmQ.jpg" alt="AgXsmQ.jpg"></p><p>　　例如，对于32位寻址的系统，如果页面大小为4KB，则页面号占20位，页内偏移值占12位。</p><p>　　（2）地址翻译：虚拟地址→物理地址</p><p>　　<strong>分页系统的核心是页面的翻译，即从虚拟页面到物理页面的映射（Mapping）</strong>。该翻译过程如下伪代码所示：</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span>(虚拟页面非法、不在内存中或被保护)</span><br><span class="line">&#123;</span><br><span class="line">    陷入到操作系统错误服务程序</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">else</span></span><br><span class="line">&#123;</span><br><span class="line">    将虚拟页面号转换为物理页面号</span><br><span class="line">    根据物理页面号产生最终物理地址</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>　　而这个翻译过程由内存管理单元（MMU）完成，MMU接收CPU发出的虚拟地址，将其翻译为物理地址后发送给内存。内存管理单元按照该物理地址进行相应访问后读出或写入相关数据，如下图所示：</p><p><a href="https://imgchr.com/i/A6Xx1S" target="_blank" rel="noopener"><img src="https://s2.ax1x.com/2019/04/02/A6Xx1S.jpg" alt="A6Xx1S.jpg"></a></p><p>　　那么，这个翻译是怎么实现的呢？答案是<strong>查页表</strong>，对于每个程序，内存管理单元MMU都为其保存一个页表，该页表中存放的是虚拟页面到物理页面的映射。每当为一个虚拟页面寻找到一个物理页面之后，就在页表里增加一条记录来保留该映射关系。当然，随着虚拟页面进出物理内存，页表的内容也会不断更新变化。</p><p><img src="https://s2.ax1x.com/2019/04/04/AgX46U.jpg" alt="AgX46U.jpg"></p><h2 id="1-3-页表"><a href="#1-3-页表" class="headerlink" title="1.3 页表"></a>1.3 页表</h2><p>　　<strong>页表的根本功能是提供从虚拟页面到物理页面的映射</strong>。因此，页表的记录条数与虚拟页面数相同。此外，<strong>内存管理单元依赖于页表来进行一切与页面有关的管理活动</strong>，这些活动包括判断某一页面号是否在内存里，页面是否受到保护，页面是否非法空间等等。</p><p>　　页表的一个记录所包括的内容如下图所示：</p><p><img src="https://s2.ax1x.com/2019/04/04/AgjPAA.jpg" alt="AgjPAA.jpg"></p><p>　　由于页表的特殊地位，决定了它是由硬件直接提供支持，即页表是一个硬件数据结构。</p><h2 id="1-4-分页系统的优缺点"><a href="#1-4-分页系统的优缺点" class="headerlink" title="1.4 分页系统的优缺点"></a>1.4 分页系统的优缺点</h2><p>　　优点：</p><p>　　（1）分页系统不会产生外部碎片，一个进程占用的内存空间可以不是连续的，并且一个进程的虚拟页面在不需要的时候可以放在磁盘中。</p><p>　　（2）分页系统可以共享小的地址，即页面共享。只需要在对应给定页面的页表项里做一个相关的记录即可。</p><p>　　缺点：页表很大，占用了大量的内存空间。</p><h2 id="1-5-缺页中断处理"><a href="#1-5-缺页中断处理" class="headerlink" title="1.5 缺页中断处理"></a>1.5 缺页中断处理</h2><p>　　在分页系统中，一个虚拟页面既有可能在物理内存，也有可能保存在磁盘上。<strong>如果CPU发出的虚拟地址对应的页面不在物理内存，就将产生一个缺页中断，而缺页中断服务程序负责将需要的虚拟页面找到并加载到内存</strong>。缺页中断的处理步骤如下，省略了中间很多的步骤，只保留最核心的几个步骤：</p><p><img src="https://s2.ax1x.com/2019/04/04/AgjFht.jpg" alt="AgjFht.jpg"></p><h1 id="二、页面置换算法"><a href="#二、页面置换算法" class="headerlink" title="二、页面置换算法"></a>二、页面置换算法</h1><p>　　如果发生了缺页中断，就需要从磁盘上将需要的页面调入内存。如果内存没有多余的空间，就需要在现有的页面中选择一个页面进行替换。使用不同的页面置换算法，页面更换的顺序也会各不相同。如果挑选的页面是之后很快又要被访问的页面，那么系统将很开再次产生缺页中断，因为磁盘访问速度远远内存访问速度，缺页中断的代价是非常大的。因此，挑选哪个页面进行置换不是随随便便的事情，而是有要求的。</p><h2 id="2-1-页面置换的目标"><a href="#2-1-页面置换的目标" class="headerlink" title="2.1 页面置换的目标"></a>2.1 页面置换的目标</h2><p>　　页面置换时挑选页面的目标主要在于<strong>降低随后发生缺页中断的次数或概率</strong>。</p><p>　　因此，挑选的页面应当是随后相当长时间内不会被访问的页面，最好是再也不会被访问的页面。BTW，如果可能，最好选择一个没有修改过的页面，这样替换时就无须将被替换页面的内容写回磁盘，从而进一步加快缺页中断的响应速度。</p><p>　　所以，为了达到这个目的，先驱们设计出了各种各样的页面置换算法，下面就来看看这些算法。</p><h2 id="2-2-随机更换算法"><a href="#2-2-随机更换算法" class="headerlink" title="2.2 随机更换算法"></a>2.2 随机更换算法</h2><p>　　在需要替换页面的时候，产生一个随机页面号，从而替换与该页面号对应的物理页面。遗憾的是，随机选出的被替换的页面不太可能是随后相当长时间内不会被访问的页面。也就是说，这种算法难以保证最小化随后的缺页中断次数。事实上，这种算法的效果相当差。</p><h2 id="2-3-先进先出算法"><a href="#2-3-先进先出算法" class="headerlink" title="2.3 先进先出算法"></a>2.3 先进先出算法</h2><p>　　顾名思义，先进先出（FIFO，First In First Out）算法的核心是更换最早进入内存的页面，其实现机制是使用链表将所有在内存中的页面按照进入时间的早晚链接起来，然后每次置换链表头上的页面就行了，而新加进来的页面则挂在链表的末端，如下图所示：</p><p><img src="https://s2.ax1x.com/2019/04/04/AgjA9P.jpg" alt="AgjA9P.jpg"></p><p>　　FIFO的优点是简单且容易实现，缺点是如果最先加载进来的页面是经常被访问的页面，那么就可能造成被访问的页面替换到磁盘上，导致很快就需要再次发生缺页中断，从而降低效率。</p><h2 id="2-4-第二次机会算法"><a href="#2-4-第二次机会算法" class="headerlink" title="2.4 第二次机会算法"></a>2.4 第二次机会算法</h2><p>　　由于FIFO只考虑进入内存的时间，不关心一个页面被访问的频率，从而有可能造成替换掉一个被经常访问的页面而造成效率低下。那么，可以对FIFO进行改进：<strong>在使用FIFO更换一个页面时，需要看一下该页面是否在最近被访问过，如果没有被访问过，则替换该页面。反之，如果最近被访问过（通过检查其访问位的取值），则不替换该页面，而是将该页面挂到链表末端，并将该页面进入内存的时间设置为当前时间，并将其访问位清零</strong>。这样，对于最近被访问过的页面来说，相当于给了它第二次机会。</p><p>　　例如，当A页面最近被访问过，即其访问位R的值为1，则使用第二次机会算法之后，链表的格局如下图所示：</p><p><img src="https://s2.ax1x.com/2019/04/04/AgjnBQ.jpg" alt="AgjnBQ.jpg"></p><p>　　第二次机会算法简单、公平且容易实现。但是，每次给予一个页面第二次机会时，将其移动到链表末端需要耗费时间。此外，页面的访问位只在页面替换进行扫描时才可能清零，所以其时间局域性体现得不好，访问位为1的页面可能是很久以前访问的，时间上的分辨粒度太粗，从而影响页面替换的效果。</p><h2 id="2-5-时钟算法"><a href="#2-5-时钟算法" class="headerlink" title="2.5 时钟算法"></a>2.5 时钟算法</h2><p>　　为了改善第二次机会算法的缺点，先驱们提出了时钟算法。时钟算法的核心思想是：<strong>将页面排成一个时钟的形状，该时钟有一个针臂，每次需要更换页面时，我们从针臂所指的页面开始检查。如果当前页面的访问位为0，即从上次检查到这次，该页面没有被访问过，将该页面替换。反之，就将其访问位清零，并顺时针移动指针到下一个页面。重复这些步骤，直到找到一个访问位为0的页面。</strong></p><p>　　例如下图所示的一个时钟，指针指向的页面是F，因此第一个被考虑替换的页面是F。如果页面F的访问位为0，F将被替换。如果F的访问位为1，则F的访问位清零，指针移动到页面G。</p><p><img src="https://images2015.cnblogs.com/blog/381412/201601/381412-20160102114151948-149680362.jpg" alt="img"></p><p>　　从表面上看，它和第二次机会算法类似，都是访问位为0就更换，反之则再给一次机会。但是，它和第二次机会算法还是有几点不同：</p><p>　　（1）他们的数据结构不一样，第二次机会使用的是链表，时钟算法使用的是索引（整数指针）。这样，其使用的内存空间不一样。</p><p>　　（2）第二次机会需要使用额外的内存，而时钟算法可以直接使用页表。使用页表的好处是无需额外的空间，更大的好处是页面的访问位会定期自动清零，这样将使得时钟算法的时间分辨粒度较第二次机会算法高，从而取得更好的页面替换效果。</p><p>　　时钟算法的精髓是第二次机会，其缺点也就和第二次机会算法一样：过于公平，没有考虑到不同页面调用频率的不同，有可能换出不应该或不能换出的页面，还可能造成无限循环。</p><blockquote><p><strong>PS：</strong>至此，随机、FIFO、第二次机会与时钟算法的介绍就到此结束，这四种算法都是属于“公平算法”，即所有的页面都或多或少地给予公平待遇，没有页面获得特殊待遇。但是这种公平实现方式，会使效率受到一定影响，这时因为个体对于整个系统的贡献没有被区别对待，造成贡献大的和贡献小的待遇一样，自然会影响整个系统的效率。</p></blockquote><h2 id="2-6-最优更换算法"><a href="#2-6-最优更换算法" class="headerlink" title="2.6 最优更换算法"></a>2.6 最优更换算法</h2><p>　　我们知道，最理想的页面替换算法是选择一个再也不会被访问的页面进行替换。如果不存在这样的页面，那至少选择一个在随后最长时间内不会被访问的页面进行替换。这样，我们就可以保证在随后发生缺页中断的次数最小或概率最低，这种算法就是最有替换算法。</p><p>　　但是，我们没法知道一个页面随后多长时间不会被访问，因此最优更换算法在实际中没法实现，那么为什么要介绍最有更换算法呢？这是为了定义一个标杆，以此来评判其他算法的优劣。</p><h2 id="2-7-NRU（最近未被使用）算法"><a href="#2-7-NRU（最近未被使用）算法" class="headerlink" title="2.7 NRU（最近未被使用）算法"></a>2.7 NRU（最近未被使用）算法</h2><p>　　顾名思义，<strong>NRU就是选择一个在最近一段时间内没有被访问过的页面进行替换</strong>，这是基于程序访问的时空局域性。因为根据时空局域性原理，一个最近没有被访问的页面，在随后的时间里也不太可能被访问，而NRU的实现方式就是利用页面的访问和修改位。</p><p>　　每个页面都有一个访问位和一个修改位，凡是对页面进行读写操作时，访问位被设置为1。当进程对页面进行读写操作时，修改位设置为1。根据这两个位的状态来对页面进行分类的话，可以分成以下四种页面类型：1、2、3、4。</p><p><img src="https://s2.ax1x.com/2019/04/04/AgjMAs.jpg" alt="AgjMAs.jpg"></p><p>　　有了这个分类，NRU算法就按照这四类页面的顺序依次寻找可以替换的页面。如果所有页面皆被访问和修改过，那也只能从中替换掉一个页面，因此NRU算法总是会终结的。</p><p>　　当然，这种分类比较笼统，在同一类页面里，我们没有办法分辨出哪一类被访问的时间更近一些。即在某些情况下，我们替换的可能并不是最近没有被使用的页面。</p><h2 id="2-8-LRU（最近最少使用）算法"><a href="#2-8-LRU（最近最少使用）算法" class="headerlink" title="2.8 LRU（最近最少使用）算法"></a>2.8 LRU（最近最少使用）算法</h2><p>　　与NRU算法相比，<strong>LRU算法不仅考虑最近是否用过，还要考虑最近使用的频率</strong>。这里是基于过去的数据预测未来：如果一个页面被访问的频率低，那么以后很可能也用不到。</p><p>　　LRU算法的实现必须以某种方式记录每个页面被访问的次数，这是个相当大的工作量。最简单的方式就是在页表的记录项里增加一个计数域，一个页面被访问一次，这个计数器的值就增加1。于是，当需要更换页面时，只需要找到计数域值最小的页面替换即可，该页面即是最近最少使用的页面。另一种简单实现方式就是用一个链表将所有页面链接起来，最近被使用的页面在链表头，最近未被使用的放在链表尾。在每次页面访问时对这个链表进行更新，使其保持最近被使用的页面在链表头。</p><p>　　LRU算法虽然很好，但是实现成本高（需要分辨出不同页面中哪个页面时最近最少使用的），并且时间代价大（每次页面访问发生时都需要更新记录）。因此，一般的商业操作系统都没有采纳LRU页面更新算法。</p><h2 id="2-9-工作集算法"><a href="#2-9-工作集算法" class="headerlink" title="2.9 工作集算法"></a>2.9 工作集算法</h2><p>　　由于不可能精确地确定那个页面是最近最少使用的，那就干脆不花费这个力气，只维持少量的信息使得我们选出的替换页面不太可能是马上又会使用的页面即可。这种少量的信息就是<strong>工作集信息</strong>。</p><p>　　工作集概念来源于程序访问的时空局限性，即在一段时间内，程序访问的页面将局限在一组页面集合上。例如，最近k次访问均发生在某m个页面上，那么m就是参数为k时的工作集。我们用w(k,t)来表示在时间t时k次访问所涉及的页面数量。</p><p>　　显然，随着k的增长，w(k,t)的值也随之增长；但是当k增长到某个数值之后，w(k,t)的值将增长极其缓慢甚至接近停滞，并维持一段时间的稳定，如下图所示：</p><p><img src="https://s2.ax1x.com/2019/04/04/Agjlhq.jpg" alt="Agjlhq.jpg"></p><p>　　由上图可以看出，<strong>如果一个程序在内存里面的页面数与其工作集大小相等或者超过工作集，则该程序可在一段时间内不会发生缺页中断</strong>。如果其在内存的页面数小于工作集，则发生缺页中断的频率将增加，甚至发生内存抖动。</p><p>　　因此，工作计算法的目标就是<strong>维持当前的工作集的页面在物理内存里面。每次页面更换时，寻找一个不属于当前工作集的页面替换即可</strong>。这样，我们再寻找页面时只需要将页面分离为两大类即可：当前工作集内页面和当前工作集外页面。如此，只要找到一个飞当前工作集的页面，将其替换即可。</p><p>　　工作集算法的优点：实现简单，只需要在页表的每个记录增加一个虚拟时间域即可。而且，这个时间域不是每次发生访问时都需要更新，而是在需要更换页面时，页面更换算法对其进行修改，因此时间成本也不大。</p><p>　　工作集算法的缺点：每次扫描页面进行替换时，有可能需要扫描整个页表。然而，并不是所有页面都内存里，因此扫描过程中的一大部分时间将是无用功。另外，由于其数据结构是线性的，会造成每次都按同样的顺序进行扫描，显得不太公平。</p><h2 id="2-10-工作集时钟算法"><a href="#2-10-工作集时钟算法" class="headerlink" title="2.10 工作集时钟算法"></a>2.10 工作集时钟算法</h2><p>　　鉴于工作集算法的缺点，先驱们将工作集算法与时钟算法结合起来，设计出了工作集时钟算法，即<strong>使用工作集算法的原理，但是将页面的扫描顺序按照时钟的形式组织起来。这样每次需要替换页面时，从指针指向的页面开始扫描，从而达到更加公平的状态</strong>。而且，<strong>按时钟组织的页面只是在内存里面的页面，在内存外的页面不放在时钟圈里，从而提高实现效率</strong>。</p><p>　　<strong>鉴于其时间与空间上的优势，工作集时钟算法被大多商业操作系统所采纳</strong>。</p><h1 id="参考资料"><a href="#参考资料" class="headerlink" title="参考资料"></a>参考资料</h1><p><img src="https://images2015.cnblogs.com/blog/381412/201511/381412-20151125223110077-842709175.jpg" alt="img"></p><p>邹恒明，《操作系统之哲学原理》，机械工业出版社</p>]]></content>
    
    <summary type="html">
    
      
      
        &lt;h1 id=&quot;一、分页内存管理&quot;&gt;&lt;a href=&quot;#一、分页内存管理&quot; class=&quot;headerlink&quot; title=&quot;一、分页内存管理&quot;&gt;&lt;/a&gt;一、分页内存管理&lt;/h1&gt;&lt;h2 id=&quot;1-1-解决问题之道&quot;&gt;&lt;a href=&quot;#1-1-解决问题之道&quot; class=&quot;
      
    
    </summary>
    
      <category term="OS" scheme="https://spaces-x.github.io/categories/OS/"/>
    
    
      <category term="linux" scheme="https://spaces-x.github.io/tags/linux/"/>
    
      <category term="paging" scheme="https://spaces-x.github.io/tags/paging/"/>
    
      <category term="memory" scheme="https://spaces-x.github.io/tags/memory/"/>
    
  </entry>
  
  <entry>
    <title>DecoratorMode</title>
    <link href="https://spaces-x.github.io/2019/03/31/DecoratorMode/"/>
    <id>https://spaces-x.github.io/2019/03/31/DecoratorMode/</id>
    <published>2019-03-31T08:26:40.000Z</published>
    <updated>2019-03-31T08:43:51.483Z</updated>
    
    <content type="html"><![CDATA[<h1 id="java-装饰器模式"><a href="#java-装饰器模式" class="headerlink" title="java 装饰器模式"></a>java 装饰器模式</h1><h2 id="结构"><a href="#结构" class="headerlink" title="结构"></a>结构</h2><p>模式结构</p><p><a href="https://imgchr.com/i/AruOFU" target="_blank" rel="noopener"><img src="https://s2.ax1x.com/2019/03/31/AruOFU.md.png" alt="AruOFU.md.png"></a></p><ul><li>Component抽象构件角色：真实对象和装饰对象有相同的接口。这样，客户端对象就能够以与真实对象相同的方式同装饰对象交互。</li><li>ConcreteCompoent具体构建角色(真实对象)：定义一个将要接收附加责任的类。</li><li>Decorator装饰角色：持有一个抽象构件的引用。装饰对象接受所有客户端的请求，并把这些请求转发给真实的对象。这样，就能在真实对象调用前后增加新的功能。</li><li>ConcreteDecorate具体装饰角色：负责给构件对象增加新的功能。</li></ul><p>代码结构：</p><p><img src="https://s2.ax1x.com/2019/03/31/AruXYF.png" alt="AruXYF.png"></p><h2 id="代码"><a href="#代码" class="headerlink" title="代码"></a>代码</h2><h3 id="Person-java"><a href="#Person-java" class="headerlink" title="Person.java"></a>Person.java</h3><p>Component抽象构件角色:      </p><p>​    就是一个功能接口</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">package</span> com.user;</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">interface</span> <span class="title">Person</span> </span>&#123;</span><br><span class="line"><span class="function"><span class="keyword">void</span> <span class="title">eat</span><span class="params">()</span></span>;</span><br><span class="line"></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="Man-java"><a href="#Man-java" class="headerlink" title="Man.java"></a>Man.java</h3><p>ConcreteCompoent具体构建角色(真实对象):</p><p>​    定义具体要被装饰的类（Man），这个类要实现上述接口（实现eat）</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">package</span> com.user;</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">Man</span> <span class="keyword">implements</span> <span class="title">Person</span> </span>&#123;</span><br><span class="line"></span><br><span class="line"><span class="meta">@Override</span></span><br><span class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">eat</span><span class="params">()</span> </span>&#123;</span><br><span class="line"><span class="comment">// TODO Auto-generated method stub</span></span><br><span class="line">System.out.println(<span class="string">"男人在吃饭"</span>);</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="Decorator-java"><a href="#Decorator-java" class="headerlink" title="Decorator.java"></a>Decorator.java</h3><p>Decorator装饰角色：</p><p>​    持有一个抽象构件的引用。装饰对象接受所有客户端的请求，并把这些请求转发给真实的对象。这样，就能在真实对象调用前后增加新的功能。</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">package</span> com.user;</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="keyword">abstract</span> <span class="class"><span class="keyword">class</span> <span class="title">Decorator</span> <span class="keyword">implements</span> <span class="title">Person</span> </span>&#123;</span><br><span class="line"></span><br><span class="line">Person person;   <span class="comment">//  持有接口的引用    加不加protected 都行 如果为了安全还是加上好</span></span><br><span class="line"><span class="function"><span class="keyword">public</span> <span class="title">Decorator</span><span class="params">(Person person)</span>   <span class="comment">// 将原里的set函数改为构造函数</span></span></span><br><span class="line"><span class="function"></span>&#123;</span><br><span class="line"><span class="keyword">this</span>.person = person;</span><br><span class="line">&#125;</span><br><span class="line"><span class="meta">@Override</span></span><br><span class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">eat</span><span class="params">()</span> </span>&#123;</span><br><span class="line"><span class="comment">// TODO Auto-generated method stub</span></span><br><span class="line">person.eat();</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>​    这个装饰器持有抽象接口（Person）的对象，并通过构造函数对它初始化，抽象装饰器也要实现步骤1中的抽象接口(Person),只不过抽象装饰器的实现方式比较特殊，它通过调用持有的抽象接口（Person）的对象的方法来实现抽象接口的功能函数。</p><p>​    那么算是一个接口的对象呢？ 具体来说就是实现了这个接口的所有类，都可以实例化出一个对象来作为接口的对象，比如 Man me = new Man(); 因为Man 实现了 Person接口所以me 就是一个接口（Person）对象，所以把me 传递到具体装饰角色DecoratorA（Person person）的构造函数里不会报错。</p><h3 id="DecoratorA-java"><a href="#DecoratorA-java" class="headerlink" title="DecoratorA.java"></a>DecoratorA.java</h3><p>ConcreteDecorate具体装饰角色：</p><p>​    负责给构件对象增加新的功能。</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">package</span> com.user;</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">DecoratorA</span> <span class="keyword">extends</span> <span class="title">Decorator</span> </span>&#123;  <span class="comment">// 具体的装饰器    继承   抽象的装饰器</span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">public</span> <span class="title">DecoratorA</span><span class="params">(Person person)</span> </span>&#123;</span><br><span class="line"><span class="keyword">super</span>(person);   <span class="comment">// 调用父类构造函数的 this.person = person</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">Aeat</span><span class="params">()</span></span>&#123;</span><br><span class="line"><span class="comment">// eat 的包装新功能</span></span><br><span class="line">System.out.println(<span class="string">"Eat A Balabala"</span>);</span><br><span class="line">&#125;</span><br><span class="line"><span class="meta">@Override</span></span><br><span class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">eat</span><span class="params">()</span></span>&#123;</span><br><span class="line"><span class="keyword">super</span>.eat(); <span class="comment">// 调用父类的eat()函数</span></span><br><span class="line">Aeat();      <span class="comment">// 执行Aeat()函数包装的新功能</span></span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="DecoratorB-java"><a href="#DecoratorB-java" class="headerlink" title="DecoratorB.java"></a>DecoratorB.java</h3><p>ConcreteDecorate具体装饰角色：</p><p>​    负责给构件对象增加新的功能。</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">package</span> com.user;</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">DecoratorB</span> <span class="keyword">extends</span> <span class="title">Decorator</span> </span>&#123; <span class="comment">// 具体的装饰器    继承   抽象的装饰器</span></span><br><span class="line"><span class="function"><span class="keyword">public</span> <span class="title">DecoratorB</span><span class="params">(Person person)</span> </span>&#123;</span><br><span class="line"><span class="keyword">super</span>(person);<span class="comment">//调用父类构造函数的 this.person = person</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">eatB</span><span class="params">()</span></span>&#123;</span><br><span class="line"><span class="comment">// eat 的另一种包装新功能</span></span><br><span class="line">System.out.println(<span class="string">"eat B balabala"</span>);</span><br><span class="line">&#125;</span><br><span class="line"><span class="meta">@Override</span></span><br><span class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">eat</span><span class="params">()</span></span>&#123;</span><br><span class="line"><span class="keyword">super</span>.eat();</span><br><span class="line">eatB();        <span class="comment">// 执行eat另一种包装新功能</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>装饰器步骤：</p><ol><li>定义要被装饰的功能（eat）抽象接口（Person）</li><li>定义具体要被装饰的类（Man），这个类要实现上述接口（实现eat）</li><li>定义抽象装饰器（Decorator），这个装饰器持有步骤1中的抽象接口（Person）的对象，并通过构造函数对它初始化，抽象装饰器也要实现步骤1中的抽象接口(Person),只不过 抽象装饰器的实现方式比较特殊，它通过调用持有的抽象接口（Person）的对象的方法来实现抽象接口的功能函数。 什么算是一个接口的对象呢？ 具体来说就是实现了这个接口的所有类，都可以实例化出一个对象来作为接口的对象，比如 Man me = new Man(); 因为Man 实现了 Person接口 所以me 就是一个接口（Person）对象，所以把me 传递到DecoratorA（Person person）的构造函数里不会报错~</li><li>定义具体装饰器（DecoratorA/B/C…），包装功能函数eat，在super的基础上添加新功能。</li></ol><h2 id="优缺点"><a href="#优缺点" class="headerlink" title="优缺点"></a>优缺点</h2><h3 id="优点"><a href="#优点" class="headerlink" title="优点"></a>优点</h3><ul><li>为类添加新的功能 但是 Man这个类却不用改变，也不会产生新的继承类，类的数目会比较少</li><li>可以对一个对象进行多次装饰，创造出不同的表现</li></ul><h3 id="缺点"><a href="#缺点" class="headerlink" title="缺点"></a>缺点</h3><ul><li>产生一堆装饰器对象  比如这里的DecoratorA da、DecoratorB db稍微占用内存空间</li><li>装饰模式易出错，调试排查比较麻烦。</li></ul>]]></content>
    
    <summary type="html">
    
      
      
        &lt;h1 id=&quot;java-装饰器模式&quot;&gt;&lt;a href=&quot;#java-装饰器模式&quot; class=&quot;headerlink&quot; title=&quot;java 装饰器模式&quot;&gt;&lt;/a&gt;java 装饰器模式&lt;/h1&gt;&lt;h2 id=&quot;结构&quot;&gt;&lt;a href=&quot;#结构&quot; class=&quot;headerli
      
    
    </summary>
    
      <category term="java设计模式" scheme="https://spaces-x.github.io/categories/java%E8%AE%BE%E8%AE%A1%E6%A8%A1%E5%BC%8F/"/>
    
    
      <category term="装饰器模式" scheme="https://spaces-x.github.io/tags/%E8%A3%85%E9%A5%B0%E5%99%A8%E6%A8%A1%E5%BC%8F/"/>
    
      <category term="java" scheme="https://spaces-x.github.io/tags/java/"/>
    
  </entry>
  
  <entry>
    <title>Algorithm1</title>
    <link href="https://spaces-x.github.io/2019/01/18/alogorithm/"/>
    <id>https://spaces-x.github.io/2019/01/18/alogorithm/</id>
    <published>2019-01-18T14:55:42.000Z</published>
    <updated>2019-01-19T09:03:03.170Z</updated>
    
    <content type="html"><![CDATA[<script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=default"></script><h1 id="动态规划"><a href="#动态规划" class="headerlink" title="动态规划"></a>动态规划</h1><h2 id="第一小题"><a href="#第一小题" class="headerlink" title="第一小题"></a>第一小题</h2><h3 id="题目："><a href="#题目：" class="headerlink" title="题目："></a>题目：</h3><p>某工厂调查了解市场情况，估计在今后四个月内，市场对其产品的需求量如下表所示。</p><table><thead><tr><th style="text-align:center">时期（月）</th><th style="text-align:center">需要量（产品单位）</th></tr></thead><tbody><tr><td style="text-align:center">1</td><td style="text-align:center">2</td></tr><tr><td style="text-align:center">2</td><td style="text-align:center">3</td></tr><tr><td style="text-align:center">3</td><td style="text-align:center">2</td></tr><tr><td style="text-align:center">4</td><td style="text-align:center">4</td></tr></tbody></table><p>已知：对每个月来讲，生产一批产品的固定成本费为 3 (千元)，若不生产，则为零。每<br>生产单位产品的成本费为 1 （千元)。同时，在任何一个月内，生产能力所允许的最大生产<br>批量为不超过 6 个单位。<br> 又知每单位产品的库存费用为每月 0.5 （千元），同时要求在第一个月开始之初， 及<br>在第四个月末，均无产品库存。<br> 问：在满足上述条件下，该厂应如何安排各个时期的生产与库存，使所花的总成本费用<br>最低？<br>要求：写出各种变量、状态转移方程、递推关系式、和详细计算步骤。</p><h3 id="解："><a href="#解：" class="headerlink" title="解："></a>解：</h3><p>如下图：</p><p><a href="https://imgchr.com/i/FN7KZ8" target="_blank" rel="noopener"><img src="https://s1.ax1x.com/2018/12/14/FN7KZ8.md.jpg" alt="FN7KZ8.md.jpg"></a></p><p><img src="https://s1.ax1x.com/2018/12/14/FN7QIg.jpg" alt="FN7QIg.jpg"></p><p><img src="https://s1.ax1x.com/2018/12/14/FNqzfe.jpg" alt="FNqzfe.jpg"></p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"></span><br><span class="line"></span><br></pre></td></tr></table></figure><h2 id="第二小题"><a href="#第二小题" class="headerlink" title="第二小题"></a>第二小题</h2><h3 id="题目：-1"><a href="#题目：-1" class="headerlink" title="题目："></a>题目：</h3><p>某推销员要从城市 v1 出发，访问其它城市 v2，v3，…，v6 各一次且仅一次，</p><p>最后返回 v1。D为各城市间的距离矩阵。问：该推销员应如何选择路线，才能使总的行程最短？</p><p>节点v1,v2,…,v6之间的距离矩阵D如下</p><p>$$<br>D= \left[</p><p> \begin{matrix}<br>   0 &amp; 10 &amp; 20 &amp; 30 &amp; 40 &amp; 50  \<br>   12 &amp; 0 &amp; 18 &amp; 30 &amp; 25 &amp; 21  \<br>  23 &amp; 19 &amp; 0 &amp; 5 &amp; 10 &amp; 15  \<br>   34&amp; 32 &amp; 4 &amp; 0 &amp; 8 &amp; 16  \<br>   45 &amp; 27 &amp; 11 &amp; 10 &amp; 0 &amp; 18  \<br>   56 &amp; 22 &amp; 16 &amp; 20 &amp; 12 &amp; 0  \</p><p>  \end{matrix}<br>  \right] \tag{1}<br>$$</p><h3 id="解"><a href="#解" class="headerlink" title="解:"></a>解:</h3><p>令L(v,U)  表示从v出发遍历U中所有点一次仅一次后返回到原点v_1的最短路径长度，则有如下的递推公式<br>$$<br>L(v_i,U_i) =\min_{v_{i+1} \in U_i } {     L(v_{i+1},U_i -{v_{i+1}})   +D[v_i]  [v_{i+1}]    } \tag{2}<br>$$<br>特别的<br>$$<br>L(v_i, \emptyset) = D[v_i][0] \tag{3}<br>$$<br>令函数<br>$$<br>min_len( v_i,U_i )<br>$$<br> 实现 L的功能</p><p>min_len ()函数输入起始城市和要遍历城市的集合，返回最小长度,下一跳节点组成的元组(最小长度,下一跳节点)流程图如下：</p><p><a href="https://imgchr.com/i/FNaoB8" target="_blank" rel="noopener"><img src="https://s1.ax1x.com/2018/12/13/FNaoB8.png" alt="FNaoB8.png"></a><br>主函数用于输出整个过程的路径和最小长度，流程如下：</p><p>答案输出：</p><p>最小路径长度： 80<br>路径为:  V1 -&gt;V2 -&gt;V6 -&gt;V5 -&gt;V4 -&gt;V3 -&gt;V1</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#!/usr/bin/env python</span></span><br><span class="line"><span class="comment"># encoding: utf-8</span></span><br><span class="line"><span class="comment"># 姓名：魏翔</span></span><br><span class="line"><span class="comment"># 学号：ZY1806220</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"></span><br><span class="line">dislist = [</span><br><span class="line">    [<span class="number">0</span>,<span class="number">10</span>,<span class="number">20</span>,<span class="number">30</span>,<span class="number">40</span>,<span class="number">50</span>],</span><br><span class="line">    [<span class="number">12</span>,<span class="number">0</span>,<span class="number">18</span>,<span class="number">30</span>,<span class="number">25</span>,<span class="number">21</span>],</span><br><span class="line">    [<span class="number">23</span>,<span class="number">19</span>,<span class="number">0</span>,<span class="number">5</span>,<span class="number">10</span>,<span class="number">15</span>],</span><br><span class="line">    [<span class="number">34</span>,<span class="number">32</span>,<span class="number">4</span>,<span class="number">0</span>,<span class="number">8</span>,<span class="number">16</span>],</span><br><span class="line">    [<span class="number">45</span>,<span class="number">27</span>,<span class="number">11</span>,<span class="number">10</span>,<span class="number">0</span>,<span class="number">18</span>],</span><br><span class="line">    [<span class="number">56</span>,<span class="number">22</span>,<span class="number">16</span>,<span class="number">20</span>,<span class="number">12</span>,<span class="number">0</span>]</span><br><span class="line">]</span><br><span class="line">U = set([<span class="number">0</span>,<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">4</span>,<span class="number">5</span>])</span><br><span class="line"></span><br><span class="line"><span class="comment"># 函数输入参数为：</span></span><br><span class="line"><span class="comment"># 起点v_i和待访问集合 u_i</span></span><br><span class="line"><span class="comment"># 函数返回：</span></span><br><span class="line"><span class="comment"># 以v_i为起点遍历u_i后返回原点v_1的最小路径长度 以及 下一跳节点编号</span></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">min_len</span><span class="params">(v_i,u_i)</span>:</span></span><br><span class="line">    <span class="keyword">if</span> len(u_i)==<span class="number">0</span>:              <span class="comment">#  如果u_i集合为空则返回 （从v_i到v_0的长度，下一跳节点：0）</span></span><br><span class="line">        <span class="keyword">return</span> dislist[v_i][<span class="number">0</span>],<span class="number">0</span> </span><br><span class="line">    results = []</span><br><span class="line">    <span class="keyword">for</span> item <span class="keyword">in</span> u_i:             <span class="comment">#  遍历所有未访问过得节点，将他们作为下一跳</span></span><br><span class="line">        temp = (min_len(item,u_i-&#123;item&#125;)[<span class="number">0</span>]+dislist[v_i][item],item) <span class="comment"># 递推公式</span></span><br><span class="line">        results.append(temp)</span><br><span class="line">    result = min(results)        <span class="comment"># 找到最小的路径长度和下一跳节点</span></span><br><span class="line">    <span class="keyword">return</span> result</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">"__main__"</span>:</span><br><span class="line">    U=U-&#123;<span class="number">0</span>&#125;</span><br><span class="line">    <span class="keyword">print</span> (<span class="string">"最小路径长度："</span>,min_len(<span class="number">0</span>,U)[<span class="number">0</span>])  <span class="comment"># 最短路径长度</span></span><br><span class="line">    print(<span class="string">"路径为:\n\nV1 -&gt; "</span>)</span><br><span class="line">    index=<span class="number">0</span></span><br><span class="line">    <span class="keyword">while</span> len(U)!=<span class="number">0</span>:                        <span class="comment"># 循环打印路径索引（下一跳节点）</span></span><br><span class="line">        result,index = min_len(index,U)</span><br><span class="line">        print(<span class="string">'V'</span> + str(index+<span class="number">1</span>)+<span class="string">' -&gt; '</span>)    <span class="comment"># 路径索引加1 因为list索引下标是从0开始 而题目中的下标从1开始</span></span><br><span class="line">        U.remove(index)</span><br><span class="line"></span><br><span class="line">    print(<span class="string">'V1'</span>)                             <span class="comment"># 最后返回到v1节点</span></span><br></pre></td></tr></table></figure><h1 id="分支定界"><a href="#分支定界" class="headerlink" title="分支定界"></a>分支定界</h1><h3 id="问题描述"><a href="#问题描述" class="headerlink" title="问题描述"></a>问题描述</h3><p><img src="https://s2.ax1x.com/2019/01/18/k9sKln.png" alt="k9sKln.png"></p><h3 id="直接上代码"><a href="#直接上代码" class="headerlink" title="直接上代码"></a>直接上代码</h3><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br><span class="line">128</span><br><span class="line">129</span><br><span class="line">130</span><br><span class="line">131</span><br><span class="line">132</span><br><span class="line">133</span><br><span class="line">134</span><br><span class="line">135</span><br><span class="line">136</span><br></pre></td><td class="code"><pre><span class="line"> <span class="comment">//ZY1806220 魏翔</span></span><br><span class="line"><span class="comment">/*</span></span><br><span class="line"><span class="comment"> * @Description: Assignment 2</span></span><br><span class="line"><span class="comment"> * @Author: ZY1806220_魏翔</span></span><br><span class="line"><span class="comment"> * @Date: 2019-01-08 10:34:46</span></span><br><span class="line"><span class="comment"> * @LastEditTime: 2019-01-08 15:32:16</span></span><br><span class="line"><span class="comment"> * @LastEditors: Please set LastEditors</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span><span class="meta-string">&lt;stdio.h&gt;</span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span><span class="meta-string">&lt;iostream&gt;</span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span><span class="meta-string">&lt;fstream&gt;</span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> max_vexNum 50</span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> MAX_INT 0x7FFFFFFF</span></span><br><span class="line"><span class="keyword">using</span> <span class="keyword">namespace</span> <span class="built_in">std</span>;</span><br><span class="line"></span><br><span class="line"><span class="keyword">typedef</span> <span class="class"><span class="keyword">struct</span>&#123;</span></span><br><span class="line">    <span class="keyword">bool</span> is_visited[max_vexNum];                <span class="comment">//标记节点在当前深度(deep)下是否被访问过 </span></span><br><span class="line">    <span class="keyword">int</span> dist[max_vexNum][max_vexNum];           <span class="comment">//记录距离的邻接矩阵</span></span><br><span class="line">    <span class="keyword">int</span> cost[max_vexNum][max_vexNum];           <span class="comment">//记录花费的邻接矩阵</span></span><br><span class="line">    <span class="keyword">int</span> path[max_vexNum];                       <span class="comment">//记录全局最小距离对应的访问路径</span></span><br><span class="line">    <span class="keyword">int</span> sumCost;                                <span class="comment">//记录全局最小距离对应的cost总和</span></span><br><span class="line">    <span class="keyword">int</span> min_sumDist;                            <span class="comment">//记录全局最小距离</span></span><br><span class="line">&#125;Graph;</span><br><span class="line"><span class="keyword">int</span> path[max_vexNum] = &#123;<span class="number">0</span>&#125;;</span><br><span class="line"></span><br><span class="line"><span class="comment">/**</span></span><br><span class="line"><span class="comment"> * @description: 深度优先遍历图，并按条件进行剪枝，最终找到满足条件的最短路径，并更新全局最小距离，保存路径轨迹</span></span><br><span class="line"><span class="comment"> * @param &#123;Graph &amp;G:待遍历的图的引用, int start_vex：当前起始节点, int dist：从0到当前节点已用距离, int cost：从0到当前节点已用花费, int deep：当前深度&#125;   </span></span><br><span class="line"><span class="comment"> * @return: void</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"><span class="function"><span class="keyword">void</span> <span class="title">DFS</span><span class="params">(Graph &amp;G,<span class="keyword">int</span> start_vex,<span class="keyword">int</span> dist,<span class="keyword">int</span> cost,<span class="keyword">int</span> deep)</span></span></span><br><span class="line"><span class="function"></span>&#123;</span><br><span class="line">    G.is_visited[start_vex]=<span class="literal">true</span>;   <span class="comment">//当前节点访问过标志为真</span></span><br><span class="line">    path[deep] = start_vex+<span class="number">1</span>;       <span class="comment">//当前路径当前深度下节点编号</span></span><br><span class="line">    </span><br><span class="line">    <span class="keyword">if</span> (start_vex==max_vexNum<span class="number">-1</span>) &#123;            <span class="comment">//找到满足条件的更短路径，更新全局最短路径</span></span><br><span class="line">        <span class="comment">/* code */</span></span><br><span class="line">        G.min_sumDist = dist;</span><br><span class="line">        G.sumCost = cost;</span><br><span class="line">        <span class="keyword">for</span> (<span class="keyword">int</span> i=<span class="number">0</span>; i&lt;max_vexNum;i++)</span><br><span class="line">        &#123;</span><br><span class="line">            G.path[i]=<span class="number">0</span>;</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="keyword">for</span> (<span class="keyword">int</span> i=<span class="number">0</span>; i&lt;=deep;i++)</span><br><span class="line">        &#123;</span><br><span class="line">            G.path[i] = path[i];</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">for</span>(<span class="keyword">int</span> i = <span class="number">0</span>; i &lt; max_vexNum; i++)</span><br><span class="line">    &#123;</span><br><span class="line">        <span class="comment">/* code */</span></span><br><span class="line">        <span class="keyword">if</span>((G.dist[start_vex][i]&gt;<span class="number">0</span>) &amp;&amp; (G.dist[start_vex][i]&lt;<span class="number">9999</span>) &amp;&amp; (G.is_visited[i]==<span class="literal">false</span>))</span><br><span class="line">        &#123;</span><br><span class="line">            <span class="keyword">int</span> new_dist = dist+G.dist[start_vex][i];</span><br><span class="line">            <span class="keyword">int</span> new_cost = cost+G.cost[start_vex][i];</span><br><span class="line">            <span class="keyword">if</span>( (new_cost&gt;<span class="number">1500</span>) || (new_dist&gt;G.min_sumDist))&#123;       <span class="comment">//满足剪枝条件</span></span><br><span class="line">                <span class="keyword">continue</span>;</span><br><span class="line">                <span class="comment">//这个剪枝的界还是不够紧凑</span></span><br><span class="line">                <span class="comment">//可以先通过Floyd求出每个节点到B的最短路径（路径下届）</span></span><br><span class="line">                <span class="comment">//求出每个节点到B的最小cost（花费下届）</span></span><br><span class="line">                <span class="comment">//如果 当前已有cost+从当前节点到B的最小cost&gt;1500 || </span></span><br><span class="line">                <span class="comment">//     当前已有路径长度+当前到B最短路径长&gt;G.min_sumDist </span></span><br><span class="line">                <span class="comment">//则剪枝</span></span><br><span class="line">            &#125;</span><br><span class="line">            <span class="keyword">else</span>&#123;</span><br><span class="line">                DFS(G,i,new_dist,new_cost,deep+<span class="number">1</span>);</span><br><span class="line">                G.is_visited[i]=<span class="literal">false</span>;</span><br><span class="line">            &#125;</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="comment">/**</span></span><br><span class="line"><span class="comment"> * @description: 初始化图 </span></span><br><span class="line"><span class="comment"> * @param ：</span></span><br><span class="line"><span class="comment"> * Graph的引用 Graph &amp;</span></span><br><span class="line"><span class="comment"> * @return: void</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"><span class="function"><span class="keyword">void</span> <span class="title">initial_Graph</span><span class="params">(Graph &amp;G)</span></span></span><br><span class="line"><span class="function"></span>&#123;</span><br><span class="line">    ifstream in_dist;</span><br><span class="line">    ifstream in_cost;</span><br><span class="line">    in_dist.open(<span class="string">"m1.txt"</span>);</span><br><span class="line">    <span class="keyword">if</span>(!in_dist.is_open())&#123;</span><br><span class="line">        <span class="built_in">cout</span>&lt;&lt;<span class="string">"Open file m1.txt failure"</span>&lt;&lt;<span class="built_in">endl</span>;</span><br><span class="line">    &#125;</span><br><span class="line">    in_cost.open(<span class="string">"m2.txt"</span>);</span><br><span class="line">    <span class="keyword">if</span>(!in_cost.is_open())&#123;</span><br><span class="line">        <span class="built_in">cout</span>&lt;&lt;<span class="string">"Open file m2.txt failure"</span>&lt;&lt;<span class="built_in">endl</span>;</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">for</span>(<span class="keyword">int</span> i=<span class="number">0</span>; i&lt;max_vexNum; i++)</span><br><span class="line">    &#123;</span><br><span class="line">        G.is_visited[i]=<span class="literal">false</span>;</span><br><span class="line">        G.path[i]=<span class="number">0</span>;</span><br><span class="line">    &#125;</span><br><span class="line">    G.sumCost = <span class="number">0</span>;</span><br><span class="line">    G.min_sumDist = MAX_INT;</span><br><span class="line">    <span class="keyword">for</span>(<span class="keyword">int</span> i = <span class="number">0</span>; i &lt; max_vexNum; i++)</span><br><span class="line">    &#123;</span><br><span class="line">        <span class="comment">/* code */</span></span><br><span class="line">        <span class="keyword">for</span>(<span class="keyword">int</span> j =<span class="number">0</span>; j&lt;max_vexNum; j++)</span><br><span class="line">        &#123;</span><br><span class="line">            in_dist &gt;&gt; G.dist[i][j];</span><br><span class="line">            in_cost &gt;&gt; G.cost[i][j];</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">/**</span></span><br><span class="line"><span class="comment"> * @description: 打印图G中的最小距离和其花费，以及最小距离对应的一个路径</span></span><br><span class="line"><span class="comment"> * @param &#123;Graph &amp;&#125; </span></span><br><span class="line"><span class="comment"> * @return: void</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"><span class="function"><span class="keyword">void</span> <span class="title">printGraph</span><span class="params">(Graph &amp;G)</span></span></span><br><span class="line"><span class="function"></span>&#123;</span><br><span class="line">    <span class="built_in">printf</span>(<span class="string">"最小距离为:%d;\t其花费为:%d\n"</span>, G.min_sumDist, G.sumCost);</span><br><span class="line">    <span class="keyword">for</span> (<span class="keyword">int</span> i = <span class="number">0</span>; G.path[i] != <span class="number">0</span>; i++)</span><br><span class="line">    &#123;</span><br><span class="line">        <span class="keyword">if</span> (i == <span class="number">0</span>) <span class="built_in">printf</span>(<span class="string">"路径:%d"</span>, G.path[i]);</span><br><span class="line">        <span class="keyword">else</span> <span class="built_in">printf</span>(<span class="string">"-&gt;%d"</span>, G.path[i]);</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">(<span class="keyword">int</span> argc, <span class="keyword">char</span> <span class="keyword">const</span> *argv[])</span></span></span><br><span class="line"><span class="function"></span>&#123;</span><br><span class="line">    Graph G;</span><br><span class="line">    initial_Graph(G);</span><br><span class="line">    DFS(G,<span class="number">0</span>,<span class="number">0</span>,<span class="number">0</span>,<span class="number">0</span>);</span><br><span class="line">    printGraph(G);</span><br><span class="line">    <span class="built_in">cout</span>&lt;&lt;<span class="built_in">endl</span>;</span><br><span class="line">    system(<span class="string">"pause"</span>);</span><br><span class="line">    <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>]]></content>
    
    <summary type="html">
    
      
      
        &lt;script type=&quot;text/javascript&quot; src=&quot;http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=default&quot;&gt;&lt;/script&gt;

&lt;h1 id=&quot;动态规划&quot;&gt;&lt;a href=&quot;#动态规划
      
    
    </summary>
    
      <category term="算法" scheme="https://spaces-x.github.io/categories/%E7%AE%97%E6%B3%95/"/>
    
    
      <category term="分支定界" scheme="https://spaces-x.github.io/tags/%E5%88%86%E6%94%AF%E5%AE%9A%E7%95%8C/"/>
    
      <category term="动态规划" scheme="https://spaces-x.github.io/tags/%E5%8A%A8%E6%80%81%E8%A7%84%E5%88%92/"/>
    
  </entry>
  
  <entry>
    <title>Nanjing</title>
    <link href="https://spaces-x.github.io/2019/01/18/nanjing/"/>
    <id>https://spaces-x.github.io/2019/01/18/nanjing/</id>
    <published>2019-01-18T14:29:29.000Z</published>
    <updated>2019-01-19T08:14:25.257Z</updated>
    
    <content type="html"><![CDATA[<script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=default"></script><h1 id="南京游记"><a href="#南京游记" class="headerlink" title="南京游记"></a>南京游记</h1><p>一次不单纯的南京之旅，和一个有趣的女孩纸~</p><h3 id="出发"><a href="#出发" class="headerlink" title="出发"></a>出发</h3><p>北京to南京南</p><p><img src="https://s2.ax1x.com/2019/01/18/k90iUH.jpg" alt="k90iUH.jpg"></p><h3 id="玄武湖-amp-城墙"><a href="#玄武湖-amp-城墙" class="headerlink" title="玄武湖&amp;城墙"></a>玄武湖&amp;城墙</h3><p>​   这么大的公园竟然不收门票~~~</p><p>​   冬天的树还绿着，但是看不到秋天的银杏大道了。。。</p><p>​   还有湖中央的小黄鸭</p><p>​   <img src="https://s2.ax1x.com/2019/01/18/k90UqU.jpg" alt="k90UqU.jpg"></p><p>​   城墙上的刻字，是什么意思啊？？？</p><p><img src="https://s2.ax1x.com/2019/01/18/k900IJ.jpg" alt="k900IJ.jpg"></p><p>​   太阳快落山了，某人嘴里只有大连和南京才会有的夕阳~</p><p><img src="https://s2.ax1x.com/2019/01/18/k90fde.jpg" alt="k90fde.jpg"></p><p>​   打码打码，为数不多的未假笑的照片~</p><p><img src="https://s2.ax1x.com/2019/01/18/k9BSWn.jpg" alt="k9BSWn.jpg"></p><h3 id="网红书店"><a href="#网红书店" class="headerlink" title="网红书店"></a>网红书店</h3><p><img src="https://s2.ax1x.com/2019/01/18/k9BD0S.jpg" alt="k9BD0S.jpg"></p><p>​   认识你们后，感觉自己像个文盲一样，捂脸…   我主要是来拍照的哈哈哈~ </p><h3 id="南京博物馆-amp-第一场雪"><a href="#南京博物馆-amp-第一场雪" class="headerlink" title="南京博物馆 &amp; 第一场雪"></a>南京博物馆 &amp; 第一场雪</h3><p><img src="https://s2.ax1x.com/2019/01/18/k9DVnf.jpg" alt="k9DVnf.jpg"></p><p>​       博大精深地下楼，诸多展馆吸眸球。</p><p>​       优良制作青铜鼎，完好存留独木舟。</p><p>​       古邑风流书冊籍，六朝迭兴载春秋。</p><p>​       辉煌历史人民创，催我挥毫盛世讴。</p><p>​   不知道写什么，引用一下提升下币格。</p><h3 id="夜游南大"><a href="#夜游南大" class="headerlink" title="夜游南大"></a>夜游南大</h3><p><img src="https://s2.ax1x.com/2019/01/18/k9DYHU.jpg" alt="k9DYHU.jpg"></p><p>​   emmm假装满墙都是绿油油的~</p><h3 id="中山陵"><a href="#中山陵" class="headerlink" title="中山陵"></a>中山陵</h3><p><img src="https://s2.ax1x.com/2019/01/18/k9DwC9.jpg" alt="k9DwC9.jpg"></p><p>​   历史书上的三民主义？</p><p><img src="https://s2.ax1x.com/2019/01/18/k9D6HO.jpg" alt="k9D6HO.jpg"></p><p>​   妈耶，恐高恐高~</p><h3 id="夫子庙"><a href="#夫子庙" class="headerlink" title="夫子庙"></a>夫子庙</h3><p><img src="https://s2.ax1x.com/2019/01/18/k9DfCd.jpg" alt="k9DfCd.jpg"></p><p>​   晚上来会更好看？</p><p><img src="https://s2.ax1x.com/2019/01/18/k9Dh8A.jpg" alt="k9Dh8A.jpg"></p><p>​   我才不要路过呢，我要陪你度过漫长的岁月，嘻嘻嘻~</p><h3 id="火锅-amp-跨年"><a href="#火锅-amp-跨年" class="headerlink" title="火锅&amp;跨年"></a>火锅&amp;跨年</h3><p><img src="https://s2.ax1x.com/2019/01/18/k9DoKP.jpg" alt="k9DoKP.jpg"></p><p>​   热乎乎美滋滋的火锅，风格和菜品巨像东来顺的力来顺~</p><p><img src="https://s2.ax1x.com/2019/01/18/k9DqUg.jpg" alt="k9DqUg.jpg"></p><p>​   超级假，没有倒计时的跨年。晚上看《求婚大作战》的时候也看到了跨年，</p><p>​   “和同伴们一起度过的重要时间是无价的”。</p><h3 id="风筝和她"><a href="#风筝和她" class="headerlink" title="风筝和她"></a>风筝和她</h3><p><img src="https://s2.ax1x.com/2019/01/18/k9r9bT.jpg" alt="k9r9bT.jpg"></p><p>​   谜一样的定价，只有在门口的湖边才能放飞自我的小风筝~</p><p><img src="https://s2.ax1x.com/2019/01/18/k9rnr6.jpg" alt="k9rnr6.jpg"></p><p>​   背影，悄咪咪的喜欢一下~~~  </p><h3 id="南京站"><a href="#南京站" class="headerlink" title="南京站"></a>南京站</h3><p><img src="https://s2.ax1x.com/2019/01/18/k9rhoF.jpg" alt="k9rhoF.jpg"></p><p>​   差点哭晕的南京站…</p><h3 id="还有二刷南京大学呦"><a href="#还有二刷南京大学呦" class="headerlink" title="还有二刷南京大学呦~"></a>还有二刷南京大学呦~</h3><p><img src="https://s2.ax1x.com/2019/01/18/k9rxFe.jpg" alt="k9rxFe.jpg"></p><p>​   充满着”喜悦“、”期待“、“冲动”的二刷~</p>]]></content>
    
    <summary type="html">
    
      
      
        &lt;script type=&quot;text/javascript&quot; src=&quot;http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=default&quot;&gt;&lt;/script&gt;

&lt;h1 id=&quot;南京游记&quot;&gt;&lt;a href=&quot;#南京游记
      
    
    </summary>
    
      <category term="游玩" scheme="https://spaces-x.github.io/categories/%E6%B8%B8%E7%8E%A9/"/>
    
    
      <category term="travel" scheme="https://spaces-x.github.io/tags/travel/"/>
    
      <category term="love" scheme="https://spaces-x.github.io/tags/love/"/>
    
  </entry>
  
  <entry>
    <title>Hadoop Day 3</title>
    <link href="https://spaces-x.github.io/2018/08/19/hadoop-d-3/"/>
    <id>https://spaces-x.github.io/2018/08/19/hadoop-d-3/</id>
    <published>2018-08-19T02:31:31.000Z</published>
    <updated>2018-08-23T13:55:50.953Z</updated>
    
    <content type="html"><![CDATA[<h3 id="读取数据部分的关系图"><a href="#读取数据部分的关系图" class="headerlink" title="读取数据部分的关系图"></a>读取数据部分的关系图</h3><p><img src="https://7n.w3cschool.cn/attachments/image/wk/hadoop/mapreduce-inputformat.png" alt=""></p><h3 id="InputFormat"><a href="#InputFormat" class="headerlink" title="InputFormat"></a>InputFormat</h3><p>首先我们先看看官方文档对 <a href="http://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/mapreduce/InputFormat.html" target="_blank" rel="noopener">InputFormat</a>的解释</p><p>从api文档中我们可以了解到InputFormat主要干3件事:</p><ol><li>验证作业输入的正确性，如格式等</li><li>将输入文件切割成逻辑分片(InputSplit)，一个InputSplit将会被分配给一个独立的Map任务</li><li>提供RecordReader实现，读取InputSplit中的”K-V对”供Mapper使用</li></ol><p>基于文件的InputFormats（通常是FileInputFormat的子类）的默认行为是根据输入文件的总大小（以字节为单位）将输入拆分为逻辑InputSplits。 但是，输入文件的FileSystem块大小被视为输入拆分大小的上限。 可以通过mapreduce.input.file.inputformat.split.minsize设置拆分大小的下限。</p><p>显然，基于输入大小的逻辑分割对于许多应用来说是不够的，因为要尊守记录边界。 在这种情况下，应用程序还必须实现一个RecordReader，负责尊守记录边界，并将逻辑InputSplit的面向记录的视图呈现给单个任务。</p><p>方法：</p><ol><li>List getSplits(): 获取由输入文件计算出输入分片(InputSplit)，解决数据或文件分割成片问题。</li><li>RecordReader &lt;k,v&gt;createRecordReader():&lt;k,v&gt; 创建RecordReader，从InputSplit中读取数据，解决读取分片中数据问题</li></ol><p><strong>TextInputFormat:</strong> 输入文件中的每一行就是一个记录，Key是这一行的byte offset，而value是这一行的内容</p><p><strong>KeyValueTextInputFormat:</strong> 输入文件中每一行就是一个记录，第一个分隔符字符切分每行。在分隔符字符之前的内容为Key，在之后的为Value。分隔符变量通过key.value.separator.in.input.line变量设置，默认为(\t)字符。</p><p><strong>NLineInputFormat:</strong> 与TextInputFormat一样，但每个数据块必须保证有且只有Ｎ行，mapred.line.input.format.linespermap属性，默认为１</p><p><strong>SequenceFileInputFormat:</strong>  一个用来读取字符流数据的InputFormat，&lt;key,value&gt;为用户自定义的。字符流数据是Hadoop自定义的压缩的二进制数据格式。它用来优化从一个MapReduce任务的输出到另一个MapReduce任务的输入之间的数据传输过程。&lt;/key,value&gt;</p><h4 id="FileInputFormat"><a href="#FileInputFormat" class="headerlink" title="FileInputFormat"></a>FileInputFormat</h4><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">/** </span></span><br><span class="line"><span class="comment"> * A base class for file-based &#123;<span class="doctag">@link</span> InputFormat&#125;s.</span></span><br><span class="line"><span class="comment"> * </span></span><br><span class="line"><span class="comment"> * &lt;p&gt;&lt;code&gt;FileInputFormat&lt;/code&gt; is the base class for all file-based </span></span><br><span class="line"><span class="comment"> * &lt;code&gt;InputFormat&lt;/code&gt;s. This provides a generic implementation of</span></span><br><span class="line"><span class="comment"> * &#123;<span class="doctag">@link</span> #getSplits(JobContext)&#125;.</span></span><br><span class="line"><span class="comment"> * Subclasses of &lt;code&gt;FileInputFormat&lt;/code&gt; can also override the </span></span><br><span class="line"><span class="comment"> * &#123;<span class="doctag">@link</span> #isSplitable(JobContext, Path)&#125; method to ensure input-files are</span></span><br><span class="line"><span class="comment"> * not split-up and are processed as a whole by &#123;<span class="doctag">@link</span> Mapper&#125;s.</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"><span class="meta">@InterfaceAudience</span>.Public</span><br><span class="line"><span class="meta">@InterfaceStability</span>.Stable</span><br><span class="line"><span class="keyword">public</span> <span class="keyword">abstract</span> <span class="class"><span class="keyword">class</span> <span class="title">FileInputFormat</span>&lt;<span class="title">K</span>, <span class="title">V</span>&gt; <span class="keyword">extends</span> <span class="title">InputFormat</span>&lt;<span class="title">K</span>, <span class="title">V</span>&gt; </span>&#123;</span><br><span class="line">  <span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">final</span> String INPUT_DIR = </span><br><span class="line">    <span class="string">"mapreduce.input.fileinputformat.inputdir"</span>;                 <span class="comment">//输入路径配置名称</span></span><br><span class="line">  <span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">final</span> String SPLIT_MAXSIZE = </span><br><span class="line">    <span class="string">"mapreduce.input.fileinputformat.split.maxsize"</span>;            <span class="comment">// split大小的最大值配置名称</span></span><br><span class="line">  <span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">final</span> String SPLIT_MINSIZE = </span><br><span class="line">    <span class="string">"mapreduce.input.fileinputformat.split.minsize"</span>;            <span class="comment">//split大小的最小值配置名称</span></span><br><span class="line">  <span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">final</span> String PATHFILTER_CLASS = </span><br><span class="line">    <span class="string">"mapreduce.input.pathFilter.class"</span>;</span><br><span class="line">  <span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">final</span> String NUM_INPUT_FILES =</span><br><span class="line">    <span class="string">"mapreduce.input.fileinputformat.numinputfiles"</span>;</span><br><span class="line">  <span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">final</span> String INPUT_DIR_RECURSIVE =</span><br><span class="line">    <span class="string">"mapreduce.input.fileinputformat.input.dir.recursive"</span>;       <span class="comment">//是否递归dir的boolean的配置名称</span></span><br><span class="line">  <span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">final</span> String LIST_STATUS_NUM_THREADS = </span><br><span class="line">    <span class="string">"mapreduce.input.fileinputformat.list-status.num-threads"</span>;    <span class="comment">//收集filestatus的list时候的线程数目配置名称</span></span><br><span class="line">  <span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">final</span> <span class="keyword">int</span> DEFAULT_LIST_STATUS_NUM_THREADS = <span class="number">1</span>;    <span class="comment">// 默认使用一个线程来做list filestatus</span></span><br><span class="line"></span><br><span class="line">  <span class="keyword">private</span> <span class="keyword">static</span> <span class="keyword">final</span> Log LOG = LogFactory.getLog(FileInputFormat.class);</span><br><span class="line"></span><br><span class="line">  <span class="keyword">private</span> <span class="keyword">static</span> <span class="keyword">final</span> <span class="keyword">double</span> SPLIT_SLOP = <span class="number">1.1</span>;   <span class="comment">// 10% slop     //当文件大小少于SPLIT_SLOP*SPLIT_SIZE时不分割</span></span><br><span class="line">  </span><br><span class="line">  <span class="meta">@Deprecated</span></span><br><span class="line">  <span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">enum</span> Counter &#123; </span><br><span class="line">    BYTES_READ</span><br><span class="line">  &#125;</span><br><span class="line">  ...</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>类的主要成员变量都是一些配置名称的String。下面我们从这个类的主要函数说起。</p><p>在List<filestatus> listStatus(JobContext job)中列出了输入的目录列表。子类可以对这个方法重载例如子类可以满足特定正则表达式的输入路径。</filestatus></p><blockquote><ul><li>@param job the job to list input paths for</li><li>@return array of FileStatus objects</li><li>@throws IOException if zero items.</li></ul></blockquote><p>listStatus函数通过JobContext来获取配置信息，通过读取配置信息进行判断来进一步建立InputFile的PathFilter，如果配置中numThreads==1则使用singleThreadedListStatus()函数来的到List<filestatus>，否则建立LocatedFileStatusFetcher对象，多线程地得到List<filestatus>。<br><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">protected</span> List&lt;FileStatus&gt; <span class="title">listStatus</span><span class="params">(JobContext job</span></span></span><br><span class="line"><span class="function"><span class="params">                                      )</span> <span class="keyword">throws</span> IOException </span>&#123;</span><br><span class="line">  Path[] dirs = getInputPaths(job);</span><br><span class="line">  <span class="keyword">if</span> (dirs.length == <span class="number">0</span>) &#123;</span><br><span class="line">    <span class="keyword">throw</span> <span class="keyword">new</span> IOException(<span class="string">"No input paths specified in job"</span>);</span><br><span class="line">  &#125;</span><br><span class="line">  </span><br><span class="line">  <span class="comment">// get tokens for all the required FileSystems..</span></span><br><span class="line">  TokenCache.obtainTokensForNamenodes(job.getCredentials(), dirs, </span><br><span class="line">                                      job.getConfiguration());</span><br><span class="line"></span><br><span class="line">  <span class="comment">// Whether we need to recursive look into the directory structure</span></span><br><span class="line">  <span class="keyword">boolean</span> recursive = getInputDirRecursive(job);</span><br><span class="line"></span><br><span class="line">  <span class="comment">// creates a MultiPathFilter with the hiddenFileFilter and the</span></span><br><span class="line">  <span class="comment">// user provided one (if any).</span></span><br><span class="line">  List&lt;PathFilter&gt; filters = <span class="keyword">new</span> ArrayList&lt;PathFilter&gt;();</span><br><span class="line">  filters.add(hiddenFileFilter);</span><br><span class="line">  PathFilter jobFilter = getInputPathFilter(job);</span><br><span class="line">  <span class="keyword">if</span> (jobFilter != <span class="keyword">null</span>) &#123;</span><br><span class="line">    filters.add(jobFilter);</span><br><span class="line">  &#125;</span><br><span class="line">  PathFilter inputFilter = <span class="keyword">new</span> MultiPathFilter(filters);</span><br><span class="line">  </span><br><span class="line">  List&lt;FileStatus&gt; result = <span class="keyword">null</span>;</span><br><span class="line"></span><br><span class="line">  <span class="keyword">int</span> numThreads = job.getConfiguration().getInt(LIST_STATUS_NUM_THREADS,</span><br><span class="line">      DEFAULT_LIST_STATUS_NUM_THREADS);        <span class="comment">//读取配置文件中的线程数</span></span><br><span class="line">  Stopwatch sw = <span class="keyword">new</span> Stopwatch().start();</span><br><span class="line">  <span class="keyword">if</span> (numThreads == <span class="number">1</span>) &#123;</span><br><span class="line">    result = singleThreadedListStatus(job, dirs, inputFilter, recursive); <span class="comment">//单线程执行ListStatus</span></span><br><span class="line">  &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">    Iterable&lt;FileStatus&gt; locatedFiles = <span class="keyword">null</span>;</span><br><span class="line">    <span class="keyword">try</span> &#123;</span><br><span class="line">      LocatedFileStatusFetcher locatedFileStatusFetcher = <span class="keyword">new</span> LocatedFileStatusFetcher(</span><br><span class="line">          job.getConfiguration(), dirs, recursive, inputFilter, <span class="keyword">true</span>);     <span class="comment">// 多线程建立LocatedFileStatusFetcher 来多线程执行listfilestatus，在fetcher中通过 Executors.newFixedThreadPool()建立线程池</span></span><br><span class="line">      locatedFiles = locatedFileStatusFetcher.getFileStatuses();</span><br><span class="line">    &#125; <span class="keyword">catch</span> (InterruptedException e) &#123;</span><br><span class="line">      <span class="keyword">throw</span> <span class="keyword">new</span> IOException(<span class="string">"Interrupted while getting file statuses"</span>);</span><br><span class="line">    &#125;</span><br><span class="line">    result = Lists.newArrayList(locatedFiles);</span><br><span class="line">  &#125;</span><br><span class="line">  </span><br><span class="line">  sw.stop();</span><br><span class="line">  <span class="keyword">if</span> (LOG.isDebugEnabled()) &#123;</span><br><span class="line">    LOG.debug(<span class="string">"Time taken to get FileStatuses: "</span> + sw.elapsedMillis());</span><br><span class="line">  &#125;</span><br><span class="line">  LOG.info(<span class="string">"Total input paths to process : "</span> + result.size()); </span><br><span class="line">  <span class="keyword">return</span> result;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></filestatus></filestatus></p><p>getSplits() 用于将输入文件进行拆分成splits并返回 </p><blockquote><ul><li>Generate the list of files and make them into FileSplits.</li><li>@param job the job context</li><li>@throws IOException</li></ul></blockquote><p>通过配置文件来得到split的maxSize和minSize和filesystem的BlockSize，通过以上三个size可以算出 Math.max(minSize, Math.min(maxSize, blockSize))=splitsize。 通过上面的listStatus()来得到List<filestatus>并且变量list中的每一个FileStatus，按照每一个file的长度和splitsize来分片成splits</filestatus></p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">public</span> List&lt;InputSplit&gt; <span class="title">getSplits</span><span class="params">(JobContext job)</span> <span class="keyword">throws</span> IOException </span>&#123;</span><br><span class="line">   Stopwatch sw = <span class="keyword">new</span> Stopwatch().start();</span><br><span class="line">   <span class="keyword">long</span> minSize = Math.max(getFormatMinSplitSize(), getMinSplitSize(job));</span><br><span class="line">   <span class="keyword">long</span> maxSize = getMaxSplitSize(job);</span><br><span class="line"></span><br><span class="line">   <span class="comment">// generate splits</span></span><br><span class="line">   List&lt;InputSplit&gt; splits = <span class="keyword">new</span> ArrayList&lt;InputSplit&gt;();</span><br><span class="line">   List&lt;FileStatus&gt; files = listStatus(job);</span><br><span class="line">   <span class="keyword">for</span> (FileStatus file: files) &#123;</span><br><span class="line">     Path path = file.getPath();</span><br><span class="line">     <span class="keyword">long</span> length = file.getLen();</span><br><span class="line">     <span class="keyword">if</span> (length != <span class="number">0</span>) &#123;</span><br><span class="line">       BlockLocation[] blkLocations;</span><br><span class="line">       <span class="comment">// 得到blkLocations 要么通过LocatedFileStatus的getBlockLocation </span></span><br><span class="line">       <span class="comment">// 要么通过FileSystem的getFileBlockLocations</span></span><br><span class="line">       <span class="keyword">if</span> (file <span class="keyword">instanceof</span> LocatedFileStatus) &#123;</span><br><span class="line">         blkLocations = ((LocatedFileStatus) file).getBlockLocations();</span><br><span class="line">       &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">         FileSystem fs = path.getFileSystem(job.getConfiguration());</span><br><span class="line">         blkLocations = fs.getFileBlockLocations(file, <span class="number">0</span>, length);</span><br><span class="line">       &#125;</span><br><span class="line">       <span class="keyword">if</span> (isSplitable(job, path)) &#123;</span><br><span class="line">         <span class="keyword">long</span> blockSize = file.getBlockSize();</span><br><span class="line">         <span class="keyword">long</span> splitSize = computeSplitSize(blockSize, minSize, maxSize);</span><br><span class="line"></span><br><span class="line">         <span class="keyword">long</span> bytesRemaining = length;</span><br><span class="line">         <span class="comment">// 对当前文件按照splitsize进行分割，分割成一个个split</span></span><br><span class="line">         <span class="keyword">while</span> (((<span class="keyword">double</span>) bytesRemaining)/splitSize &gt; SPLIT_SLOP) &#123;</span><br><span class="line">           <span class="keyword">int</span> blkIndex = getBlockIndex(blkLocations, length-bytesRemaining);  <span class="comment">//得到block的index</span></span><br><span class="line">           splits.add(makeSplit(path, length-bytesRemaining, splitSize,</span><br><span class="line">                       blkLocations[blkIndex].getHosts(),</span><br><span class="line">                       blkLocations[blkIndex].getCachedHosts()));</span><br><span class="line">           bytesRemaining -= splitSize;</span><br><span class="line">         &#125;</span><br><span class="line"></span><br><span class="line">         <span class="keyword">if</span> (bytesRemaining != <span class="number">0</span>) &#123;</span><br><span class="line">           <span class="keyword">int</span> blkIndex = getBlockIndex(blkLocations, length-bytesRemaining);</span><br><span class="line">           splits.add(makeSplit(path, length-bytesRemaining, bytesRemaining,</span><br><span class="line">                      blkLocations[blkIndex].getHosts(),</span><br><span class="line">                      blkLocations[blkIndex].getCachedHosts()));</span><br><span class="line">         &#125;</span><br><span class="line">       &#125; <span class="keyword">else</span> &#123; <span class="comment">// not splitable 不可分 整个文件作为split</span></span><br><span class="line">         splits.add(makeSplit(path, <span class="number">0</span>, length, blkLocations[<span class="number">0</span>].getHosts(),</span><br><span class="line">                     blkLocations[<span class="number">0</span>].getCachedHosts()));</span><br><span class="line">       &#125;</span><br><span class="line">     &#125; <span class="keyword">else</span> &#123; </span><br><span class="line">       <span class="comment">//Create empty hosts array for zero length files</span></span><br><span class="line">       splits.add(makeSplit(path, <span class="number">0</span>, length, <span class="keyword">new</span> String[<span class="number">0</span>]));</span><br><span class="line">     &#125;</span><br><span class="line">   &#125;</span><br><span class="line">   <span class="comment">// Save the number of input files for metrics/loadgen</span></span><br><span class="line">   job.getConfiguration().setLong(NUM_INPUT_FILES, files.size());</span><br><span class="line">   sw.stop();</span><br><span class="line">   <span class="keyword">if</span> (LOG.isDebugEnabled()) &#123;</span><br><span class="line">     LOG.debug(<span class="string">"Total # of splits generated by getSplits: "</span> + splits.size()</span><br><span class="line">         + <span class="string">", TimeTaken: "</span> + sw.elapsedMillis());</span><br><span class="line">   &#125;</span><br><span class="line">   <span class="keyword">return</span> splits;</span><br><span class="line"> &#125;</span><br></pre></td></tr></table></figure><p>制作切片split<br><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"> <span class="comment">/*</span></span><br><span class="line"><span class="comment"> * 制作切片split，makeSplit调用FileSplit()函数来制作切片</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"><span class="function"><span class="keyword">protected</span> FileSplit <span class="title">makeSplit</span><span class="params">(Path file, <span class="keyword">long</span> start, <span class="keyword">long</span> length, </span></span></span><br><span class="line"><span class="function"><span class="params">                               String[] hosts, String[] inMemoryHosts)</span> </span>&#123;</span><br><span class="line">   <span class="keyword">return</span> <span class="keyword">new</span> FileSplit(file, start, length, hosts, inMemoryHosts);</span><br><span class="line"> &#125;</span><br></pre></td></tr></table></figure></p><p>对file进行切片，并附带cached-blocks信息<br><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"> <span class="comment">/** Constructs a split with host and cached-blocks information</span></span><br><span class="line"><span class="comment"> *</span></span><br><span class="line"><span class="comment"> * <span class="doctag">@param</span> file the file name</span></span><br><span class="line"><span class="comment"> * <span class="doctag">@param</span> start the position of the first byte in the file to process</span></span><br><span class="line"><span class="comment"> * <span class="doctag">@param</span> length the number of bytes in the file to process</span></span><br><span class="line"><span class="comment"> * <span class="doctag">@param</span> hosts the list of hosts containing the block</span></span><br><span class="line"><span class="comment"> * <span class="doctag">@param</span> inMemoryHosts the list of hosts containing the block in memory</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"><span class="function"><span class="keyword">public</span> <span class="title">FileSplit</span><span class="params">(Path file, <span class="keyword">long</span> start, <span class="keyword">long</span> length, String[] hosts,</span></span></span><br><span class="line"><span class="function"><span class="params">    String[] inMemoryHosts)</span> </span>&#123;</span><br><span class="line">  <span class="keyword">this</span>(file, start, length, hosts);</span><br><span class="line">  hostInfos = <span class="keyword">new</span> SplitLocationInfo[hosts.length];</span><br><span class="line">  <span class="keyword">for</span> (<span class="keyword">int</span> i = <span class="number">0</span>; i &lt; hosts.length; i++) &#123;</span><br><span class="line">    <span class="comment">// because N will be tiny, scanning is probably faster than a HashSet</span></span><br><span class="line">    <span class="keyword">boolean</span> inMemory = <span class="keyword">false</span>;</span><br><span class="line">    <span class="keyword">for</span> (String inMemoryHost : inMemoryHosts) &#123;</span><br><span class="line">      <span class="keyword">if</span> (inMemoryHost.equals(hosts[i])) &#123;</span><br><span class="line">        inMemory = <span class="keyword">true</span>;</span><br><span class="line">        <span class="keyword">break</span>;</span><br><span class="line">      &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    hostInfos[i] = <span class="keyword">new</span> SplitLocationInfo(hosts[i], inMemory);</span><br><span class="line">  &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></p><p>LineRecorder: 负责记录行的读取情况</p><blockquote><ul><li>param Configuration, job configuartion.</li><li>param FileSplit, split to read.</li><li>param byte[], delimiter bytes.</li></ul></blockquote><p>通过job的configuration获取文件系统fs并且以输入流的形式打开分片所在的文件，后续的逻辑是判断输入流输入的文件是否为压缩，如果是压缩的话解压后重新定位start和end。如果不是压缩的的话很简单，直接通过读取split的start和end作为LineRecordReader的start和end，并且将文件定位到分片的start处。在以上逻辑完成后判断当前分片是不是文件中的第一个分片(start==0?)如果不是的话则越过第一行。<br>为什么非开头的分片要越过第一行？这是由于按照blocksize大小读取的分片很可能不是按行对其的，而LineRecorderReader要处理的是行 保证行对齐是关键，通过忽略非开头分片的第一行可以做到行对其，其结果如下图所示</p><p><img src="https://7n.w3cschool.cn/attachments/image/wk/hadoop/mapreduce-split.png" alt="行对齐"></p><p>那么如何计算对于非开头分片应该忽略多少才能保证行对其，即计算非开头分片start的后移量是多少呢？ 主要是通过readLine()函数,该还函数返回到行末的偏移量。 后续分析。</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">public</span> <span class="title">LineRecordReader</span><span class="params">(Configuration job, FileSplit split,</span></span></span><br><span class="line"><span class="function"><span class="params">    <span class="keyword">byte</span>[] recordDelimiter)</span> <span class="keyword">throws</span> IOException </span>&#123;</span><br><span class="line">  <span class="keyword">this</span>.maxLineLength = job.getInt(org.apache.hadoop.mapreduce.lib.input.</span><br><span class="line">    LineRecordReader.MAX_LINE_LENGTH, Integer.MAX_VALUE);</span><br><span class="line">  start = split.getStart();</span><br><span class="line">  end = start + split.getLength();</span><br><span class="line">  <span class="keyword">final</span> Path file = split.getPath();</span><br><span class="line">  compressionCodecs = <span class="keyword">new</span> CompressionCodecFactory(job);</span><br><span class="line">  codec = compressionCodecs.getCodec(file);</span><br><span class="line"></span><br><span class="line">  <span class="comment">// open the file and seek to the start of the split</span></span><br><span class="line">  <span class="keyword">final</span> FileSystem fs = file.getFileSystem(job);</span><br><span class="line">  fileIn = fs.open(file);</span><br><span class="line">  <span class="keyword">if</span> (isCompressedInput()) &#123;</span><br><span class="line">    decompressor = CodecPool.getDecompressor(codec);</span><br><span class="line">    <span class="keyword">if</span> (codec <span class="keyword">instanceof</span> SplittableCompressionCodec) &#123;</span><br><span class="line">      <span class="keyword">final</span> SplitCompressionInputStream cIn =</span><br><span class="line">        ((SplittableCompressionCodec)codec).createInputStream(</span><br><span class="line">          fileIn, decompressor, start, end,</span><br><span class="line">          SplittableCompressionCodec.READ_MODE.BYBLOCK);</span><br><span class="line">      in = <span class="keyword">new</span> CompressedSplitLineReader(cIn, job, recordDelimiter);</span><br><span class="line">      start = cIn.getAdjustedStart();</span><br><span class="line">      end = cIn.getAdjustedEnd();</span><br><span class="line">      filePosition = cIn; <span class="comment">// take pos from compressed stream</span></span><br><span class="line">    &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">      in = <span class="keyword">new</span> SplitLineReader(codec.createInputStream(fileIn,</span><br><span class="line">          decompressor), job, recordDelimiter);</span><br><span class="line">      filePosition = fileIn;</span><br><span class="line">    &#125;</span><br><span class="line">  &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">    fileIn.seek(start);</span><br><span class="line">    in = <span class="keyword">new</span> UncompressedSplitLineReader(</span><br><span class="line">        fileIn, job, recordDelimiter, split.getLength());</span><br><span class="line">    filePosition = fileIn;</span><br><span class="line">  &#125;</span><br><span class="line">  <span class="comment">// If this is not the first split, we always throw away first record</span></span><br><span class="line">  <span class="comment">// because we always (except the last split) read one extra line in</span></span><br><span class="line">  <span class="comment">// next() method.</span></span><br><span class="line">  <span class="keyword">if</span> (start != <span class="number">0</span>) &#123;              <span class="comment">// 如果不是文件中的第一个分片</span></span><br><span class="line">    start += in.readLine(<span class="keyword">new</span> Text(), <span class="number">0</span>, maxBytesToConsume(start));    <span class="comment">//忽略文件中的第一行 start后移。这里调用了readLine来计算后移的偏量</span></span><br><span class="line">  &#125;</span><br><span class="line">  <span class="keyword">this</span>.pos = start;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>接下来我们看看readLine函数,它主要通过判断是否设定分隔符来返回 自定义和默认方式的readline。<br><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">int</span> <span class="title">readLine</span><span class="params">(Text str, <span class="keyword">int</span> maxLineLength,</span></span></span><br><span class="line"><span class="function"><span class="params">                    <span class="keyword">int</span> maxBytesToConsume)</span> <span class="keyword">throws</span> IOException </span>&#123;</span><br><span class="line">  <span class="keyword">if</span> (<span class="keyword">this</span>.recordDelimiterBytes != <span class="keyword">null</span>) &#123;     <span class="comment">//分隔符数组为空？ 即未设定分隔符？</span></span><br><span class="line">    <span class="keyword">return</span> readCustomLine(str, maxLineLength, maxBytesToConsume);  <span class="comment">//自定义分隔符！</span></span><br><span class="line">  &#125; <span class="keyword">else</span> &#123;                                     </span><br><span class="line">    <span class="keyword">return</span> readDefaultLine(str, maxLineLength, maxBytesToConsume); <span class="comment">//否则使用默认的分隔符 即 '\r' or '\n'</span></span><br><span class="line">  &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></p>]]></content>
    
    <summary type="html">
    
      
      
        &lt;h3 id=&quot;读取数据部分的关系图&quot;&gt;&lt;a href=&quot;#读取数据部分的关系图&quot; class=&quot;headerlink&quot; title=&quot;读取数据部分的关系图&quot;&gt;&lt;/a&gt;读取数据部分的关系图&lt;/h3&gt;&lt;p&gt;&lt;img src=&quot;https://7n.w3cschool.cn/atta
      
    
    </summary>
    
      <category term="hadoop" scheme="https://spaces-x.github.io/categories/hadoop/"/>
    
    
      <category term="hadoop" scheme="https://spaces-x.github.io/tags/hadoop/"/>
    
      <category term="读取数据" scheme="https://spaces-x.github.io/tags/%E8%AF%BB%E5%8F%96%E6%95%B0%E6%8D%AE/"/>
    
  </entry>
  
  <entry>
    <title>HBase</title>
    <link href="https://spaces-x.github.io/2018/08/08/hbase/"/>
    <id>https://spaces-x.github.io/2018/08/08/hbase/</id>
    <published>2018-08-08T01:57:10.000Z</published>
    <updated>2018-08-28T04:45:27.179Z</updated>
    
    <content type="html"><![CDATA[<script type="text/javascript" src="toc/js/jquery-1.4.4.min.js"></script><script type="text/javascript" src="toc/js/jquery.ztree.all-3.5.min.js"></script><script type="text/javascript" src="toc/js/ztree_toc.js"></script><script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=default"></script><h3 id="Hbase安装"><a href="#Hbase安装" class="headerlink" title="Hbase安装"></a>Hbase安装</h3><h4 id="准备环境"><a href="#准备环境" class="headerlink" title="准备环境"></a>准备环境</h4><blockquote><ul><li>Hadoop 集群安装配置</li><li>ZooKeeper 3.4.X 安装和配置</li><li>Hbase 下载解压修改配置文件 散发到集群</li></ul></blockquote><p><a href="https://spaces-x.github.io/2018/07/26/hadoop-d-2/#Hadoop%E9%9B%86%E7%BE%A4%E9%83%A8%E7%BD%B2">Hadoop集群安装配置参见</a></p><h4 id="ZooKeeper-3-4-X-安装与配置"><a href="#ZooKeeper-3-4-X-安装与配置" class="headerlink" title="ZooKeeper 3.4.X 安装与配置"></a>ZooKeeper 3.4.X 安装与配置</h4><p>zookeeper有单机、伪分布式集群、完全分布式集群三种部署方式,本文中主要讲解完全分布集群的配置方法，<a href="https://www.cnblogs.com/lsdb/p/7297731.html" target="_blank" rel="noopener">其他两种参见</a></p><p>完全分布式集群中假设我们有三台主机h1 h2 h3, 下载<a href="https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/" target="_blank" rel="noopener">ZooKeeper 3.4.X</a><br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">tar -xzfv zookeeper-3.4.x.tar.gz -C <span class="string">'安装目录'</span></span><br><span class="line">mkdir -p <span class="string">'预设zookeeper的data目录'</span>(/home/weixiang/data/zookeeper/)</span><br><span class="line">mkdir -p <span class="string">'预设zookeeper的logs目录'</span>(/home/weixiang/logs/zookeeper/)</span><br><span class="line"><span class="built_in">cd</span> <span class="string">'安装目录/ZooKeeper 3.4.X/conf'</span></span><br></pre></td></tr></table></figure></p><p>将zookeeper添加到环境变量中,编辑/etc/profile文件 添加<code>export ZOOKEEPER_HOME=安装目录/zookeeper-3.4.x</code><br><code>export PATH=......:$ZOOKEEPER_HOME/bin</code></p><p>修改zoo_sample.cfg 重命名为zoo.cfg 并开始编辑<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">tickTime=2000</span><br><span class="line">dataDir=对应预设zookeeper的data目录(/home/weixiang/data/zookeeper/)</span><br><span class="line">dataLogDir=对应预设zookeeper的logs目录(/home/weixiang/logs/zookeeper/)</span><br><span class="line">clientPort=2181</span><br><span class="line">initLimit=5</span><br><span class="line">syncLimit=2</span><br><span class="line">server.1=h1:2888:3888</span><br><span class="line">server.2=h2:2888:3888</span><br><span class="line">server.3=h3:2888:3888</span><br></pre></td></tr></table></figure></p><p>通过scp -r “源目录” “目的主机目录” 命令将zookeeper 3.4.X 文件夹整个拷贝到其他节点<br>在每个节点的预设zookeeper的data目录(/home/weixiang/data/zookeeper/)上建立myid文件并写入节点编号(这个编号对应配置文件中的server.X其中的X) 例如在h2 这个节点myid的内容应为2</p><p>运行脚本启动,停止,查看状态<br>在每个安装zookeeper的节点上启动<code>zkServer.sh start</code></p><p>查看状态<code>zkServer.sh status</code> 可以得到以下结果:<br>[hadoop@mdw ~]$ zkServer.sh status<br>JMX enabled by default<br>Using config: ……/../conf/zoo.cfg<br>Mode: follower/leader</p><p>在每个安装zookeeper的节点上停止zookeeper服务<code>zkServer.sh stop</code></p><p>如果出错查看zookeeper.out错误信息</p><h4 id="Hbase的安装与配置"><a href="#Hbase的安装与配置" class="headerlink" title="Hbase的安装与配置"></a>Hbase的安装与配置</h4><p>已在h1 h2 h3 h4 h5上安装好了hadoop: h1 为namenode h2-5 为datanode<br>已在h1 h2 h3上安装好了zookeeper<br>将在h1 h2 h3 上装hbase 并将h1 作为master和regionserver，h2-3作regionserver</p><p>首先要确定hadoop,Hbase,JDK版本兼容的问题,根据<a href="https://hbase.apache.org/book.html#basic.prerequisites" target="_blank" rel="noopener">兼容表格</a>选取合适的版本</p><p>在本文中我们选取的是hbase 1.3.2.1 <a href="https://mirrors.tuna.tsinghua.edu.cn/apache/hbase/" target="_blank" rel="noopener">下载链接</a> </p><p>在h1节点下载,下载后通过 <code>tar -xzvf hbase-1.3.2.1-bin.tar.gz -C &#39;安装目录(例如 /usr/)&#39;</code> 来解压。解压后编辑/usr/hbase-1.3.2.1/conf目录下的hbase-env.sh文件 添加如下内容<br><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">export JAVA_HOME=/usr/java/jdk1.8.0_172</span><br><span class="line">export HBASE_LOG_DIR=/usr/hbase-1.3.2.1/logs     #日志路径 如果路径不存在先建立路径</span><br><span class="line">export HBASE_MANAGES_ZK=false                    #表示hbase不用内部的zookeeper </span><br><span class="line">export HBASE_CLASSPATH=/usr/hadoop-2.6.5/etc/hadoop #hadoop配置文件所在位置</span><br></pre></td></tr></table></figure></p><p>编辑hbase-site.xml 文件如下<br><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag">&lt;<span class="name">configuration</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span> </span><br><span class="line"><span class="tag">&lt;<span class="name">name</span>&gt;</span>hbase.rootdir<span class="tag">&lt;/<span class="name">name</span>&gt;</span> </span><br><span class="line"><span class="tag">&lt;<span class="name">value</span>&gt;</span>hdfs://h1:9000/hbase<span class="tag">&lt;/<span class="name">value</span>&gt;</span> </span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span> </span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span> </span><br><span class="line"><span class="tag">&lt;<span class="name">name</span>&gt;</span>hbase.cluster.distributed<span class="tag">&lt;/<span class="name">name</span>&gt;</span> </span><br><span class="line"><span class="tag">&lt;<span class="name">value</span>&gt;</span>true<span class="tag">&lt;/<span class="name">value</span>&gt;</span> </span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span> </span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span> </span><br><span class="line"><span class="tag">&lt;<span class="name">name</span>&gt;</span>hbase.zookeeper.quorum<span class="tag">&lt;/<span class="name">name</span>&gt;</span> </span><br><span class="line"><span class="tag">&lt;<span class="name">value</span>&gt;</span>h1,h2,h3<span class="tag">&lt;/<span class="name">value</span>&gt;</span> </span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span> </span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span> </span><br><span class="line"><span class="tag">&lt;<span class="name">name</span>&gt;</span>hbase.zookeeper.property.dataDir<span class="tag">&lt;/<span class="name">name</span>&gt;</span> </span><br><span class="line"><span class="tag">&lt;<span class="name">value</span>&gt;</span>/home/weixiang/data/zookeeper<span class="tag">&lt;/<span class="name">value</span>&gt;</span> </span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span> </span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span> </span><br><span class="line"><span class="tag">&lt;<span class="name">name</span>&gt;</span>hbase.master<span class="tag">&lt;/<span class="name">name</span>&gt;</span> </span><br><span class="line"><span class="tag">&lt;<span class="name">value</span>&gt;</span>hdfs://h1:60000<span class="tag">&lt;/<span class="name">value</span>&gt;</span> </span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span>  </span><br><span class="line"><span class="tag">&lt;/<span class="name">configuration</span>&gt;</span></span><br></pre></td></tr></table></figure></p><p>以上配置文件涉及zookeeper的要与已经装好的zookeeper信息相符合,涉及到hadooop的也要与已装的hadoop相符。</p><p>编辑regionservers文件,将regionservers的主机名添加到文件中<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">h1</span><br><span class="line"></span><br><span class="line">h2</span><br><span class="line"></span><br><span class="line">h3</span><br></pre></td></tr></table></figure></p><p>最后在建立hdfs-site.xml 软连接到hadoop的hdfs-site.xml配置文件<code>ln -s hdfs-site.xml /usr/hadoop-2.6.5/etc/hadoop/hdfs-site.xml</code><br>效果如下</p><p>lrwxrwxrwx. 1 weixiang weixiang   42 Aug  7 18:00 hdfs-site.xml -&gt; /usr/hadoop-2.6.5/etc/hadoop/hdfs-site.xml</p><p>至此hbase配置完成通过<code>scp -r</code>命令将hbase文件夹复制到各个节点的安装目录<br>在各个节点将hbase添加到环境变量中,编辑/etc/profile文件 添加<br><code>export HBASE_HOME=安装目录/hbase-1.3.2.1</code><br><code>export PATH=......:$HBASE_HOME/bin</code></p><h4 id="Hbase-的启动和相关脚本说明"><a href="#Hbase-的启动和相关脚本说明" class="headerlink" title="Hbase 的启动和相关脚本说明"></a>Hbase 的启动和相关脚本说明</h4><p>脚本使用小结：</p><ol><li>开启集群，start-hbase.sh </li><li>关闭集群，stop-hbase.sh </li><li>开启/关闭所有的regionserver、zookeeper，hbase-daemons.sh start/stop regionserver/zookeeper </li><li>开启/关闭单个regionserver、zookeeper，hbase-daemon.sh start/stop regionserver/zookeeper </li><li>开启/关闭master hbase-daemon.sh start/stop master, 是否成为active master取决于当前是否有active master<br>两个进阶脚本 </li><li>rolling-restart.sh 可以用来挨个滚动重启 </li><li>graceful_stop.sh move服务器上的所有region后，再stop/restart该服务器，可以用来进行版本的热升级 </li></ol><p>几个细节： </p><ol><li><p>hbase-daemon.sh start master 与 hbase-daemon.sh start master –backup，这2个命令的作用一样的，是否成为backup或active是由master的内部逻辑来控制的 </p></li><li><p>stop-hbase.sh 不会调用hbase-daemons.sh stop regionserver 来关闭regionserver， 但是会调用hbase-daemons.sh stop zookeeper/master-backup来关闭zk和backup master，关闭regionserver实际调用的是hbaseAdmin的shutdown接口 </p></li><li><p>通过$HBASE_HOME/bin/hbase stop master关闭的是整个集群而非单个master，只关闭单个master的话使用$HBASE_HOME/bin/hbase-daemon.sh stop master </p></li><li><p>$HBASE_HOME/bin/hbase stop regionserver/zookeeper 不能这么调，调了也会出错，也没有路径会调用这个命令，但是可以通过$HBASE_HOME/bin/hbase start regionserver/zookeeper 来启动rs或者zk，hbase-daemon.sh调用的就是这个命令</p></li></ol><p>常用到的HBase启动脚本有： </p><ol><li>$HBASE_HOME/bin/start-hbase.sh<br>启动整个集群 </li><li>$HBASE_HOME/bin/stop-hbase.sh<br>停止整个集群 </li><li>$HBASE_HOME/bin/hbase-daemons.sh<br>启动或停止，所有的regionserver或zookeeper或backup-master </li><li>$HBASE_HOME/bin/hbase-daemon.sh<br>启动或停止，单个master或regionserver或zookeeper </li></ol><p>以start-hbase.sh为起点，可以看看脚本间的一些调用关系 </p><p>start-hbase.sh的流程如下： </p><ol><li>运行hbase-config.sh（作用后面解释） </li><li>解析参数（0.96版本及以后才可以带唯一参数autorestart，作用就是重启） </li><li>调用hbase-daemon.sh来启动master；调用hbase-daemons.sh来启动regionserver zookeeper master-backup </li></ol><p>hbase-config.sh的作用： </p><p>装载相关配置，如HBASE_HOME目录，conf目录，regionserver机器列表，JAVA_HOME目录等，它会调用$HBASE_HOME/conf/hbase-env.sh </p><p>hbase-env.sh的作用： </p><p>主要是配置JVM及其GC参数，还可以配置log目录及参数，配置是否需要hbase管理ZK，配置进程id目录等 </p><p>hbase-daemons.sh的作用： </p><p>根据需要启动的进程，<br>如为zookeeper,则调用zookeepers.sh<br>如为regionserver，则调用regionservers.sh<br>如为master-backup，则调用master-backup.sh </p><p>zookeepers.sh的作用： </p><p>如果hbase-env.sh中的HBASE_MANAGES_ZK” = “true”，那么通过ZKServerTool这个类解析xml配置文件，获取ZK节点列表（即hbase.zookeeper.quorum的配置值），然后通过SSH向这些节点发送远程命令： </p><p>cd ${HBASE_HOME};<br>$bin/hbase-daemon.sh –config ${HBASE_CONF_DIR} start/stop zookeeper </p><p>regionservers.sh的作用： </p><p>与zookeepers.sh类似，通过${HBASE_CONF_DIR}/regionservers配置文件，获取regionserver机器列表，然后SSH向这些机器发送远程命令：<br>cd ${HBASE_HOME};<br>$bin/hbase-daemon.sh –config ${HBASE_CONF_DIR} start/stop regionserver </p><p>hbase-daemon.sh的作用： </p><p>无论是zookeepers.sh还是regionservers.sh或是master-backup.sh，最终都会调用本地的hbase-daemon.sh，其执行过程如下：</p><ol><li>运行hbase-config.sh，装载各种配置（java环境、log配置、进程ID目录等） </li><li>如果是start命令？滚动out输出文件，滚动gc日志文件，日志文件中输出启动时间+ulimit -a信息，如“Mon Nov 26 10:31:42 CST 2012 Starting master on dwxx.yy.taobao””..open files                      (-n) 65536..” </li><li>调用$HBASE_HOME/bin/hbase start master/regionserver/zookeeper </li><li>执行wait，等待3中开启的进程结束 </li><li>执行cleanZNode，将regionserver在zk上登记的节点删除，这样做的目的是：在regionserver进程意外退出的情况下，可以免去3分钟的ZK心跳超时等待，直接由master进行宕机恢复 </li><li>如果是stop命令？<br>根据进程ID，检查进程是否存在；调用kill命令，然后等待到进程不存在为止 </li><li>如果是restart命令？<br>调用stop后，再调用start</li></ol><h4 id="Hbase简介"><a href="#Hbase简介" class="headerlink" title="Hbase简介"></a>Hbase简介</h4><ul><li>HBase是一个分布式的、面向列的开源数据库，该技术来源于Chang et al所撰写的Google论文“Bigtable：一个结构化数据的分布式存储系统”。</li><li>就像Bigtable利用了Google文件系统（File System）所提供的分布式数据存储一样，HBase在Hadoop之上提供了类似于Bigtable的能力，并利用hdfs文件系统作为后端存储</li><li>HBase是Apache的Hadoop 项目的子项目。</li><li>HBase不同于一般的关系数据库,它是一个适合于非结构化数据存储的数据库.另一个不同的是HBase基于列的而不是基于行的模式</li></ul><h4 id="Hbase-逻辑模型"><a href="#Hbase-逻辑模型" class="headerlink" title="Hbase 逻辑模型"></a>Hbase 逻辑模型</h4><p><a href="https://imgchr.com/i/PL5MLQ" target="_blank" rel="noopener"><img src="https://s1.ax1x.com/2018/08/28/PL5MLQ.md.png" alt="PL5MLQ.md.png"></a></p><ul><li>以表的形式存放数据</li><li>表由行与列组成，每个列属于某个列族，由行和列确定的存储单元称为元素</li><li>每个元素保存了同一份数据的多个版本，由时间戳来标识区分</li></ul><p><strong>行建</strong>：行键可以是最大长度不超过64KB的任意字符串，并按照字典序存储<br>对于经常要一起读取的行，要对行键值精心设计，以便它们能放在一起存储</p><ul><li>行键是数据行在表里的唯一标识，并作为检索记录的主键</li><li>访问表里的行只有三种方式<blockquote><p>  1.通过单个行键访问<br>  2.给定行键的范围访问<br>  3.全表扫描   </p></blockquote></li></ul><p><strong>列族与列</strong>：</p><ul><li>列表示为&lt;列族&gt;:&lt;限定符&gt;</li><li>Hbase在磁盘上按照列族存储数据，这种列式数据库的设计非常适合于数据分析的情形</li><li>列族里的元素最好具有相同的读写方式（例如等长的字符串），以提高性能</li><li>在创建表的时候要指定列族的数目，但是列族中限定符的数目（即列的数目）可以不定</li></ul><p><strong>时间戳</strong>：</p><ul><li>对应每次数据操作的时间，可由系统自动生成，也可以由用户显式的赋值</li><li>Hbase支持两种数据版本回收方式：1 每个数据单元，只存储指定个数的最新版本 2 保存指定时间长度的版本（例如7天）</li><li>常见的客户端时间查询：“某个时刻起的最新数据”或“给我全部版本的数据”</li><li>元素由 行键，列族:限定符，时间戳唯一决定</li><li>元素以字节码形式存放，没有类型之分</li></ul><h4 id="Hbase-物理模型"><a href="#Hbase-物理模型" class="headerlink" title="Hbase 物理模型"></a>Hbase 物理模型</h4><p><img src="https://s1.ax1x.com/2018/08/28/PL4crQ.png" alt="PL4crQ.png"></p><p> <strong>物理结构说明：</strong><br> 由于Hbase后端存储采用HDFS，HDFS中很难对文件进行修改操作，即使是最基本的追加append操作也是很难实现的，因此Hbase在修改数据时实际上是插入一个新时间戳的记录，并且Hbase在删除时所做的操作只是打上删除标签，当storefile文件合并的时候再剔除删除项和时间戳过期的记录。   </p><p><strong>Region与RegionServer:</strong></p><ul><li>表在行方向上，按照行键范围划分成若干的Region</li><li>每个表最初只有一个region，当记录数增加到超过某个阈值时，开始分裂成两个region</li><li>物理上所有数据存放在HDFS，由Region服务器提供region的管理</li><li>一台物理节点只能跑一个HRegionServer</li><li>一个Hregionserver可以管理多个表的Region实例</li><li>一个Region实例包括Hlog日志和存放数据的Store</li><li>Hmaster作为总控节点</li><li>Zookeeper负责调度</li></ul><p><strong>HLog：</strong></p><ul><li>用于灾难恢复（掉电、物理介质损坏等）</li><li>预写式日志，记录所有更新操作，操作先记录进日志，数据才会写入  </li></ul><p><strong>-ROOT- 和 .META. 表：</strong><br><img src="https://s1.ax1x.com/2018/08/28/PLI9f0.png" alt="PLI9f0.png"></p><ul><li>HBase中有两张特殊的Table，-ROOT-和.META.</li><li>.META.：记录了用户表的Region信息分布情况，.META.本身可以分布在多个regoin</li><li>-ROOT-：记录了.META.表的Region信息，-ROOT-只有一个region</li><li>Zookeeper中记录了-ROOT-表的location</li></ul><p><strong>Memstore与storefile：</strong></p><ul><li>一个region由多个store组成，每个store包含一个列族的所有数据，这也是为什么Hbase是列式数据库</li><li>Store包括位于把内存的memstore和位于硬盘的storefile</li><li>写操作先写入memstore，当memstore中的数据量达到某个阈值，Hregionserver会启动flashcache进程写入storefile，每次写入形成单独一个storefile</li><li>当storefile文件的数量增长到一定阈值后，系统会进行合并，在合并过程中会进行版本合并和删除工作，形成更大的storefile</li><li>当storefile大小超过一定阈值后，会把当前的region分割为两个，并由Hmaster分配到相应的region服务器，实现负载均衡</li><li>客户端检索数据时，先在memstore找，找不到再找storefile</li></ul><p><strong>Key-Value format:</strong><br><img src="https://s1.ax1x.com/2018/08/28/PLoLZQ.png" alt="PLoLZQ.png"><br>在Hbase中会建立B+树作为索引，B+ tree的叶子节点就是形如上图的key-value,而索引节点现实按照Key中的Row建立索引再逐层按照其他属性如列族(column family)进行索引。 因此相同列族的记录会在同一个子树下。<br>物理模型图解：<br><img src="https://s1.ax1x.com/2018/08/28/PLoda9.png" alt="PLoda9.png"></p><h4 id="比较"><a href="#比较" class="headerlink" title="比较"></a>比较</h4><p><strong>Hbase vs Oracle:</strong></p><ul><li>索引不同造成行为的差异</li><li>Hbase适合大量插入同时又有读的情况</li><li>Hbase的瓶颈是硬盘传输速度，Oracle的瓶颈是硬盘寻道时间,因为Hbase所有操作都可以看作是插入操作而且是大批量的，因此速度取决于与硬盘的传输速度。但是oracle经常要随机读写，update时先找到修改内容对应block装载到内存，修改再回写到硬盘。</li><li>Hbase很适合寻找按照时间排序top n的场景</li><li>Hbase不能复杂的统计适合做简单的key-value查询</li></ul><h4 id="参考文章"><a href="#参考文章" class="headerlink" title="参考文章:"></a>参考文章:</h4><blockquote><p>1: <a href="https://spaces-x.github.io/2018/07/26/hadoop-d-2/#Hadoop%E9%9B%86%E7%BE%A4%E9%83%A8%E7%BD%B2">https://spaces-x.github.io/2018/07/26/hadoop-d-2/#Hadoop%E9%9B%86%E7%BE%A4%E9%83%A8%E7%BD%B2</a><br>2: <a href="https://www.cnblogs.com/lsdb/p/7297731.html" target="_blank" rel="noopener">https://www.cnblogs.com/lsdb/p/7297731.html</a><br>3: <a href="https://blog.csdn.net/gnail_oug/article/details/46981607" target="_blank" rel="noopener">https://blog.csdn.net/gnail_oug/article/details/46981607</a><br>4: <a href="https://www.w3cschool.cn/hbase_doc/" target="_blank" rel="noopener">https://www.w3cschool.cn/hbase_doc/</a><br>5: <a href="https://hbase.apache.org/book.html" target="_blank" rel="noopener">https://hbase.apache.org/book.html</a><br>6: <a href="http://zjushch.iteye.com/blog/1736065" target="_blank" rel="noopener">http://zjushch.iteye.com/blog/1736065</a></p></blockquote>]]></content>
    
    <summary type="html">
    
      
      
        &lt;script type=&quot;text/javascript&quot; src=&quot;toc/js/jquery-1.4.4.min.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;toc/js/jquery.ztree.all-3.5.mi
      
    
    </summary>
    
      <category term="Hbase" scheme="https://spaces-x.github.io/categories/Hbase/"/>
    
    
      <category term="noSQL" scheme="https://spaces-x.github.io/tags/noSQL/"/>
    
      <category term="Hbase" scheme="https://spaces-x.github.io/tags/Hbase/"/>
    
      <category term="列式数据库" scheme="https://spaces-x.github.io/tags/%E5%88%97%E5%BC%8F%E6%95%B0%E6%8D%AE%E5%BA%93/"/>
    
  </entry>
  
  <entry>
    <title>Git</title>
    <link href="https://spaces-x.github.io/2018/08/01/git/"/>
    <id>https://spaces-x.github.io/2018/08/01/git/</id>
    <published>2018-08-01T12:14:00.000Z</published>
    <updated>2018-08-21T05:43:44.483Z</updated>
    
    <content type="html"><![CDATA[<h3 id="git-简介"><a href="#git-简介" class="headerlink" title="git 简介"></a>git 简介</h3><p>Git是什么？</p><p>Git是目前世界上最先进的分布式版本控制系统（没有之一）。</p><p>那什么是版本控制系统？<br>如果你用Microsoft Word写过长篇大论，那你一定有这样的经历:<br>&emsp;想删除一个段落，又怕将来想恢复找不回来怎么办？有办法，先把当前文件“另存为……”一个新的Word文件，再接着改，改到一定程度，再“另存为……”一个新文件，这样一直改下去，最后你的Word文档变成了这样：</p><p><img src="https://cdn.liaoxuefeng.com/cdn/files/attachments/0013848606651673ff1c83932d249118bf8fd5c58c15ca2000/0" alt=""></p><p>过了一周，你想找回被删除的文字，但是已经记不清删除前保存在哪个文件里了，只好一个一个文件去找，真麻烦。</p><p>看着一堆乱七八糟的文件，想保留最新的一个，然后把其他的删掉，又怕哪天会用上，还不敢删，真郁闷。</p><p>更要命的是，有些部分需要你的财务同事帮助填写，于是你把文件Copy到U盘里给她（也可能通过Email发送一份给她），然后，你继续修改Word文件。一天后，同事再把Word文件传给你，此时，你必须想想，发给她之后到你收到她的文件期间，你作了哪些改动，得把你的改动和她的部分合并，真困难。</p><p>于是你想，如果有一个软件，不但能自动帮我记录每次文件的改动，还可以让同事协作编辑，这样就不用自己管理一堆类似的文件了，也不需要把文件传来传去。如果想查看某次改动，只需要在软件里瞄一眼就可以，岂不是很方便？</p><p>这个软件用起来就应该像这个样子，能记录每次文件的改动：</p><table><thead><tr><th style="text-align:center">版本</th><th style="text-align:center">文件名</th><th style="text-align:center">用户</th><th style="text-align:center">说明</th><th style="text-align:center">日期</th></tr></thead><tbody><tr><td style="text-align:center">1</td><td style="text-align:center">service.doc</td><td style="text-align:center">张三</td><td style="text-align:center">删除了软件服务条款5</td><td style="text-align:center">7/12 10:38</td></tr><tr><td style="text-align:center">2</td><td style="text-align:center">service.doc</td><td style="text-align:center">张三</td><td style="text-align:center">增加了License人数限制</td><td style="text-align:center">7/12 18:09</td></tr><tr><td style="text-align:center">3</td><td style="text-align:center">service.doc</td><td style="text-align:center">李四</td><td style="text-align:center">财务部门调整了合同金额</td><td style="text-align:center">7/13 9:51</td></tr><tr><td style="text-align:center">4</td><td style="text-align:center">service.doc</td><td style="text-align:center">张三</td><td style="text-align:center">延长了免费升级周期</td><td style="text-align:center">7/14 15:17</td></tr></tbody></table><h3 id="Git工作原理"><a href="#Git工作原理" class="headerlink" title="Git工作原理"></a>Git工作原理</h3><h4 id="工作区（Working-Directory）"><a href="#工作区（Working-Directory）" class="headerlink" title="工作区（Working Directory）"></a>工作区（Working Directory）</h4><p>工作区就是电脑上的一个目录，一般就是项目所在的目录。<br><img src="https://i.loli.net/2018/08/13/5b7170abb887a.png" alt="working directory"></p><h4 id="版本库（Repository）"><a href="#版本库（Repository）" class="headerlink" title="版本库（Repository）"></a>版本库（Repository）</h4><p>工作区有一个隐藏目录.git，这个不算工作区，而是Git的版本库。<br>Git的版本库里存了很多东西，其中最重要的就是称为stage（或者叫index）的暂存区，还有Git为我们自动创建的第一个分支master，以及指向master的一个指针叫HEAD。<br>第一步是用git add把文件添加进去，实际上就是把文件修改添加到暂存区；<br>第二步是用git commit提交更改，实际上就是把暂存区的所有内容提交到当前分支。</p><p><img src="https://cdn.liaoxuefeng.com/cdn/files/attachments/001384907702917346729e9afbf4127b6dfbae9207af016000/0" alt="工作逻辑流程"></p><h3 id="git-常用命令及作用"><a href="#git-常用命令及作用" class="headerlink" title="git 常用命令及作用"></a>git 常用命令及作用</h3><ol><li>任何人在使用git之前，都要提交简单的个人信息，以便git区分不同的提交者身份。<br><code>git config –global user.name “your name”</code><br><code>git config –global user.email yourname@example.com</code></li><li>想新开启一个项目，应该先建立一个目录，然后所有的项目开发内容都在此目录下进行。<br>cd workdir<br><code>git init  //产生.git文件夹</code><br><code>git add . //将目录下的所有文件都加入到暂存(index)区</code><br><code>git commit -m &quot;提交信息&quot; //将当前的暂存(index)区提交到版本库的Head所指向的分支</code><br><code>git commit -a //这是一个偷懒的命令，相当于git add .; git commit; 但是不会将新建立的文件add进去 只管修改过的已存在的文件</code></li><li>查看修改<br><code>git diff --cached //查看index file和仓库之间代码的区别的</code><br><code>git diff          //如果省略–cached选项的话，就是比较working tree和index file(暂存区)的区别</code><br><code>git status  //这个命令在git commit之前有效，表示都有哪些文件发生了改动</code></li><li>查看日志<br><code>git log     //查看commit简要日志</code><br><img src="https://s1.ax1x.com/2018/08/14/Pg7NNR.png" alt="log.png"><br><code>git log -p  //会输出非常详细的日志内容，包括了每次都做了哪些源码的修改</code><br><img src="https://s1.ax1x.com/2018/08/14/Pg70gK.png" alt="Pg70gK.png"><br>只显示了部分详细信息<br><code>git show $commit_id //显示某个提交提与上一个提交相比的详细信息 包括改了哪里</code><br><img src="https://s1.ax1x.com/2018/08/14/PgO57D.png" alt="PgO57D.png"><br>显示某个分支的详细信息<br><code>git show 分支名</code><br><img src="https://s1.ax1x.com/2018/08/14/PgLgL8.png" alt="PgLgL8.png"><br><code>git show HEAD^ //查看HEAD的父母的信息</code><br><code>git show HEAD^^ //查看HEAD的父母的父母的信息</code><br><code>git show HEAD~4 //查看HEAD上溯4代的信息</code>  </li><li>分支<br><code>git branch //显示当前都有哪些分支，其中标注*为当前所在分支</code><br><code>git branch experimental //创建一个试验分支，名称叫experimental</code><br><code>git checkout experimental //转移到experimental分支</code><br>如果分支开发成功：修改代码<br><code>git commit -a //在experimental分支改进完代码之后用commit在此分支中进行提交</code><br><code>git checkout master //转移回master分支</code><br><code>git merge experimental //经证实分支开发成功，将exerimental分支合并到主分支</code>  <img src="https://s1.ax1x.com/2018/08/14/PgHwIs.png" alt="conflict.png"><br><img src="https://s1.ax1x.com/2018/08/14/PgH3Gt.md.png" alt="a.c"><br>如果冲突需要人为修改冲突的部分,修改后<br><img src="https://s1.ax1x.com/2018/08/14/PgHfo9.png" alt="a.c"><br><code>git commit -a //彻底完成此次分支合并，即提交master分支</code><br>如果合并后没问题可以将experimental分支删除<br><code>git branch -d experimental //因为experimental分支已提交，所以可安全删除此</code><br>如果分支开发失败：<br>git checkout master<br>git branch -D experimental //由于分支被证明失败，因此使用-D来放弃并删除该分支</li><li>图形化界面<br><code>gitk</code><br>在5中gitk的效果<br><img src="https://s1.ax1x.com/2018/08/14/PgbkwQ.png" alt="PgbkwQ.png"><br><code>gitk –since=”2 weeks ago” drivers/</code>   将在GUI中显示自2周前到现在为止的且位于drivers目录下的分支记录信息  </li><li>拉取<br>我如果非常非常信任bob的开发能力：<br><code>git pull /home/bob/myrepo</code><br>pull命令的意思是从远端git仓库中取出然后合并(git-merge)到我（rocrocket）的项目中去。git-pull命令有可能会因为/home/bob的目录权限问题而被拒绝，解决方法是<code>chmod o+rx /home/bob</code><br>如果我不是很信任bob的开发能力：<br><code>git fetch /home/bob/myrepo master:bobworks</code><br>此命令意思是提取出bob修改的代码内容，然后放到我（rocrocket）工作目录下的bobworks分支中。之所以要放到分支中，而不是master中，就是要我先仔仔细细看看bob的开发成果，如果我觉得满意，我再merge到master中，如果不满意，我完全可以直接git branch -D掉。<br><code>git whatchanged -p master..bobworks //用来查看bob都做了什么</code><br><code>git checkout master //切换到master分区</code><br><code>git pull . bobworks //如果我检查了bob的工作后很满意，就可以用pull来将bobworks分支合并到我的项目中了</code><br><code>git branch -D bobworks //如果我检查了bob的工作后很不满意，就可以用-D来放弃这个分支就可以了</code><br>过了几天，bob如果想继续帮助我开发，他需要先同步一下我这几天的工作成果，只要在其当初clone的myrepo目录下执行git pull即可：<br><code>git pull //不用加任何参数，因为当初clone的时候，git已经记住了我（rocrocket）的工作目录，它会直接找到我的目录来取。</code>  </li><li>远端库<br>git 不仅可以保存在本地还可以上传到远端的github库,为了方便上传我们首先<a href="https://blog.csdn.net/qq_35246620/article/details/69061355" target="_blank" rel="noopener">配置ssh传输密钥</a><br>为本地git库添加远端库<br><code>git remote add origin 远端库路径(例如git@github.com:spaces-X/paper_version.git)</code><br>从本地push到远端库<br><code>git push origin 源(本地)分支:目的(远端库)分支</code><br><a href="https://imgchr.com/i/PgbytA" target="_blank" rel="noopener"><img src="https://s1.ax1x.com/2018/08/14/PgbytA.md.png" alt="PgbytA.md.png"></a><br>在上图中我们可以看到 本地HEAD-&gt;master比远端的origin-&gt;master先进了一个版本采用上述命令来同步远端库<br>从远端库拉取到本地<br><code>git fetch origin 源(远端)分支:目的(本地)分支</code><br><code>git checkout master //切换到本地master分支上</code><br><code>git merge 分支      //合并分支到master</code>  </li><li><p>Tag与搜索<br><code>git tag V3 $commit_id  以后可以用V3来代替复杂的名称commit_id</code><br><img src="https://s1.ax1x.com/2018/08/14/PgOVFH.png" alt="tag.png">  </p><p>可以用git grep帮助我们搜索：<br><code>git grep “print” V3 //在V3中搜索所有的包含print的行</code><br><code>git grep “print” //在所有的历史记录中搜索包含print的行</code><br><code>git log V3..V7   //显示V3之后直至V7的所有历史记录</code><br><code>git log –since=”2 weeks ago”</code> //显示2周前到现在的所有历史记录。具体语法可查询git-ref-parse命令的帮助文件。</p></li></ol>]]></content>
    
    <summary type="html">
    
      
      
        &lt;h3 id=&quot;git-简介&quot;&gt;&lt;a href=&quot;#git-简介&quot; class=&quot;headerlink&quot; title=&quot;git 简介&quot;&gt;&lt;/a&gt;git 简介&lt;/h3&gt;&lt;p&gt;Git是什么？&lt;/p&gt;
&lt;p&gt;Git是目前世界上最先进的分布式版本控制系统（没有之一）。&lt;/p&gt;
&lt;p&gt;那什
      
    
    </summary>
    
      <category term="Git 版本管理" scheme="https://spaces-x.github.io/categories/Git-%E7%89%88%E6%9C%AC%E7%AE%A1%E7%90%86/"/>
    
    
      <category term="linux" scheme="https://spaces-x.github.io/tags/linux/"/>
    
      <category term="git" scheme="https://spaces-x.github.io/tags/git/"/>
    
      <category term="版本管理" scheme="https://spaces-x.github.io/tags/%E7%89%88%E6%9C%AC%E7%AE%A1%E7%90%86/"/>
    
  </entry>
  
  <entry>
    <title>Hadoop Day 2</title>
    <link href="https://spaces-x.github.io/2018/07/26/hadoop-d-2/"/>
    <id>https://spaces-x.github.io/2018/07/26/hadoop-d-2/</id>
    <published>2018-07-26T05:15:46.000Z</published>
    <updated>2018-08-21T05:43:48.339Z</updated>
    
    <content type="html"><![CDATA[<script type="text/javascript" src="toc/js/jquery-1.4.4.min.js"></script><script type="text/javascript" src="toc/js/jquery.ztree.all-3.5.min.js"></script><script type="text/javascript" src="toc/js/ztree_toc.js"></script><script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=default"></script><p>[TOC]</p><h3 id="Hadoop集群部署"><a href="#Hadoop集群部署" class="headerlink" title="Hadoop集群部署"></a>Hadoop集群部署</h3><p>Hadoop 部署中需要以下几个主要步骤</p><blockquote><ul><li><a href="https://jingyan.baidu.com/article/20095761d65c67cb0721b4a8.html" target="_blank" rel="noopener">创建虚拟机Centos7</a>并复制2个虚拟机副本</li><li><a href="https://jingyan.baidu.com/article/03b2f78c6cd4cd5ea337ae11.html" target="_blank" rel="noopener">为虚拟机配置NAT端口转发方便ssh</a></li><li><a href="https://blog.csdn.net/pucao_cug/article/details/71698903#t1" target="_blank" rel="noopener">修改hosts文件并配置相互间免密ssh</a></li><li>安装<a href="http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html" target="_blank" rel="noopener">jdk8</a>和<a href="http://hadoop.apache.org/releases.html" target="_blank" rel="noopener">hadoop</a>，<a href="https://blog.csdn.net/pucao_cug/article/details/71698903#t11" target="_blank" rel="noopener">并修改配置文件</a></li><li><a href="https://blog.csdn.net/pucao_cug/article/details/71698903#t18" target="_blank" rel="noopener">启动hadoop</a></li></ul></blockquote><h3 id="eclips-hadoop2-6-5-开发环境配置"><a href="#eclips-hadoop2-6-5-开发环境配置" class="headerlink" title="eclips + hadoop2.6.5 开发环境配置"></a>eclips + hadoop2.6.5 开发环境配置</h3><p>所需的软件：</p><ul><li style="list-style: none"><input type="checkbox" checked> <a href="http://www.eclipse.org/downloads/packages/release/luna/sr2" target="_blank" rel="noopener">eclipse JEE Version:Luna(4.4.2)</a></li><li style="list-style: none"><input type="checkbox" checked> <a href="https://github.com/winghc/hadoop2x-eclipse-plugin" target="_blank" rel="noopener">hadoop-eclipse-plugin-2.6.0.jar</a></li></ul><p>eclipse的配置方法参见<a href="https://blog.csdn.net/sl1992/article/details/53171342#1%E4%B8%8B%E8%BD%BD%E6%8F%92%E4%BB%B6%E5%8C%85" target="_blank" rel="noopener">“Windows下使用Eclipse工具搭建Hadoop2.6.4开发环境”</a></p><p>可能遇到的错误的<a href="https://blog.csdn.net/sl1992/article/details/53171342#8%E5%88%9B%E5%BB%BAhadoop%E4%B8%AD%E7%9A%84mapreduce%E5%B7%A5%E7%A8%8B" target="_blank" rel="noopener">解决方案</a></p><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">log4j.rootLogger=debug,stdout,R  </span><br><span class="line">log4j.appender.stdout=org.apache.log4j.ConsoleAppender  </span><br><span class="line">log4j.appender.stdout.layout=org.apache.log4j.PatternLayout  </span><br><span class="line">log4j.appender.stdout.layout.ConversionPattern=%5p - %m%n  </span><br><span class="line">log4j.appender.R=org.apache.log4j.RollingFileAppender  </span><br><span class="line">log4j.appender.R.File=mapreduce_test.log  </span><br><span class="line">log4j.appender.R.MaxFileSize=1MB  </span><br><span class="line">log4j.appender.R.MaxBackupIndex=1  </span><br><span class="line">log4j.appender.R.layout=org.apache.log4j.PatternLayout  </span><br><span class="line">log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%  </span><br><span class="line">log4j.logger.com.codefutures=DEBUG</span><br></pre></td></tr></table></figure><h3 id="Hadoop家族"><a href="#Hadoop家族" class="headerlink" title="Hadoop家族"></a>Hadoop家族</h3><blockquote><ul><li>Pig</li><li>Zookeeper</li><li>Hbase</li><li>Hive</li><li>Sqoop</li><li>Avro</li><li>Chukwa</li><li>Cassandra</li></ul></blockquote><p><img src="https://s1.ax1x.com/2018/07/26/PNGULn.jpg" alt="hadoop family"></p><h4 id="Pig"><a href="#Pig" class="headerlink" title="Pig"></a>Pig</h4><p><img src="https://s1.ax1x.com/2018/07/26/PNJq9U.png" alt="Pig"></p><blockquote><ol><li>Hadoop客户端</li><li>使用类似于SQL的面向数据流的语言Pig Latin</li><li>Pig Latin可以完成排序，过滤，求和，聚组，关联等操作，可以支持自定义函数</li><li>Pig自动把Pig Latin映射为Map-Reduce作业上传到集群运行，减少用户编写Java程序的苦恼</li><li>三种运行方式：Grunt shell，脚本方式，嵌入式</li></ol></blockquote><h4 id="Hbase"><a href="#Hbase" class="headerlink" title="Hbase"></a>Hbase</h4><p><img src="https://s1.ax1x.com/2018/07/26/PNJzH1.png" alt="Hbase"></p><blockquote><ol><li>Google Bigtable的开源实现</li><li>列式数据库</li><li>可集群化</li><li>可以使用shell、web、api等多种方式访问</li><li>适合高读写（insert）的场景</li><li>HQL查询语言</li><li>NoSQL的典型代表产品</li></ol></blockquote><h4 id="Hive"><a href="#Hive" class="headerlink" title="Hive"></a>Hive</h4><p><img src="https://s1.ax1x.com/2018/07/26/PNYJDs.png" alt="Hive"></p><blockquote><ol><li>数据仓库工具。可以把Hadoop下的原始结构化数据变成Hive中的表</li><li>数据仓库工具。可以把Hadoop下的原始结构化数据变成Hive中的表</li><li>支持一种与SQL几乎完全相同的语言HiveQL。除了不支持更新、索引和事务，几乎SQL的其它特征都能支持</li><li>可以看成是从SQL到Map-Reduce的映射器</li><li>提供shell、JDBC/ODBC、Thrift、Web等接口</li></ol></blockquote><h4 id="Zookeeper"><a href="#Zookeeper" class="headerlink" title="Zookeeper"></a>Zookeeper</h4><p><img src="https://s1.ax1x.com/2018/07/26/PNYyr9.png" alt="PNYyr9.png"></p><blockquote><ol><li>Google Chubby的开源实现</li><li>用于协调分布式系统上的各种服务。例如确认消息是否准确到达，防止单点失效，处理负载均衡等</li><li>应用场景：Hbase，实现Namenode自动切换</li><li>工作原理：领导者，跟随者以及选举过程</li></ol></blockquote><h4 id="Sqoop"><a href="#Sqoop" class="headerlink" title="Sqoop"></a>Sqoop</h4><p><img src="https://s1.ax1x.com/2018/07/26/PNYR56.png" alt="PNYR56.png"></p><blockquote><ol><li>用于在Hadoop和关系型数据库之间交换数据</li><li>通过JDBC接口连入关系型数据库</li></ol></blockquote><h4 id="Avro"><a href="#Avro" class="headerlink" title="Avro"></a>Avro</h4><blockquote><ol><li>数据序列化工具，由Hadoop的创始人Doug Cutting主持开发</li><li>用于支持大批量数据交换的应用。支持二进制序列化方式，可以便捷，快速地处理大量数据</li><li>动态语言友好，Avro提供的机制使动态语言可以方便地处理 Avro数据。</li><li>Thrift接口</li></ol></blockquote><h4 id="Chukwa"><a href="#Chukwa" class="headerlink" title="Chukwa"></a>Chukwa</h4><blockquote><ol><li>架构在Hadoop之上的数据采集与分析框架</li><li>主要进行日志采集和分析</li><li>通过安装在收集节点的“代理”采集最原始的日志数据</li><li>代理将数据发给收集器</li><li>收集器定时将数据写入Hadoop集群</li><li>指定定时启动的Map-Reduce作业队数据进行加工处理和分析</li><li>Hadoop基础管理中心（HICC）最终展示数据</li></ol></blockquote><h4 id="Cassandra"><a href="#Cassandra" class="headerlink" title="Cassandra"></a>Cassandra</h4><blockquote><ol><li>NoSQL，分布式的Key-Value型数据库，由Facebook贡献</li><li>与Hbase类似，也是借鉴Google Bigtable的思想体系</li><li>只有顺序写，没有随机写的设计，满足高负荷情形的性能需求</li></ol></blockquote><h3 id="hadoop-实例运行"><a href="#hadoop-实例运行" class="headerlink" title="hadoop 实例运行"></a>hadoop 实例运行</h3><p>代码：<br><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br></pre></td><td class="code"><pre><span class="line">example <span class="number">1</span> word count:</span><br><span class="line"><span class="comment">/**</span></span><br><span class="line"><span class="comment"> * Licensed to the Apache Software Foundation (ASF) under one</span></span><br><span class="line"><span class="comment"> * or more contributor license agreements.  See the NOTICE file</span></span><br><span class="line"><span class="comment"> * distributed with this work for additional information</span></span><br><span class="line"><span class="comment"> * regarding copyright ownership.  The ASF licenses this file</span></span><br><span class="line"><span class="comment"> * to you under the Apache License, Version 2.0 (the</span></span><br><span class="line"><span class="comment"> * "License"); you may not use this file except in compliance</span></span><br><span class="line"><span class="comment"> * with the License.  You may obtain a copy of the License at</span></span><br><span class="line"><span class="comment"> *</span></span><br><span class="line"><span class="comment"> *     http://www.apache.org/licenses/LICENSE-2.0</span></span><br><span class="line"><span class="comment"> *</span></span><br><span class="line"><span class="comment"> * Unless required by applicable law or agreed to in writing, software</span></span><br><span class="line"><span class="comment"> * distributed under the License is distributed on an "AS IS" BASIS,</span></span><br><span class="line"><span class="comment"> * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.</span></span><br><span class="line"><span class="comment"> * See the License for the specific language governing permissions and</span></span><br><span class="line"><span class="comment"> * limitations under the License.</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"><span class="keyword">package</span> org.apache.hadoop.examples;</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> java.io.IOException;</span><br><span class="line"><span class="keyword">import</span> java.util.StringTokenizer;</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.conf.Configuration;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.fs.Path;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.io.IntWritable;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.io.Text;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.mapreduce.Job;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.mapreduce.Mapper;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.mapreduce.Reducer;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.mapreduce.lib.input.FileInputFormat;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.util.GenericOptionsParser;</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">WordCount</span> </span>&#123;</span><br><span class="line"></span><br><span class="line">  <span class="keyword">public</span> <span class="keyword">static</span> <span class="class"><span class="keyword">class</span> <span class="title">TokenizerMapper</span>                         //继承<span class="title">Mapper</span>类实现<span class="title">map</span>方法</span></span><br><span class="line"><span class="class">       <span class="keyword">extends</span> <span class="title">Mapper</span>&lt;<span class="title">Object</span>, <span class="title">Text</span>, <span class="title">Text</span>, <span class="title">IntWritable</span>&gt;</span>&#123;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">private</span> <span class="keyword">final</span> <span class="keyword">static</span> IntWritable one = <span class="keyword">new</span> IntWritable(<span class="number">1</span>);</span><br><span class="line">    <span class="keyword">private</span> Text word = <span class="keyword">new</span> Text();</span><br><span class="line">      </span><br><span class="line">    <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">map</span><span class="params">(Object key, Text value, Context context</span></span></span><br><span class="line"><span class="function"><span class="params">                    )</span> <span class="keyword">throws</span> IOException, InterruptedException </span>&#123;</span><br><span class="line">      StringTokenizer itr = <span class="keyword">new</span> StringTokenizer(value.toString());</span><br><span class="line">      <span class="keyword">while</span> (itr.hasMoreTokens()) &#123;</span><br><span class="line">        word.set(itr.nextToken());</span><br><span class="line">        context.write(word, one);</span><br><span class="line">      &#125;</span><br><span class="line">    &#125;</span><br><span class="line">  &#125;</span><br><span class="line">    </span><br><span class="line">  <span class="keyword">public</span> <span class="keyword">static</span> <span class="class"><span class="keyword">class</span> <span class="title">IntSumReducer</span>                           //继承并<span class="title">Reducer</span>类并实现<span class="title">reduce</span>方法</span></span><br><span class="line"><span class="class">       <span class="keyword">extends</span> <span class="title">Reducer</span>&lt;<span class="title">Text</span>,<span class="title">IntWritable</span>,<span class="title">Text</span>,<span class="title">IntWritable</span>&gt; </span>&#123;  </span><br><span class="line">    <span class="keyword">private</span> IntWritable result = <span class="keyword">new</span> IntWritable();</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">reduce</span><span class="params">(Text key, Iterable&lt;IntWritable&gt; values, </span></span></span><br><span class="line"><span class="function"><span class="params">                       Context context</span></span></span><br><span class="line"><span class="function"><span class="params">                       )</span> <span class="keyword">throws</span> IOException, InterruptedException </span>&#123;</span><br><span class="line">      <span class="keyword">int</span> sum = <span class="number">0</span>;</span><br><span class="line">      <span class="keyword">for</span> (IntWritable val : values) &#123;</span><br><span class="line">        sum += val.get();</span><br><span class="line">      &#125;</span><br><span class="line">      result.set(sum);</span><br><span class="line">      context.write(key, result);</span><br><span class="line">    &#125;</span><br><span class="line">  &#125;</span><br><span class="line"></span><br><span class="line">  <span class="function"><span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">void</span> <span class="title">main</span><span class="params">(String[] args)</span> <span class="keyword">throws</span> Exception </span>&#123;</span><br><span class="line">    Configuration conf = <span class="keyword">new</span> Configuration();</span><br><span class="line">    String[] otherArgs = <span class="keyword">new</span> GenericOptionsParser(conf, args).getRemainingArgs();</span><br><span class="line">    <span class="keyword">if</span> (otherArgs.length &lt; <span class="number">2</span>) &#123;</span><br><span class="line">      System.err.println(<span class="string">"Usage: wordcount &lt;in&gt; [&lt;in&gt;...] &lt;out&gt;"</span>);</span><br><span class="line">      System.exit(<span class="number">2</span>);</span><br><span class="line">    &#125;</span><br><span class="line">    Job job = <span class="keyword">new</span> Job(conf, <span class="string">"word count"</span>);</span><br><span class="line">    job.setJarByClass(WordCount.class);</span><br><span class="line">    job.setMapperClass(TokenizerMapper.class);</span><br><span class="line">    job.setCombinerClass(IntSumReducer.class);</span><br><span class="line">    job.setReducerClass(IntSumReducer.class);</span><br><span class="line">    job.setOutputKeyClass(Text.class);</span><br><span class="line">    job.setOutputValueClass(IntWritable.class);</span><br><span class="line">    <span class="keyword">for</span> (<span class="keyword">int</span> i = <span class="number">0</span>; i &lt; otherArgs.length - <span class="number">1</span>; ++i) &#123;</span><br><span class="line">      FileInputFormat.addInputPath(job, <span class="keyword">new</span> Path(otherArgs[i]));</span><br><span class="line">    &#125;</span><br><span class="line">    FileOutputFormat.setOutputPath(job,</span><br><span class="line">      <span class="keyword">new</span> Path(otherArgs[otherArgs.length - <span class="number">1</span>]));</span><br><span class="line">    System.exit(job.waitForCompletion(<span class="keyword">true</span>) ? <span class="number">0</span> : <span class="number">1</span>);</span><br><span class="line">  &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br></pre></td><td class="code"><pre><span class="line">example <span class="number">2</span> :对电话清单进行整理</span><br><span class="line"><span class="keyword">import</span> java.io.IOException;</span><br><span class="line"><span class="keyword">import</span> java.io.InputStream;</span><br><span class="line"><span class="keyword">import</span> java.io.OutputStream;</span><br><span class="line"><span class="keyword">import</span> java.util.StringTokenizer;</span><br><span class="line"><span class="keyword">import</span> javax.sound.midi.SysexMessage;</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="comment">//import javax.tools.Tool;</span></span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.util.Tool;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.conf.Configuration;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.conf.Configured;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.fs.Path;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.io.IntWritable;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.io.LongWritable;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.io.Text;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.mapred.TextOutputFormat;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.mapreduce.Job;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.mapreduce.Mapper;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.mapreduce.Reducer;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.mapreduce.lib.input.FileInputFormat;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.util.GenericOptionsParser;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.util.ToolRunner;</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> com.sun.org.apache.xml.internal.serialize.OutputFormat;</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">Test_2</span> <span class="keyword">extends</span> <span class="title">Configured</span> <span class="keyword">implements</span> <span class="title">Tool</span></span>&#123;</span><br><span class="line">    <span class="keyword">enum</span> Counter</span><br><span class="line">    &#123;</span><br><span class="line">        LINESKIP, <span class="comment">//出错的行数</span></span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">public</span> <span class="keyword">static</span> <span class="class"><span class="keyword">class</span> <span class="title">Map</span> <span class="keyword">extends</span> <span class="title">Mapper</span>&lt;<span class="title">LongWritable</span>, <span class="title">Text</span>, <span class="title">Text</span>, <span class="title">Text</span>&gt;</span>&#123;</span><br><span class="line">        <span class="meta">@Override</span></span><br><span class="line">        <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">map</span><span class="params">(LongWritable key, Text value,</span></span></span><br><span class="line"><span class="function"><span class="params">                Mapper&lt;LongWritable, Text, Text, Text&gt;.Context context)</span></span></span><br><span class="line"><span class="function">                <span class="keyword">throws</span> IOException, InterruptedException</span>&#123;</span><br><span class="line">            <span class="comment">// TODO Auto-generated method stub</span></span><br><span class="line">            <span class="keyword">try</span> &#123;</span><br><span class="line">                String line = value.toString();</span><br><span class="line">                String[] linesplit = line.split(<span class="string">" "</span>);</span><br><span class="line">                String anum = linesplit[<span class="number">0</span>];</span><br><span class="line">                String bnum = linesplit[<span class="number">1</span>];</span><br><span class="line">                context.write(<span class="keyword">new</span> Text(bnum), <span class="keyword">new</span> Text(anum));</span><br><span class="line">            &#125; <span class="keyword">catch</span> (java.lang.ArrayIndexOutOfBoundsException e) &#123;</span><br><span class="line">                <span class="comment">// <span class="doctag">TODO:</span> handle exception</span></span><br><span class="line">                context.getCounter(Counter.LINESKIP).increment(<span class="number">1</span>);</span><br><span class="line">                <span class="keyword">return</span>;</span><br><span class="line">            &#125;       </span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">public</span> <span class="keyword">static</span> <span class="class"><span class="keyword">class</span> <span class="title">Reduce</span> <span class="keyword">extends</span> <span class="title">Reducer</span>&lt;<span class="title">Text</span>, <span class="title">Text</span>, <span class="title">Text</span>, <span class="title">Text</span>&gt;</span>&#123;</span><br><span class="line">        <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">reduce</span><span class="params">(Text key,Iterable&lt;Text&gt; values,Context context)</span> <span class="keyword">throws</span> IOException, InterruptedException</span></span><br><span class="line"><span class="function">        </span>&#123;</span><br><span class="line">            String valueString;</span><br><span class="line">            String out=<span class="string">""</span>;</span><br><span class="line">            <span class="keyword">for</span> (Text value:values) &#123;</span><br><span class="line">                valueString = value.toString();</span><br><span class="line">                out += valueString+<span class="string">"|"</span>; </span><br><span class="line">            &#125;</span><br><span class="line">            context.write(key, <span class="keyword">new</span> Text(out));</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="function"><span class="keyword">public</span> <span class="keyword">int</span> <span class="title">run</span><span class="params">(String[] args)</span> <span class="keyword">throws</span> Exception </span>&#123;</span><br><span class="line">        Configuration conf = getConf();</span><br><span class="line">        Job job = <span class="keyword">new</span> Job(conf, <span class="string">"Test_2"</span>);</span><br><span class="line">        job.setJarByClass(Test_2.class);</span><br><span class="line">        <span class="keyword">for</span> (<span class="keyword">int</span> i = <span class="number">0</span>; i &lt; args.length-<span class="number">1</span>; i++) &#123;</span><br><span class="line">            FileInputFormat.setInputPaths(job, <span class="keyword">new</span> Path(args[i]));</span><br><span class="line">        &#125;</span><br><span class="line">        FileOutputFormat.setOutputPath(job, <span class="keyword">new</span> Path(args[args.length-<span class="number">1</span>]));</span><br><span class="line">        job.setMapperClass(Map.class);</span><br><span class="line">        job.setReducerClass(Reduce.class);</span><br><span class="line"><span class="comment">//      job.setOutputFormatClass(TextOutputFormat.class);</span></span><br><span class="line">        job.setOutputKeyClass(Text.class);</span><br><span class="line">        job.setOutputValueClass(Text.class);</span><br><span class="line">        job.waitForCompletion(<span class="keyword">true</span>);</span><br><span class="line">        <span class="keyword">return</span> job.isSuccessful()?<span class="number">0</span>:<span class="number">1</span>;</span><br><span class="line">    &#125;;</span><br><span class="line">    <span class="function"><span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">void</span> <span class="title">main</span><span class="params">(String[] args)</span> <span class="keyword">throws</span> Exception </span>&#123;</span><br><span class="line">        System.out.println(args.length);</span><br><span class="line">        <span class="keyword">int</span> res = ToolRunner.run(<span class="keyword">new</span> Configuration(),<span class="keyword">new</span> Test_2(),args);</span><br><span class="line">        System.exit(res);</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>]]></content>
    
    <summary type="html">
    
      
      
        &lt;script type=&quot;text/javascript&quot; src=&quot;toc/js/jquery-1.4.4.min.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;toc/js/jquery.ztree.all-3.5.mi
      
    
    </summary>
    
      <category term="Hadoop" scheme="https://spaces-x.github.io/categories/Hadoop/"/>
    
    
      <category term="Hadoop部署" scheme="https://spaces-x.github.io/tags/Hadoop%E9%83%A8%E7%BD%B2/"/>
    
      <category term="Eclipse" scheme="https://spaces-x.github.io/tags/Eclipse/"/>
    
      <category term="Map-Reduce" scheme="https://spaces-x.github.io/tags/Map-Reduce/"/>
    
  </entry>
  
  <entry>
    <title>GFW</title>
    <link href="https://spaces-x.github.io/2018/07/21/GFW/"/>
    <id>https://spaces-x.github.io/2018/07/21/GFW/</id>
    <published>2018-07-21T07:36:31.000Z</published>
    <updated>2018-08-21T05:43:46.314Z</updated>
    
    <content type="html"><![CDATA[<script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=default"></script><h3 id="常规的-http-或者是-https-请求被墙-假设它没有被墙-的网站"><a href="#常规的-http-或者是-https-请求被墙-假设它没有被墙-的网站" class="headerlink" title="常规的 http 或者是 https 请求被墙(假设它没有被墙)的网站"></a>常规的 http 或者是 https 请求被墙(假设它没有被墙)的网站</h3><ol><li>首先你在浏览器里面键入 google.com, 然后回车</li><li>浏览器发起 DNS 请求获取 google.com 的 IP 地址(因为TCP连接必须要ip地址, 域名只是为了让人好记忆而发明出来的), DNS 服务器查询 google.com 的 IP 地址, 然后返回给浏览器.</li><li>浏览器拿到了 google.com 的IP地址, 然后向这个 IP 地址发起 TCP 连接, 三次握手之后连接成功.</li><li>然后到了 HTTP 协议的时间了, 浏览器向 google 的服务器发送 HTTP 请求头(因为此时 TCP 连接已经建立, 可以发送数据了), 请求头其实就是一段字符串.<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">GET / HTTP/1.1</span><br><span class="line">Host: www.google.com</span><br><span class="line">User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6)</span><br><span class="line">Gecko/20050225 Firefox/1.0.1</span><br><span class="line">Connection: Keep-Alive</span><br><span class="line">...</span><br></pre></td></tr></table></figure>无论是 HTTP 请求还是加密的 HTTPS 请求, <strong>请求头都是不会被加密的</strong>.</li><li>google.com 收到请求头, 如果是 HTTPS 请求, 还要交换秘钥什么的, 这个就不解释了, 总之就是 google.com 把数据返回回来了.</li></ol><h3 id="在上述过程中墙的作用"><a href="#在上述过程中墙的作用" class="headerlink" title="在上述过程中墙的作用"></a>在上述过程中墙的作用</h3><ol><li>我们键入 google.com 的时候浏览器需要先发起 DNS 请求查询 google.com 的 IP 地址, 如果拿不到这个 IP 地址, 不就无法和 google.com 的真实 IP 地址通讯了吗, 所以 墙 的初期手段就是污染 DNS, AI, 浏览器任你发送 DNS 请求, 国内的 DNS 服务器就给你返回一个错误的 IP 地址, 什么冰岛的,什么澳大利亚的地址给你, 去访问吧, 浏览器很懵, 有可能这个 IP 地址压根就没有被绑定服务器, 然后服务器不可达. 但是 DNS 请求需要我们的浏览器发起, 既然 DNS 请求能被污染, 如果我自己知道正确的 google.com IP地址, 我来告诉浏览器正确的 IP 地址不就行了吗, 刚开始想要科学上网的人确实是这么做的, 在 window 系统 <code>C:\Windows\System32\drivers\etc</code> 该目录下面有一个 hosts 文件, Mac 系统下面的为 <code>/etc/hosts</code>, 如果操作系统能在这里获得关于域名的 IP 地址, 就不会向 DNS服务器发起请求了, 所以都赶忙去找 google.com 的真实 IP 地址填在这里来让操作系统不要去其他的 DNS 查询了. 这个时期应该是成功的度过了一段时间吧, 当然我没有经历过, 我想科学上网的时候 google.com 的 IP 地址已经被封了🧐.忘了一件事情, 因为 DNS 服务器在公网上面有很多, 而且都是免费的, 出名的 8.8.8.8 就是谷歌提供的, 你可以自己填写自己的操作系统的 DNS 地址, 应该有一段时期可以把 DNS 服务器修改为 8.8.8.8, 谷歌并不会返回错误的地址, 所以后来 8.8.8.8 也被墙了👽.</li><li>墙 继续升级, 既然你们能搞到真实的 google.com 的 IP 地址, 那我就把所有已知的 google.com 的 IP 地址全部封了, 但是 google.com 的服务器是可以换 IP 地址的, 你可以试一下, 多访问几次 google.com, 有可能是不相同的, 所以只是把已知的 google.com 服务器的 IP 地址封掉并不能完全的阻挡, 所以 墙 从 HTTP 协议入手了, HTTP 协议请求头的发送全部是明文的, <strong>无论是 HTTP, 还是 HTTPS, 请求头全部都是明文的, 请求头里面有 google.com</strong>, 这就很尴尬了, 这个域名没有办法隐藏, 墙 很开心, 好了, 所有经过我这里的数据包都要被拆开看看你访问的地址是不是违禁地址, 如果是的就不要继续了, 墙 会模拟 google.com 的服务器给你的电脑发送一个 RST TCP响应, 电脑的TCP协议看到这个响应就会无条件的重置连接, 所以你会看到有时候你能拿到正确的 google.com 的IP地址, 也会看到浏览器提示你连接被重置, 这时候的 RST 的响应可是墙发给你的, 是不是感觉很荣耀😀, 然后 墙 还会模仿客户端给 google.com 的服务器发送 RST 响应, google.com 以为是你发的, 也会无条件的关闭 TCP 连接</li></ol><h3 id="SS如何穿墙"><a href="#SS如何穿墙" class="headerlink" title="SS如何穿墙"></a>SS如何穿墙</h3><p>有一个代理协议叫做<code>socks5</code>, 可以帮助你穿透防火墙, 由<code>socks5</code>负责传递双方的数据<br>shadowsocks 把 socks5 拆成了两个部分</p><blockquote><p>client ——-&gt; ssclient ——&gt; ssserver ———&gt; server</p></blockquote><p>client 把数据传递给 ssclient, 这就是你为什么要把 socks5 客户端的地址要填写 ssclient 监听的地址, ssclient 把数据加密, 包括 DNS 请求什么的一切数据全部加密发送给 ssserver(并且 ss 没有握手阶段, 没有明显的数据特征能辨识这个是 ss 流量), ssserver 把数据解密拿到 client 想要访问的域名, 然后发起 DNS 请求获取 IP 地址(这也就是为什么你的 VPS 要放在国外的原因), 然后和这个 IP 地址握手, 然后服务器响应, ssserver 并不负责处理数据, 只是原样把数据加密回传给 ssclient, ssclient 解密数据然后传递给浏览器, 浏览器负责数据的辨识处理(因为 HTTPS 还需要进一步的握手).</p><h3 id="服务器的购买以及shadowsocks配置"><a href="#服务器的购买以及shadowsocks配置" class="headerlink" title="服务器的购买以及shadowsocks配置"></a>服务器的购买以及shadowsocks配置</h3><p><a href="http://blog.sina.com.cn/s/blog_17639fae00102xat4.html" target="_blank" rel="noopener">购买服务器</a><br>上面的连接包含搬瓦工的一键配置ss，但是这个自带的太low了好多加密模式它都没有这就造成容易被墙ip 采用下面的连接配置ss 或 ssr</p><p><a href="https://teddysun.com/486.html/comment-page-10" target="_blank" rel="noopener">配置shadowsocks server</a><br>这个教程中的脚本将各种shadowsocks的版本都集成好了，具体在安装的时候选择即可</p><p><a href="https://zhuanlan.zhihu.com/p/35147877" target="_blank" rel="noopener">加密模式的选择与shadowsocks客户端配置</a></p>]]></content>
    
    <summary type="html">
    
      
      
        &lt;script type=&quot;text/javascript&quot; src=&quot;http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=default&quot;&gt;&lt;/script&gt;


&lt;h3 id=&quot;常规的-http-或者是-https-
      
    
    </summary>
    
      <category term="GFW" scheme="https://spaces-x.github.io/categories/GFW/"/>
    
    
      <category term="Great Firewall" scheme="https://spaces-x.github.io/tags/Great-Firewall/"/>
    
      <category term="shadowsocks" scheme="https://spaces-x.github.io/tags/shadowsocks/"/>
    
  </entry>
  
  <entry>
    <title>Hadoop Day 1</title>
    <link href="https://spaces-x.github.io/2018/07/16/hadoop/"/>
    <id>https://spaces-x.github.io/2018/07/16/hadoop/</id>
    <published>2018-07-16T10:36:01.000Z</published>
    <updated>2018-08-21T05:43:49.884Z</updated>
    
    <content type="html"><![CDATA[<script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=default"></script><h3 id="Hadoop起源"><a href="#Hadoop起源" class="headerlink" title="Hadoop起源:"></a>Hadoop起源:</h3><p>Google的低成本之道:</p><blockquote><ol><li>不使用超算，不使用存储（淘宝的去i，去e，去o之路）</li><li>大量使用普通的pc服务器，提供有冗余的集群服务</li><li>全世界多个数据中心，有些附带发电厂</li><li>运营商（中国联通 电信） 向Google倒付费</li></ol></blockquote><p><img src="https://s1.ax1x.com/2018/07/16/PQ5HbD.png" alt="google hard problem"><br>可以把Hadoop理解成是一个山寨版的Google，它是基于Google的三篇论文（解决上图中的问题）提出，具体如下：</p><ol><li>GFS(Google File System)</li><li>PageRank</li><li>Bigtable</li></ol><p>其中GFS 是HDFS的雏形；Bigtable是HBase的雏形</p><p>而PageRank主要是解决如何量化一个网页的价值问题，google通过建立数学模型来量化网页的价值进而在搜索结果中排序(后面会讲),但是由于该数学模型涉及到百万数量级的矩阵乘法运算，这在世界范围内都找不到能够在秒级单位的Response Time，因此对模型的求解引发了Map-Reduce分布式处理的思想，也就有了Hadoop中Map-Reduced的由来。</p><h3 id="倒排索引："><a href="#倒排索引：" class="headerlink" title="倒排索引："></a>倒排索引：</h3><p>Google 搜索的数据量相当大，按照常人的思维，google搜索应该是全数据库检索，但是这就不符合Google 毫秒级的响应时间。 这里google借助了倒排索引，顾名思义，所谓倒排索引就是于正常相反，不是由记录来确定属性值，而是由属性值来确定记录的位置，因而称为倒排索引。<br>举个简单的例子以英文为例，下面是被索引的数据：</p><ul><li>T<sub>0</sub>: “it is what it is”</li><li>T<sub>1</sub>: “what is it”</li><li>T<sub>2</sub>: “it is a banana”</li></ul><p>通过分词我们可以得到如下的反向索引</p><blockquote><p>“a”:      {2}<br>“banana”: {2}<br>“is”:     {0, 1, 2}<br>“it”:     {0, 1, 2}<br>“what”:   {0, 1}  </p></blockquote><p>搜索 “what is it” 就变成求关键字的交集 即<br>$${ 0,1 } \cap { 0,1,2 } \cap {0,1,2} = {0,1}$$</p><h3 id="PageRank"><a href="#PageRank" class="headerlink" title="PageRank:"></a>PageRank:</h3><p><a href="https://en.wikipedia.org/wiki/PageRank" target="_blank" rel="noopener">PageRank</a> 是用来量化不同网页的价值的，它主要采用不同网页外连接到本网页的多少来量化，其实和期刊的影响因子计算类似，如果其他网页外链到本网页的数目也多，也就是本网页被其他网页引用次数越多，那么本网页的PageRank则高。</p><p>PageRank的具体算法如下<br><img src="https://s1.ax1x.com/2018/07/16/PQIJR1.png" alt="pagerank"><br><img src="https://s1.ax1x.com/2018/07/16/PQIUsK.png" alt="pagerank"><br><img src="https://s1.ax1x.com/2018/07/16/PQILLT.png" alt="pagerank"></p><p>最后图中 q为所求的pagerank的解 q是矩阵G特征值为1的特征向量，求解可以通过随机初始化q<sup>cur</sup> 不断迭代最后收敛。</p>]]></content>
    
    <summary type="html">
    
      
      
        &lt;script type=&quot;text/javascript&quot; src=&quot;http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=default&quot;&gt;&lt;/script&gt;

&lt;h3 id=&quot;Hadoop起源&quot;&gt;&lt;a href=&quot;#
      
    
    </summary>
    
      <category term="Hadoop" scheme="https://spaces-x.github.io/categories/Hadoop/"/>
    
    
      <category term="Hadoop" scheme="https://spaces-x.github.io/tags/Hadoop/"/>
    
      <category term="Google,PageRank" scheme="https://spaces-x.github.io/tags/Google-PageRank/"/>
    
  </entry>
  
  <entry>
    <title>网络基础之TCP连接建立分析</title>
    <link href="https://spaces-x.github.io/2018/07/16/tcp/"/>
    <id>https://spaces-x.github.io/2018/07/16/tcp/</id>
    <published>2018-07-16T03:43:32.000Z</published>
    <updated>2018-08-21T05:43:45.496Z</updated>
    
    <content type="html"><![CDATA[<h3 id="问题描述："><a href="#问题描述：" class="headerlink" title="问题描述："></a>问题描述：</h3><p>在JAVA的client和server，使用socket通信，server使用NIO。</p><blockquote><p>1.间歇性得出现client向server建立连接三次握手已经完成，但server的selector没有响应到这连接。<br>2.出问题的时间点，会同时有很多连接出现这个问题。<br>3.selector没有销毁重建，一直用的都是一个。<br>4.程序刚启动的时候必会出现一些，之后会间歇性出现。</p></blockquote><h3 id="正常的TCP建立连接三次握手的过程："><a href="#正常的TCP建立连接三次握手的过程：" class="headerlink" title="正常的TCP建立连接三次握手的过程："></a>正常的TCP建立连接三次握手的过程：</h3><p><img src="https://s1.ax1x.com/2018/08/08/PsgpjK.png" alt="tcp三次握手"></p><ul><li>第一步：client 发送 syn 到server 发起握手；</li><li>第二步：server 收到 syn后回复syn+ack给client；</li><li>第三步：client 收到syn+ack后，回复server一个ack表示收到了server的syn+ack（此时client的56911端口的连接已经是established）</li></ul><p>从问题的描述来看，有点像TCP建连接的时候全连接队列（accept队列，后面具体讲）满了，尤其是症状2、4. 为了证明是这个原因，马上通过 netstat -s | egrep “listen” 去看队列的溢出统计数据：    </p><p><img src="https://s1.ax1x.com/2018/08/08/PsgPBD.png" alt=""></p><p>反复看了几次之后发现这个overflowed 数目一直在增加，那么可以明确的是server上全连接队列一定溢出了。<br>接着查看溢出后，OS怎么处理：</p><p><img src="https://s1.ax1x.com/2018/08/08/Psge3t.png" alt=""></p><p><strong>tcp_abort_on_overflow 为0表示如果三次握手第三步的时候全连接队列满了那么server扔掉client 发过来的ack（在server端认为连接还没建立起来）</strong></p><p>为了证明客户端应用代码的异常跟全连接队列满有关系，我先把tcp_abort_on_overflow修改成 1，1表示第三步的时候如果全连接队列满了，server发送一个reset包给client，表示废掉这个握手过程和这个连接（本来在server端这个连接就还没建立起来）。</p><p>接着测试，这时在客户端异常中可以看到很多connection reset by peer的错误，到此证明客户端错误是这个原因导致的（逻辑严谨、快速证明问题的关键点所在）。</p><p>于是开发同学翻看java 源代码发现socket 默认的backlog（这个值控制全连接队列的大小，后面再详述）是50，于是改大重新跑，经过12个小时以上的压测，这个错误一次都没出现了，同时观察到 overflowed 也不再增加了。</p><p>到此问题解决，简单来说TCP三次握手后有个accept队列，进到这个队列才能从Listen变成accept，默认backlog 值是50，很容易就满了。满了之后握手第三步的时候server就忽略了client发过来的ack包（隔一段时间server重发握手第二步的syn+ack包给client），如果这个连接一直排不上队就异常了。</p><blockquote><p>但是不能只是满足问题的解决，而是要去复盘解决过程，中间涉及到了哪些知识点是我所缺失或者理解不到位的；这个问题除了上面的异常信息表现出来之外，还有没有更明确地指征来查看和确认这个问题。</p></blockquote><h3 id="深入理解TCP握手过程中建连接的流程和队列"><a href="#深入理解TCP握手过程中建连接的流程和队列" class="headerlink" title="深入理解TCP握手过程中建连接的流程和队列"></a><font color="#FF7F50">深入理解TCP握手过程中建连接的流程和队列</font></h3><p><img src="https://s1.ax1x.com/2018/08/08/PsgKu8.jpg" alt=""><br>如上图所示，这里有两个队列：syns queue(半连接队列）；accept queue（全连接队列）。</p><p>三次握手中，在第一步server收到client的syn后，把这个连接信息放到半连接队列中，同时回复syn+ack给client（第二步）；</p><p>第三步的时候server收到client的ack，如果这时全连接队列没满，那么从半连接队列拿出这个连接的信息放入到全连接队列中，否则按tcp_abort_on_overflow指示的执行。</p><p>这时如果全连接队列满了并且tcp_abort_on_overflow是0的话，server过一段时间再次发送syn+ack给client（也就是重新走握手的第二步），如果client超时等待比较短，client就很容易异常了。</p><p>在我们的os中retry 第二步的默认次数是2（centos默认是5次）：<br><img src="https://s1.ax1x.com/2018/08/08/PsgdDU.png" alt=""></p><h3 id="如果TCP连接队列溢出，有哪些指标可以看呢？"><a href="#如果TCP连接队列溢出，有哪些指标可以看呢？" class="headerlink" title="如果TCP连接队列溢出，有哪些指标可以看呢？"></a><font color="#FF7F50">如果TCP连接队列溢出，有哪些指标可以看呢？</font></h3><p>上述解决过程有点绕，听起来懵，那么下次再出现类似问题有什么更快更明确的手段来确认这个问题呢？（通过具体的、感性的东西来强化我们对知识点的理解和吸收。）</p><h4 id="netstat-s"><a href="#netstat-s" class="headerlink" title="netstat -s"></a>netstat -s</h4><p><img src="https://s1.ax1x.com/2018/08/08/PsgwbF.jpg" alt=""><br>比如上面看到的 667399 times ，表示全连接队列溢出的次数，隔几秒钟执行下，如果这个数字一直在增加的话肯定全连接队列偶尔满了。</p><h4 id="ss-命令"><a href="#ss-命令" class="headerlink" title="ss 命令"></a>ss 命令</h4><p><img src="https://s1.ax1x.com/2018/08/08/PsgBE4.jpg" alt=""><br>上面看到的第二列Send-Q 值是50，表示第三列的listen端口上的全连接队列最大为50，第一列Recv-Q为全连接队列当前使用了多少。</p><p>全连接队列的大小取决于：min(backlog, somaxconn) . backlog是在socket创建的时候传入的，somaxconn是一个os级别的系统参数。</p><p>这个时候可以跟我们的代码建立联系了，比如Java创建ServerSocket的时候会让你传入backlog的值：<br><img src="https://s1.ax1x.com/2018/08/08/PsgbxP.jpg" alt=""></p><h4 id="半连接队列的大小取决于：max-64-proc-sys-net-ipv4-tcp-max-syn-backlog-，不同版本的os会有些差异。"><a href="#半连接队列的大小取决于：max-64-proc-sys-net-ipv4-tcp-max-syn-backlog-，不同版本的os会有些差异。" class="headerlink" title="半连接队列的大小取决于：max(64,  /proc/sys/net/ipv4/tcp_max_syn_backlog)，不同版本的os会有些差异。"></a>半连接队列的大小取决于：max(64,  /proc/sys/net/ipv4/tcp_max_syn_backlog)，不同版本的os会有些差异。</h4><blockquote><p>我们写代码的时候从来没有想过这个backlog或者说大多时候就没给他值（那么默认就是50），直接忽视了他，首先这是一个知识点的盲点；其次也许哪天你在哪篇文章中看到了这个参数，当时有点印象，但是过一阵子就忘了，这是知识之间没有建立连接，不是体系化的。但是如果你跟我一样首先经历了这个问题的痛苦，然后在压力和痛苦的驱动自己去找为什么，同时能够把为什么从代码层推理理解到OS层，那么这个知识点你才算是比较好地掌握了，也会成为你的知识体系在TCP或者性能方面成长自我生长的一个有力抓手。</p></blockquote><h4 id="netstat-命令"><a href="#netstat-命令" class="headerlink" title="netstat 命令"></a>netstat 命令</h4><p>netstat跟ss命令一样也能看到Send-Q、Recv-Q这些状态信息，不过如果这个连接不是Listen状态的话，Recv-Q就是指收到的数据还在缓存中，还没被进程读取，这个值就是还没被进程读取的 bytes；而 Send 则是发送队列中没有被远程主机确认的 bytes 数。</p><p><img src="https://s1.ax1x.com/2018/08/08/Psgcgx.jpg" alt=""><br>netstat -tn 看到的 Recv-Q 跟全连接半连接没有关系，这里特意拿出来说一下是因为容易跟 ss -lnt 的 Recv-Q 搞混淆，顺便建立知识体系，巩固相关知识点 。  </p><p>比如如下netstat -t 看到的Recv-Q有大量数据堆积，那么一般是CPU处理不过来导致的：<br><img src="https://s1.ax1x.com/2018/08/08/PsgfbD.jpg" alt=""><br>上面是通过一些具体的工具、指标来认识全连接队列（工程效率的手段）。  </p><h4 id="实践验证一下上面的理解"><a href="#实践验证一下上面的理解" class="headerlink" title="实践验证一下上面的理解"></a><font color="#FF7F50">实践验证一下上面的理解</font></h4><p>把java中backlog改成10（越小越容易溢出），继续跑压力，这个时候client又开始报异常了，然后在server上通过 ss 命令观察到：</p><p><img src="https://s1.ax1x.com/2018/08/08/Psg4Ve.jpg" alt=""><br>按照前面的理解，这个时候我们能看到3306这个端口上的服务全连接队列最大是10，但是现在有11个在队列中和等待进队列的，肯定有一个连接进不去队列要overflow掉，同时也确实能看到overflow的值在不断地增大。</p><h4 id="Tomcat和Nginx中的Accept队列参数"><a href="#Tomcat和Nginx中的Accept队列参数" class="headerlink" title="Tomcat和Nginx中的Accept队列参数"></a><font color="#FF7F50">Tomcat和Nginx中的Accept队列参数</font></h4><p>Tomcat默认短连接，backlog（Tomcat里面的术语是Accept count）Ali-tomcat默认是200, Apache Tomcat默认100。</p><p><img src="https://s1.ax1x.com/2018/08/08/Psg78I.jpg" alt=""><br>Nginx默认是511<br><img src="https://s1.ax1x.com/2018/08/08/PsgH2t.jpg" alt=""><br>因为Nginx是多进程模式，所以看到了多个8085，也就是多个进程都监听同一个端口以尽量避免上下文切换来提升性能   </p><h4 id="总结"><a href="#总结" class="headerlink" title="总结"></a><font color="#FF7F50">总结</font></h4><p>全连接队列、半连接队列溢出这种问题很容易被忽视，但是又很关键，特别是对于一些短连接应用（比如Nginx、PHP，当然他们也是支持长连接的）更容易爆发。 一旦溢出，从cpu、线程状态看起来都比较正常，但是压力上不去，在client看来rt也比较高（rt=网络+排队+真正服务时间），但是从server日志记录的真正服务时间来看rt又很短。</p><p>jdk、netty等一些框架默认backlog比较小，可能有些情况下导致性能上不去。</p><p>希望通过本文能够帮大家理解TCP连接过程中的半连接队列和全连接队列的概念、原理和作用，更关键的是有哪些指标可以明确看到这些问题（工程效率帮助强化对理论的理解）。</p><p>另外每个具体问题都是最好学习的机会，光看书理解肯定是不够深刻的，请珍惜每个具体问题，碰到后能够把来龙去脉弄清楚，每个问题都是你对具体知识点通关的好机会。</p><h4 id="参考文章"><a href="#参考文章" class="headerlink" title="参考文章:"></a>参考文章:</h4><blockquote><p><a href="http://www.cnxct.com/something-about-phpfpm-s-backlog/" target="_blank" rel="noopener">http://www.cnxct.com/something-about-phpfpm-s-backlog/</a><br><a href="http://veithen.github.io/2014/01/01/how-tcp-backlog-works-in-linux.html" target="_blank" rel="noopener">http://veithen.github.io/2014/01/01/how-tcp-backlog-works-in-linux.html</a><br><a href="http://www.cnblogs.com/zengkefu/p/5606696.html" target="_blank" rel="noopener">http://www.cnblogs.com/zengkefu/p/5606696.html</a><br><a href="http://www.cnxct.com/something-about-phpfpm-s-backlog/" target="_blank" rel="noopener">http://www.cnxct.com/something-about-phpfpm-s-backlog/</a><br><a href="http://jaseywang.me/2014/07/20/tcp-queue-%E7%9A%84%E4%B8%80%E4%BA%9B%E9%97%AE%E9%A2%98/" target="_blank" rel="noopener">http://jaseywang.me/2014/07/20/tcp-queue-%E7%9A%84%E4%B8%80%E4%BA%9B%E9%97%AE%E9%A2%98/</a><br><a href="http://jin-yang.github.io/blog/network-synack-queue.html#" target="_blank" rel="noopener">http://jin-yang.github.io/blog/network-synack-queue.html#</a><br><a href="http://blog.chinaunix.net/uid-20662820-id-4154399.html" target="_blank" rel="noopener">http://blog.chinaunix.net/uid-20662820-id-4154399.html</a></p></blockquote>]]></content>
    
    <summary type="html">
    
      
      
        &lt;h3 id=&quot;问题描述：&quot;&gt;&lt;a href=&quot;#问题描述：&quot; class=&quot;headerlink&quot; title=&quot;问题描述：&quot;&gt;&lt;/a&gt;问题描述：&lt;/h3&gt;&lt;p&gt;在JAVA的client和server，使用socket通信，server使用NIO。&lt;/p&gt;
&lt;blockquot
      
    
    </summary>
    
      <category term="计算机网络" scheme="https://spaces-x.github.io/categories/%E8%AE%A1%E7%AE%97%E6%9C%BA%E7%BD%91%E7%BB%9C/"/>
    
    
      <category term="TCP,Socket通信" scheme="https://spaces-x.github.io/tags/TCP-Socket%E9%80%9A%E4%BF%A1/"/>
    
      <category term="计算机网络" scheme="https://spaces-x.github.io/tags/%E8%AE%A1%E7%AE%97%E6%9C%BA%E7%BD%91%E7%BB%9C/"/>
    
  </entry>
  
  <entry>
    <title>Welcome</title>
    <link href="https://spaces-x.github.io/2018/07/15/welcome/"/>
    <id>https://spaces-x.github.io/2018/07/15/welcome/</id>
    <published>2018-07-15T11:37:35.000Z</published>
    <updated>2019-01-18T15:02:13.162Z</updated>
    
    <content type="html"><![CDATA[<p>欢迎来到Space-X，本空间是基于hexo搭建的静态博客空间，主要用于日常生活、学习的经验分享</p><h2 id="学习"><a href="#学习" class="headerlink" title="学习"></a>学习</h2><p>Learning is hard.</p><script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=default"></script><p>我的<a href="https://github.com/spaces-x" target="_blank" rel="noopener">GitHub</a></p><p><a href="https://www.coursera.org/" target="_blank" rel="noopener">Coursera</a> 公开课</p><h2 id="生活"><a href="#生活" class="headerlink" title="生活"></a>生活</h2><p>Life is simple.</p><p>生活中最悠闲莫非看我最爱的<a href="https://www.bilibili.com/" target="_blank" rel="noopener">哔哩哔哩</a><br>豆瓣<a href="https://book.douban.com/" target="_blank" rel="noopener">读书</a> <a href="https://movie.douban.com/" target="_blank" rel="noopener">电影</a></p><h3 id="日剧"><a href="#日剧" class="headerlink" title="日剧"></a>日剧</h3><p><img src="https://s1.ax1x.com/2018/07/26/PNlGut.jpg" alt="unnatural"></p><p><a href="https://www.bilibili.com/bangumi/play/ep204577" target="_blank" rel="noopener">Unnatural</a></p><p>讲述了在“非自然死亡原因研究所”任职的法医三澄美琴和同事们一起探查非正常死亡者的真正死因，从而帮助人们的故事，<br>其中也影射了社会显示出来的一些问题。个人得很好看。<br>每次Lemon这首歌一想起来，几乎都在泪目。也有许多喜欢的和值得思考的话。</p><iframe frameborder="no" border="0" marginwidth="0" marginheight="0" width="550" height="100" src="//music.163.com/outchain/player?type=2&id=536622304&auto=1&height=66"></iframe><h3 id="毕业"><a href="#毕业" class="headerlink" title="毕业"></a>毕业</h3><p><a href="https://www.bilibili.com/video/av25515914" target="_blank" rel="noopener">Forever 1413</a><br>非常幸运遇到你们~</p>]]></content>
    
    <summary type="html">
    
      
      
        &lt;p&gt;欢迎来到Space-X，本空间是基于hexo搭建的静态博客空间，主要用于日常生活、学习的经验分享&lt;/p&gt;
&lt;h2 id=&quot;学习&quot;&gt;&lt;a href=&quot;#学习&quot; class=&quot;headerlink&quot; title=&quot;学习&quot;&gt;&lt;/a&gt;学习&lt;/h2&gt;&lt;p&gt;Learning is ha
      
    
    </summary>
    
      <category term="welcome" scheme="https://spaces-x.github.io/categories/welcome/"/>
    
    
      <category term="Learning" scheme="https://spaces-x.github.io/tags/Learning/"/>
    
      <category term="Life" scheme="https://spaces-x.github.io/tags/Life/"/>
    
      <category term="hobbies" scheme="https://spaces-x.github.io/tags/hobbies/"/>
    
  </entry>
  
</feed>
