用法
常用构造方法
private HttpClient newHttpClient() {
Registry<ConnectionSocketFactory> socketFactoryRegistry = RegistryBuilder
.<ConnectionSocketFactory>create()
.register("http", PlainConnectionSocketFactory.INSTANCE)
.register("https", Objects.nonNull(this.connectionSocketFactory) ? this.connectionSocketFactory : SSLConnectionSocketFactory.getSocketFactory()).build();
//设置连接池大小
PoolingHttpClientConnectionManager connManager = new PoolingHttpClientConnectionManager(socketFactoryRegistry);
// 全局最大连接数
connManager.setMaxTotal(this.maxConnPerTotal);
// 每个路由的最大连接数
connManager.setDefaultMaxPerRoute(this.maxConnPerRoute);
return HttpClients.custom().setKeepAliveStrategy(new DefaultDurationConnectionKeepAliveStrategy())
.setConnectionManager(connManager).build();
}
请求配置
protected RequestConfig buildRequestConfig() {
return RequestConfig.custom()
.setConnectTimeout(this.connectTimeout)
.setSocketTimeout(this.socketTimeout)
.setCookieSpec(CookieSpecs.STANDARD)
.build();
}
响应处理
注意:EntityUtils.toString处理会关闭并释连接
private final ResponseHandler<String> responseHandler = response -> {
int status = response.getStatusLine().getStatusCode();
HttpEntity entity;
if (status >= HttpStatus.SC_OK && status < HttpStatus.SC_MULTIPLE_CHOICES) {
entity = response.getEntity();
return entity != null ? EntityUtils.toString(entity) : null;
} else {
String errorMsg = null;
if ((entity = response.getEntity()) != null) {
errorMsg = EntityUtils.toString(entity);
}
throw new HttpResponseException(status, errorMsg);
}
};
模型
PoolEntry
包含连接池中的一个连接与它的路由
- route:路由,例如:org.apache.http.conn.routing.HttpRoute
- conn:连接池中的一个连接
- state:状态信息,可以在释放连接时指定状态信息,在租用时也可以指定状态信息,可以借助此字段来定制化连接复用策略
- created:创建时间戳
- updated:更新时间戳
- validityDeadline:PoolEntry的有效期截止日期时间戳
- expiry:有消息,同validityDeadline,区别在于该字段允许修改,validityDeadline不允许修改
RouteSpecificPool
每个路由的连接池,是连接池CPool的子集
属性
- available: 是一个链表,可复用的PoolEntry(包含连接池中的一个连接与它的路由),在leased租借释放时从leased转移而来
- leased:已租借集合,当RouteSpecificPool已分配数量小于maxPerRoute并且小于maxTotal,创建新连接,或者在available存在可复用连接时转移而来
- pending:当RouteSpecificPool已分配数量大于等于maxPerRoute并且大于等于maxTotal,将请求PoolEntry的future缓存至pending并等待有空闲资源时被唤醒。如果有超时时间,超时未拿到资源将抛出超时异常:throw new TimeoutException(“Timeout waiting for connection”)。注意:如果未设置超时时间,将永久等待
方法
- getFree:返回空闲资源,不为空则将LeaseRequest设置为已完成(即isDone=true)。并且将CPool的该资源由available转移至leased
CPool
连接池
属性
- maxTotal:全局最大可用资源数量
- available: RouteSpecificPool.available是其子集
- leased:RouteSpecificPool.leased是其子集
- pending:RouteSpecificPool.pending是其子集
- leasingRequests:租赁中的请求链表,当LeaseRequest请求未完成,并且processPendingRequest未完成时,将LeaseRequest添加至该链表
- completedRequests:已完成请求队列,当LeaseRequest请求完成时,将LeaseRequest添加至该队列
- totalUsed=maxTotal-pending-leased
方法
- fireCallbacks:处理completedRequests队列
- release:释放已租借资源
HttpResponseProxy
响应体代理对象,该对象会对普通响应体进行增强,增强后的响应体Entity为代理:ResponseEntityProxy,主要增强功能如下
- 实现了EofSensorWatcher接口,可以监听InputStream数据流,在关闭/中断/检测到EOF时同时释放连接
- 继承了HttpEntityWrapper类,重写了getContent方法,将原InputStream封装为EofSensorInputStream
public HttpResponseProxy(final HttpResponse original, final ConnectionHolder connHolder) {
this.original = original;
this.connHolder = connHolder;
// 增强响应体
ResponseEntityProxy.enchance(original, connHolder);
}
public static void enchance(final HttpResponse response, final ConnectionHolder connHolder) {
final HttpEntity entity = response.getEntity();
if (entity != null && entity.isStreaming() && connHolder != null) {
response.setEntity(new ResponseEntityProxy(entity, connHolder));
}
}
流程
连接池-获取连接
为什么是totalAvailable > freeCapacity - 1才释放available连接?
因为逻辑走到此处,紧接着会创建一个新的连接,也就是说freeCapacity的数量会减少1。如果不减1的话则会出现连接数溢出1的场景(每个路由会溢出1,路由越多溢出越多)
请求服务端
- 如果开启了staleConnectionCheckEnabled校验,则检查连接是否已stale,是则关闭
- 如果HttpExecutionAware不为空则回调:org.apache.http.client.methods.HttpExecutionAware#setCancellable
- 如果HttpExecutionAware不为空,检查是否已aborted连接,是则抛出异常:throw new RequestAbortedException(“Request aborted”);
- 如果连接未打开,建立目标路由:org.apache.http.impl.execchain.MainClientExec#establishRoute
- 设置socketTimeout超时时间
- 再次检查是否已aborted连接
- 处理认证:org.apache.http.impl.auth.HttpAuthenticator#generateAuthResponse
- 执行请求:org.apache.http.protocol.HttpRequestExecutor
- 捕获异常:IOException/HttpException/RuntimeException,存在则关闭连接
- 执行连接复用策略:org.apache.http.ConnectionReuseStrategy#keepAlive
- 如果需要认证:org.apache.http.impl.execchain.MainClientExec#needAuthentication
- 如果可以复用,则消费响应体后关闭流
- 否则:关闭连接
- 处理userToken
- 如果响应体HttpEntity为null,或者响应体不是流式响应体org.apache.http.HttpEntity#isStreaming
- 释放连接,并返回响应体代理(不包含连接connHolder):org.apache.http.impl.execchain.HttpResponseProxy#HttpResponseProxy
- 否则返回响应体代理(包含连接connHolder):org.apache.http.impl.execchain.HttpResponseProxy#HttpResponseProxy
- 捕获异常
- HttpException/IOException/RuntimeException/Error:释放连接
- ConnectionShutdownException:不需要释放连接
释放连接
httpclient释放连接场景的总结
- 执行请求发生以下异常时,会释放连接
- 执行请求期间:IOException/HttpException/RuntimeException
- 请求+响应整个过程期间:HttpException/IOException/RuntimeException/Error
- 请求完成后,响应体HttpEntity为空时,会释放连接
- 请求完成后,响应体不为空但HttpEntity不是流式时,会释放连接
- 请求完成后消费完HttpEntity时,会释放连接,例如:EntityUtils.toString(entity)
注意:请求完成后,响应体不为空,且HttpEntity为流式时,不会自动释放连接,需要客户端自行释放连接
生产连接泄漏案例
tomcat:tomcat-embed-core-8.5.72.jar
客户端代码
长轮询线程
new KafkaThread("KM-LONG-POLLING", new LongPollingTask(), true).start();
class LongPollingTask implements Runnable {
@Override
public void run() {
while (true) {
try {
int httpResponseCode = RestClient.longPollingStatus(KM_CONFIG_LISTENER_URL);
if (HttpStatus.SC_CREATED == httpResponseCode) {
CONFIG.topicsAndClusterInfo = loadFromKM();
CONFIG.version.getAndIncrement();
log.info("KM Config Has Change:{} ", CONFIG.topicsAndClusterInfo);
}
log.debug("KM long polling .................");
} catch (Exception e) {
log.error("KM long polling failed , ", e);
}
}
}
}
请求服务端获取数据
public static int longPollingStatus(String uri) {
try {
HttpResponse httpResponse = LONG_REST_CLIENT.get(uri, HttpResponse.class);
return httpResponse.getStatusLine().getStatusCode();
} catch (Exception e) {
}
return HttpStatus.SC_BAD_GATEWAY;
}
HttpClient工具类
- socketTimeout:60S
- 连接池大小maxPerRoute:2
public <T> T get(String uri, Class<T> clazz, Map<String, ?> params, Map<String, String> headers) throws RestClientException {
String aUri = buildUrl(uri, params);
log.debug("Start GET uri[{}] with params[{}] header[{}]", uri, params, headers);
String body = null;
try {
HttpGet httpGet = buildHttpGet(aUri, headers);
log.debug("Done GET uri[{}] with params[{}] header[{}], and response is {}", uri, params, headers, body);
if (clazz == HttpResponse.class) {
return (T) httpClient.execute(httpGet);
}
body = executeRequest(httpGet);
if (clazz == String.class) {
return (T) body;
}
if (body != null) {
return JacksonUtil.from(body, clazz);
}
} catch (Exception e) {
log.warn("Occur error when GET uri[{}] with params[{}] headers[{}], response is {}, error: {}", uri, params, headers, body, e);
throw new RestClientException(e.getMessage(), e);
}
return null;
}
服务端代码
controller
@RequestMapping("/listener")
public void fetchTopicAndClusterInfo(@RequestParam String appName,
HttpServletRequest request,
HttpServletResponse response) {
topicAndClusterInfoManager.addAsyncRequest(renameAppName(appName), request, response);
}
异步请求
public void addAsyncRequest(String appName, HttpServletRequest request, HttpServletResponse response) {
AsyncContext asyncContext = request.startAsync(request, response);
AsyncTask asyncTask = new AsyncTask(asyncContext, true);
APPID_CONTEXT.put(appName, asyncTask);
// 30s 后写入 304 响应
TIMEOUT_CHECKER.schedule(() -> {
if (asyncTask.isTimeout()) {
try {
APPID_CONTEXT.remove(appName, asyncTask);
response.setStatus(HttpServletResponse.SC_NOT_MODIFIED);
asyncContext.complete();
} catch (Exception e) {
}
}
}, 30, TimeUnit.SECONDS);
}
案例为什么会发生连接泄漏?
案例并非一定会泄漏,生产9台服务,有3台出现了泄漏
案发现场
"KM-LONG-POLLING" #90 daemon prio=5 os_prio=0 cpu=7591.61ms elapsed=3429768.95s tid=0x00007f52e0a86d30 nid=0x11f waiting on condition [0x00007f52ce2c8000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@11.0.10/Native Method)
- parking to wait for <0x00001000bead9b18> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(java.base@11.0.10/LockSupport.java:194)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.10/AbstractQueuedSynchronizer.java:2081)
at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:379)
at org.apache.http.pool.AbstractConnPool.access$200(AbstractConnPool.java:69)
at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:245)
- locked <0x00001000de891e20> (a org.apache.http.pool.AbstractConnPool$2)
at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:193)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:304)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:280)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:190)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at ....framework.net.RestClient.get(RestClient.java:315)
at ....framework.net.RestClient.get(RestClient.java:161)
at ....framework.net.RestClient.get(RestClient.java:154)
at ....kafka.core.util.RestClient.longPollingStatus(RestClient.java:31)
at ....kafka.core.metadata.KafkaMetaDataManager$LongPollingTask.run(KafkaMetaDataManager.java:171)
at java.lang.Thread.run(java.base@11.0.10/Thread.java:829)
Locked ownable synchronizers:
- None
3台机器均处于等待获取连接中,线程为死循环,不应该是waiting状态,并且单线程任务,连接池大小为2,不应该出现等待连接场景。问题基本定位是连接泄漏导致,为什么呢?
客户端超时?
- 客户端超时时间为60S
- 服务端定时任务为30S,30S左右一定会响应客户端状态码304
- 客户端不会超时,排除
客户端没有主动关闭连接?
- 正常响应没有HttpEntity,会自动关闭连接,排除
- 客户端无论是发生超时异常还是正常获取到数据,httpclient会自动关闭连接,排除
- 本地尝试复现,并且对请求抓包,看到服务端果然没有正常响应请求,而是响应了500错误码,并且符合不自动释放连接的场景
问题定位,因为服务端响应5xx错误时会触发连接泄漏。
为什么服务端会响应5xx错误呢?
查看服务端异步请求代码,发现了猫腻-_-!代码如下:org.apache.catalina.connector.Request#startAsync(javax.servlet.ServletRequest, javax.servlet.ServletResponse)
@Override
public AsyncContext startAsync(ServletRequest request,
ServletResponse response) {
if (!isAsyncSupported()) {
IllegalStateException ise =
new IllegalStateException(sm.getString("request.asyncNotSupported"));
log.warn(sm.getString("coyoteRequest.noAsync",
StringUtils.join(getNonAsyncClassNames())), ise);
throw ise;
}
if (asyncContext == null) {
asyncContext = new AsyncContextImpl(this);
}
asyncContext.setStarted(getContext(), request, response,
request==getRequest() && response==getResponse().getResponse());
asyncContext.setTimeout(getConnector().getAsyncTimeout());
return asyncContext;
}
没错,tomcat的异步请求实现是有超时时间,而且超时时间恰好是:30S
总结
连接泄漏的原因小结
- 客户端没有主动释放连接
- 服务端长轮询定时任务刚好是30S与Tomcat的异步请求超时时间30S吻合
- 在数据没有准备好时,服务端定时任务(ScheduledThreadPoolExecutor)30S到时调度任务期间,个别任务还未被调度唤醒时,Tomcat异步请求超时,提前响应了客户端,客户端收到异常请求时没有主动关闭连接,导致连接发生泄漏
解决方法
- 主动释放连接
- 客户端超时时间设置低于30S
- 超时监控:起一个定时任务,定期检查连接是否已超时,超时则主动释放
推荐方案3,类似实现可以参考:tomcat-jdbc连接池,其中就有超时驱赶机制