由于工作需求,之前在公司一直使用的是WebSocket对接大模型返回流式响应解析后返回数据,然后公司的负责自然语言处理的同事和我说使用POST请求也可以返回流式的模型回答数据。我使用Java程序进行对接后,使用PostMan进行调试时,发现返回的数据中文是乱码的。我一直找不到原因,期间又有其他工作就先把这个事情搁置了。闲余时间我又想到了这件事,于是抱着试一试的心态换了另外一种方法,终于解决了乱码的问题。在此向大家分享一下这个小Demo。
首先介绍一下需要向大模型传递的参数:
{
"chatId": "my_chatId",
"stream": true,
"detail": true,
"responseChatItemId": "my_responseChatItemId",
"variables": {
"uid": "唯一ID",
"name": "名称随意"
},
"messages": [
{
"role": "user",
"content": "问题内容"
} # 历史记录继续向下添加
]
}
1.连接大模型的客户端代码实现一(直接返回大模型原始消息)
public void callAIStreamOrg(String userMessage, SseEmitter emitter) {
OkHttpClient client = new OkHttpClient();
MediaType mediaType = MediaType.parse("application/json");
JSONObject requestBody = new JSONObject();
requestBody.put("chatId", "my_chatId");
requestBody.put("stream", true);
requestBody.put("detail", true);
requestBody.put("responseChatItemId", "my_responseChatItemId");
JSONObject variables = new JSONObject();
variables.put("uid", "edison_56465123843");
variables.put("name", "edison");
requestBody.put("variables", variables);
JSONArray messages = new JSONArray();
JSONObject message = new JSONObject();
message.put("role", "user");
message.put("content", userMessage);
messages.add(message);
requestBody.put("messages", messages);
RequestBody body = RequestBody.create(mediaType, requestBody.toJSONString());
Request request = new Request.Builder()
.url("http://IP:3000/api/v1/chat/completions") // 在这里替换为你自己的模型地址
.post(body)
.addHeader("Authorization", "Bearer xxxxxxxxxxxxxxxxxxx") // 在这里替换为你自己的Token
.addHeader("Content-Type", "application/json")
.build();
client.newCall(request).enqueue(new Callback() {
@Override
public void onFailure(Call call, IOException e) {
try {
emitter.completeWithError(e);
} catch (Exception ex) {
log.error("消息处理失败: {}", ex.getMessage());
}
}
@Override
public void onResponse(Call call, Response response) {
try (BufferedReader reader = new BufferedReader(new InputStreamReader(response.body().byteStream()))) {
String line;
while ((line = reader.readLine()) != null) {
if (line.startsWith("data: ")) {
String json = line.substring(6);
if ("[DONE]".equals(json)) break;
emitter.send(json);
}
}
emitter.complete();
} catch (Exception e) {
try {
emitter.completeWithError(e);
} catch (Exception ex) {
log.error("消息处理失败: {}", ex.getMessage());
}
}
}
});
}
private String createJsonMessage(String status, String content, String thinkContent){
JSONObject msg = new JSONObject();
msg.put("status", status);
msg.put("content", content);
msg.put("think_content", thinkContent);
return msg.toJSONString();
}
这是根据OpenAI的流式响应规则进行转发,在接收到的数据中为[DONE]时为接收结束。由于在这里只是一个Demo,所以我没有做历史消息在传参中构建,而是在该连接类中直接写死,有需要构建历史消息的小伙伴们需要自己处理一下。
返回原始消息示例:
{
"id": "",
"object": "",
"created": 0,
"model": "",
"choices": [
{
"delta": {
"role": "assistant",
"reasoning_content": "思考流式数据..."
},
"index": 0,
"finish_reason": null
}
]
}
{
"id": "",
"object": "",
"created": 0,
"model": "",
"choices": [
{
"delta": {
"role": "assistant",
"content": "回答流式数据..."
},
"index": 0,
"finish_reason": null # 在回答结束后会返回状态为stop
}
]
}
2.连接大模型的客户端代码实现二(对消息进行处理后再转发)
我们可以在收到消息后进行处理后再以自己的规则进行转发,比如我们可以自定义返回体,来区分目前流式返回的状态(开始、回答中、结束等)
public void callAIStream(String userMessage, SseEmitter emitter) {
OkHttpClient client = new OkHttpClient();
MediaType mediaType = MediaType.parse("application/json");
JSONObject requestBody = new JSONObject();
requestBody.put("chatId", "my_chatId");
requestBody.put("stream", true);
requestBody.put("detail", true);
requestBody.put("responseChatItemId", "my_responseChatItemId");
JSONObject variables = new JSONObject();
variables.put("uid", "edison_5641563123512635");
variables.put("name", "edison");
requestBody.put("variables", variables);
JSONArray messages = new JSONArray();
JSONObject message = new JSONObject();
message.put("role", "user");
message.put("content", userMessage);
messages.add(message);
requestBody.put("messages", messages);
RequestBody body = RequestBody.create(mediaType, requestBody.toJSONString());
Request request = new Request.Builder()
.url("http://localhost:8080/api/v1/chat/completions")
.post(body)
.addHeader("Authorization", "Bearer xxxxxxxxxxxxxxxxxxxxxxxx")
.addHeader("Content-Type", "application/json")
.build();
client.newCall(request).enqueue(new Callback() {
@Override
public void onFailure(@NotNull Call call, @NotNull IOException e) {
try {
emitter.completeWithError(e);
} catch (Exception ex) {
log.error("消息处理失败: {}", ex.getMessage());
}
}
@Override
public void onResponse(@NotNull Call call, @NotNull Response response) {
try (BufferedReader reader = new BufferedReader(new InputStreamReader(response.body().byteStream()))) {
StringBuilder contentBuffer = new StringBuilder(); // 主回答缓存
StringBuilder thinkBuffer = new StringBuilder(); // 主思考缓存
StringBuilder tempContentBuffer = new StringBuilder(); // 临时回答缓冲
StringBuilder tempThinkingBuffer = new StringBuilder(); // 临时思考缓冲
emitter.send(createJsonMessage("start", null, null));
String line;
while ((line = reader.readLine()) != null) {
if (line.startsWith("data: ")) {
String jsonStr = line.substring(6);
if ("[DONE]".equals(jsonStr)) break;
JSONObject json = JSONObject.parseObject(jsonStr);
JSONArray choices = json.getJSONArray("choices");
if (choices != null && !choices.isEmpty()) {
JSONObject delta = choices.getJSONObject(0).getJSONObject("delta");
if (delta != null) {
// 处理思考内容 reasoning_content
if (delta.containsKey("reasoning_content")) {
String thinking = delta.getString("reasoning_content");
tempThinkingBuffer.append(thinking);
if (shouldFlush(tempThinkingBuffer.toString())) {
thinkBuffer.append(tempThinkingBuffer);
emitter.send(createJsonMessage("thinking", contentBuffer.toString(), thinkBuffer.toString()));
tempThinkingBuffer.setLength(0);
}
}
// 处理正常回答 content
if (delta.containsKey("content")) {
String content = delta.getString("content");
tempContentBuffer.append(content);
if (shouldFlush(tempContentBuffer.toString())) {
contentBuffer.append(tempContentBuffer);
emitter.send(createJsonMessage("sending", contentBuffer.toString(), thinkBuffer.toString()));
tempContentBuffer.setLength(0);
}
}
}
}
}
}
// 补发剩余未发送内容
if (tempThinkingBuffer.length() > 0) {
thinkBuffer.append(tempThinkingBuffer);
emitter.send(createJsonMessage("thinking", contentBuffer.toString(), thinkBuffer.toString()));
}
if (tempContentBuffer.length() > 0) {
contentBuffer.append(tempContentBuffer);
emitter.send(createJsonMessage("sending", contentBuffer.toString(), thinkBuffer.toString()));
}
emitter.send(createJsonMessage("end", null, null));
emitter.complete();
} catch (Exception e) {
try {
emitter.completeWithError(e);
} catch (Exception ex) {
log.error("消息处理失败: {}", ex.getMessage());
}
}
}
});
}
/**
* 判断是否需要刷新
* @param text 文本
* @return 是否需要刷新
*/
private boolean shouldFlush(String text) {
// 中文标点符号,可以自行扩展
String punctuationRegex = "[。!?;]";
boolean hasPunctuation = Pattern.compile(punctuationRegex).matcher(text).find();
boolean lengthEnough = text.length() >= 15;
return hasPunctuation || lengthEnough;
}
private String createJsonMessage(String status, String content, String thinkContent){
JSONObject msg = new JSONObject();
msg.put("status", status);
msg.put("content", content);
msg.put("think_content", thinkContent);
return msg.toJSONString();
}
在这段代码中我对返回的数据进行了拼接处理,使用StringBuilder将返回的零散回答内容进行拼接,并且可以自己进行控制,当当前临时回答内容堆积到自己阈值的时候才进行返回,从而也可以避免网络请求的压力,这里对思考内容和回答内容都进行了处理。
3.Controller层代码实现
import com.edison.interaction.client.OpenAIStreamClient;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.servlet.mvc.method.annotation.SseEmitter;
@RestController
@RequestMapping("/ai")
public class AITestController {
private final OpenAIStreamClient aiClient = new OpenAIStreamClient();
@GetMapping("/stream")
public SseEmitter streamAI(@RequestParam("message") String message) {
SseEmitter emitter = new SseEmitter(0L); // 无限超时
aiClient.callAIStream(message, emitter);
return emitter;
}
@GetMapping("/streamOrg")
public SseEmitter streamAIOrg(@RequestParam("message") String message) {
SseEmitter emitter = new SseEmitter(0L); // 无限超时
aiClient.callAIStreamOrg(message, emitter);
return emitter;
}
}
这样就可以通过java 程序对大模型的进行转发和处理了,后续的其他需求如对话历史入库,模型回答来源追溯等功能就需要根据具体的业务要求去实现了。