Java POST请求对接大模型接收流式响应处理

最新推荐文章于 2025-07-13 15:16:25 发布

原创最新推荐文章于 2025-07-13 15:16:25 发布 · 873 阅读

CC 4.0 BY-SA版权

文章标签：

由于工作需求，之前在公司一直使用的是WebSocket对接大模型返回流式响应解析后返回数据，然后公司的负责自然语言处理的同事和我说使用POST请求也可以返回流式的模型回答数据。我使用Java程序进行对接后，使用PostMan进行调试时，发现返回的数据中文是乱码的。我一直找不到原因，期间又有其他工作就先把这个事情搁置了。闲余时间我又想到了这件事，于是抱着试一试的心态换了另外一种方法，终于解决了乱码的问题。在此向大家分享一下这个小Demo。

首先介绍一下需要向大模型传递的参数：
{
    "chatId": "my_chatId",
    "stream": true,
    "detail": true,
    "responseChatItemId": "my_responseChatItemId",
    "variables": {
        "uid": "唯一ID",
        "name": "名称随意"
    },
    "messages": [
        {
            "role": "user",
            "content": "问题内容"
        } # 历史记录继续向下添加
    ]
}

1.连接大模型的客户端代码实现一(直接返回大模型原始消息)

public void callAIStreamOrg(String userMessage, SseEmitter emitter) {
        OkHttpClient client = new OkHttpClient();

        MediaType mediaType = MediaType.parse("application/json");
        JSONObject requestBody = new JSONObject();

        requestBody.put("chatId", "my_chatId");
        requestBody.put("stream", true);
        requestBody.put("detail", true);
        requestBody.put("responseChatItemId", "my_responseChatItemId");

        JSONObject variables = new JSONObject();
        variables.put("uid", "edison_56465123843");
        variables.put("name", "edison");
        requestBody.put("variables", variables);

        JSONArray messages = new JSONArray();
        JSONObject message = new JSONObject();
        message.put("role", "user");
        message.put("content", userMessage);
        messages.add(message);
        requestBody.put("messages", messages);

        RequestBody body = RequestBody.create(mediaType, requestBody.toJSONString());
        Request request = new Request.Builder()
                .url("http://IP:3000/api/v1/chat/completions")  // 在这里替换为你自己的模型地址
                .post(body)
                .addHeader("Authorization", "Bearer xxxxxxxxxxxxxxxxxxx")  // 在这里替换为你自己的Token
                .addHeader("Content-Type", "application/json")
                .build();

        client.newCall(request).enqueue(new Callback() {
            @Override
            public void onFailure(Call call, IOException e) {
                try {
                    emitter.completeWithError(e);
                } catch (Exception ex) {
                    log.error("消息处理失败: {}", ex.getMessage());
                }
            }

            @Override
            public void onResponse(Call call, Response response) {
                try (BufferedReader reader = new BufferedReader(new InputStreamReader(response.body().byteStream()))) {
                    String line;
                    while ((line = reader.readLine()) != null) {
                        if (line.startsWith("data: ")) {
                            String json = line.substring(6);
                            if ("[DONE]".equals(json)) break;
                            emitter.send(json);
                        }
                    }
                    emitter.complete();
                } catch (Exception e) {
                    try {
                        emitter.completeWithError(e);
                    } catch (Exception ex) {
                        log.error("消息处理失败: {}", ex.getMessage());
                    }
                }
            }
        });
    }


private String createJsonMessage(String status, String content, String thinkContent){
        JSONObject msg = new JSONObject();
        msg.put("status", status);
        msg.put("content", content);
        msg.put("think_content", thinkContent);
        return msg.toJSONString();
}

这是根据OpenAI的流式响应规则进行转发，在接收到的数据中为[DONE]时为接收结束。由于在这里只是一个Demo，所以我没有做历史消息在传参中构建，而是在该连接类中直接写死，有需要构建历史消息的小伙伴们需要自己处理一下。

返回原始消息示例：

{
    "id": "",
    "object": "",
    "created": 0,
    "model": "",
    "choices": [
        {
            "delta": {
                "role": "assistant",
                "reasoning_content": "思考流式数据..."
            },
            "index": 0,
            "finish_reason": null
        }
    ]
}



{
    "id": "",
    "object": "",
    "created": 0,
    "model": "",
    "choices": [
        {
            "delta": {
                "role": "assistant",
                "content": "回答流式数据..."
            },
            "index": 0,
            "finish_reason": null  # 在回答结束后会返回状态为stop
        }
    ]
}

2.连接大模型的客户端代码实现二(对消息进行处理后再转发)

我们可以在收到消息后进行处理后再以自己的规则进行转发，比如我们可以自定义返回体，来区分目前流式返回的状态（开始、回答中、结束等）

public void callAIStream(String userMessage, SseEmitter emitter) {
        OkHttpClient client = new OkHttpClient();

        MediaType mediaType = MediaType.parse("application/json");
        JSONObject requestBody = new JSONObject();

        requestBody.put("chatId", "my_chatId");
        requestBody.put("stream", true);
        requestBody.put("detail", true);
        requestBody.put("responseChatItemId", "my_responseChatItemId");

        JSONObject variables = new JSONObject();
        variables.put("uid", "edison_5641563123512635");
        variables.put("name", "edison");
        requestBody.put("variables", variables);

        JSONArray messages = new JSONArray();
        JSONObject message = new JSONObject();
        message.put("role", "user");
        message.put("content", userMessage);
        messages.add(message);
        requestBody.put("messages", messages);

        RequestBody body = RequestBody.create(mediaType, requestBody.toJSONString());
        Request request = new Request.Builder()
                .url("http://localhost:8080/api/v1/chat/completions")
                .post(body)
                .addHeader("Authorization", "Bearer xxxxxxxxxxxxxxxxxxxxxxxx")
                .addHeader("Content-Type", "application/json")
                .build();

        client.newCall(request).enqueue(new Callback() {
            @Override
            public void onFailure(@NotNull Call call, @NotNull IOException e) {
                try {
                    emitter.completeWithError(e);
                } catch (Exception ex) {
                    log.error("消息处理失败: {}", ex.getMessage());
                }
            }

            @Override
            public void onResponse(@NotNull Call call, @NotNull Response response) {
                try (BufferedReader reader = new BufferedReader(new InputStreamReader(response.body().byteStream()))) {
                    StringBuilder contentBuffer = new StringBuilder();           // 主回答缓存
                    StringBuilder thinkBuffer = new StringBuilder();             // 主思考缓存
                    StringBuilder tempContentBuffer = new StringBuilder();       // 临时回答缓冲
                    StringBuilder tempThinkingBuffer = new StringBuilder();      // 临时思考缓冲

                    emitter.send(createJsonMessage("start", null, null));

                    String line;
                    while ((line = reader.readLine()) != null) {
                        if (line.startsWith("data: ")) {
                            String jsonStr = line.substring(6);
                            if ("[DONE]".equals(jsonStr)) break;

                            JSONObject json = JSONObject.parseObject(jsonStr);
                            JSONArray choices = json.getJSONArray("choices");
                            if (choices != null && !choices.isEmpty()) {
                                JSONObject delta = choices.getJSONObject(0).getJSONObject("delta");
                                if (delta != null) {
                                    // 处理思考内容 reasoning_content
                                    if (delta.containsKey("reasoning_content")) {
                                        String thinking = delta.getString("reasoning_content");
                                        tempThinkingBuffer.append(thinking);
                                        if (shouldFlush(tempThinkingBuffer.toString())) {
                                            thinkBuffer.append(tempThinkingBuffer);
                                            emitter.send(createJsonMessage("thinking", contentBuffer.toString(), thinkBuffer.toString()));
                                            tempThinkingBuffer.setLength(0);
                                        }
                                    }

                                    // 处理正常回答 content
                                    if (delta.containsKey("content")) {
                                        String content = delta.getString("content");
                                        tempContentBuffer.append(content);
                                        if (shouldFlush(tempContentBuffer.toString())) {
                                            contentBuffer.append(tempContentBuffer);
                                            emitter.send(createJsonMessage("sending", contentBuffer.toString(), thinkBuffer.toString()));
                                            tempContentBuffer.setLength(0);
                                        }
                                    }
                                }
                            }
                        }
                    }

                    // 补发剩余未发送内容
                    if (tempThinkingBuffer.length() > 0) {
                        thinkBuffer.append(tempThinkingBuffer);
                        emitter.send(createJsonMessage("thinking", contentBuffer.toString(), thinkBuffer.toString()));
                    }

                    if (tempContentBuffer.length() > 0) {
                        contentBuffer.append(tempContentBuffer);
                        emitter.send(createJsonMessage("sending", contentBuffer.toString(), thinkBuffer.toString()));
                    }

                    emitter.send(createJsonMessage("end", null, null));
                    emitter.complete();
                } catch (Exception e) {
                    try {
                        emitter.completeWithError(e);
                    } catch (Exception ex) {
                        log.error("消息处理失败: {}", ex.getMessage());
                    }
                }
            }




        });
    }



    /**
     * 判断是否需要刷新
     * @param text 文本
     * @return 是否需要刷新
     */
private boolean shouldFlush(String text) {
        // 中文标点符号，可以自行扩展
        String punctuationRegex = "[。！？；]";
        boolean hasPunctuation = Pattern.compile(punctuationRegex).matcher(text).find();
        boolean lengthEnough = text.length() >= 15;

        return hasPunctuation || lengthEnough;
}

private String createJsonMessage(String status, String content, String thinkContent){
        JSONObject msg = new JSONObject();
        msg.put("status", status);
        msg.put("content", content);
        msg.put("think_content", thinkContent);
        return msg.toJSONString();
}

在这段代码中我对返回的数据进行了拼接处理，使用StringBuilder将返回的零散回答内容进行拼接，并且可以自己进行控制，当当前临时回答内容堆积到自己阈值的时候才进行返回，从而也可以避免网络请求的压力，这里对思考内容和回答内容都进行了处理。

3.Controller层代码实现

import com.edison.interaction.client.OpenAIStreamClient;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.servlet.mvc.method.annotation.SseEmitter;

@RestController
@RequestMapping("/ai")
public class AITestController {

    private final OpenAIStreamClient aiClient = new OpenAIStreamClient();

    @GetMapping("/stream")
    public SseEmitter streamAI(@RequestParam("message") String message) {
        SseEmitter emitter = new SseEmitter(0L); // 无限超时
        aiClient.callAIStream(message, emitter);
        return emitter;
    }

    @GetMapping("/streamOrg")
    public SseEmitter streamAIOrg(@RequestParam("message") String message) {
        SseEmitter emitter = new SseEmitter(0L); // 无限超时
        aiClient.callAIStreamOrg(message, emitter);
        return emitter;
    }
}

这样就可以通过java 程序对大模型的进行转发和处理了，后续的其他需求如对话历史入库，模型回答来源追溯等功能就需要根据具体的业务要求去实现了。