2. Flink源码解析—Flink编译

Flink 源码系列

Flink 源码编译

  1. Flink 源码环境搞定后,开始编译 Flink 源码

  2. cd 到 Flink 源码路径,执行 maven 命令。

    可以提前添加 阿里云的maven镜像。在 maven安装路径的 ./conf/settings.xml 的 mirrors 节点下添加。

        <mirror>
          <id>aliyunmaven</id>
          <mirrorOf>central</mirrorOf>
          <name>阿里云 Maven 镜像</name>
          <url>https://maven.aliyun.com/repository/public</url>
        </mirror>

    执行以下命令:

    mvn clean install -DskipTests -Dfast
  3. 在 build-target 路径下获取源码编译后的jar包。这个路径和官网的 jar包路径一致。

    root@admin-1:~/flink# cd build-target
    root@admin-1:~/flink/build-target# ll
    total 52
    drwxr-xr-x  9 root root  4096 May 28 00:53 ./
    drwxr-xr-x  3 root root  4096 May 28 00:53 ../
    -rw-r--r--  1 root root 11357 May 27 23:12 LICENSE
    -rw-r--r--  1 root root  1309 May 27 23:12 README.txt
    drwxr-xr-x  2 root root  4096 May 29 20:50 bin/
    drwxr-xr-x  2 root root  4096 May 29 20:32 conf/
    drwxr-xr-x  7 root root  4096 May 28 00:53 examples/
    drwxr-xr-x  2 root root  4096 May 28 00:53 lib/
    drwxr-xr-x  2 root root  4096 May 29 20:49 log/
    drwxr-xr-x  3 root root  4096 May 28 00:53 opt/
    drwxr-xr-x 10 root root  4096 May 28 00:53 plugins/

Flink集群启动入口

  1. 整体参考 Flink 的 standalone 模式进行源码编译调试。

  2. Flink 的 standalone 整体分为 jobManager 和 taskManager。对应的启动脚本分别是:./bin/jobmanager.sh 和 ./bin/taskmanager.sh

  3. 流程上首先启动 jobmanager.sh,然后启动 taskManager。

  4. 首先查看 Jobmanager 的启动脚本

jobmanager.sh 启动脚本解析

  1. 第一行内容参考下方代码。上面讲述了 JobManager 的启动参数。因为是编译运行 Flink源码环境,而非部署。因此 启动参数是 start-foreground ,前端模式。

    USAGE="Usage: jobmanager.sh ((start|start-foreground) [host] [webui-port])|stop|stop-all"
  2. start-foreground 对应的 shell 代码。根据 if 逻辑,查看 flink-console.sh。${FLINK_BIN_DIR} 参数是在 config.sh 里引入的。config.sh 是通过 start-cluster.sh 引入的。这里就是 flink 的编译产出路径。

    ENTRYPOINT=standalonesession
    ​
    if [[ $STARTSTOP == "start-foreground" ]]; then
        exec "${FLINK_BIN_DIR}"/flink-console.sh $ENTRYPOINT "${args[@]}"
    else
        "${FLINK_BIN_DIR}"/flink-daemon.sh $STARTSTOP $ENTRYPOINT "${args[@]}"
    fi
  3. 查看flink-console.sh 脚本代码。直接查找 standalonesession 相关的代码。获取 CLASS_TO_RUN 的参数。最后一行是执行代码,这里面脚本参数都是 各种解析。

    USAGE="Usage: flink-console.sh (taskexecutor|zookeeper|historyserver|standalonesession|standalonejob|kubernetes-session|kubernetes-application|kubernetes-taskmanager) [args]"
        (standalonesession)
            CLASS_TO_RUN=org.apache.flink.runtime.entrypoint.StandaloneSessionClusterEntrypoint
        ;;
    echo "$JAVA_RUN" $JVM_ARGS ${FLINK_ENV_JAVA_OPTS} "${log_setting[@]}" -classpath "`manglePathList "$FLINK_TM_CLASSPATH:$INTERNAL_HADOOP_CLASSPATHS"`" ${CLASS_TO_RUN} "${ARGS[@]}"
    ​
    #exec "$JAVA_RUN" $JVM_ARGS ${FLINK_ENV_JAVA_OPTS} "${log_setting[@]}" -classpath "`manglePathList "$FLINK_TM_CLASSPATH:$INTERNAL_HADOOP_CLASSPATHS"`" ${CLASS_TO_RUN} "${ARGS[@]}"
  4. 为了快速获得 jobManager 启动参数,可以将 最后一行注释。换成 echo 进行打印。执行 jobmanager.sh start-foreground。获取以下参数

    java -Xmx1073741824 -Xms1073741824 -XX:MaxMetaspaceSize=268435456 -Dlog.file=/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/log/flink-root-standalonesession-0-admin-1.log -Dlog4j.configuration=file:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/conf/log4j-console.properties -Dlog4j.configurationFile=file:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/conf/log4j-console.properties -Dlogback.configurationFile=file:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/conf/logback-console.xml -classpath /root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-cep-1.15.0.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-connector-files-1.15.0.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-csv-1.15.0.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-json-1.15.0.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-scala_2.12-1.15.0.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-shaded-zookeeper-3.5.9.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-table-api-java-uber-1.15.0.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-table-planner-loader-1.15.0.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-table-runtime-1.15.0.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/log4j-1.2-api-2.17.1.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/log4j-api-2.17.1.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/log4j-core-2.17.1.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/log4j-slf4j-impl-2.17.1.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-dist-1.15.0.jar::: org.apache.flink.runtime.entrypoint.StandaloneSessionClusterEntrypoint --configDir /root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/conf --executionMode cluster -D jobmanager.memory.off-heap.size=134217728b -D jobmanager.memory.jvm-overhead.min=201326592b -D jobmanager.memory.jvm-metaspace.size=268435456b -D jobmanager.memory.heap.size=1073741824b -D jobmanager.memory.jvm-overhead.max=201326592b

jobmanager 启动参数解析

-Xmx1073741824:设置 JVM 最大堆内存为 1073741824 字节
-Xms1073741824:设置 JVM 初始堆内存也是 1 GB
-XX:MaxMetaspaceSize=268435456:设置 元空间(Metaspace)最大为 256 MB 
-Dlog.file=/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/log/flink-root-standalonesession-0-admin-1.log:flink的jobmanager的输出日志路径
-Dlog4j.configuration=file:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/conf/log4j-console.properties:flink的日志配置脚本
-classpath:脚本启动环境
org.apache.flink.runtime.entrypoint.StandaloneSessionClusterEntrypoint:启动类入口
--configDir /root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/conf:配置路径参考
--executionMode cluster:集群模式
-D jobmanager.memory.off-heap.size=134217728b:为 JobManager 配置 堆外内存(off-heap) 128 MB启动的各类参数
-D jobmanager.memory.jvm-overhead.min=201326592b
-D jobmanager.memory.jvm-overhead.max=201326592b:为 JVM 保留 额外开销空间(JVM Overhead):192 MB(固定上下限)
-D jobmanager.memory.jvm-metaspace.size=268435456b:为 JVM 的 Metaspace(类元空间) 预留 256 MB
-D jobmanager.memory.heap.size=1073741824b:设置 JobManager JVM 堆内存为 1 GB

设置 jobmanager 的 idea 启动参数

  1. 双击shift,查找 StandaloneSessionClusterEntrypoin 启动类

  2. 右键点击 modify Run Configuration 按钮。

    1. vm框填入:

      -Xmx1073741824 -Xms1073741824 -XX:MaxMetaspaceSize=268435456 -Dlog.file=/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/log/flink-root-standalonesession-0-admin-1.log -Dlog4j.configuration=file:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/conf/log4j-console.properties -Dlog4j.configurationFile=file:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/conf/log4j-console.properties -Dlogback.configurationFile=file:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/conf/logback-console.xml -classpath /root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-cep-1.15.0.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-connector-files-1.15.0.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-csv-1.15.0.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-json-1.15.0.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-scala_2.12-1.15.0.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-shaded-zookeeper-3.5.9.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-table-api-java-uber-1.15.0.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-table-planner-loader-1.15.0.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-table-runtime-1.15.0.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/log4j-1.2-api-2.17.1.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/log4j-api-2.17.1.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/log4j-core-2.17.1.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/log4j-slf4j-impl-2.17.1.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-dist-1.15.0.jar
    2. CLI ARGUMENTS 框填入

      --configDir /root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/conf --executionMode cluster -D jobmanager.memory.off-heap.size=134217728b -D jobmanager.memory.jvm-overhead.min=201326592b -D jobmanager.memory.jvm-metaspace.size=268435456b -D jobmanager.memory.heap.size=1073741824b -D jobmanager.memory.jvm-overhead.max=201326592b
  3. 点击 run。正常就能运行,并在控制台打印日志。根据控制台日志,能够访问 jobManager 网址。

  4. 同样的顺序,来解析 taskmanager 的启动参数

taskManager.sh 启动脚本解析

  1. 执行 taskmanager.sh start-foreground 获取参数

    java -XX:+UseG1GC -Xmx536870902 -Xms536870902 -XX:MaxDirectMemorySize=268435458 -XX:MaxMetaspaceSize=268435456 -Dlog.file=/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/log/flink-root-taskexecutor-0-admin-1.log -Dlog4j.configuration=file:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/conf/log4j-console.properties -Dlog4j.configurationFile=file:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/conf/log4j-console.properties -Dlogback.configurationFile=file:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/conf/logback-console.xml -classpath /root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-cep-1.15.0.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-connector-files-1.15.0.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-csv-1.15.0.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-json-1.15.0.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-scala_2.12-1.15.0.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-shaded-zookeeper-3.5.9.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-table-api-java-uber-1.15.0.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-table-planner-loader-1.15.0.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-table-runtime-1.15.0.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/log4j-1.2-api-2.17.1.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/log4j-api-2.17.1.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/log4j-core-2.17.1.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/log4j-slf4j-impl-2.17.1.jar:/root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/lib/flink-dist-1.15.0.jar::: org.apache.flink.runtime.taskexecutor.TaskManagerRunner --configDir /root/flink/flink-dist/target/flink-1.15.0-bin/flink-1.15.0/conf -D taskmanager.memory.network.min=134217730b -D taskmanager.cpu.cores=1.0 -D taskmanager.memory.task.off-heap.size=0b -D taskmanager.memory.jvm-metaspace.size=268435456b -D external-resources=none -D taskmanager.memory.jvm-overhead.min=201326592b -D taskmanager.memory.framework.off-heap.size=134217728b -D taskmanager.memory.network.max=134217730b -D taskmanager.memory.framework.heap.size=134217728b -D taskmanager.memory.managed.size=536870920b -D taskmanager.memory.task.heap.size=402653174b -D taskmanager.numberOfTaskSlots=1 -D taskmanager.memory.jvm-overhead.max=201326592b
  2. 大部分参数类似,不做介绍

    -XX:MaxDirectMemorySize=268435458:是一个 JVM 参数,用于控制 直接内存(Direct Memory)的最大大小,也叫 堆外内存限制。
    
    -Dtaskmanager.memory.network.min=134217730b
    限制网络缓冲区的最小内存(约 128MB),用于 shuffle 和传输 buffer。
    
    -Dtaskmanager.cpu.cores=1.0
    声明 TaskManager 使用的 CPU 核数(用于资源调度,不做硬性限制)。
    
    -Dtaskmanager.memory.task.off-heap.size=0b
    每个 task slot 可用的堆外内存大小,设置为 0 表示不使用。
    
    -Dtaskmanager.memory.jvm-metaspace.size=268435456b
    JVM 元空间大小,约 256MB,用于类元数据的存储。
    
    -Dexternal-resources=none
    声明不使用 GPU 等外部资源。
    
    -Dtaskmanager.memory.jvm-overhead.min=201326592b
    JVM overhead(栈、JIT 等)的最小值,约 192MB。
    
    -Dtaskmanager.memory.framework.off-heap.size=134217728b
    Flink 框架组件的堆外内存,约 128MB(非 task 使用)。
    
    -Dtaskmanager.memory.network.max=134217730b
    网络缓冲区最大内存(与 min 相同,固定为约 128MB)。
    
    -Dtaskmanager.memory.framework.heap.size=134217728b
    Flink 框架组件的堆内内存,约 128MB。
    
    -Dtaskmanager.memory.managed.size=536870920b
    Flink 管理内存大小(用于 RocksDB、spill、缓存等),约 512MB。
    
    -Dtaskmanager.memory.task.heap.size=402653174b
    task slot 可用的堆内内存大小,约 384MB。
    
    -Dtaskmanager.numberOfTaskSlots=1
    每个 TaskManager 的 slot 数为 1,表示资源不再切分。
    
    -Dtaskmanager.memory.jvm-overhead.max=201326592b
    JVM overhead 的最大值,约 192MB(与 min 相同表示固定)。

设置 taskmanager 的 idea 启动参数

查找 org.apache.flink.runtime.taskexecutor.TaskManagerRunner 类。配置上述配置,并运行。如果有报错,控制台查看错误,并根据原因,deepseek一下。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值