书生大模型实战营闯关 - 入门岛 - Python

原创已于 2024-08-08 06:34:13 修改 · 1k 阅读

14 ·

CC 4.0 BY-SA版权

文章标签：

#python #开发语言

于 2024-07-29 23:51:23 首次发布

学习目标：

复习Python知识
复习numpy知识
闯关任务一：实现一个wordcount函数，统计英文字符串中每个单词出现的次数。返回一个字典，key为单词，value为对应单词出现的次数。
闯关任务二：使用本地vscode连接远程开发机，将上面你写的wordcount函数在开发机上进行debug，体验debug的全流程，并完成一份debug笔记(需要截图)

学习内容：

1. 复习Python知识

实战营基础复习资料
 python官方文档
 wtfpython 一些python怪点

2. 复习numpy知识

实战营numpy资料
 numpy1.26文档

又复习了一遍广播机制

两个向量是broadcastable 如果符合接下来的规则

每个向量至少一维
向量维度最后一位开始, 向量维度必须相同, 其中之一是1, 或不存在

如果可广播

x, y 如果维度不相等从低维度的前面补1,
然后针对每维度, 维数取最大值, 相应数值进行复制, 然后完成计算

值得一提的是, numpy2发布之后大量库都存在兼容性问题，需要pip install 'numpy<2'确保使用。

3. 闯关任务一：实现一个wordcount函数

统计英文字符串中每个单词出现的次数。返回一个字典，key为单词，value为对应单词出现的次数。

import string

def wordcount(input_string):
    # 先去掉标点符号。
    input_string = input_string.translate(str.maketrans('', '', string.punctuation))

    # 然后将每个单词转换成小写。
    input_string = input_string.lower()

    # 将字符串转化为列表
    words = input_string.split()

    # 创建字典
    word_counts = {}

    # 遍历列表, 单词作key, 首次出现次数值是1, 之后每出现加1
    for word in words:
        if word in word_counts:
            word_counts[word] += 1
        else:
            word_counts[word] = 1

    # 返回字典
    return word_counts


# 测试
if __name__ == "__main__":
    text = f"""☁️ Welcome  user, welcome to the ShuSheng LLM Practical Camp Course!
            😀 Let’s go on a journey through ShuSheng Island together.
        """
    print(wordcount(text))

4. 闯关任务二：开发机上进行debug

4.1 使用本地Vscode连接InternStudio开发机

首先需要安装Remote-SSH插件

在这里插入图片描述

安装完成后进入Remote Explorer,在ssh目录下新建一个ssh链接, 在$HOME/.ssh/config中设置好

Host internlm1
	 HostName ssh.intern-ai.org.cn
	 User root
	 Port 48438
     AddKeysToAgent yes
	 IdentityFile "SomewhereOnMyMachine/key"
     ServerAliveInterval=180
     ServerAliveCountMax=5

在这里插入图片描述

此时会有弹窗提示输入ssh链接命令，回车后还会让我们选择要更新那个ssh配置文件，默认就选择第一个就行（如果你有其他需要的话也可以新建一个ssh配置文件）。

开发机的链接命令可以在开发机控制台对应开发机"SSH连接"找到，复制登录命令到vscode的弹窗中然后回车，vscode就会开始链接InternStudio的服务器，记得此时切回去复制一下ssh的密码，待会会用到。

在这里插入图片描述

连接成功后我们打开远程连接的vscode的extensions，在远程开发机上安装好python的插件，后面python debug会用到。也可以一键把我们本地vscode的插件安装到开发机上。

在这里插入图片描述

4.2 在Vscode中打开终端

单击vscode页面下方有一个X和！的位置可以快速打开vscode的控制台，然后进入TERMINAL。
TIPS：右上方的+可以新建一个TERMINAL。

4.3 使用Vscode进行Python debug

4.3.1 打开文件count_word.py

在VSCode中打开直接打开root文件夹，或者你想要debug的Python文件所在的文件夹。这里可能会需要再次输入密码。下面我们以打开root文件夹为例。单击Open Folder或者左上角菜单File->Open Folder。

在这里插入图片描述

选择解释器：单击右下角的select interpreter，vsconde会自动扫描开发机上所有的python环境中的解释器。这里我们只要选中在上一节Linux任务中建好的```xt`` 环境。

conda activate xt

4.3.2 设置断点

在代码行号旁边点击，可以添加一个红点，这就是断点（如果不能添加红点需要检查一下python extension是否已经正确安装）。当代码运行到这里时，它会停下来，这样你就可以检查变量的值、执行步骤等。

4.3.3 启动debug

点击VSCode侧边栏的“Run and Debug”（运行和调试），然后点击“Run and Debug”（开始调试）按钮，或者按F5键。

单击后会需要选择debugger和debug配置文件，我们单独debug一个python文件只要选择Python File就行。然后在达到第一个断点之前运行，在第一个断点处停下来。

4.3.4 查看变量

当代码在断点处停下来时，你可以查看和修改变量的值。在“Run and Debug”侧边栏的“Variables”（变量）部分，你可以看到当前作用域内的所有变量及其值。
在这里插入图片描述

4.3.5 单步执行代码

使用“Run and Debug”侧边栏顶部的按钮来单步执行代码。逐行运行代码，并查看每行代码执行后的效果。如下图所示，当for循环运行过一次之后， words与word均发生变化。

在这里插入图片描述

4.6 使用debug面板

在这里插入图片描述

1: continue: 继续运行到下一个断点（F5)
2: step over：跳过，可以理解为运行当前行代码，不进入具体的函数或者方法。(F10)
3: step into: 进入函数或者方法。如果当行代码存在函数或者方法时，进入代码该函数或者方法。如果当行代码没有函数或者方法，则等价于step over。(F11)
4: step out：退出函数或者方法, 返回上一层。(Shift+F11)
5: restart：重新启动debug (Ctrl+Shift+F5)
6: stop：终止debug (Shift+F5)

4.3.6 修复错误并重新运行

当找到了代码中的错误，可以修复它，然后重新运行。

4.3.7 最终运行结果

import string

text = """
Got this panda plush toy for my daughter's birthday,
who loves it and takes it everywhere. It's soft and
super cute, and its face has a friendly look. It's
a bit small for what I paid though. I think there
might be other options that are bigger for the
same price. It arrived a day earlier than expected,
so I got to play with it myself before I gave it
to her.
"""

def wordcount(input_string):
    # 去除标点符号
    input_string = input_string.translate(str.maketrans('', '', string.punctuation))

    # 将大写字母全部小写
    input_string = input_string.lower()

    # 字符串分割转化为列表
    words = input_string.split()

    # 新建空dict
    word_counts = {}

    # 扫描转化
    for word in words:
        # If the word is already in the dictionary, increment its count
        if word in word_counts:
            word_counts[word] += 1
        # Otherwise, add the word to the dictionary with a count of 1
        else:
            word_counts[word] = 1

    # 返回查明dict
    return word_counts


print(wordcount(text))

 {'got': 2, 'this': 1, 'panda': 1, 'plush': 1, 'toy': 1, 'for': 3, 'my': 1, 'daughters': 1, 'birthday': 1, 'who': 1, 'loves': 1, 'it': 5, 'and': 3, 'takes': 1, 'everywhere': 1, 'its': 3, 'soft': 1, 'super': 1, 'cute': 1, 'face': 1, 'has': 1, 'a': 3, 'friendly': 1, 'look': 1, 'bit': 1, 'small': 1, 'what': 1, 'i': 4, 'paid': 1, 'though': 1, 'think': 1, 'there': 1, 'might': 1, 'be': 1, 'other': 1, 'options': 1, 'that': 1, 'are': 1, 'bigger': 1, 'the': 1, 'same': 1, 'price': 1, 'arrived': 1, 'day': 1, 'earlier': 1, 'than': 1, 'expected': 1, 'so': 1, 'to': 2, 'play': 1, 'with': 1, 'myself': 1, 'before': 1, 'gave': 1, 'her': 1}

在这里插入图片描述