亚马逊红移机器学习与数据共享全解析
立即解锁
发布时间: 2025-08-31 01:06:13 阅读量: 9 订阅数: 11 AIGC 

### 亚马逊红移机器学习与数据共享全解析
#### 1. 亚马逊红移机器学习模型应用与解析
在机器学习中,模型的构建和应用是关键环节。以竞赛速度推理为例,以下代码展示了从`race.speed_inference`表中选取数据的操作:
```sql
human_speed)
FROM
race.speed_inference
LIMIT 5;
```
同时,`fnc_will_human_win_prob`函数给出了人类赢得比赛的概率,部分结果如下:
| probabilities | labels |
| --- | --- |
| [0.98792028, 0.01207972] | ["f", "t"] |
| [0.99937975, 0.00062025] | ["f", "t"] |
| [0.91194165, 0.08805832] | ["t", "f"] |
| [0.95782197, 0.04217804] | ["t", "f"] |
| [0.93414819, 0.06585180] | ["t", "f"] |
可以看出,前两行输入数据中人类赢得比赛的概率较低,而后三行较高。
为了理解模型中各特征对预测结果的贡献,可使用`explain_model`命令。该命令借助亚马逊SageMaker Clarify,能提供模型可解释性报告,其中包含所有模型特征的Shapley值。Shapley值越高,表明该特征对预测结果的影响越大。示例代码如下:
```sql
SELECT explain_model('race.fnc_will_human_win_prob');
```
运行结果如下表所示:
| explain_model |
| --- |
| {"explanations":{"kernel_shap":{"label0":{ "expected_value":-0.5694439538988466, "global_shap_values": { "city":0.2865426473431818, "human_distance":0.8485933955733828, "human_speed":0.4954490773124456, "machine_distance":0.8925393014624781, "machine_speed":0.7125560417928333, "road_condition":1.0487996886952989, "weather":1.460974788708901} }}},"version":"1.0"} |
从结果可知,`city`和`human_speed`的影响相对`road_condition`(1.04)和`weather`(1.46)较小。由此可推断,人类或机器在适应天气和道路条件方面表现更优。
#### 2. 运用亚马逊红移机器学习预测学生成绩
利用学生学习数据集构建模型,可预测学生的成绩结果,如通过、失败、退学或优异。具体操作步骤如下:
1. **创建训练数据集**:将历史数据创建为表格,包含学生的人口统计信息、学习管理系统(LMS)收集的数据以及学生评估的分数。示例代码如下:
```sql
CREATE TABLE tbl_student_lmsactivities_and_score AS
SELECT
st.school_id, st.id_student, st.code_module,
st.code_presentation, st.gender, st.region,
st.highest_education, st.imd_band, st.age_band,
st.num_of_prev_atteempts, st.studied_credits,
st.disability, st.final_result,
st_lms_clicks.sum_of_clicks, scores.total_score,
scores.mean_score
FROM
openlearnm.student_info st
LEFT JOIN
(SELECT school_id, code_module,
code_presentation, id_student,
sum(sum_click) AS sum_of_clicks
FROM
OPENLEARNM.student_lms
GROUP BY 1,2,3,4) st_lms_clicks
ON st.school_id = st_lms_clicks.school_id
AND st.code_module = st_lms_clicks.code_module
AND st.code_presentation = st_lms_clicks.code_presentation
AND st.id_student = st_lms_clicks.id_student
LEFT JOIN
(SELECT
school_id, id_student,
sum(score) AS total_score,
avg(score) AS mean_score
FROM
openlearnm.student_assessment
GROUP BY 1,2) scores
ON st.school_id = scores.school_id
AND st.id_student = scores.id_student
;
```
2. **运行`CREATE MODEL`命令**:调用亚马逊SageMaker Autopilot。示例代码如下:
```sql
CREATE MODEL student_result
FROM tbl_student_lmsactivities_and_score
TARGET final_result
FUNCTION fnc_final_result
IAM_ROLE default
SETTINGS (
S3_BUCKET 'my_ml_bucket',
MAX_RUNTIME 10800
);
```
这里将`MAX_RUNTIME`设置为10800秒,高于默认的5400秒,以确保成功生成模型可解释性报告。
3. **执行`show model`命令**:查看亚马逊SageMaker Autopilot选择的模型详细信息。示例代码如下:
```sql
show model student_result;
```
运行结果如下表所示:
| Key | Value |
| --- | --- |
| Model Name | student_result |
| Schema Name | public |
| Owner | model_create_user |
| Creation Time | Sun, 05.03.2024 18:54:11 |
| Model State | READY |
| validation:accuracy | 0.870610 |
| Estimated Cost | 29.814964 |
| TRAINING DATA: Table | tbl_student_lmsactivities_and_score |
0
0
复制全文
相关推荐









