Spark: SQL: JSON Operation

9th August 2021 at 11:20am

get_json_object

官方文档。示例:

> SELECT get_json_object('{"a":"b"}', '$.a');
b
> SELECT get_json_object('{"a": []}', '$.a[0]');
NULL
> SELECT get_json_object('[1, 2]', '$[1]');
2

返回的数据类型为 STRING。取不到数据时返回 NULL

from_json

可以从 JSON 中提取出一个 struct value。官方文档。下面给一个复杂示例。

对于这样一个 JSON:

{
  "viewed_objects": [
    {
      "chat_session_id": "471699156198480128",
      "chat_dialogue_id": "471699264186260352",
      "node_point_id": "",
      "if_history": 0,
      "button_details": [
        {
          "button_id": "feedback_display_btn0",
          "button_type": "btn_feedback_display",
          "button_text": "Ya"
        },
        {
          "button_id": "feedback_display_btn1",
          "button_type": "btn_feedback_display",
          "button_text": "Tidak"
        }
      ]
    }
  ]
}

假如上述的 JSON 是在 data 字段中,结合 LATERAL VIEWEXPLODE 可以写出复杂语句:

SELECT
    button.button_id   AS button_id
  , button.button_text AS button_text
  , button.button_type AS button_type
FROM table1
  LATERAL VIEW EXPLODE(from_json(
    get_json_object(data, "$.viewed_objects[0].button_details"),
    'ARRAY<STRUCT<button_id: STRING, button_text: STRING, button_type: STRING>>'
  )) AS button;