Update some stuff & remove unneed code like python stuff, it is in the

web app.
This commit is contained in:
yuanhau 2025-05-20 20:58:13 +08:00
parent bc9a63f6ab
commit 62fa31ae4a
24 changed files with 104 additions and 2580 deletions

View file

@ -1,24 +1,32 @@
# Scraping line today home
# Scraping Line Today's home page system
This took me some time, but they use a fancy system for pulling news data.
## Endpoint on news.yuanhau.com aka this repo (Cached results)
### /api/home/uuid_lt/action?query=${query}
Fetches the uuid in each listings of the query
### /api/home/lt/${query}
Fetches the uuid and returns back with the news
## Main endpoint
For local Taiwan news they use this url: https://today.line.me/_next/data/v1/tw/v3/tab/domestic.json?tabs=domestic
From the _next? I thought that is static? I mean it maybe is, it is just providing with the URLs that the client will be fetching to the server, which is a bit fun.
From _next? I thought that is static? I mean it maybe is, it is just providing with the URLs that the client will be fetching to the server, which is a bit fun.
Here is a JSON snippet:
```json
{
"id": "682b0cef1b1269f8dec93e60",
"id": "the-news-id",
"type": "HIGHLIGHT",
"containerStyle": "Header",
"name": "國內話題:新北重大車禍",
"name": "國內話題:topic",
"source": "LISTING",
"header": {
"title": "新北重大死傷車禍",
"title": "the top title here",
"hasCompositeTitle": false,
"subTitle": "一輛小客車19日下午撞上放學人群造成多名學童、大人送醫至少3死10多傷肇事的78歲男子當場昏迷。"
"subTitle": "the news subtitle here"
},
"listings": [
{
@ -43,9 +51,9 @@ This api can be used for fetching the news from them, however, there is an issue
And viewing the JSON, oh would you look at that.
```JSON
{
"id": "262862833",
"title": "派駐芬蘭遭白委扯焦慮症 林昶佐現身喊話",
"publisher": "太報",
"id": "news-id",
"title": "news-title",
"publisher": "news-publisher",
"publisherId": "101366",
"publishTimeUnix": 1747670221000,
"contentType": "GENERAL",
@ -58,7 +66,7 @@ And viewing the JSON, oh would you look at that.
},
"categoryId": 100262,
"categoryName": "國內",
"shortDescription": "前立委林昶佐右二將出任駐芬蘭代表民眾黨立委林憶君卻質疑罹患焦慮症不適合去北歐。翻攝畫面前立委林昶佐將接任駐芬蘭代表民眾黨立委林憶君今5/19質詢指出林林昶佐曾患焦慮症北歐國家日常短病症容易發作質疑是否適合。林昶佐晚間現身直播節目向病友喊話要對自己有信心「絕對可以回復到正常生活包括工作」。林憶君指出1990年芬蘭是全球自殺率最高國家而且北歐國家的日照很短病症容易發作..."
"shortDescription": "The article's short description"
},
```
The url hash is just what we needed to use my scraper :D