{ "cells": [ { "cell_type": "markdown", "id": "83885e86-1ccb-46ec-bee9-a33f3b541569", "metadata": {}, "source": [ "# Zusammenfassung der Analysen vom Hackathon für die Webside\n", "\n", "- womöglich zur Darstellung auf der Webside\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "9bd1686f-9bbc-4c05-a5f5-e0c4ce653fb2", "metadata": { "tags": [] }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import altair as alt" ] }, { "cell_type": "markdown", "id": "81780c9a-7721-438b-9726-ff5a70910ce8", "metadata": {}, "source": [ "## Daten aufbereitung\n", "\n", "Dump der Datenbank vom 25.03.2023. Die verschiedene Tabellen der Datenbank werden einzeln eingelesen. Zusätzlich werden alle direkt zu einem Tweet zugehörige Information in ein Datenobjekt gesammelt. Die Informationen zu den GIS-Daten zu den einzelnen Polizeistadtion (\"police_stations\") sind noch unvollständig und müssen gegebenfalls nocheinmal überprüft werden.\n", "\n" ] }, { "cell_type": "code", "execution_count": 45, "id": "fcc48831-7999-4d79-b722-736715b1ced6", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "((479991, 3), (151690, 8), (151690, 4), (13327, 3), (163, 7))" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Merging different table of old (~2021) and new (~2022) scraper\n", "\n", "## cols: hashtag, url, mention (same for both)\n", "tweets_meta = pd.concat([pd.read_csv(\"data/entity_old.tsv\", sep = \"\\t\"), # data from old scraper\n", " pd.read_csv(\"data/tweets.csv\")]) # data from new scraper\n", "\n", "## cols: id, tweet_text, created_at, user_id; only subset from old table (same tsv used in next step)\n", "tweets_text = pd.concat([pd.read_csv(\"data/tweet_old.tsv\", sep = \"\\t\")[['id','tweet_text', 'created_at', 'user_id']].rename(columns = {\"id\":\"tweet_id\"}),\n", " pd.read_csv(\"data/tweets-1679742698645.csv\")])\n", "\n", "## cols: id, like_count, retweet_count, reply_count, quote_count; only subset from old table\n", "tweets_statistics = pd.concat([pd.read_csv(\"data/tweet_old.tsv\", sep = \"\\t\")[['id', 'like_count', 'retweet_count', 'reply_count', 'quote_count']].rename(columns = {\"id\":\"tweet_id\"}),\n", " pd.read_csv(\"data/tweets-1679742620302.csv\")])\n", "\n", "## cols: user_id, handle, user_name; colnames do not match betweend old an new data. Even username and handle seem to be mixed up in new data set (inverse order)\n", "## Info: Only a small amount of user_ids appear in both data sets, but if so username occasionaly have changed an therefore can not easily be merged\n", "tweets_user = pd.read_csv(\"data/user_old.tsv\", \n", " sep = \"\\t\").rename(columns = {\"id\":\"user_id\",\"name\": \"user_name\"} # uniform names\n", " ).merge(pd.read_csv(\"data/tweets-1679742702794.csv\" # merge with renamed new data\n", " ).rename(columns = {\"username\":\"handle\", \"handle\": \"user_name\"}), # reverse col names\n", " on = \"user_id\", # user_id as matching column\n", " how = \"outer\", # keep all unique uer_ids\n", " suffixes = [\"_2021\", \"_2022\"]) # identify column where username and label came from\n", "\n", "## Some usernames corresponding to one user_id have changed overtime. For easier handling only the latest username and handle is kept.\n", "tweets_user = tweets_user.assign(handle = tweets_user.apply(lambda row: row['handle_2021'] if pd.isna(row['handle_2022']) else row['handle_2022'], axis=1),\n", " user_name = tweets_user.apply(lambda row: row['user_name_2021'] if pd.isna(row['user_name_2022']) else row['user_name_2022'], axis=1)\n", " ).drop(['handle_2021', 'handle_2022', 'user_name_2021', 'user_name_2022'], axis =1) # no longer needed\n", "\n", "## addiditional information concerning the police stations\n", "## cols: handle, name, typ, bundesland, stadt, lat, long\n", "police_stations = pd.read_csv(\"data/polizei_accounts_geo.csv\", sep = \"\\t\" \n", " ).rename(columns = {\"Polizei Account\": \"handle\"})\n", "\n", "tweets_meta.shape, tweets_statistics.shape, tweets_text.shape, tweets_user.shape, police_stations.shape" ] }, { "cell_type": "markdown", "id": "0f7b2b95-0a6c-42c6-a308-5f68d4ba94b9", "metadata": {}, "source": [ "Jetzt können noch alle Tweet bezogenen informationen in einem Data Frame gespeichert werden:" ] }, { "cell_type": "code", "execution_count": 24, "id": "f30c2799-02c6-4e6a-ae36-9e039545b6b3", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Merge statistics, tweet text and user information in one data frame\n", "tweets_combined = pd.merge(tweets_statistics, \n", " tweets_text,\n", " on = 'tweet_id').merge(tweets_user, on = 'user_id'\n", " ).drop(['id'], axis = 1) # drop unascessary id column (redundant to index)\n", " " ] }, { "cell_type": "code", "execution_count": 49, "id": "bd407aba-eec1-41ed-bff9-4c5fcdf6cb9d", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idlike_countretweet_countreply_countquote_countmeasured_atis_deletedtweet_textcreated_atuser_idhandleuser_name
013210211234636636162120NaT<NA>@mahanna196 Da die Stadt keine Ausnahme für Radfahrer aufgeführt hat, gilt diese (Stand jetzt) a...2020-10-27 09:29:13778895426007203840polizei_olPolizei Oldenburg-Stadt/Ammerland
113210378342460661812000NaT<NA>@mahanna196 Ja. *sr2020-10-27 10:35:38778895426007203840polizei_olPolizei Oldenburg-Stadt/Ammerland
2132106823495577600019330NaT<NA>#Aktuell Auf dem ehem. Bundeswehrkrankenhausgelände in #Rostrup wurde ein Sprengsatz gefunden. F...2020-10-27 12:36:26778895426007203840polizei_olPolizei Oldenburg-Stadt/Ammerland
313210739401991004160000NaT<NA>@Emma36166433 Bitte lesen Sie unseren Tweet 2/2 *sr2020-10-27 12:59:06778895426007203840polizei_olPolizei Oldenburg-Stadt/Ammerland
413210886465067540492000NaT<NA>In der vergangenen Woche wurde die Wohnung des Tatverdächtigen durchsucht. Dabei stellten die Be...2020-10-27 13:57:32778895426007203840polizei_olPolizei Oldenburg-Stadt/Ammerland
.......................................
151685162582880380400435451102023-02-19 13:40:36False#Sicherheit durch #Sichtbarkeit\\nUnsere #Dir3 hat zu diesem Thema wieder einmal die Puppen tanze...2023-02-15 12:06:071168873095614160896polizeiberlin_pPolizei Berlin Prävention
151686162800410562390016720002023-02-25 13:14:49FalseUnser Präventionsteam vom #A44 berät heute und morgen tagsüber zum Thema Alkohol &amp; Drogen + ...2023-02-21 12:10:001168873095614160896polizeiberlin_pPolizei Berlin Prävention
151687162800481018301644860002023-02-25 13:14:49FalseAuch unser #A52 war heute aktiv und hat zum Thema Alkohol &amp; Drogen im Straßenverkehr beraten...2023-02-21 12:12:481168873095614160896polizeiberlin_pPolizei Berlin Prävention
151688162835289635287859320002023-02-26 13:15:05FalseGestern führte unser #A13 in einer Wohnsiedlung einen Präventionseinsatz zum Thema „Wohnraumeinb...2023-02-22 11:15:581168873095614160896polizeiberlin_pPolizei Berlin Prävention
1516891628709531998998529101002023-02-27 12:17:33FalseAuf dem Gelände der @BUFAStudios (Oberlandstr. 26-35) findet heute die #Seniorenmesse vom Bezirk...2023-02-23 10:53:071168873095614160896polizeiberlin_pPolizei Berlin Prävention
\n", "

151690 rows × 12 columns

\n", "
" ], "text/plain": [ " tweet_id like_count retweet_count reply_count \\\n", "0 1321021123463663616 2 1 2 \n", "1 1321037834246066181 2 0 0 \n", "2 1321068234955776000 19 3 3 \n", "3 1321073940199100416 0 0 0 \n", "4 1321088646506754049 2 0 0 \n", "... ... ... ... ... \n", "151685 1625828803804004354 5 1 1 \n", "151686 1628004105623900167 2 0 0 \n", "151687 1628004810183016448 6 0 0 \n", "151688 1628352896352878593 2 0 0 \n", "151689 1628709531998998529 10 1 0 \n", "\n", " quote_count measured_at is_deleted \\\n", "0 0 NaT \n", "1 0 NaT \n", "2 0 NaT \n", "3 0 NaT \n", "4 0 NaT \n", "... ... ... ... \n", "151685 0 2023-02-19 13:40:36 False \n", "151686 0 2023-02-25 13:14:49 False \n", "151687 0 2023-02-25 13:14:49 False \n", "151688 0 2023-02-26 13:15:05 False \n", "151689 0 2023-02-27 12:17:33 False \n", "\n", " tweet_text \\\n", "0 @mahanna196 Da die Stadt keine Ausnahme für Radfahrer aufgeführt hat, gilt diese (Stand jetzt) a... \n", "1 @mahanna196 Ja. *sr \n", "2 #Aktuell Auf dem ehem. Bundeswehrkrankenhausgelände in #Rostrup wurde ein Sprengsatz gefunden. F... \n", "3 @Emma36166433 Bitte lesen Sie unseren Tweet 2/2 *sr \n", "4 In der vergangenen Woche wurde die Wohnung des Tatverdächtigen durchsucht. Dabei stellten die Be... \n", "... ... \n", "151685 #Sicherheit durch #Sichtbarkeit\\nUnsere #Dir3 hat zu diesem Thema wieder einmal die Puppen tanze... \n", "151686 Unser Präventionsteam vom #A44 berät heute und morgen tagsüber zum Thema Alkohol & Drogen + ... \n", "151687 Auch unser #A52 war heute aktiv und hat zum Thema Alkohol & Drogen im Straßenverkehr beraten... \n", "151688 Gestern führte unser #A13 in einer Wohnsiedlung einen Präventionseinsatz zum Thema „Wohnraumeinb... \n", "151689 Auf dem Gelände der @BUFAStudios (Oberlandstr. 26-35) findet heute die #Seniorenmesse vom Bezirk... \n", "\n", " created_at user_id handle \\\n", "0 2020-10-27 09:29:13 778895426007203840 polizei_ol \n", "1 2020-10-27 10:35:38 778895426007203840 polizei_ol \n", "2 2020-10-27 12:36:26 778895426007203840 polizei_ol \n", "3 2020-10-27 12:59:06 778895426007203840 polizei_ol \n", "4 2020-10-27 13:57:32 778895426007203840 polizei_ol \n", "... ... ... ... \n", "151685 2023-02-15 12:06:07 1168873095614160896 polizeiberlin_p \n", "151686 2023-02-21 12:10:00 1168873095614160896 polizeiberlin_p \n", "151687 2023-02-21 12:12:48 1168873095614160896 polizeiberlin_p \n", "151688 2023-02-22 11:15:58 1168873095614160896 polizeiberlin_p \n", "151689 2023-02-23 10:53:07 1168873095614160896 polizeiberlin_p \n", "\n", " user_name \n", "0 Polizei Oldenburg-Stadt/Ammerland \n", "1 Polizei Oldenburg-Stadt/Ammerland \n", "2 Polizei Oldenburg-Stadt/Ammerland \n", "3 Polizei Oldenburg-Stadt/Ammerland \n", "4 Polizei Oldenburg-Stadt/Ammerland \n", "... ... \n", "151685 Polizei Berlin Prävention \n", "151686 Polizei Berlin Prävention \n", "151687 Polizei Berlin Prävention \n", "151688 Polizei Berlin Prävention \n", "151689 Polizei Berlin Prävention \n", "\n", "[151690 rows x 12 columns]" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Convert datatypes to appropriate one\n", "tweets_combined[['like_count', 'retweet_count', 'reply_count', 'quote_count']] = tweets_combined[['like_count', 'retweet_count', 'reply_count', 'quote_count']].fillna(np.NAN).astype(int)\n", "tweets_combined = tweets_combined.assign(measured_at = pd.to_datetime(tweets_combined['measured_at']), # change date to date format\n", " created_at = pd.to_datetime(tweets_combined['created_at']),\n", " handle = tweets_combined['handle'].str.lower(), # handle to lower case\n", " is_deleted = tweets_combined['is_deleted'].astype('boolean')) # is deleted column as boolean variable\n", "tweets_combined#.to_csv(\"data/tweets_all_combined.csv\")" ] }, { "cell_type": "markdown", "id": "91dfb8bb-15dc-4b2c-9c5f-3eab18d78ef8", "metadata": { "tags": [] }, "source": [ "### Adjazenzmatrix mentions\n", " \n", "Information, welche nicht direkt enthalten ist: welche Accounts werden erwähnt. Ist nur im Tweet mit @handle gekennzeichnet." ] }, { "cell_type": "code", "execution_count": 53, "id": "5d8bf730-3c8f-4143-b405-c95f1914f54b", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "0 Auch wir schließen uns dem Apell an! \\n\\n#Ukra...\n", "1 @BWeltenbummler Sehr schwer zu sagen. Die Evak...\n", "2 Halten Sie durch – die Evakuierung ist fast ab...\n", "3 Halten Sie durch – die Evakuierung ist fast ab...\n", "4 RT @drkberlin_iuk: 🚨 In enger Abstimmung mit d...\n", "Name: tweet_text, dtype: object" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# TODO" ] }, { "cell_type": "markdown", "id": "0c242090-0748-488c-b604-f521030f468f", "metadata": { "tags": [] }, "source": [ "## Metadaten \n", "\n", "Welche Daten bilden die Grundlage?\n" ] }, { "cell_type": "code", "execution_count": 112, "id": "0e5eb455-6b12-4572-8f5e-f328a94bd797", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "hashtag 267255\n", "url 141594\n", "mention 71142\n", "Name: entity_type, dtype: int64" ] }, "execution_count": 112, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweets_meta[\"entity_type\"].value_counts()\n", "# tweets_meta[tweets_meta['entity_type'] == \"mention\"]" ] }, { "cell_type": "markdown", "id": "ef440301-cf89-4e80-8801-eb853d636190", "metadata": { "tags": [] }, "source": [ "Insgesamt haben wir 151690 einzigartige Tweets:" ] }, { "cell_type": "code", "execution_count": 113, "id": "5a438e7f-8735-40bb-b450-2ce168f0f67a", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "151690" ] }, "execution_count": 113, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweets_combined[\"tweet_id\"].value_counts().shape[0] # Anzahl an Tweets" ] }, { "cell_type": "code", "execution_count": 10, "id": "4f1e8c6c-3610-436e-899e-4d0307259230", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Die Tweets wurden vom 2020-10-27 bis zum: 2023-03-16 gesammelt. Also genau insgesamt: 870 Tage. (Mit kleinen Unterbrechungen)\n" ] } ], "source": [ "print(\"Die Tweets wurden vom\", tweets_combined['created_at'].min().date(), \"bis zum:\", tweets_combined['created_at'].max().date(), \"gesammelt.\", \"Also genau insgesamt:\", (tweets_combined['created_at'].max() - tweets_combined['created_at'].min()).days, \"Tage. (Mit kleinen Unterbrechungen)\")\n", "# tweets_combined[tweets_combined['created_at'] == tweets_combined['created_at'].max()] # Tweets vom letzten Tag" ] }, { "cell_type": "markdown", "id": "d8b47a60-1535-4d03-913a-73e897bc18df", "metadata": { "tags": [] }, "source": [ "Welche Polizei Accounts haben am meisten getweetet?" ] }, { "cell_type": "code", "execution_count": 11, "id": "9373552e-6baf-46df-ae16-c63603e20a83", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
handlecountNameTypBundeslandStadtLATLONG
11polizei_ffm5512NaNNaNNaNNaNNaNNaN
0polizeisachsen5340Polizei SachsenPolizeiSachsenDresden51.049328613.7381437
3polizei_nrw_do4895Polizei NRW DOPolizeiNordrhein-WestfalenDortmund51.51422737.4652789
92polizeibb4323NaNNaNNaNNaNNaNNaN
61polizeihamburg4042Polizei HamburgPolizeiHamburgHamburg53.55034110.000654
\n", "
" ], "text/plain": [ " handle count Name Typ Bundesland \\\n", "11 polizei_ffm 5512 NaN NaN NaN \n", "0 polizeisachsen 5340 Polizei Sachsen Polizei Sachsen \n", "3 polizei_nrw_do 4895 Polizei NRW DO Polizei Nordrhein-Westfalen \n", "92 polizeibb 4323 NaN NaN NaN \n", "61 polizeihamburg 4042 Polizei Hamburg Polizei Hamburg \n", "\n", " Stadt LAT LONG \n", "11 NaN NaN NaN \n", "0 Dresden 51.0493286 13.7381437 \n", "3 Dortmund 51.5142273 7.4652789 \n", "92 NaN NaN NaN \n", "61 Hamburg 53.550341 10.000654 " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweets_agg = tweets_combined.groupby(by = [\"user_id\", \"user_name\", \"handle\"]\n", " )[\"user_id\"].aggregate(['count']\n", " ).merge(police_stations,\n", " on = \"handle\",\n", " how = \"left\"\n", " ).sort_values(['count'], ascending=False)\n", "tweets_agg.shape\n", "activy_police_vis = tweets_agg[0:50]\n", "activy_police_vis.head()" ] }, { "cell_type": "markdown", "id": "9cf5f544-706b-41af-b785-7023f04e3ecb", "metadata": { "tags": [] }, "source": [ "Visualisierung aktivste Polizeistadtionen:" ] }, { "cell_type": "code", "execution_count": 13, "id": "b1c39196-d1cc-4f82-8e01-7529e7b3046f", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "barchart = alt.Chart(activy_police_vis[0:15]).mark_bar().encode(\n", " x = 'count:Q',\n", " y = alt.Y('handle:O', sort = '-x'),\n", ")\n", "barchart " ] }, { "cell_type": "markdown", "id": "90f686ff-93c6-44d9-9761-feb35dfe9d1d", "metadata": { "tags": [] }, "source": [ "Welche Tweets ziehen besonders viel Aufmerksamkeit auf sich?" ] }, { "cell_type": "code", "execution_count": 14, "id": "d0549250-b11f-4762-8500-1134c53303b4", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indextweet_idlike_countretweet_countreply_countquote_countmeasured_atis_deletedtweet_textcreated_atuser_idhandleuser_nameNameTypBundeslandStadtLATLONG
03053160953924045887897921455184536433412023-01-05 14:44:34FalseDie Gewalt, die unsere Kolleginnen &amp; Kollegen in der Silvesternacht erleben mussten, ist une...2023-01-01 13:17:132397974054polizeiberlinPolizei BerlinNaNNaNNaNNaNNaNNaN
1133113551792283968798721918633861203628NaTNaNAn diejenigen, die vergangene Nacht in eine Schule in #Gesundbrunnen eingebrochen sind und 242 T...2021-01-29 15:41:202397974054polizeiberlinPolizei BerlinNaNNaNNaNNaNNaNNaN
29169315056204591481733161570870981865402022-03-24 20:15:08FalseWICHTIGE Info:\\nÜber das Internet wird derzeit ein Video verbreitet, in dem von einem Überfall a...2022-03-20 19:01:052389161066polizei_nrw_bnPolizei NRW BNPolizei NRW BNPolizeiNordrhein-WestfalenBonn50.7358517.10066
391695150562066647689625910337153959352022-03-24 20:15:08FalseDie Experten gehen derzeit davon aus, dass es sich um ein absichtliches \"Fake-Video\" handelt, da...2022-03-20 19:01:542389161066polizei_nrw_bnPolizei NRW BNPolizei NRW BNPolizeiNordrhein-WestfalenBonn50.7358517.10066
412263113590981964342927399471642128102NaTNaNWeil wir dich schieben! @BVG_Kampagne 😉 https://t.co/N8kdlCxhz22021-02-09 11:13:554876039738bpol_bBundespolizei BerlinNaNNaNNaNNaNNaNNaN
............................................................
15168575691332625325654757377-99-99-99-99NaTNaNSinken die Temperaturen ❄, steigt zeitgleich das Risiko für Verkehrsteilnehmer. Höchste Zeit zu ...2020-11-28 10:00:11223758384polizeisachsenPolizei SachsenPolizei SachsenPolizeiSachsenDresden51.049328613.7381437
15168675721332738525507186692-99-99-99-99NaTNaN📺Am Sonntag, um 19:50 Uhr, geht es bei #KripoLive im \\n@mdrde\\n auch um die Fahndung nach einem ...2020-11-28 17:30:00223758384polizeisachsenPolizei SachsenPolizei SachsenPolizeiSachsenDresden51.049328613.7381437
1516871447021465679768494526467-99-99-99-99NaTNaNMusik verbindet!\\nUnser #Adventskalender der #Bundespolizei startet morgen ➡ https://t.co/V6CaTV...2021-11-30 13:51:024876085224bpol_nordBundespolizei NordNaNNaNNaNNaNNaNNaN
1516881447011464124290605977600-99-99-99-99NaTNaN@gretchen_hann Hallo, diese Frage kann die Bundespolizei Spezialkräfte besser beantworten. Richt...2021-11-26 06:50:074876085224bpol_nordBundespolizei NordNaNNaNNaNNaNNaNNaN
151689668541376453040283209728-99-99-99-99NaTNaN#Bönen #Holzwickede - Verstöße gegen Coronaschutzverordnung: Polizei löst Gaststättenabend und F...2021-03-29 08:35:522389263558polizei_nrw_unPolizei NRW UNPolizei NRW UNPolizeiNordrhein-WestfalenUnna51.53488357.689014
\n", "

151690 rows × 19 columns

\n", "
" ], "text/plain": [ " index tweet_id like_count retweet_count reply_count \\\n", "0 3053 1609539240458878979 21455 1845 3643 \n", "1 1331 1355179228396879872 19186 3386 1203 \n", "2 91693 1505620459148173316 15708 7098 186 \n", "3 91695 1505620666476896259 10337 1539 59 \n", "4 122631 1359098196434292739 9471 642 128 \n", "... ... ... ... ... ... \n", "151685 7569 1332625325654757377 -99 -99 -99 \n", "151686 7572 1332738525507186692 -99 -99 -99 \n", "151687 144702 1465679768494526467 -99 -99 -99 \n", "151688 144701 1464124290605977600 -99 -99 -99 \n", "151689 66854 1376453040283209728 -99 -99 -99 \n", "\n", " quote_count measured_at is_deleted \\\n", "0 341 2023-01-05 14:44:34 False \n", "1 628 NaT NaN \n", "2 540 2022-03-24 20:15:08 False \n", "3 35 2022-03-24 20:15:08 False \n", "4 102 NaT NaN \n", "... ... ... ... \n", "151685 -99 NaT NaN \n", "151686 -99 NaT NaN \n", "151687 -99 NaT NaN \n", "151688 -99 NaT NaN \n", "151689 -99 NaT NaN \n", "\n", " tweet_text \\\n", "0 Die Gewalt, die unsere Kolleginnen & Kollegen in der Silvesternacht erleben mussten, ist une... \n", "1 An diejenigen, die vergangene Nacht in eine Schule in #Gesundbrunnen eingebrochen sind und 242 T... \n", "2 WICHTIGE Info:\\nÜber das Internet wird derzeit ein Video verbreitet, in dem von einem Überfall a... \n", "3 Die Experten gehen derzeit davon aus, dass es sich um ein absichtliches \"Fake-Video\" handelt, da... \n", "4 Weil wir dich schieben! @BVG_Kampagne 😉 https://t.co/N8kdlCxhz2 \n", "... ... \n", "151685 Sinken die Temperaturen ❄, steigt zeitgleich das Risiko für Verkehrsteilnehmer. Höchste Zeit zu ... \n", "151686 📺Am Sonntag, um 19:50 Uhr, geht es bei #KripoLive im \\n@mdrde\\n auch um die Fahndung nach einem ... \n", "151687 Musik verbindet!\\nUnser #Adventskalender der #Bundespolizei startet morgen ➡ https://t.co/V6CaTV... \n", "151688 @gretchen_hann Hallo, diese Frage kann die Bundespolizei Spezialkräfte besser beantworten. Richt... \n", "151689 #Bönen #Holzwickede - Verstöße gegen Coronaschutzverordnung: Polizei löst Gaststättenabend und F... \n", "\n", " created_at user_id handle user_name \\\n", "0 2023-01-01 13:17:13 2397974054 polizeiberlin Polizei Berlin \n", "1 2021-01-29 15:41:20 2397974054 polizeiberlin Polizei Berlin \n", "2 2022-03-20 19:01:05 2389161066 polizei_nrw_bn Polizei NRW BN \n", "3 2022-03-20 19:01:54 2389161066 polizei_nrw_bn Polizei NRW BN \n", "4 2021-02-09 11:13:55 4876039738 bpol_b Bundespolizei Berlin \n", "... ... ... ... ... \n", "151685 2020-11-28 10:00:11 223758384 polizeisachsen Polizei Sachsen \n", "151686 2020-11-28 17:30:00 223758384 polizeisachsen Polizei Sachsen \n", "151687 2021-11-30 13:51:02 4876085224 bpol_nord Bundespolizei Nord \n", "151688 2021-11-26 06:50:07 4876085224 bpol_nord Bundespolizei Nord \n", "151689 2021-03-29 08:35:52 2389263558 polizei_nrw_un Polizei NRW UN \n", "\n", " Name Typ Bundesland Stadt LAT \\\n", "0 NaN NaN NaN NaN NaN \n", "1 NaN NaN NaN NaN NaN \n", "2 Polizei NRW BN Polizei Nordrhein-Westfalen Bonn 50.735851 \n", "3 Polizei NRW BN Polizei Nordrhein-Westfalen Bonn 50.735851 \n", "4 NaN NaN NaN NaN NaN \n", "... ... ... ... ... ... \n", "151685 Polizei Sachsen Polizei Sachsen Dresden 51.0493286 \n", "151686 Polizei Sachsen Polizei Sachsen Dresden 51.0493286 \n", "151687 NaN NaN NaN NaN NaN \n", "151688 NaN NaN NaN NaN NaN \n", "151689 Polizei NRW UN Polizei Nordrhein-Westfalen Unna 51.5348835 \n", "\n", " LONG \n", "0 NaN \n", "1 NaN \n", "2 7.10066 \n", "3 7.10066 \n", "4 NaN \n", "... ... \n", "151685 13.7381437 \n", "151686 13.7381437 \n", "151687 NaN \n", "151688 NaN \n", "151689 7.689014 \n", "\n", "[151690 rows x 19 columns]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweets_attention = tweets_combined.merge(police_stations,\n", " on = \"handle\",\n", " how = \"left\")\n", "pd.options.display.max_colwidth = 100\n", "tweets_attention.sort_values('like_count', ascending = False).reset_index()\n", "\n" ] }, { "cell_type": "code", "execution_count": 42, "id": "621a3b74-e909-435c-8820-b38b63aa4893", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user_idhandleuser_name
010000046861566525456jannik9Systemstratege:
11000043230870867969LSollikPhysiolucy
21000405847460151296Achim1949HansSystemstratege:
31000460805719121921WahreWWahreWorte
41000744009638252544derD1ck3Ⓓ①ⓒⓚ①③ (🏡)
............
1155499931264Havok1975Systemstratege:
11555999542638226403328Madame_de_SaxeSystemstratege:
11556999901133282754560tungstendie74Systemstratege:
11557999904275080794112_danielheimSystemstratege:
11558999955376454930432amyman6010Systemstratege:
\n", "

11559 rows × 3 columns

\n", "
" ], "text/plain": [ " user_id handle user_name\n", "0 1000004686156652545 6jannik9 Systemstratege: \n", "1 1000043230870867969 LSollik Physiolucy\n", "2 1000405847460151296 Achim1949Hans Systemstratege: \n", "3 1000460805719121921 WahreW WahreWorte\n", "4 1000744009638252544 derD1ck3 Ⓓ①ⓒⓚ①③ (🏡)\n", "... ... ... ...\n", "11554 99931264 Havok1975 Systemstratege: \n", "11555 999542638226403328 Madame_de_Saxe Systemstratege: \n", "11556 999901133282754560 tungstendie74 Systemstratege: \n", "11557 999904275080794112 _danielheim Systemstratege: \n", "11558 999955376454930432 amyman6010 Systemstratege: \n", "\n", "[11559 rows x 3 columns]" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [] } ], "metadata": { "kernelspec": { "display_name": "python-scientific kernel", "language": "python", "name": "python-scientific" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.9" } }, "nbformat": 4, "nbformat_minor": 5 }