{ "cells": [ { "cell_type": "markdown", "id": "83885e86-1ccb-46ec-bee9-a33f3b541569", "metadata": {}, "source": [ "# Zusammenfassung der Analysen vom Hackathon für die Webside\n", "\n", "- womöglich zur Darstellung auf der Webside\n" ] }, { "cell_type": "code", "execution_count": 14, "id": "9bd1686f-9bbc-4c05-a5f5-e0c4ce653fb2", "metadata": { "tags": [] }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import altair as alt" ] }, { "cell_type": "markdown", "id": "81780c9a-7721-438b-9726-ff5a70910ce8", "metadata": {}, "source": [ "## Daten aufbereitung\n", "\n", "Dump der Datenbank vom 25.03.2023. Die verschiedene Tabellen der Datenbank werden einzeln eingelesen. Zusätzlich werden alle direkt zu einem Tweet zugehörige Information in ein Datenobjekt gesammelt. Die Informationen zu den GIS-Daten zu den einzelnen Polizeistadtion (\"police_stations\") sind noch unvollständig und müssen gegebenfalls nocheinmal überprüft werden.\n", "\n" ] }, { "cell_type": "code", "execution_count": 6, "id": "e312a975-3921-44ee-a7c5-37736678bc3f", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idmeasured_atlike_countreply_countretweet_countquote_countis_deletedtweet_textcreated_atuser_id
014969550547120455812022-02-28 22:42:26130200Auch wir schließen uns dem Apell an! \\n\\n#Ukra...2022-02-24 21:07:51773438463068766208
114969572135162142772022-02-28 22:42:2620000@BWeltenbummler Sehr schwer zu sagen. Die Evak...2022-02-24 21:16:262397974054
214969635012015390732022-02-28 22:42:26-1-1-1-11Halten Sie durch – die Evakuierung ist fast ab...2022-02-24 21:41:252398002414
314969637710548254722022-02-28 23:42:271427830Halten Sie durch – die Evakuierung ist fast ab...2022-02-24 21:42:292398002414
414969656969071042582022-02-28 23:42:27001100RT @drkberlin_iuk: 🚨 In enger Abstimmung mit d...2022-02-24 21:50:092398002414
\n", "
" ], "text/plain": [ " tweet_id measured_at like_count reply_count \\\n", "0 1496955054712045581 2022-02-28 22:42:26 13 0 \n", "1 1496957213516214277 2022-02-28 22:42:26 2 0 \n", "2 1496963501201539073 2022-02-28 22:42:26 -1 -1 \n", "3 1496963771054825472 2022-02-28 23:42:27 142 7 \n", "4 1496965696907104258 2022-02-28 23:42:27 0 0 \n", "\n", " retweet_count quote_count is_deleted \\\n", "0 2 0 0 \n", "1 0 0 0 \n", "2 -1 -1 1 \n", "3 8 3 0 \n", "4 11 0 0 \n", "\n", " tweet_text created_at \\\n", "0 Auch wir schließen uns dem Apell an! \\n\\n#Ukra... 2022-02-24 21:07:51 \n", "1 @BWeltenbummler Sehr schwer zu sagen. Die Evak... 2022-02-24 21:16:26 \n", "2 Halten Sie durch – die Evakuierung ist fast ab... 2022-02-24 21:41:25 \n", "3 Halten Sie durch – die Evakuierung ist fast ab... 2022-02-24 21:42:29 \n", "4 RT @drkberlin_iuk: 🚨 In enger Abstimmung mit d... 2022-02-24 21:50:09 \n", "\n", " user_id \n", "0 773438463068766208 \n", "1 2397974054 \n", "2 2398002414 \n", "3 2398002414 \n", "4 2398002414 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweets_meta = pd.read_csv(\"data/tweets.csv\")\n", "tweets_time = pd.read_csv(\"data/tweets-1679742620302.csv\")\n", "tweets_text = pd.read_csv(\"data/tweets-1679742698645.csv\")\n", "tweets_user = pd.read_csv(\"data/tweets-1679742702794.csv\"\n", " ).rename(columns = {\"username\":\"handle\", # rename columns\n", " \"handle\": \"username\"})\n", "tweets_user = tweets_user.assign(handle = tweets_user['handle'].str.lower()) # convert handles to lower case\n", "tweets_combined = pd.merge(tweets_time, # merge the two tweet related data frames\n", " tweets_text, \n", " how = 'inner', \n", " on = 'tweet_id'\n", " ).drop(['id'], # drop unascessary id column (redundant to index)\n", " axis = 1)\n", "tweets_combined = tweets_combined.assign(measured_at = pd.to_datetime(tweets_combined['measured_at']), # change date to date format\n", " created_at = pd.to_datetime(tweets_combined['created_at']))\n", "police_stations = pd.read_csv(\"data/polizei_accounts_geo.csv\", sep = \"\\t\" # addiditional on police stations\n", " ).rename(columns = {\"Polizei Account\": \"handle\"})\n", "tweets_combined.head()" ] }, { "cell_type": "markdown", "id": "91dfb8bb-15dc-4b2c-9c5f-3eab18d78ef8", "metadata": { "tags": [] }, "source": [ "### Adjazenzmatrix mentions\n", " \n", "Information, welche nicht direkt enthalten ist: welche Accounts werden erwähnt. Ist nur im Tweet mit @handle gekennzeichnet." ] }, { "cell_type": "code", "execution_count": 53, "id": "5d8bf730-3c8f-4143-b405-c95f1914f54b", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "0 Auch wir schließen uns dem Apell an! \\n\\n#Ukra...\n", "1 @BWeltenbummler Sehr schwer zu sagen. Die Evak...\n", "2 Halten Sie durch – die Evakuierung ist fast ab...\n", "3 Halten Sie durch – die Evakuierung ist fast ab...\n", "4 RT @drkberlin_iuk: 🚨 In enger Abstimmung mit d...\n", "Name: tweet_text, dtype: object" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# TODO" ] }, { "cell_type": "markdown", "id": "0c242090-0748-488c-b604-f521030f468f", "metadata": { "tags": [] }, "source": [ "## Metadaten \n", "\n", "Welche Daten bilden die Grundlage?" ] }, { "cell_type": "code", "execution_count": 7, "id": "0e5eb455-6b12-4572-8f5e-f328a94bd797", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "hashtag 157145\n", "url 88322\n", "mention 36815\n", "Name: entity_type, dtype: int64" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweets_meta[\"entity_type\"].value_counts()\n", "# tweets_meta[tweets_meta['entity_type'] == \"mention\"]" ] }, { "cell_type": "markdown", "id": "ef440301-cf89-4e80-8801-eb853d636190", "metadata": { "tags": [] }, "source": [ "Insgesamt haben wir 84794 einzigartige Tweets:" ] }, { "cell_type": "code", "execution_count": 8, "id": "5a438e7f-8735-40bb-b450-2ce168f0f67a", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "84794" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweets_combined[\"tweet_id\"].value_counts().shape[0] # Anzahl an Tweets" ] }, { "cell_type": "code", "execution_count": 9, "id": "4f1e8c6c-3610-436e-899e-4d0307259230", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Die Tweets wurden vom 2022-02-24 bis zum: 2023-03-16 gesammelt. Also genau insgesamt: 384 Tage.\n" ] } ], "source": [ "print(\"Die Tweets wurden vom \", tweets_combined['created_at'].min().date(), \"bis zum:\", tweets_combined['created_at'].max().date(), \"gesammelt.\", \"Also genau insgesamt:\", (tweets_combined['created_at'].max() - tweets_combined['created_at'].min()).days, \"Tage.\")\n", "# tweets_combined[tweets_combined['created_at'] == tweets_combined['created_at'].max()] # Tweets vom letzten Tag" ] }, { "cell_type": "markdown", "id": "d8b47a60-1535-4d03-913a-73e897bc18df", "metadata": { "tags": [] }, "source": [ "Welche Polizei Accounts haben am meisten getweetet?" ] }, { "cell_type": "code", "execution_count": 43, "id": "9373552e-6baf-46df-ae16-c63603e20a83", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
handlecountNameTypBundeslandStadtLATLONG
11polizei_ffm2993NaNNaNNaNNaNNaNNaN
3polizei_nrw_do2860Polizei NRW DOPolizeiNordrhein-WestfalenDortmund51.51422737.4652789
0polizeisachsen2700Polizei SachsenPolizeiSachsenDresden51.049328613.7381437
91polizeibb2310NaNNaNNaNNaNNaNNaN
61polizeihamburg2093Polizei HamburgPolizeiHamburgHamburg53.55034110.000654
\n", "
" ], "text/plain": [ " handle count Name Typ Bundesland \\\n", "11 polizei_ffm 2993 NaN NaN NaN \n", "3 polizei_nrw_do 2860 Polizei NRW DO Polizei Nordrhein-Westfalen \n", "0 polizeisachsen 2700 Polizei Sachsen Polizei Sachsen \n", "91 polizeibb 2310 NaN NaN NaN \n", "61 polizeihamburg 2093 Polizei Hamburg Polizei Hamburg \n", "\n", " Stadt LAT LONG \n", "11 NaN NaN NaN \n", "3 Dortmund 51.5142273 7.4652789 \n", "0 Dresden 51.0493286 13.7381437 \n", "91 NaN NaN NaN \n", "61 Hamburg 53.550341 10.000654 " ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweets_agg = tweets_combined.merge(tweets_user,\n", " on = \"user_id\"\n", " ).groupby(by = [\"user_id\", \"handle\", \"username\"]\n", " )[\"user_id\"].aggregate(['count']\n", " ).merge(police_stations, \n", " on = \"handle\",\n", " how = \"left\"\n", " ).sort_values(['count'], \n", " ascending=False)\n", "tweets_agg.shape\n", "activy_police_vis = tweets_agg[0:50]\n", "activy_police_vis.headd()" ] }, { "cell_type": "markdown", "id": "9cf5f544-706b-41af-b785-7023f04e3ecb", "metadata": { "tags": [] }, "source": [ "Visualisierung aktivste Polizeistadtionen:" ] }, { "cell_type": "code", "execution_count": 47, "id": "b1c39196-d1cc-4f82-8e01-7529e7b3046f", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "barchart = alt.Chart(activy_police_vis[0:15]).mark_bar().encode(\n", " x = 'count:Q',\n", " y = alt.Y('handle:N', sort = '-x'),\n", ")\n", "barchart " ] }, { "cell_type": "markdown", "id": "90f686ff-93c6-44d9-9761-feb35dfe9d1d", "metadata": { "tags": [] }, "source": [ "Welche Tweets ziehen besonders viel Aufmerksamkeit auf sich?" ] }, { "cell_type": "code", "execution_count": 90, "id": "d0549250-b11f-4762-8500-1134c53303b4", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "0 Die Gewalt, die unsere Kolleginnen & Kollegen in der Silvesternacht erleben mussten, ist une...\n", "1 WICHTIGE Info:\\nÜber das Internet wird derzeit ein Video verbreitet, in dem von einem Überfall a...\n", "2 Die Experten gehen derzeit davon aus, dass es sich um ein absichtliches \"Fake-Video\" handelt, da...\n", "3 Auf unserem #A45 in #lichterfelde) befindet sich gerade diese Fundhündin. Sie wurde am Hindenbur...\n", "4 @nexta_tv Wir haben das Video gesichert und leiten den Sachverhalt an die zuständigen Kolleginne...\n", " ... \n", "84789 #Polizeimeldungen #Tagesticker\\n \\nAnhalt-Bitterfeld\\nhttps://t.co/tNLEzztL1o\\n \\nDessau-Roßlau\\...\n", "84790 Am Mittwoch erhielten wir mehrere Anrufe über einen auffälligen Pkw-Fahrer (Reifen quietschen un...\n", "84791 @Jonas5Luisa Kleiner Pro-Tipp von uns: Einfach mal auf den link klicken! ;)*cl\n", "84792 Vermisstensuche nach 27-Jährigem aus Bendorf-Mühlhofen: Wer hat Tobias Wißmann gesehen? Ein Foto...\n", "84793 #PolizeiNRW #Köln #Leverkusen : XXX - Infos unter https://t.co/SeWShP2tZE https://t.co/Kopy7w8W3B\n", "Name: tweet_text, Length: 84794, dtype: object" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweets_attention = tweets_combined.merge(tweets_user,\n", " on = \"user_id\",\n", " how = \"left\"\n", " ).merge(police_stations,\n", " on = \"handle\",\n", " how = \"left\")\n", "pd.options.display.max_colwidth = 100\n", "tweets_attention.sort_values('like_count', ascending = False).reset_index()['tweet_text']\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "python-scientific kernel", "language": "python", "name": "python-scientific" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.9" } }, "nbformat": 4, "nbformat_minor": 5 }