{ "cells": [ { "cell_type": "markdown", "id": "ecb90936-f52e-4e7d-8a8c-a5ef58ad5bd0", "metadata": {}, "source": [ "
\n", "

COPBIRD – TEAM 21

\n", "
" ] }, { "cell_type": "markdown", "id": "a874edd0-07fa-4db0-ab10-6e0c68f81d6a", "metadata": {}, "source": [ "**What is CopBird?** It's a project that evaluates the behavior of the German police on Twitter. This jupyter notebook was created during the hackathon from May 21 to May 23, 2021. More information on the project can be found [here](https://copbird.org/).\n", "\n", "**Where can I get the data?** Unfortunately, the full data is not published because its usage is restricted to scientific research only. Nevertheless, the tweet IDs can be downloaded [here](https://copbird.org/assets/tweet_id.csv).\n", "\n", "**Where should I place this notebook?** Please put this file in a directory that contains also a folder called \"data\" including all the necessary data in csv format. Your folder should look like this:\n", "\n", "```\n", ".\n", "├── charts # folder for results, will be created if not existing\n", "│   └── tweets-pro-woche # -- \" --\n", "├── copbird.ipynb # this file\n", "└── data # folder \"data\"\n", "    ├── copbird_table_entity.csv # necessary data files in csv format\n", "    ├── copbird_table_tweet.csv # -- \" --\n", "    ├── copbird_table_user.csv # -- \" --\n", "    └── polizei_accounts_geo.csv # -- \" --\n", "\n", "```\n", "\n", "**How can I use this notebook?** To make sure that everythink works properly, all cells should be run in order. Verbose comments should make it understandable for noobs.\n", "\n", "Caution: A message like this indicates if a cell will change your system, e.g. save image files or create folders! \n", "\n", "**Which libraries do I need?** You will need [pandas](https://pandas.pydata.org/) to analyze the data, [altair](https://altair-viz.github.io/) to visualize the data, [vega_datasets](https://github.com/vega/vega-datasets), and [pillow](https://python-pillow.org/), the fork of PIL, the Python Imaging Library. Please install them, e.g. by using the following command: `pip install pandas altair vega_datasets pillow`. Additionally, we will use the modules `os` and `glob` as parts of the standard library which do not need to be installed separately.\n", "\n", "**How can I change the view?** https://pandas.pydata.org/docs/user_guide/options.html" ] }, { "cell_type": "markdown", "id": "a21401e6-93e7-4b41-8c37-4c81cfacc896", "metadata": {}, "source": [ "
\n", "

0. Preparation

\n", "
" ] }, { "cell_type": "code", "execution_count": 191, "id": "24a00ce9-8147-4a32-9db9-eeea5caa0a48", "metadata": {}, "outputs": [], "source": [ "import pandas as pd # analysis\n", "import altair as alt # visualization \n", "\n", "import os # work with files and folders" ] }, { "cell_type": "code", "execution_count": 192, "id": "ba5baa54-4068-452a-9b62-8051ff7163a3", "metadata": {}, "outputs": [], "source": [ "# settings\n", "\n", "# suppress decimal places in floats (= keine Nachkommastellen anzeigen)\n", "pd.options.display.float_format = '{:,.0f}'.format\n", "\n", "# wrap text with no whitespace\n", "pd.set_option('display.max_colwidth', 0)" ] }, { "cell_type": "code", "execution_count": 1, "id": "374a0b10-938f-4cf7-aa6a-072d13442791", "metadata": { "tags": [] }, "outputs": [ { "ename": "NameError", "evalue": "name 'pd' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[1], line 2\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;66;03m# import datasets\u001b[39;00m\n\u001b[0;32m----> 2\u001b[0m entities \u001b[38;5;241m=\u001b[39m \u001b[43mpd\u001b[49m\u001b[38;5;241m.\u001b[39mread_csv(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mdata/copbird_table_entity.csv\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m 3\u001b[0m tweets \u001b[38;5;241m=\u001b[39m pd\u001b[38;5;241m.\u001b[39mread_csv(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mdata/copbird_table_tweet.csv\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m 4\u001b[0m users \u001b[38;5;241m=\u001b[39m pd\u001b[38;5;241m.\u001b[39mread_csv(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mdata/copbird_table_user.csv\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n", "\u001b[0;31mNameError\u001b[0m: name 'pd' is not defined" ] } ], "source": [ "# import datasets\n", "entities = pd.read_csv(\"data/copbird_table_entity.csv\")\n", "tweets = pd.read_csv(\"data/copbird_table_tweet.csv\")\n", "users = pd.read_csv(\"data/copbird_table_user.csv\")\n", "locations = pd.read_csv(\"data/polizei_accounts_geo.csv\", sep = \"\\t\")\n" ] }, { "cell_type": "markdown", "id": "69cfd2a4-db85-4c69-8b86-3682ebb735a1", "metadata": {}, "source": [ "
\n", "

1. Exploration

\n", "
" ] }, { "cell_type": "code", "execution_count": 194, "id": "d161f148-44b0-4ce0-98fc-91e2cd2df506", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "shape: 131424 rows, 3 columns\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idtagentity_type
01321021123463663616mahanna196mention
11321025127388188673bkamention
21321028108665950208StrupeitVolkermention
31321029199998656513bkamention
41321032307277443072Sitewindermention
\n", "
" ], "text/plain": [ " tweet_id tag entity_type\n", "0 1321021123463663616 mahanna196 mention \n", "1 1321025127388188673 bka mention \n", "2 1321028108665950208 StrupeitVolker mention \n", "3 1321029199998656513 bka mention \n", "4 1321032307277443072 Sitewinder mention " ] }, "execution_count": 194, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# explore entities\n", "print(f\"shape: {entities.shape[0]} rows, {entities.shape[1]} columns\")\n", "entities.head()" ] }, { "cell_type": "code", "execution_count": 195, "id": "59a7e6be-4983-4e42-b466-247971eb7fb8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "hashtag 71313\n", "url 35635\n", "mention 24476\n", "Name: entity_type, dtype: int64" ] }, "execution_count": 195, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# explore column entity_type of entities:\n", "# show all entity types and corresponding amount of values\n", "entities['entity_type'].value_counts()" ] }, { "cell_type": "code", "execution_count": 196, "id": "3bb8c9ac-f91b-4958-8941-da0cd031f0aa", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "shape: 45001 rows, 8 columns\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtweet_textcreated_atuser_idlike_countretweet_countreply_countquote_count
01321021123463663616@mahanna196 Da die Stadt keine Ausnahme für Radfahrer aufgeführt hat, gilt diese (Stand jetzt) auch für Radfahrer. *sr2020-10-27 09:29:137788954260072038402120
11321023114071969792#Zeugengesucht\\nDie Hintergründe zu dem Tötungsdelikt in #Gesundbrunnen sind bislang unklar. Unsere 6. #MoKo sucht daher nach Zeugen, die Hinweise zu der Tötung von Mila SIMIC geben können.\\n\\n☎️(030) 4664-911666\\n\\n#PM & Foto:\\nhttps://t.co/cwzVsRWdCN\\n\\n^tsm https://t.co/JdeEh04UAH2020-10-27 09:37:082397974054202441
21321025127388188673RT @bka: EUROPE´S MOST WANTED – Sexualstraftäter nach Vergewaltigung einer Minderjährigen gesucht! \\n➡️https://t.co/CoaTgx9qAR \\n➡️https://t.…2020-10-27 09:45:082397974054NaNNaNNaNNaN
31321028108665950208@StrupeitVolker Wir verstehen nicht so recht was Sie wollen, aber kennen Sie das mit dem Glashaus?2020-10-27 09:56:59281090238155230
41321029199998656513Wir unterstützen das @bka bei der #Öffentlichkeitsfahndung nach einem Tatverdächtigen zur Vergewaltigung einer Minderjährigen. Foto und Personenbeschreibung des Mannes finden Sie hier: https://t.co/YP8bLuakMF https://t.co/ooh75YQjgX2020-10-27 10:01:1922375838416950
\n", "
" ], "text/plain": [ " id \\\n", "0 1321021123463663616 \n", "1 1321023114071969792 \n", "2 1321025127388188673 \n", "3 1321028108665950208 \n", "4 1321029199998656513 \n", "\n", " tweet_text \\\n", "0 @mahanna196 Da die Stadt keine Ausnahme für Radfahrer aufgeführt hat, gilt diese (Stand jetzt) auch für Radfahrer. *sr \n", "1 #Zeugengesucht\\nDie Hintergründe zu dem Tötungsdelikt in #Gesundbrunnen sind bislang unklar. Unsere 6. #MoKo sucht daher nach Zeugen, die Hinweise zu der Tötung von Mila SIMIC geben können.\\n\\n☎️(030) 4664-911666\\n\\n#PM & Foto:\\nhttps://t.co/cwzVsRWdCN\\n\\n^tsm https://t.co/JdeEh04UAH \n", "2 RT @bka: EUROPE´S MOST WANTED – Sexualstraftäter nach Vergewaltigung einer Minderjährigen gesucht! \\n➡️https://t.co/CoaTgx9qAR \\n➡️https://t.… \n", "3 @StrupeitVolker Wir verstehen nicht so recht was Sie wollen, aber kennen Sie das mit dem Glashaus? \n", "4 Wir unterstützen das @bka bei der #Öffentlichkeitsfahndung nach einem Tatverdächtigen zur Vergewaltigung einer Minderjährigen. Foto und Personenbeschreibung des Mannes finden Sie hier: https://t.co/YP8bLuakMF https://t.co/ooh75YQjgX \n", "\n", " created_at user_id like_count retweet_count \\\n", "0 2020-10-27 09:29:13 778895426007203840 2 1 \n", "1 2020-10-27 09:37:08 2397974054 20 24 \n", "2 2020-10-27 09:45:08 2397974054 NaN NaN \n", "3 2020-10-27 09:56:59 2810902381 55 2 \n", "4 2020-10-27 10:01:19 223758384 16 9 \n", "\n", " reply_count quote_count \n", "0 2 0 \n", "1 4 1 \n", "2 NaN NaN \n", "3 3 0 \n", "4 5 0 " ] }, "execution_count": 196, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# explore tweets\n", "print(f\"shape: {tweets.shape[0]} rows, {tweets.shape[1]} columns\")\n", "tweets.head(5)" ] }, { "cell_type": "code", "execution_count": 197, "id": "4915825c-e527-474c-9fd8-f77180f7a6bc", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'@mahanna196 Da die Stadt keine Ausnahme für Radfahrer aufgeführt hat, gilt diese (Stand jetzt) auch für Radfahrer. *sr'" ] }, "execution_count": 197, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# show tweet example\n", "tweets['tweet_text'][0]" ] }, { "cell_type": "code", "execution_count": 198, "id": "1348d325-e4cd-4691-aac3-add03247c6e4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "shape: 161 rows, 3 columns\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idnamehandle
01032561433102434304Polizei WittlichPolizeiWittlich
11143867545226764293Bayerisches LandeskriminalamtLKA_Bayern
21169206134189830145Polizei StendalPolizei_SDL
31184024283342950401Polizei RavensburgPolizeiRV
41232548941889228808Polizei Bad NenndorfPolizei_BadN
\n", "
" ], "text/plain": [ " id name handle\n", "0 1032561433102434304 Polizei Wittlich PolizeiWittlich\n", "1 1143867545226764293 Bayerisches Landeskriminalamt LKA_Bayern \n", "2 1169206134189830145 Polizei Stendal Polizei_SDL \n", "3 1184024283342950401 Polizei Ravensburg PolizeiRV \n", "4 1232548941889228808 Polizei Bad Nenndorf Polizei_BadN " ] }, "execution_count": 198, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# explore users\n", "print(f\"shape: {users.shape[0]} rows, {users.shape[1]} columns\")\n", "users.head()" ] }, { "cell_type": "code", "execution_count": 199, "id": "b2626471-cece-4e31-bcfd-d5e86b5d9f2b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "shape: 163 rows, 7 columns\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Polizei AccountNameTypBundeslandStadtLATLONG
0bpol_11Bundespolizei SpezialkräfteBundespolizei---NaN
1bpol_bepoBundesbereitschaftspolizeiBundesbereitschaftspolizei----
2bpol_air_fraBundespolizei Flughafen Frankfurt am MainBundespolizeiHessenFrankfurt am Main50.11092218.6821267
3bpol_bBundespolizei BerlinBundespolizeiBerlinBerlin52.52000713.404954
4bpol_b_einsatzBundespolizei Berlin EinsatzBundespolizeiBerlinBerlin52.52000713.404954
\n", "
" ], "text/plain": [ " Polizei Account Name \\\n", "0 bpol_11 Bundespolizei Spezialkräfte \n", "1 bpol_bepo Bundesbereitschaftspolizei \n", "2 bpol_air_fra Bundespolizei Flughafen Frankfurt am Main \n", "3 bpol_b Bundespolizei Berlin \n", "4 bpol_b_einsatz Bundespolizei Berlin Einsatz \n", "\n", " Typ Bundesland Stadt LAT \\\n", "0 Bundespolizei - - - \n", "1 Bundesbereitschaftspolizei - - - \n", "2 Bundespolizei Hessen Frankfurt am Main 50.1109221 \n", "3 Bundespolizei Berlin Berlin 52.520007 \n", "4 Bundespolizei Berlin Berlin 52.520007 \n", "\n", " LONG \n", "0 NaN \n", "1 - \n", "2 8.6821267 \n", "3 13.404954 \n", "4 13.404954 " ] }, "execution_count": 199, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# explore locations\n", "print(f\"shape: {locations.shape[0]} rows, {locations.shape[1]} columns\")\n", "locations.head()" ] }, { "cell_type": "markdown", "id": "34f3bac2-342b-43c6-a931-35c1816a6cfc", "metadata": {}, "source": [ "
\n", "

2. Combine tweets and users to working dataframe df

\n", "
" ] }, { "cell_type": "code", "execution_count": 200, "id": "fe313e4e-8480-4777-949f-aa571c2a90f4", "metadata": {}, "outputs": [], "source": [ "# merge dataframes tweets and users\n", "df = tweets.merge(users, how = \"left\", left_on = \"user_id\", right_on=\"id\")" ] }, { "cell_type": "code", "execution_count": 201, "id": "5d6eede0-206c-466c-a3f1-a2defbddfc31", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_xtweet_textcreated_atuser_idlike_countretweet_countreply_countquote_countid_ynamehandle
01321021123463663616@mahanna196 Da die Stadt keine Ausnahme für Radfahrer aufgeführt hat, gilt diese (Stand jetzt) auch für Radfahrer. *sr2020-10-27 09:29:137788954260072038402120778895426007203840Polizei Oldenburg-Stadt/AmmerlPolizei_OL
\n", "
" ], "text/plain": [ " id_x \\\n", "0 1321021123463663616 \n", "\n", " tweet_text \\\n", "0 @mahanna196 Da die Stadt keine Ausnahme für Radfahrer aufgeführt hat, gilt diese (Stand jetzt) auch für Radfahrer. *sr \n", "\n", " created_at user_id like_count retweet_count \\\n", "0 2020-10-27 09:29:13 778895426007203840 2 1 \n", "\n", " reply_count quote_count id_y \\\n", "0 2 0 778895426007203840 \n", "\n", " name handle \n", "0 Polizei Oldenburg-Stadt/Ammerl Polizei_OL " ] }, "execution_count": 201, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# have a look at new dataframe\n", "df.head(1)" ] }, { "cell_type": "code", "execution_count": 202, "id": "acb08a67-2b97-4d19-9d5a-5da0770a8cc9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idtweet_textcreated_atuser_idlike_countretweet_countreply_countquote_countnamehandle
01321021123463663616@mahanna196 Da die Stadt keine Ausnahme für Radfahrer aufgeführt hat, gilt diese (Stand jetzt) auch für Radfahrer. *sr2020-10-27 09:29:137788954260072038402120Polizei Oldenburg-Stadt/AmmerlPolizei_OL
11321023114071969792#Zeugengesucht\\nDie Hintergründe zu dem Tötungsdelikt in #Gesundbrunnen sind bislang unklar. Unsere 6. #MoKo sucht daher nach Zeugen, die Hinweise zu der Tötung von Mila SIMIC geben können.\\n\\n☎️(030) 4664-911666\\n\\n#PM & Foto:\\nhttps://t.co/cwzVsRWdCN\\n\\n^tsm https://t.co/JdeEh04UAH2020-10-27 09:37:082397974054202441Polizei Berlinpolizeiberlin
\n", "
" ], "text/plain": [ " tweet_id \\\n", "0 1321021123463663616 \n", "1 1321023114071969792 \n", "\n", " tweet_text \\\n", "0 @mahanna196 Da die Stadt keine Ausnahme für Radfahrer aufgeführt hat, gilt diese (Stand jetzt) auch für Radfahrer. *sr \n", "1 #Zeugengesucht\\nDie Hintergründe zu dem Tötungsdelikt in #Gesundbrunnen sind bislang unklar. Unsere 6. #MoKo sucht daher nach Zeugen, die Hinweise zu der Tötung von Mila SIMIC geben können.\\n\\n☎️(030) 4664-911666\\n\\n#PM & Foto:\\nhttps://t.co/cwzVsRWdCN\\n\\n^tsm https://t.co/JdeEh04UAH \n", "\n", " created_at user_id like_count retweet_count \\\n", "0 2020-10-27 09:29:13 778895426007203840 2 1 \n", "1 2020-10-27 09:37:08 2397974054 20 24 \n", "\n", " reply_count quote_count name handle \n", "0 2 0 Polizei Oldenburg-Stadt/Ammerl Polizei_OL \n", "1 4 1 Polizei Berlin polizeiberlin " ] }, "execution_count": 202, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# necessary adjustments\n", "\n", "# rename columns\n", "df = df.rename(columns={\"id_x\": \"tweet_id\"})\n", "\n", "# drop duplicate columns\n", "df = df.drop(columns=\"id_y\")\n", "\n", "# show dataframe again\n", "df.head(2)" ] }, { "cell_type": "code", "execution_count": 203, "id": "eb599e3f-f4b7-4ba9-8417-44b3e02129de", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tweet_id int64 \n", "tweet_text object \n", "created_at object \n", "user_id int64 \n", "like_count float64\n", "retweet_count float64\n", "reply_count float64\n", "quote_count float64\n", "name object \n", "handle object \n", "dtype: object" ] }, "execution_count": 203, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# show datatypes of new dataframe\n", "df.dtypes" ] }, { "cell_type": "code", "execution_count": 204, "id": "c88eed1b-1792-4e06-94ea-c42bff1d75dc", "metadata": {}, "outputs": [], "source": [ "# convert date column to datetime format\n", "df['created_at'] = pd.to_datetime(df['created_at'])" ] }, { "cell_type": "code", "execution_count": 205, "id": "4eae0e3d-04a7-4ebe-9d9b-595dff0f336a", "metadata": {}, "outputs": [], "source": [ "# add location details\n", "\n", "# preparation: necessary because values are spelled differently in columns needed for merge\n", "locations['Polizei Account'] = locations[\"Polizei Account\"].str.replace(' ', '') # delete spaces \n", "df['handle'] = df['handle'].str.lower() # convert everything to lower case\n", "\n", "# merge tables\n", "df = df.merge(locations, how = \"left\", left_on = \"handle\", right_on=\"Polizei Account\")" ] }, { "cell_type": "code", "execution_count": 206, "id": "544aa48c-ce72-403b-a841-8e0ebd54b9bb", "metadata": {}, "outputs": [], "source": [ "# add column with week number\n", "df['week'] = df['created_at'].dt.isocalendar().week" ] }, { "cell_type": "code", "execution_count": 207, "id": "8bba2fc0-5743-4f00-948b-f08939f36c3b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idtweet_textcreated_atuser_idlike_countretweet_countreply_countquote_countnamehandlePolizei AccountNameTypBundeslandStadtLATLONGweek
01321021123463663616@mahanna196 Da die Stadt keine Ausnahme für Radfahrer aufgeführt hat, gilt diese (Stand jetzt) auch für Radfahrer. *sr2020-10-27 09:29:137788954260072038402120Polizei Oldenburg-Stadt/Ammerlpolizei_olpolizei_olPolizei Oldenburg-Stadt/AmmerlandPolizeiNiedersachsenOldenburg53.13897538.214601744
\n", "
" ], "text/plain": [ " tweet_id \\\n", "0 1321021123463663616 \n", "\n", " tweet_text \\\n", "0 @mahanna196 Da die Stadt keine Ausnahme für Radfahrer aufgeführt hat, gilt diese (Stand jetzt) auch für Radfahrer. *sr \n", "\n", " created_at user_id like_count retweet_count \\\n", "0 2020-10-27 09:29:13 778895426007203840 2 1 \n", "\n", " reply_count quote_count name handle \\\n", "0 2 0 Polizei Oldenburg-Stadt/Ammerl polizei_ol \n", "\n", " Polizei Account Name Typ Bundesland \\\n", "0 polizei_ol Polizei Oldenburg-Stadt/Ammerland Polizei Niedersachsen \n", "\n", " Stadt LAT LONG week \n", "0 Oldenburg 53.1389753 8.2146017 44 " ] }, "execution_count": 207, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# show new dataframe\n", "df.head(1)" ] }, { "cell_type": "markdown", "id": "b216163f-95af-4ece-a743-a2299390578d", "metadata": {}, "source": [ "
\n", "

3. Analyze: Welches sind die 50 aktivsten Polizei-Accounts?

\n", "
" ] }, { "cell_type": "code", "execution_count": 208, "id": "e4ee6e95-636e-42fd-8c34-33813f8f518a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namehandleuser_idtweet_count
0Bayerisches Landeskriminalamtlka_bayern114386754522676429384
1Bundesbereitschaftspolizeibpol_bepo487607857029
2Bundespolizei Baden-Württemberbpol_bw3169257933488
3Bundespolizei Bayernbpol_by3169867654285
4Bundespolizei Berlinbpol_b4876039738115
\n", "
" ], "text/plain": [ " name handle user_id \\\n", "0 Bayerisches Landeskriminalamt lka_bayern 1143867545226764293 \n", "1 Bundesbereitschaftspolizei bpol_bepo 4876078570 \n", "2 Bundespolizei Baden-Württember bpol_bw 3169257933 \n", "3 Bundespolizei Bayern bpol_by 3169867654 \n", "4 Bundespolizei Berlin bpol_b 4876039738 \n", "\n", " tweet_count \n", "0 84 \n", "1 29 \n", "2 488 \n", "3 285 \n", "4 115 " ] }, "execution_count": 208, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# prepare dataframe for visualization\n", "df_vis = df.groupby(['name', 'handle', 'user_id']).agg({\"tweet_id\": 'count'}).reset_index()\n", "\n", "# rename columns\n", "df_vis = df_vis.rename(columns = {'tweet_id': 'tweet_count'})\n", "\n", "# show df_vis\n", "df_vis.head()" ] }, { "cell_type": "code", "execution_count": 209, "id": "b2f82d88-4798-4dbd-8785-ffeddb7f44bc", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "161" ] }, "execution_count": 209, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# how many accounts are in dataset?\n", "df_vis.shape[0]" ] }, { "cell_type": "code", "execution_count": 210, "id": "ab29714f-aae3-4726-9b79-472c84fe7f28", "metadata": {}, "outputs": [], "source": [ "# only use 50 accounts with most tweets in dataset \n", "df_vis = df_vis.sort_values(by='tweet_count', ascending = False)[0:50]" ] }, { "cell_type": "markdown", "id": "d576b91a-dc34-4e8c-b6d6-9066ee6322df", "metadata": {}, "source": [ "Caution: If you remove the '#' symbols in lines 2,3 and 16, the following code will save a png file called \"barchart_most_active_50\" in a new folder named \"charts\". If you don't change anything, the chart will be shown in this notebook. " ] }, { "cell_type": "code", "execution_count": null, "id": "d4143652-c694-44ac-97ba-4c5f18b45191", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 211, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# create folder if not already exists\n", "#if not os.path.exists('charts'):\n", " #os.makedirs('charts')\n", "\n", "# draw bar chart\n", "bar = alt.Chart(df_vis).mark_bar().encode(\n", " x=alt.X('tweet_count:Q'),\n", " y=alt.Y('name:O', sort='-x'),\n", " tooltip = 'tweet_count'\n", ")\n", "\n", "rule = alt.Chart(df_vis).mark_rule(color='red').encode(\n", " x='mean(tweet_count):Q'\n", ")\n", "\n", "(bar + rule).properties(width=600)#.save(\"barchart_most_active_50.png\", format = \"png\")" ] }, { "cell_type": "code", "execution_count": 212, "id": "714ffb8b-61a2-4597-bc8d-d280c0edffdc", "metadata": {}, "outputs": [], "source": [ "# create list with 50 accounts with most tweets for later usage\n", "top_50 = list(df_vis.user_id.unique())\n", "\n", "# create dataset only of 50 top accounts\n", "df_50 = df[df['user_id'].isin(top_50)==True]" ] }, { "cell_type": "markdown", "id": "78119358-efe5-4c22-b7ea-b04c9eafb66d", "metadata": {}, "source": [ "
\n", "

4. Analyze: Welche Dienststelle setzt wann wie viele Tweets ab?

\n", "
" ] }, { "cell_type": "code", "execution_count": 213, "id": "212b4468-7bcb-4dce-9be9-d5b83b44cb28", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
handleuser_idcreated_attweet_id
count
0bpol_bw31692579332020-11-09 06:47:091
1bpol_bw31692579332020-11-09 09:03:031
2bpol_bw31692579332020-11-09 09:13:181
3bpol_bw31692579332020-11-09 09:24:051
4bpol_bw31692579332020-11-09 14:58:431
\n", "
" ], "text/plain": [ " handle user_id created_at tweet_id\n", " count\n", "0 bpol_bw 3169257933 2020-11-09 06:47:09 1 \n", "1 bpol_bw 3169257933 2020-11-09 09:03:03 1 \n", "2 bpol_bw 3169257933 2020-11-09 09:13:18 1 \n", "3 bpol_bw 3169257933 2020-11-09 09:24:05 1 \n", "4 bpol_bw 3169257933 2020-11-09 14:58:43 1 " ] }, "execution_count": 213, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# limit to 50 most active accounts\n", "df_vis = df_50[['created_at', 'user_id', 'handle', 'tweet_id']]\n", "\n", "# count tweets over time\n", "df_vis = df_vis.groupby(['handle', 'user_id', 'created_at']).agg({\"tweet_id\": ['count']}).reset_index()\n", "\n", "# have a look at new created df_vis\n", "df_vis.head()" ] }, { "cell_type": "code", "execution_count": 214, "id": "f12e0efb-66a9-4cad-a5dc-5179abe84ae0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
handleuser_idcreated_attweet_countweek
0bpol_bw31692579332020-11-09 06:47:09146
1bpol_bw31692579332020-11-09 09:03:03146
2bpol_bw31692579332020-11-09 09:13:18146
3bpol_bw31692579332020-11-09 09:24:05146
4bpol_bw31692579332020-11-09 14:58:43146
\n", "
" ], "text/plain": [ " handle user_id created_at tweet_count week\n", "0 bpol_bw 3169257933 2020-11-09 06:47:09 1 46 \n", "1 bpol_bw 3169257933 2020-11-09 09:03:03 1 46 \n", "2 bpol_bw 3169257933 2020-11-09 09:13:18 1 46 \n", "3 bpol_bw 3169257933 2020-11-09 09:24:05 1 46 \n", "4 bpol_bw 3169257933 2020-11-09 14:58:43 1 46 " ] }, "execution_count": 214, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# rename columns\n", "df_vis.columns = ['handle', 'user_id', 'created_at', 'tweet_count']\n", "\n", "# add week column\n", "df_vis['week'] = df_vis['created_at'].dt.isocalendar().week\n", "\n", "# again show df_vis\n", "df_vis.head()" ] }, { "cell_type": "code", "execution_count": 215, "id": "a2e329c1-50e7-4d27-87d7-fbe527463a0c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
handleuser_idweektweet_count
0bpol_bw316925793316
1bpol_bw316925793323
2bpol_bw3169257933333
3bpol_bw3169257933426
4bpol_bw316925793357
\n", "
" ], "text/plain": [ " handle user_id week tweet_count\n", "0 bpol_bw 3169257933 1 6 \n", "1 bpol_bw 3169257933 2 3 \n", "2 bpol_bw 3169257933 3 33 \n", "3 bpol_bw 3169257933 4 26 \n", "4 bpol_bw 3169257933 5 7 " ] }, "execution_count": 215, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# group by week to get number of tweets per week\n", "df_vis = df_vis.groupby(['handle', 'user_id', 'week']).agg({'tweet_count': 'count'}).reset_index()\n", "\n", "# again show df_vis\n", "df_vis.head()" ] }, { "cell_type": "markdown", "id": "e8680194-ead5-4ba2-9ea5-2f8d61f48a0b", "metadata": {}, "source": [ "Caution: If you remove the '#' symbols in lines 2,3 and 7, the following code will save a png file called \"barchart_most_active_50\" in a folder named \"charts\". If you don't change anything, the chart will be shown in this notebook. (Press shift+L to show line numbers.) " ] }, { "cell_type": "code", "execution_count": 216, "id": "ca1068f1-faab-4504-b0c9-fa1d758d574f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 216, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# create folder if not already exists\n", "#if not os.path.exists('charts'):\n", "# os.makedirs('charts')\n", "\n", "# show chart\n", "alt.Chart(df_vis).mark_line().encode(\n", " x='week',\n", " y=alt.Y('tweet_count'),\n", " color = 'handle',\n", " tooltip = ['tweet_count','user_id', 'handle', 'week']\n", ").interactive().properties(width=800)#.save(\"charts/aktive-nach-wochen.png\", format = 'png')" ] }, { "cell_type": "markdown", "id": "70a8d3dc-5703-4eb0-80e8-ccde16a9255e", "metadata": {}, "source": [ "**Achtung: Darstellung nicht ideal, da Werte zwischen KW 19 und 44 nicht existieren. Außerdem beziehen sich KW 44-53 auf das Jahr 2020, 1-19 auf das Jahr 2021**" ] }, { "cell_type": "markdown", "id": "ec2a02ac-c004-4713-8fa5-c0a2449c1917", "metadata": {}, "source": [ "**Durch die Exploration des Line Charts über Tooltip-Anzeigen ergeben sich weitere Fragen:**\n", "\n", "* Was war in KW 5 und 13 und 47 in Karlsruhe los?\n", "* Was war in KW 5 und 18, 45 und 50 Frankfurt a.M. los?\n", "* Was war in KW 9 in Dortmund los?\n", "* Was war in KW 12 und KW 14 in Mannheim los?\n", "* Was war in KW 17 in Sachsen los?\n", "* Was war in KW 46 in Mülheim an der Ruhr los?\n", "* Was war in KW 49 in Bremen los?\n", "* Was war in KW 49 in Gelsenkirchen los?" ] }, { "cell_type": "markdown", "id": "92585603-ee61-4cb3-bedc-ae6e863c8492", "metadata": {}, "source": [ "
\n", "

4. Analyze: Was war los in Karlsruhe (in den Kalenderwochen 5, 13, 47)?

\n", "
" ] }, { "cell_type": "code", "execution_count": 217, "id": "bb918908-1f9e-49bc-b99f-75c5ec36fd30", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idtweet_textcreated_atuser_idlike_countretweet_countreply_countquote_countnamehandlePolizei AccountNameTypBundeslandStadtLATLONGweek
1091321119171825012736Die #Staatsanwaltschaft Ka hat am Sa bzw. So beim zuständigen Amtsgericht #Haftbefehle gegen zwei Männer erwirkt. Dem 18-Jährigen wird versuchter Totschlag vorgeworfen, dem 19-Jährigen gefährliche Körperverletzung. Zur PM: https://t.co/4MrESOTo3b\\n\\nEure #Polizei #Karlsruhe https://t.co/RZwXmI3VPf2020-10-27 15:58:503029998264NaNNaNNaNNaNPolizei Karlsruhepolizei_kapolizei_kaPolizei KarlsruhePolizeiBaden-WürttembergKarlsruhe49.00687058.403419544
\n", "
" ], "text/plain": [ " tweet_id \\\n", "109 1321119171825012736 \n", "\n", " tweet_text \\\n", "109 Die #Staatsanwaltschaft Ka hat am Sa bzw. So beim zuständigen Amtsgericht #Haftbefehle gegen zwei Männer erwirkt. Dem 18-Jährigen wird versuchter Totschlag vorgeworfen, dem 19-Jährigen gefährliche Körperverletzung. Zur PM: https://t.co/4MrESOTo3b\\n\\nEure #Polizei #Karlsruhe https://t.co/RZwXmI3VPf \n", "\n", " created_at user_id like_count retweet_count reply_count \\\n", "109 2020-10-27 15:58:50 3029998264 NaN NaN NaN \n", "\n", " quote_count name handle Polizei Account \\\n", "109 NaN Polizei Karlsruhe polizei_ka polizei_ka \n", "\n", " Name Typ Bundesland Stadt LAT \\\n", "109 Polizei Karlsruhe Polizei Baden-Württemberg Karlsruhe 49.0068705 \n", "\n", " LONG week \n", "109 8.4034195 44 " ] }, "execution_count": 217, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# filter dataset of 50 most active accounts, only include rows where value in 'handle column' is 'polizei_ka'\n", "df_vis = df_50[df_50['handle']=='polizei_ka']\n", "\n", "# have a look at dataframe\n", "df_vis.head(1)" ] }, { "cell_type": "code", "execution_count": 218, "id": "f182e57a-6181-4a43-b7b0-a0656c86f1e9", "metadata": {}, "outputs": [], "source": [ "# create function to create new dataframes filtered by week\n", "def create_df_by_week(df,week):\n", " \n", " # create dataframe for selected week of input df\n", " df = df[df['week']==week]\n", " \n", " # \n", " df = df[['tweet_id', 'created_at', 'tweet_text', 'like_count', 'retweet_count', 'reply_count', 'quote_count']]\n", " \n", " df = df.rename(columns = {'like_count': 'likes', \n", " 'retweet_count': 'retweets', \n", " 'replie_count': 'replies',\n", " 'quote_count': 'quotes'})\n", " \n", " return df" ] }, { "cell_type": "markdown", "id": "0fe3234a-fc7d-4d0c-a3bc-5d791eac4963", "metadata": {}, "source": [ "KW 5" ] }, { "cell_type": "code", "execution_count": 219, "id": "e7f92fb7-f0d7-4bb5-be3a-cf70a425956e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "shape: 115 columns, 7 rows\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idcreated_attweet_textlikesretweetsreply_countquotes
2130413561482966544793612021-02-01 07:52:04@LaPapper Der Tweet wurde gelöscht, wir können leider nicht mehr sehen, auf was Sie sich bezogen haben 😅0000
2142213561954684060876842021-02-01 10:59:31#GeschädigterGesucht: Ein alkoholisierter 43-Jähriger hat am Samstagmittag in einer Bahn zwischen #Kronenplatz & #Marktplatz einen älteren Fahrgast angegriffen. Zwei jugendliche Mädchen griffen zum Glück ein. #ZivileHelden\\n\\nPM: https://t.co/8qUfvYSBoH\\n\\nEure #Polizei #Karlsruhe https://t.co/depnkQrYF514101
\n", "
" ], "text/plain": [ " tweet_id created_at \\\n", "21304 1356148296654479361 2021-02-01 07:52:04 \n", "21422 1356195468406087684 2021-02-01 10:59:31 \n", "\n", " tweet_text \\\n", "21304 @LaPapper Der Tweet wurde gelöscht, wir können leider nicht mehr sehen, auf was Sie sich bezogen haben 😅 \n", "21422 #GeschädigterGesucht: Ein alkoholisierter 43-Jähriger hat am Samstagmittag in einer Bahn zwischen #Kronenplatz & #Marktplatz einen älteren Fahrgast angegriffen. Zwei jugendliche Mädchen griffen zum Glück ein. #ZivileHelden\\n\\nPM: https://t.co/8qUfvYSBoH\\n\\nEure #Polizei #Karlsruhe https://t.co/depnkQrYF5 \n", "\n", " likes retweets reply_count quotes \n", "21304 0 0 0 0 \n", "21422 14 1 0 1 " ] }, "execution_count": 219, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# create dataframe\n", "df_ka_5 = create_df_by_week(df_vis,5)\n", "\n", "# print shape\n", "print(f\"shape: {df_ka_5.shape[0]} columns, {df_ka_5.shape[1]} rows\")\n", "\n", "# have a look at dataframe\n", "df_ka_5.head(2)" ] }, { "cell_type": "code", "execution_count": 220, "id": "31034da0-d624-4bbf-88ea-127a28283d1e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 220, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# show chart\n", "alt.Chart(df_ka_5).mark_circle(size=60).encode(\n", " x='created_at',\n", " y='likes:Q',\n", " tooltip=['tweet_id:N','tweet_text:N','likes:Q', 'created_at:T'],\n", " color = alt.Color('created_at', scale=alt.Scale(scheme='inferno'), legend=None),\n", ").interactive().properties(width=600) # .save('charts/df_ka_5.html', format = 'html')" ] }, { "cell_type": "markdown", "id": "7174e71d-e465-4f48-ba01-35169a0f95db", "metadata": {}, "source": [ "KW 13" ] }, { "cell_type": "code", "execution_count": 221, "id": "2475abd0-0154-428f-97fb-80e75de6d53e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "shape: 130 columns, 7 rows\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idcreated_attweet_textlikesretweetsreply_countquotes
3398513764219941331271682021-03-29 06:32:30Wir setzen mit unserer Kampagne „NICHT BEI UNS!“ ein klares Zeichen ⚠️ gegen #Diskriminierung und #Extremismus. Das Thema betrifft uns alle. Schaut Euch den ersten Clip an!“ #NICHTBEIUNS! #PolizeiBW Link zur Pressemitteilung: https://t.co/D1yLwdnBmS https://t.co/rgx5mksK0S19417160116
3399913764252884359577602021-03-29 06:45:36@filderbussard Normalerweise nicht, aber das gleicht sich ja über die Jahre so oder so aus 😊1000
\n", "
" ], "text/plain": [ " tweet_id created_at \\\n", "33985 1376421994133127168 2021-03-29 06:32:30 \n", "33999 1376425288435957760 2021-03-29 06:45:36 \n", "\n", " tweet_text \\\n", "33985 Wir setzen mit unserer Kampagne „NICHT BEI UNS!“ ein klares Zeichen ⚠️ gegen #Diskriminierung und #Extremismus. Das Thema betrifft uns alle. Schaut Euch den ersten Clip an!“ #NICHTBEIUNS! #PolizeiBW Link zur Pressemitteilung: https://t.co/D1yLwdnBmS https://t.co/rgx5mksK0S \n", "33999 @filderbussard Normalerweise nicht, aber das gleicht sich ja über die Jahre so oder so aus 😊 \n", "\n", " likes retweets reply_count quotes \n", "33985 194 17 160 116 \n", "33999 1 0 0 0 " ] }, "execution_count": 221, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# create dataframe\n", "df_ka_13 = create_df_by_week(df_vis,13)\n", "\n", "# print shape\n", "print(f\"shape: {df_ka_13.shape[0]} columns, {df_ka_13.shape[1]} rows\")\n", "\n", "# have a look at dataframe\n", "df_ka_13.head(2)" ] }, { "cell_type": "code", "execution_count": 222, "id": "f42d466a-ec1d-48c9-a479-2bd46952ac3f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 222, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# show chart \n", "alt.Chart(df_ka_13).mark_circle(size=60).encode(\n", " x='created_at',\n", " y='likes',\n", " tooltip=['tweet_id','tweet_text','likes', 'created_at'],\n", " color = alt.Color('created_at', scale=alt.Scale(scheme='inferno'), legend=None),\n", ").interactive().properties(width=600)" ] }, { "cell_type": "markdown", "id": "4008a4f6-1ce1-41b4-9f6b-f0572aef1255", "metadata": {}, "source": [ "KW 47" ] }, { "cell_type": "code", "execution_count": 223, "id": "b1c99acc-40ea-4819-8f32-06cbf80f8a32", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "shape: 115 columns, 7 rows\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idcreated_attweet_textlikesretweetsreply_countquotes
2130413561482966544793612021-02-01 07:52:04@LaPapper Der Tweet wurde gelöscht, wir können leider nicht mehr sehen, auf was Sie sich bezogen haben 😅0000
2142213561954684060876842021-02-01 10:59:31#GeschädigterGesucht: Ein alkoholisierter 43-Jähriger hat am Samstagmittag in einer Bahn zwischen #Kronenplatz & #Marktplatz einen älteren Fahrgast angegriffen. Zwei jugendliche Mädchen griffen zum Glück ein. #ZivileHelden\\n\\nPM: https://t.co/8qUfvYSBoH\\n\\nEure #Polizei #Karlsruhe https://t.co/depnkQrYF514101
\n", "
" ], "text/plain": [ " tweet_id created_at \\\n", "21304 1356148296654479361 2021-02-01 07:52:04 \n", "21422 1356195468406087684 2021-02-01 10:59:31 \n", "\n", " tweet_text \\\n", "21304 @LaPapper Der Tweet wurde gelöscht, wir können leider nicht mehr sehen, auf was Sie sich bezogen haben 😅 \n", "21422 #GeschädigterGesucht: Ein alkoholisierter 43-Jähriger hat am Samstagmittag in einer Bahn zwischen #Kronenplatz & #Marktplatz einen älteren Fahrgast angegriffen. Zwei jugendliche Mädchen griffen zum Glück ein. #ZivileHelden\\n\\nPM: https://t.co/8qUfvYSBoH\\n\\nEure #Polizei #Karlsruhe https://t.co/depnkQrYF5 \n", "\n", " likes retweets reply_count quotes \n", "21304 0 0 0 0 \n", "21422 14 1 0 1 " ] }, "execution_count": 223, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# create dataframe\n", "df_ka_47 = create_df_by_week(df_vis,5)\n", "\n", "# print shape\n", "print(f\"shape: {df_ka_47.shape[0]} columns, {df_ka_47.shape[1]} rows\")\n", "\n", "# have a look at dataframe\n", "df_ka_47.head(2)" ] }, { "cell_type": "code", "execution_count": 224, "id": "86f5ae85-c3d6-46ef-abc7-12379aa9cf7f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 224, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# show chart\n", "alt.Chart(df_ka_47).mark_circle(size=60).encode(\n", " x='created_at',\n", " y='likes',\n", " tooltip=['tweet_id','tweet_text','likes:Q', 'created_at'],\n", " color = alt.Color('created_at', scale=alt.Scale(scheme='inferno'), legend=None),\n", ").interactive().properties(width=600)" ] }, { "cell_type": "markdown", "id": "2356a635-cf76-46a6-a59b-fc6652ea1006", "metadata": {}, "source": [ "
\n", "

5. Create map: Wann twitterte welche Polizei wie viel?

\n", "
" ] }, { "cell_type": "code", "execution_count": 225, "id": "b97b26f7-2fa4-4790-9af7-3ae4e0219508", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Polizei 3452\n", "Bundespolizei 228 \n", "Landeskriminalamt 106 \n", "Polizeipräsidium 35 \n", "Bundesbereitschaftspolizei 10 \n", "Name: Typ, dtype: int64" ] }, "execution_count": 225, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# add column containing year\n", "df_cities = df\n", "df_cities['year'] = df['created_at'].dt.isocalendar().year\n", "\n", "# count tweets per city and week\n", "df_cities = df_cities.groupby(['name', 'handle', 'Typ', 'Bundesland', 'Stadt', 'LAT', 'LONG', 'year', 'week']).agg({'tweet_id': 'count'}).reset_index()\n", "\n", "# show available types and how many of them exist in dataframe\n", "df_cities['Typ'].value_counts()" ] }, { "cell_type": "code", "execution_count": 226, "id": "bebed7d8-f98c-46e4-b123-47b2d48ffc0a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namehandleTypBundeslandStadtLATLONGyearweektweet_id
344Polizei AalenpolizeiaalenPolizeiBaden-WürttembergAalen48.83668910.09711620204410
345Polizei AalenpolizeiaalenPolizeiBaden-WürttembergAalen48.83668910.0971162020456
346Polizei AalenpolizeiaalenPolizeiBaden-WürttembergAalen48.83668910.0971162020465
347Polizei AalenpolizeiaalenPolizeiBaden-WürttembergAalen48.83668910.0971162020474
348Polizei AalenpolizeiaalenPolizeiBaden-WürttembergAalen48.83668910.0971162020487
\n", "
" ], "text/plain": [ " name handle Typ Bundesland Stadt \\\n", "344 Polizei Aalen polizeiaalen Polizei Baden-Württemberg Aalen \n", "345 Polizei Aalen polizeiaalen Polizei Baden-Württemberg Aalen \n", "346 Polizei Aalen polizeiaalen Polizei Baden-Württemberg Aalen \n", "347 Polizei Aalen polizeiaalen Polizei Baden-Württemberg Aalen \n", "348 Polizei Aalen polizeiaalen Polizei Baden-Württemberg Aalen \n", "\n", " LAT LONG year week tweet_id \n", "344 48.836689 10.097116 2020 44 10 \n", "345 48.836689 10.097116 2020 45 6 \n", "346 48.836689 10.097116 2020 46 5 \n", "347 48.836689 10.097116 2020 47 4 \n", "348 48.836689 10.097116 2020 48 7 " ] }, "execution_count": 226, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# remove tweets that have unwanted types (~ means not)\n", "df_cities = df_cities[~df_cities['Typ'].isin([\"Landeskriminalamt\", \"Bundesbereitschaftspolizei\", \"Bundespolizei\"])]\n", "\n", "# have a look at dataframe\n", "df_cities.head()" ] }, { "cell_type": "code", "execution_count": 227, "id": "9dc7cdc4-85e5-4144-a5e8-095c41cdf79b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "29" ] }, "execution_count": 227, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# how many weeks do have data? \n", "len(df_cities['week'].unique())" ] }, { "cell_type": "markdown", "id": "cfacf660-b159-4c29-8203-65934017ffba", "metadata": {}, "source": [ "Caution: The following code will create a subfolder in a folder called \"charts\" and save images in png format there! " ] }, { "cell_type": "code", "execution_count": 229, "id": "32a15314-f700-4481-90b1-1445907e0c98", "metadata": {}, "outputs": [], "source": [ "# create folders if they do not already exist\n", "if not os.path.exists('charts/tweets-pro-woche'):\n", " os.makedirs('charts/tweets-pro-woche')\n", "\n", "# load world map\n", "from vega_datasets import data\n", "\n", "# create and export png maps\n", "for i in range(1,54):\n", " \n", " # filter df_cities by week and save to dataframe \"tweet_count\"\n", " tweet_count = df_cities[df_cities['week'] == i].reset_index()\n", " tweet_count = tweet_count.rename(columns=({'tweet_id': 'Anzahl Tweets'}))\n", " \n", " try:\n", " # get year if data available, else pass\n", " year = tweet_count['year'][0]\n", " except:\n", " pass\n", "\n", " # save geodata from vega_datasets to variable \"countries\"\n", " countries = alt.topo_feature(data.world_110m.url, 'countries')\n", " \n", " # define basic values appropriate for map of Germany\n", " projection = 'mercator' # select Mercator projection\n", " scale = 1800 # Magnify\n", " center = [10,51.5] # [lon, lat]\n", " clip_extent = [[0, 0], [600, 600]] # [[left, top], [right, bottom]]\n", "\n", " # create background map\n", " background = alt.Chart(countries).mark_geoshape(\n", " fill='lightgray',\n", " stroke='white'\n", " ).project(\n", " type = projection,\n", " scale = scale, \n", " center = center, \n", " clipExtent= clip_extent, \n", " ).properties(\n", " title=f'So viel twitterte die Polizei im Jahr {year} in Kalenderwoche {i}',\n", " width=600, height=600\n", " )\n", "\n", " # create points\n", " points = alt.Chart(tweet_count).mark_circle().encode(\n", " longitude='LONG:Q',\n", " latitude='LAT:Q',\n", " size=alt.Size('Anzahl Tweets:Q'),\n", " color=alt.Color('week', scale=alt.Scale(domain=['week'], range=['#154889']), legend=None),\n", " tooltip=['handle:N','name:N','Stadt:N','Anzahl Tweets:Q','LAT:Q','LONG:Q'],\n", " ).project(\n", " type= projection,\n", " scale= scale,\n", " center= center,\n", " clipExtent= clip_extent,\n", " )\n", "\n", " # export background map and points to png files in subfolders\n", " (background + points).save(f\"charts/tweets-pro-woche/pol_cities_kw-{i:02d}.png\", format = 'png') " ] }, { "cell_type": "code", "execution_count": 230, "id": "42694a70-8225-43cd-a1eb-3bbc9c4ae984", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53]\n" ] }, { "data": { "text/plain": [ "['charts/tweets-pro-woche/pol_cities_kw-01.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-02.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-03.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-04.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-05.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-06.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-07.png']" ] }, "execution_count": 230, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# print every week for which data is available\n", "list_weeks_with_data = sorted(df_cities['week'].unique())\n", "print(list_weeks_with_data)\n", "\n", "# get all images in directory\n", "import glob\n", "imgs = sorted(glob.glob(\"charts/tweets-pro-woche/*.png\"))\n", "\n", "# sort images\n", "imgs = sorted(imgs)\n", "\n", "# show first items in image list as an example (remove square brackets and numbers to get full list)\n", "imgs[0:7]" ] }, { "cell_type": "code", "execution_count": 231, "id": "1be5ca11-362d-4783-886c-79ea87681da8", "metadata": {}, "outputs": [], "source": [ "# manually create list of images (due to missing values and dates from different years, this is fastest method)\n", "\n", "imgs = ['charts/tweets-pro-woche/pol_cities_kw-49.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-44.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-45.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-46.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-47.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-48.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-49.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-50.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-51.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-52.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-53.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-01.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-02.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-03.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-04.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-05.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-06.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-07.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-08.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-09.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-10.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-11.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-12.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-13.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-14.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-15.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-16.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-17.png',\n", " 'charts/tweets-pro-woche/pol_cities_kw-18.png'\n", "]" ] }, { "cell_type": "markdown", "id": "7d4a7e8d-fce1-4058-9309-54c91e360aa1", "metadata": {}, "source": [ "Caution: The following code will save a gif in your charts folder: \"map_tweets_per_week.gif\"! " ] }, { "cell_type": "code", "execution_count": 232, "id": "9c98a418-6b21-4170-824f-53cb81331cc3", "metadata": {}, "outputs": [], "source": [ "# create gif of maps\n", "\n", "# import python pillow library\n", "from PIL import Image\n", "\n", "# Create the frames\n", "frames = []\n", "\n", "# loop through images and append each to list of frames\n", "for i in imgs:\n", " new_frame = Image.open(i)\n", " frames.append(new_frame)\n", "\n", "# create folder if not already exists\n", "if not os.path.exists('charts'):\n", " os.makedirs('charts')\n", "\n", "# save into a GIF file that loops forever\n", "frames[0].save('charts/map_tweets_per_week.gif', format='GIF',\n", " append_images=frames[1:],\n", " save_all=True,\n", " duration=300, loop=0)" ] }, { "cell_type": "code", "execution_count": null, "id": "ca7bd59d-1f6a-4660-9fca-5248f78704ae", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "python-scientific kernel", "language": "python", "name": "python-scientific" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.9" }, "toc-autonumbering": false, "toc-showcode": false, "toc-showmarkdowntxt": false }, "nbformat": 4, "nbformat_minor": 5 }