copbird_aufarbeitung/ergebnisse_hackathon_repo/team-21/copbird.ipynb

2847 lines
368 KiB
Text
Raw Normal View History

2023-03-26 16:36:49 +00:00
{
"cells": [
{
"cell_type": "markdown",
"id": "ecb90936-f52e-4e7d-8a8c-a5ef58ad5bd0",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
" <h2><b>COPBIRD TEAM 21</b></h2>\n",
"</div>"
]
},
{
"cell_type": "markdown",
"id": "a874edd0-07fa-4db0-ab10-6e0c68f81d6a",
"metadata": {},
"source": [
"**What is CopBird?** It's a project that evaluates the behavior of the German police on Twitter. This jupyter notebook was created during the hackathon from May 21 to May 23, 2021. More information on the project can be found [here](https://copbird.org/).\n",
"\n",
"**Where can I get the data?** Unfortunately, the full data is not published because its usage is restricted to scientific research only. Nevertheless, the tweet IDs can be downloaded [here](https://copbird.org/assets/tweet_id.csv).\n",
"\n",
"**Where should I place this notebook?** Please put this file in a directory that contains also a folder called \"data\" including all the necessary data in csv format. Your folder should look like this:\n",
"\n",
"```\n",
".\n",
"├── charts # folder for results, will be created if not existing\n",
"│   └── tweets-pro-woche # -- \" --\n",
"├── copbird.ipynb # this file\n",
"└── data # folder \"data\"\n",
"    ├── copbird_table_entity.csv # necessary data files in csv format\n",
"    ├── copbird_table_tweet.csv # -- \" --\n",
"    ├── copbird_table_user.csv # -- \" --\n",
"    └── polizei_accounts_geo.csv # -- \" --\n",
"\n",
"```\n",
"\n",
"**How can I use this notebook?** To make sure that everythink works properly, all cells should be run in order. Verbose comments should make it understandable for noobs.\n",
"\n",
"<code style=\"background:#ffbdbd;color:#680E0E;font-weight:bold\">Caution: A message like this indicates if a cell will change your system, e.g. save image files or create folders! </code>\n",
"\n",
"**Which libraries do I need?** You will need [pandas](https://pandas.pydata.org/) to analyze the data, [altair](https://altair-viz.github.io/) to visualize the data, [vega_datasets](https://github.com/vega/vega-datasets), and [pillow](https://python-pillow.org/), the fork of PIL, the Python Imaging Library. Please install them, e.g. by using the following command: `pip install pandas altair vega_datasets pillow`. Additionally, we will use the modules `os` and `glob` as parts of the standard library which do not need to be installed separately.\n",
"\n",
"**How can I change the view?** https://pandas.pydata.org/docs/user_guide/options.html"
]
},
{
"cell_type": "markdown",
"id": "a21401e6-93e7-4b41-8c37-4c81cfacc896",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
" <h2>0. Preparation</h2>\n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": 191,
"id": "24a00ce9-8147-4a32-9db9-eeea5caa0a48",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd # analysis\n",
"import altair as alt # visualization \n",
"\n",
"import os # work with files and folders"
]
},
{
"cell_type": "code",
"execution_count": 192,
"id": "ba5baa54-4068-452a-9b62-8051ff7163a3",
"metadata": {},
"outputs": [],
"source": [
"# settings\n",
"\n",
"# suppress decimal places in floats (= keine Nachkommastellen anzeigen)\n",
"pd.options.display.float_format = '{:,.0f}'.format\n",
"\n",
"# wrap text with no whitespace\n",
"pd.set_option('display.max_colwidth', 0)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "374a0b10-938f-4cf7-aa6a-072d13442791",
"metadata": {
"tags": []
},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'pd' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[1], line 2\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;66;03m# import datasets\u001b[39;00m\n\u001b[0;32m----> 2\u001b[0m entities \u001b[38;5;241m=\u001b[39m \u001b[43mpd\u001b[49m\u001b[38;5;241m.\u001b[39mread_csv(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mdata/copbird_table_entity.csv\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m 3\u001b[0m tweets \u001b[38;5;241m=\u001b[39m pd\u001b[38;5;241m.\u001b[39mread_csv(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mdata/copbird_table_tweet.csv\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m 4\u001b[0m users \u001b[38;5;241m=\u001b[39m pd\u001b[38;5;241m.\u001b[39mread_csv(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mdata/copbird_table_user.csv\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n",
"\u001b[0;31mNameError\u001b[0m: name 'pd' is not defined"
]
}
],
"source": [
"# import datasets\n",
"entities = pd.read_csv(\"data/copbird_table_entity.csv\")\n",
"tweets = pd.read_csv(\"data/copbird_table_tweet.csv\")\n",
"users = pd.read_csv(\"data/copbird_table_user.csv\")\n",
"locations = pd.read_csv(\"data/polizei_accounts_geo.csv\", sep = \"\\t\")\n"
]
},
{
"cell_type": "markdown",
"id": "69cfd2a4-db85-4c69-8b86-3682ebb735a1",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
" <h2>1. Exploration</h2>\n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": 194,
"id": "d161f148-44b0-4ce0-98fc-91e2cd2df506",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"shape: 131424 rows, 3 columns\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>tweet_id</th>\n",
" <th>tag</th>\n",
" <th>entity_type</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1321021123463663616</td>\n",
" <td>mahanna196</td>\n",
" <td>mention</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1321025127388188673</td>\n",
" <td>bka</td>\n",
" <td>mention</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1321028108665950208</td>\n",
" <td>StrupeitVolker</td>\n",
" <td>mention</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1321029199998656513</td>\n",
" <td>bka</td>\n",
" <td>mention</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1321032307277443072</td>\n",
" <td>Sitewinder</td>\n",
" <td>mention</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" tweet_id tag entity_type\n",
"0 1321021123463663616 mahanna196 mention \n",
"1 1321025127388188673 bka mention \n",
"2 1321028108665950208 StrupeitVolker mention \n",
"3 1321029199998656513 bka mention \n",
"4 1321032307277443072 Sitewinder mention "
]
},
"execution_count": 194,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# explore entities\n",
"print(f\"shape: {entities.shape[0]} rows, {entities.shape[1]} columns\")\n",
"entities.head()"
]
},
{
"cell_type": "code",
"execution_count": 195,
"id": "59a7e6be-4983-4e42-b466-247971eb7fb8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"hashtag 71313\n",
"url 35635\n",
"mention 24476\n",
"Name: entity_type, dtype: int64"
]
},
"execution_count": 195,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# explore column entity_type of entities:\n",
"# show all entity types and corresponding amount of values\n",
"entities['entity_type'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 196,
"id": "3bb8c9ac-f91b-4958-8941-da0cd031f0aa",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"shape: 45001 rows, 8 columns\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>tweet_text</th>\n",
" <th>created_at</th>\n",
" <th>user_id</th>\n",
" <th>like_count</th>\n",
" <th>retweet_count</th>\n",
" <th>reply_count</th>\n",
" <th>quote_count</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1321021123463663616</td>\n",
" <td>@mahanna196 Da die Stadt keine Ausnahme für Radfahrer aufgeführt hat, gilt diese (Stand jetzt) auch für Radfahrer. *sr</td>\n",
" <td>2020-10-27 09:29:13</td>\n",
" <td>778895426007203840</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1321023114071969792</td>\n",
" <td>#Zeugengesucht\\nDie Hintergründe zu dem Tötungsdelikt in #Gesundbrunnen sind bislang unklar. Unsere 6. #MoKo sucht daher nach Zeugen, die Hinweise zu der Tötung von Mila SIMIC geben können.\\n\\n☎(030) 4664-911666\\n\\n#PM &amp;amp; Foto:\\nhttps://t.co/cwzVsRWdCN\\n\\n^tsm https://t.co/JdeEh04UAH</td>\n",
" <td>2020-10-27 09:37:08</td>\n",
" <td>2397974054</td>\n",
" <td>20</td>\n",
" <td>24</td>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1321025127388188673</td>\n",
" <td>RT @bka: EUROPE´S MOST WANTED Sexualstraftäter nach Vergewaltigung einer Minderjährigen gesucht! \\n➡https://t.co/CoaTgx9qAR \\n➡https://t.…</td>\n",
" <td>2020-10-27 09:45:08</td>\n",
" <td>2397974054</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1321028108665950208</td>\n",
" <td>@StrupeitVolker Wir verstehen nicht so recht was Sie wollen, aber kennen Sie das mit dem Glashaus?</td>\n",
" <td>2020-10-27 09:56:59</td>\n",
" <td>2810902381</td>\n",
" <td>55</td>\n",
" <td>2</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1321029199998656513</td>\n",
" <td>Wir unterstützen das @bka bei der #Öffentlichkeitsfahndung nach einem Tatverdächtigen zur Vergewaltigung einer Minderjährigen. Foto und Personenbeschreibung des Mannes finden Sie hier: https://t.co/YP8bLuakMF https://t.co/ooh75YQjgX</td>\n",
" <td>2020-10-27 10:01:19</td>\n",
" <td>223758384</td>\n",
" <td>16</td>\n",
" <td>9</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id \\\n",
"0 1321021123463663616 \n",
"1 1321023114071969792 \n",
"2 1321025127388188673 \n",
"3 1321028108665950208 \n",
"4 1321029199998656513 \n",
"\n",
" tweet_text \\\n",
"0 @mahanna196 Da die Stadt keine Ausnahme für Radfahrer aufgeführt hat, gilt diese (Stand jetzt) auch für Radfahrer. *sr \n",
"1 #Zeugengesucht\\nDie Hintergründe zu dem Tötungsdelikt in #Gesundbrunnen sind bislang unklar. Unsere 6. #MoKo sucht daher nach Zeugen, die Hinweise zu der Tötung von Mila SIMIC geben können.\\n\\n☎(030) 4664-911666\\n\\n#PM &amp; Foto:\\nhttps://t.co/cwzVsRWdCN\\n\\n^tsm https://t.co/JdeEh04UAH \n",
"2 RT @bka: EUROPE´S MOST WANTED Sexualstraftäter nach Vergewaltigung einer Minderjährigen gesucht! \\n➡https://t.co/CoaTgx9qAR \\n➡https://t.… \n",
"3 @StrupeitVolker Wir verstehen nicht so recht was Sie wollen, aber kennen Sie das mit dem Glashaus? \n",
"4 Wir unterstützen das @bka bei der #Öffentlichkeitsfahndung nach einem Tatverdächtigen zur Vergewaltigung einer Minderjährigen. Foto und Personenbeschreibung des Mannes finden Sie hier: https://t.co/YP8bLuakMF https://t.co/ooh75YQjgX \n",
"\n",
" created_at user_id like_count retweet_count \\\n",
"0 2020-10-27 09:29:13 778895426007203840 2 1 \n",
"1 2020-10-27 09:37:08 2397974054 20 24 \n",
"2 2020-10-27 09:45:08 2397974054 NaN NaN \n",
"3 2020-10-27 09:56:59 2810902381 55 2 \n",
"4 2020-10-27 10:01:19 223758384 16 9 \n",
"\n",
" reply_count quote_count \n",
"0 2 0 \n",
"1 4 1 \n",
"2 NaN NaN \n",
"3 3 0 \n",
"4 5 0 "
]
},
"execution_count": 196,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# explore tweets\n",
"print(f\"shape: {tweets.shape[0]} rows, {tweets.shape[1]} columns\")\n",
"tweets.head(5)"
]
},
{
"cell_type": "code",
"execution_count": 197,
"id": "4915825c-e527-474c-9fd8-f77180f7a6bc",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'@mahanna196 Da die Stadt keine Ausnahme für Radfahrer aufgeführt hat, gilt diese (Stand jetzt) auch für Radfahrer. *sr'"
]
},
"execution_count": 197,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# show tweet example\n",
"tweets['tweet_text'][0]"
]
},
{
"cell_type": "code",
"execution_count": 198,
"id": "1348d325-e4cd-4691-aac3-add03247c6e4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"shape: 161 rows, 3 columns\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>name</th>\n",
" <th>handle</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1032561433102434304</td>\n",
" <td>Polizei Wittlich</td>\n",
" <td>PolizeiWittlich</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1143867545226764293</td>\n",
" <td>Bayerisches Landeskriminalamt</td>\n",
" <td>LKA_Bayern</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1169206134189830145</td>\n",
" <td>Polizei Stendal</td>\n",
" <td>Polizei_SDL</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1184024283342950401</td>\n",
" <td>Polizei Ravensburg</td>\n",
" <td>PolizeiRV</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1232548941889228808</td>\n",
" <td>Polizei Bad Nenndorf</td>\n",
" <td>Polizei_BadN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id name handle\n",
"0 1032561433102434304 Polizei Wittlich PolizeiWittlich\n",
"1 1143867545226764293 Bayerisches Landeskriminalamt LKA_Bayern \n",
"2 1169206134189830145 Polizei Stendal Polizei_SDL \n",
"3 1184024283342950401 Polizei Ravensburg PolizeiRV \n",
"4 1232548941889228808 Polizei Bad Nenndorf Polizei_BadN "
]
},
"execution_count": 198,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# explore users\n",
"print(f\"shape: {users.shape[0]} rows, {users.shape[1]} columns\")\n",
"users.head()"
]
},
{
"cell_type": "code",
"execution_count": 199,
"id": "b2626471-cece-4e31-bcfd-d5e86b5d9f2b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"shape: 163 rows, 7 columns\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Polizei Account</th>\n",
" <th>Name</th>\n",
" <th>Typ</th>\n",
" <th>Bundesland</th>\n",
" <th>Stadt</th>\n",
" <th>LAT</th>\n",
" <th>LONG</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>bpol_11</td>\n",
" <td>Bundespolizei Spezialkräfte</td>\n",
" <td>Bundespolizei</td>\n",
" <td>-</td>\n",
" <td>-</td>\n",
" <td>-</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>bpol_bepo</td>\n",
" <td>Bundesbereitschaftspolizei</td>\n",
" <td>Bundesbereitschaftspolizei</td>\n",
" <td>-</td>\n",
" <td>-</td>\n",
" <td>-</td>\n",
" <td>-</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>bpol_air_fra</td>\n",
" <td>Bundespolizei Flughafen Frankfurt am Main</td>\n",
" <td>Bundespolizei</td>\n",
" <td>Hessen</td>\n",
" <td>Frankfurt am Main</td>\n",
" <td>50.1109221</td>\n",
" <td>8.6821267</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>bpol_b</td>\n",
" <td>Bundespolizei Berlin</td>\n",
" <td>Bundespolizei</td>\n",
" <td>Berlin</td>\n",
" <td>Berlin</td>\n",
" <td>52.520007</td>\n",
" <td>13.404954</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>bpol_b_einsatz</td>\n",
" <td>Bundespolizei Berlin Einsatz</td>\n",
" <td>Bundespolizei</td>\n",
" <td>Berlin</td>\n",
" <td>Berlin</td>\n",
" <td>52.520007</td>\n",
" <td>13.404954</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Polizei Account Name \\\n",
"0 bpol_11 Bundespolizei Spezialkräfte \n",
"1 bpol_bepo Bundesbereitschaftspolizei \n",
"2 bpol_air_fra Bundespolizei Flughafen Frankfurt am Main \n",
"3 bpol_b Bundespolizei Berlin \n",
"4 bpol_b_einsatz Bundespolizei Berlin Einsatz \n",
"\n",
" Typ Bundesland Stadt LAT \\\n",
"0 Bundespolizei - - - \n",
"1 Bundesbereitschaftspolizei - - - \n",
"2 Bundespolizei Hessen Frankfurt am Main 50.1109221 \n",
"3 Bundespolizei Berlin Berlin 52.520007 \n",
"4 Bundespolizei Berlin Berlin 52.520007 \n",
"\n",
" LONG \n",
"0 NaN \n",
"1 - \n",
"2 8.6821267 \n",
"3 13.404954 \n",
"4 13.404954 "
]
},
"execution_count": 199,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# explore locations\n",
"print(f\"shape: {locations.shape[0]} rows, {locations.shape[1]} columns\")\n",
"locations.head()"
]
},
{
"cell_type": "markdown",
"id": "34f3bac2-342b-43c6-a931-35c1816a6cfc",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
" <h2>2. Combine tweets and users to working dataframe <b>df</b> </h2>\n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": 200,
"id": "fe313e4e-8480-4777-949f-aa571c2a90f4",
"metadata": {},
"outputs": [],
"source": [
"# merge dataframes tweets and users\n",
"df = tweets.merge(users, how = \"left\", left_on = \"user_id\", right_on=\"id\")"
]
},
{
"cell_type": "code",
"execution_count": 201,
"id": "5d6eede0-206c-466c-a3f1-a2defbddfc31",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id_x</th>\n",
" <th>tweet_text</th>\n",
" <th>created_at</th>\n",
" <th>user_id</th>\n",
" <th>like_count</th>\n",
" <th>retweet_count</th>\n",
" <th>reply_count</th>\n",
" <th>quote_count</th>\n",
" <th>id_y</th>\n",
" <th>name</th>\n",
" <th>handle</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1321021123463663616</td>\n",
" <td>@mahanna196 Da die Stadt keine Ausnahme für Radfahrer aufgeführt hat, gilt diese (Stand jetzt) auch für Radfahrer. *sr</td>\n",
" <td>2020-10-27 09:29:13</td>\n",
" <td>778895426007203840</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>778895426007203840</td>\n",
" <td>Polizei Oldenburg-Stadt/Ammerl</td>\n",
" <td>Polizei_OL</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id_x \\\n",
"0 1321021123463663616 \n",
"\n",
" tweet_text \\\n",
"0 @mahanna196 Da die Stadt keine Ausnahme für Radfahrer aufgeführt hat, gilt diese (Stand jetzt) auch für Radfahrer. *sr \n",
"\n",
" created_at user_id like_count retweet_count \\\n",
"0 2020-10-27 09:29:13 778895426007203840 2 1 \n",
"\n",
" reply_count quote_count id_y \\\n",
"0 2 0 778895426007203840 \n",
"\n",
" name handle \n",
"0 Polizei Oldenburg-Stadt/Ammerl Polizei_OL "
]
},
"execution_count": 201,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# have a look at new dataframe\n",
"df.head(1)"
]
},
{
"cell_type": "code",
"execution_count": 202,
"id": "acb08a67-2b97-4d19-9d5a-5da0770a8cc9",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>tweet_id</th>\n",
" <th>tweet_text</th>\n",
" <th>created_at</th>\n",
" <th>user_id</th>\n",
" <th>like_count</th>\n",
" <th>retweet_count</th>\n",
" <th>reply_count</th>\n",
" <th>quote_count</th>\n",
" <th>name</th>\n",
" <th>handle</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1321021123463663616</td>\n",
" <td>@mahanna196 Da die Stadt keine Ausnahme für Radfahrer aufgeführt hat, gilt diese (Stand jetzt) auch für Radfahrer. *sr</td>\n",
" <td>2020-10-27 09:29:13</td>\n",
" <td>778895426007203840</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>Polizei Oldenburg-Stadt/Ammerl</td>\n",
" <td>Polizei_OL</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1321023114071969792</td>\n",
" <td>#Zeugengesucht\\nDie Hintergründe zu dem Tötungsdelikt in #Gesundbrunnen sind bislang unklar. Unsere 6. #MoKo sucht daher nach Zeugen, die Hinweise zu der Tötung von Mila SIMIC geben können.\\n\\n☎(030) 4664-911666\\n\\n#PM &amp;amp; Foto:\\nhttps://t.co/cwzVsRWdCN\\n\\n^tsm https://t.co/JdeEh04UAH</td>\n",
" <td>2020-10-27 09:37:08</td>\n",
" <td>2397974054</td>\n",
" <td>20</td>\n",
" <td>24</td>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>Polizei Berlin</td>\n",
" <td>polizeiberlin</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" tweet_id \\\n",
"0 1321021123463663616 \n",
"1 1321023114071969792 \n",
"\n",
" tweet_text \\\n",
"0 @mahanna196 Da die Stadt keine Ausnahme für Radfahrer aufgeführt hat, gilt diese (Stand jetzt) auch für Radfahrer. *sr \n",
"1 #Zeugengesucht\\nDie Hintergründe zu dem Tötungsdelikt in #Gesundbrunnen sind bislang unklar. Unsere 6. #MoKo sucht daher nach Zeugen, die Hinweise zu der Tötung von Mila SIMIC geben können.\\n\\n☎(030) 4664-911666\\n\\n#PM &amp; Foto:\\nhttps://t.co/cwzVsRWdCN\\n\\n^tsm https://t.co/JdeEh04UAH \n",
"\n",
" created_at user_id like_count retweet_count \\\n",
"0 2020-10-27 09:29:13 778895426007203840 2 1 \n",
"1 2020-10-27 09:37:08 2397974054 20 24 \n",
"\n",
" reply_count quote_count name handle \n",
"0 2 0 Polizei Oldenburg-Stadt/Ammerl Polizei_OL \n",
"1 4 1 Polizei Berlin polizeiberlin "
]
},
"execution_count": 202,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# necessary adjustments\n",
"\n",
"# rename columns\n",
"df = df.rename(columns={\"id_x\": \"tweet_id\"})\n",
"\n",
"# drop duplicate columns\n",
"df = df.drop(columns=\"id_y\")\n",
"\n",
"# show dataframe again\n",
"df.head(2)"
]
},
{
"cell_type": "code",
"execution_count": 203,
"id": "eb599e3f-f4b7-4ba9-8417-44b3e02129de",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tweet_id int64 \n",
"tweet_text object \n",
"created_at object \n",
"user_id int64 \n",
"like_count float64\n",
"retweet_count float64\n",
"reply_count float64\n",
"quote_count float64\n",
"name object \n",
"handle object \n",
"dtype: object"
]
},
"execution_count": 203,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# show datatypes of new dataframe\n",
"df.dtypes"
]
},
{
"cell_type": "code",
"execution_count": 204,
"id": "c88eed1b-1792-4e06-94ea-c42bff1d75dc",
"metadata": {},
"outputs": [],
"source": [
"# convert date column to datetime format\n",
"df['created_at'] = pd.to_datetime(df['created_at'])"
]
},
{
"cell_type": "code",
"execution_count": 205,
"id": "4eae0e3d-04a7-4ebe-9d9b-595dff0f336a",
"metadata": {},
"outputs": [],
"source": [
"# add location details\n",
"\n",
"# preparation: necessary because values are spelled differently in columns needed for merge\n",
"locations['Polizei Account'] = locations[\"Polizei Account\"].str.replace(' ', '') # delete spaces \n",
"df['handle'] = df['handle'].str.lower() # convert everything to lower case\n",
"\n",
"# merge tables\n",
"df = df.merge(locations, how = \"left\", left_on = \"handle\", right_on=\"Polizei Account\")"
]
},
{
"cell_type": "code",
"execution_count": 206,
"id": "544aa48c-ce72-403b-a841-8e0ebd54b9bb",
"metadata": {},
"outputs": [],
"source": [
"# add column with week number\n",
"df['week'] = df['created_at'].dt.isocalendar().week"
]
},
{
"cell_type": "code",
"execution_count": 207,
"id": "8bba2fc0-5743-4f00-948b-f08939f36c3b",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>tweet_id</th>\n",
" <th>tweet_text</th>\n",
" <th>created_at</th>\n",
" <th>user_id</th>\n",
" <th>like_count</th>\n",
" <th>retweet_count</th>\n",
" <th>reply_count</th>\n",
" <th>quote_count</th>\n",
" <th>name</th>\n",
" <th>handle</th>\n",
" <th>Polizei Account</th>\n",
" <th>Name</th>\n",
" <th>Typ</th>\n",
" <th>Bundesland</th>\n",
" <th>Stadt</th>\n",
" <th>LAT</th>\n",
" <th>LONG</th>\n",
" <th>week</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1321021123463663616</td>\n",
" <td>@mahanna196 Da die Stadt keine Ausnahme für Radfahrer aufgeführt hat, gilt diese (Stand jetzt) auch für Radfahrer. *sr</td>\n",
" <td>2020-10-27 09:29:13</td>\n",
" <td>778895426007203840</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>Polizei Oldenburg-Stadt/Ammerl</td>\n",
" <td>polizei_ol</td>\n",
" <td>polizei_ol</td>\n",
" <td>Polizei Oldenburg-Stadt/Ammerland</td>\n",
" <td>Polizei</td>\n",
" <td>Niedersachsen</td>\n",
" <td>Oldenburg</td>\n",
" <td>53.1389753</td>\n",
" <td>8.2146017</td>\n",
" <td>44</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" tweet_id \\\n",
"0 1321021123463663616 \n",
"\n",
" tweet_text \\\n",
"0 @mahanna196 Da die Stadt keine Ausnahme für Radfahrer aufgeführt hat, gilt diese (Stand jetzt) auch für Radfahrer. *sr \n",
"\n",
" created_at user_id like_count retweet_count \\\n",
"0 2020-10-27 09:29:13 778895426007203840 2 1 \n",
"\n",
" reply_count quote_count name handle \\\n",
"0 2 0 Polizei Oldenburg-Stadt/Ammerl polizei_ol \n",
"\n",
" Polizei Account Name Typ Bundesland \\\n",
"0 polizei_ol Polizei Oldenburg-Stadt/Ammerland Polizei Niedersachsen \n",
"\n",
" Stadt LAT LONG week \n",
"0 Oldenburg 53.1389753 8.2146017 44 "
]
},
"execution_count": 207,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# show new dataframe\n",
"df.head(1)"
]
},
{
"cell_type": "markdown",
"id": "b216163f-95af-4ece-a743-a2299390578d",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
" <h2>3. Analyze: <b>Welches sind die 50 aktivsten Polizei-Accounts?</b></h2>\n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": 208,
"id": "e4ee6e95-636e-42fd-8c34-33813f8f518a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>handle</th>\n",
" <th>user_id</th>\n",
" <th>tweet_count</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Bayerisches Landeskriminalamt</td>\n",
" <td>lka_bayern</td>\n",
" <td>1143867545226764293</td>\n",
" <td>84</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Bundesbereitschaftspolizei</td>\n",
" <td>bpol_bepo</td>\n",
" <td>4876078570</td>\n",
" <td>29</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Bundespolizei Baden-Württember</td>\n",
" <td>bpol_bw</td>\n",
" <td>3169257933</td>\n",
" <td>488</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Bundespolizei Bayern</td>\n",
" <td>bpol_by</td>\n",
" <td>3169867654</td>\n",
" <td>285</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Bundespolizei Berlin</td>\n",
" <td>bpol_b</td>\n",
" <td>4876039738</td>\n",
" <td>115</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name handle user_id \\\n",
"0 Bayerisches Landeskriminalamt lka_bayern 1143867545226764293 \n",
"1 Bundesbereitschaftspolizei bpol_bepo 4876078570 \n",
"2 Bundespolizei Baden-Württember bpol_bw 3169257933 \n",
"3 Bundespolizei Bayern bpol_by 3169867654 \n",
"4 Bundespolizei Berlin bpol_b 4876039738 \n",
"\n",
" tweet_count \n",
"0 84 \n",
"1 29 \n",
"2 488 \n",
"3 285 \n",
"4 115 "
]
},
"execution_count": 208,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# prepare dataframe for visualization\n",
"df_vis = df.groupby(['name', 'handle', 'user_id']).agg({\"tweet_id\": 'count'}).reset_index()\n",
"\n",
"# rename columns\n",
"df_vis = df_vis.rename(columns = {'tweet_id': 'tweet_count'})\n",
"\n",
"# show df_vis\n",
"df_vis.head()"
]
},
{
"cell_type": "code",
"execution_count": 209,
"id": "b2f82d88-4798-4dbd-8785-ffeddb7f44bc",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"161"
]
},
"execution_count": 209,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# how many accounts are in dataset?\n",
"df_vis.shape[0]"
]
},
{
"cell_type": "code",
"execution_count": 210,
"id": "ab29714f-aae3-4726-9b79-472c84fe7f28",
"metadata": {},
"outputs": [],
"source": [
"# only use 50 accounts with most tweets in dataset \n",
"df_vis = df_vis.sort_values(by='tweet_count', ascending = False)[0:50]"
]
},
{
"cell_type": "markdown",
"id": "d576b91a-dc34-4e8c-b6d6-9066ee6322df",
"metadata": {},
"source": [
"<code style=\"background:#ffbdbd;color:#680E0E;font-weight:bold\">Caution: If you remove the '#' symbols in lines 2,3 and 16, the following code will save a png file called \"barchart_most_active_50\" in a new folder named \"charts\". If you don't change anything, the chart will be shown in this notebook. </code>"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d4143652-c694-44ac-97ba-4c5f18b45191",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"<div id=\"altair-viz-a62b0001ecef4e7d8d0adf002c00288e\"></div>\n",
"<script type=\"text/javascript\">\n",
" (function(spec, embedOpt){\n",
" let outputDiv = document.currentScript.previousElementSibling;\n",
" if (outputDiv.id !== \"altair-viz-a62b0001ecef4e7d8d0adf002c00288e\") {\n",
" outputDiv = document.getElementById(\"altair-viz-a62b0001ecef4e7d8d0adf002c00288e\");\n",
" }\n",
" const paths = {\n",
" \"vega\": \"https://cdn.jsdelivr.net/npm//vega@5?noext\",\n",
" \"vega-lib\": \"https://cdn.jsdelivr.net/npm//vega-lib?noext\",\n",
" \"vega-lite\": \"https://cdn.jsdelivr.net/npm//vega-lite@4.8.1?noext\",\n",
" \"vega-embed\": \"https://cdn.jsdelivr.net/npm//vega-embed@6?noext\",\n",
" };\n",
"\n",
" function loadScript(lib) {\n",
" return new Promise(function(resolve, reject) {\n",
" var s = document.createElement('script');\n",
" s.src = paths[lib];\n",
" s.async = true;\n",
" s.onload = () => resolve(paths[lib]);\n",
" s.onerror = () => reject(`Error loading script: ${paths[lib]}`);\n",
" document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
" });\n",
" }\n",
"\n",
" function showError(err) {\n",
" outputDiv.innerHTML = `<div class=\"error\" style=\"color:red;\">${err}</div>`;\n",
" throw err;\n",
" }\n",
"\n",
" function displayChart(vegaEmbed) {\n",
" vegaEmbed(outputDiv, spec, embedOpt)\n",
" .catch(err => showError(`Javascript Error: ${err.message}<br>This usually means there's a typo in your chart specification. See the javascript console for the full traceback.`));\n",
" }\n",
"\n",
" if(typeof define === \"function\" && define.amd) {\n",
" requirejs.config({paths});\n",
" require([\"vega-embed\"], displayChart, err => showError(`Error loading script: ${err.message}`));\n",
" } else if (typeof vegaEmbed === \"function\") {\n",
" displayChart(vegaEmbed);\n",
" } else {\n",
" loadScript(\"vega\")\n",
" .then(() => loadScript(\"vega-lite\"))\n",
" .then(() => loadScript(\"vega-embed\"))\n",
" .catch(showError)\n",
" .then(() => displayChart(vegaEmbed));\n",
" }\n",
" })({\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"layer\": [{\"mark\": \"bar\", \"encoding\": {\"tooltip\": {\"type\": \"quantitative\", \"field\": \"tweet_count\"}, \"x\": {\"type\": \"quantitative\", \"field\": \"tweet_count\"}, \"y\": {\"type\": \"ordinal\", \"field\": \"name\", \"sort\": \"-x\"}}}, {\"mark\": {\"type\": \"rule\", \"color\": \"red\"}, \"encoding\": {\"x\": {\"type\": \"quantitative\", \"aggregate\": \"mean\", \"field\": \"tweet_count\"}}}], \"data\": {\"name\": \"data-3b69f6ad7b64245577c2ce5ae191a09d\"}, \"width\": 600, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-3b69f6ad7b64245577c2ce5ae191a09d\": [{\"name\": \"Polizei Frankfurt\", \"handle\": \"polizei_ffm\", \"user_id\": 2272909014, \"tweet_count\": 1735}, {\"name\": \"Polizei Sachsen\", \"handle\": \"polizeisachsen\", \"user_id\": 223758384, \"tweet_count\": 1677}, {\"name\": \"Polizei Brandenburg\", \"handle\": \"polizeibb\", \"user_id\": 720244303566483456, \"tweet_count\": 1393}, {\"name\": \"Polizei NRW DO\", \"handle\": \"polizei_nrw_do\", \"user_id\": 769128278, \"tweet_count\": 1373}, {\"name\": \"Polizei Mannheim\", \"handle\": \"polizeimannheim\", \"user_id\": 4201961439, \"tweet_count\": 1139}, {\"name\": \"Polizei M\\u00fcnchen\", \"handle\": \"polizeimuenchen\", \"user_id\": 2810902381, \"tweet_count\": 1125}, {\"name\": \"Polizei Hamburg\", \"handle\": \"polizeihamburg\", \"user_id\": 2904886151, \"tweet_count\": 1040}, {\"name\": \"Polizei Mittelfranken\", \"handle\": \"polizeimfr\", \"user_id\": 800718568572612608, \"tweet_count\": 1009}, {\"name\": \"Polizei Karlsruhe\", \"handle\": \"polizei_ka\", \"user_id\": 3029998264, \"tweet_count\": 1006}, {\"name\": \"Polizei Mittelhessen\", \"handle\": \"polizei_mh\", \"user_id\": 4923370289, \"tweet_count\": 978}, {\"name\": \"Polizei Bremen\", \"handle\": \"bremenpolizei\", \"user_id\": 808666671468658688, \"tweet_count\": 910}, {\"name\": \"Polizei Hannover\", \"handle\": \"polizei_h\", \"user_id\": 770652658566852608, \"tweet_count\": 907}, {\"name\": \"Polizei Magdeburg\", \"handle\": \"polizei_md\", \"user_id\": 2849730251, \"tweet_count\": 751}, {\"name\": \"Polizei Unterfranken\", \"handle\": \"polizeiufr\", \"user_id\": 725206557709979648, \"tweet_count\": 710}, {\"name\": \"Polizei Berlin\", \"handle\": \"polizeiberlin\", \"user_id\": 2397974054, \"tweet_count\": 708}, {\"name\": \"Polizei NRW BO\", \"handle\": \"polizei_nrw_bo\", \"user_id\": 2389155192, \"tweet_count\": 706}, {\"name\": \"Polizei NRW K\", \"handle\": \"polizei_nrw_k\", \"user_id\": 259607457, \"tweet_count\": 698}, {\"name\": \"Polizei Dessau-Ro\\u00dflau\", \"handle\": \"polizei_dero\", \"user_id\": 4703631856, \"tweet_count\": 557}, {\"name\": \"Polizei NRW MS\", \"handle\": \"polizei_nrw_ms\", \"user_id\": 2284811875, \"tweet_count\": 547}, {\"name\": \"Polizei Stuttgart\", \"handle\": \"pp_stuttgart\", \"user_id\": 424895827, \"tweet_count\": 528}, {\"name\": \"Polizei NRW DU\", \"handle\": \"polizei_nrw_du\", \"user_id\": 2389222849, \"tweet_count\": 515}, {\"name\": \"Polizei Th\\u00fcringen\", \"handle\": \"polizei_thuer\", \"user_id\": 3064348636, \"tweet_count\": 496}, {\"name\": \"Polizei Ulm\", \"handle\": \"polizeiul\", \"user_id\": 783322939580092418, \"tweet_count\": 495}, {\"name\": \"Bundespolizei Baden-W\\u00fcrttember\", \"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"tweet_count\": 488}, {\"name\": \"Polizei Rheinpfalz\", \"handle\": \"pp_rheinpfalz\", \"user_id\": 2176104583, \"tweet_count\": 486}, {\"name\": \"polizei_nrw_me\", \"handle\": \"polizei_nrw_me\", \"user_id\": 2389359068, \"tweet_count\": 471}, {\"name\": \"Polizei G\\u00f6ttingen\", \"handle\": \"polizei_goe\", \"user_id\": 772751356230823936, \"tweet_count\": 445}, {\"name\": \"Polizei NRW UN\", \"handle\": \"polizei_nrw_un\", \"user_id\": 2389263558, \"tweet_count\": 419}, {\"name\": \"Polizei Nordhessen\", \"handle\": \"polizei_nh\", \"user_id\": 3165841996, \"tweet_count\": 410}, {\"name\": \"Polizei NRW BN\", \"handle\": \"pol
"</script>"
],
"text/plain": [
"alt.LayerChart(...)"
]
},
"execution_count": 211,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# create folder if not already exists\n",
"#if not os.path.exists('charts'):\n",
" #os.makedirs('charts')\n",
"\n",
"# draw bar chart\n",
"bar = alt.Chart(df_vis).mark_bar().encode(\n",
" x=alt.X('tweet_count:Q'),\n",
" y=alt.Y('name:O', sort='-x'),\n",
" tooltip = 'tweet_count'\n",
")\n",
"\n",
"rule = alt.Chart(df_vis).mark_rule(color='red').encode(\n",
" x='mean(tweet_count):Q'\n",
")\n",
"\n",
"(bar + rule).properties(width=600)#.save(\"barchart_most_active_50.png\", format = \"png\")"
]
},
{
"cell_type": "code",
"execution_count": 212,
"id": "714ffb8b-61a2-4597-bc8d-d280c0edffdc",
"metadata": {},
"outputs": [],
"source": [
"# create list with 50 accounts with most tweets for later usage\n",
"top_50 = list(df_vis.user_id.unique())\n",
"\n",
"# create dataset only of 50 top accounts\n",
"df_50 = df[df['user_id'].isin(top_50)==True]"
]
},
{
"cell_type": "markdown",
"id": "78119358-efe5-4c22-b7ea-b04c9eafb66d",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
" <h2>4. Analyze: <b>Welche Dienststelle setzt wann wie viele Tweets ab?</b></h2>\n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": 213,
"id": "212b4468-7bcb-4dce-9be9-d5b83b44cb28",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead tr th {\n",
" text-align: left;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr>\n",
" <th></th>\n",
" <th>handle</th>\n",
" <th>user_id</th>\n",
" <th>created_at</th>\n",
" <th>tweet_id</th>\n",
" </tr>\n",
" <tr>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th>count</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>bpol_bw</td>\n",
" <td>3169257933</td>\n",
" <td>2020-11-09 06:47:09</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>bpol_bw</td>\n",
" <td>3169257933</td>\n",
" <td>2020-11-09 09:03:03</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>bpol_bw</td>\n",
" <td>3169257933</td>\n",
" <td>2020-11-09 09:13:18</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>bpol_bw</td>\n",
" <td>3169257933</td>\n",
" <td>2020-11-09 09:24:05</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>bpol_bw</td>\n",
" <td>3169257933</td>\n",
" <td>2020-11-09 14:58:43</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" handle user_id created_at tweet_id\n",
" count\n",
"0 bpol_bw 3169257933 2020-11-09 06:47:09 1 \n",
"1 bpol_bw 3169257933 2020-11-09 09:03:03 1 \n",
"2 bpol_bw 3169257933 2020-11-09 09:13:18 1 \n",
"3 bpol_bw 3169257933 2020-11-09 09:24:05 1 \n",
"4 bpol_bw 3169257933 2020-11-09 14:58:43 1 "
]
},
"execution_count": 213,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# limit to 50 most active accounts\n",
"df_vis = df_50[['created_at', 'user_id', 'handle', 'tweet_id']]\n",
"\n",
"# count tweets over time\n",
"df_vis = df_vis.groupby(['handle', 'user_id', 'created_at']).agg({\"tweet_id\": ['count']}).reset_index()\n",
"\n",
"# have a look at new created df_vis\n",
"df_vis.head()"
]
},
{
"cell_type": "code",
"execution_count": 214,
"id": "f12e0efb-66a9-4cad-a5dc-5179abe84ae0",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>handle</th>\n",
" <th>user_id</th>\n",
" <th>created_at</th>\n",
" <th>tweet_count</th>\n",
" <th>week</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>bpol_bw</td>\n",
" <td>3169257933</td>\n",
" <td>2020-11-09 06:47:09</td>\n",
" <td>1</td>\n",
" <td>46</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>bpol_bw</td>\n",
" <td>3169257933</td>\n",
" <td>2020-11-09 09:03:03</td>\n",
" <td>1</td>\n",
" <td>46</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>bpol_bw</td>\n",
" <td>3169257933</td>\n",
" <td>2020-11-09 09:13:18</td>\n",
" <td>1</td>\n",
" <td>46</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>bpol_bw</td>\n",
" <td>3169257933</td>\n",
" <td>2020-11-09 09:24:05</td>\n",
" <td>1</td>\n",
" <td>46</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>bpol_bw</td>\n",
" <td>3169257933</td>\n",
" <td>2020-11-09 14:58:43</td>\n",
" <td>1</td>\n",
" <td>46</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" handle user_id created_at tweet_count week\n",
"0 bpol_bw 3169257933 2020-11-09 06:47:09 1 46 \n",
"1 bpol_bw 3169257933 2020-11-09 09:03:03 1 46 \n",
"2 bpol_bw 3169257933 2020-11-09 09:13:18 1 46 \n",
"3 bpol_bw 3169257933 2020-11-09 09:24:05 1 46 \n",
"4 bpol_bw 3169257933 2020-11-09 14:58:43 1 46 "
]
},
"execution_count": 214,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# rename columns\n",
"df_vis.columns = ['handle', 'user_id', 'created_at', 'tweet_count']\n",
"\n",
"# add week column\n",
"df_vis['week'] = df_vis['created_at'].dt.isocalendar().week\n",
"\n",
"# again show df_vis\n",
"df_vis.head()"
]
},
{
"cell_type": "code",
"execution_count": 215,
"id": "a2e329c1-50e7-4d27-87d7-fbe527463a0c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>handle</th>\n",
" <th>user_id</th>\n",
" <th>week</th>\n",
" <th>tweet_count</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>bpol_bw</td>\n",
" <td>3169257933</td>\n",
" <td>1</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>bpol_bw</td>\n",
" <td>3169257933</td>\n",
" <td>2</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>bpol_bw</td>\n",
" <td>3169257933</td>\n",
" <td>3</td>\n",
" <td>33</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>bpol_bw</td>\n",
" <td>3169257933</td>\n",
" <td>4</td>\n",
" <td>26</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>bpol_bw</td>\n",
" <td>3169257933</td>\n",
" <td>5</td>\n",
" <td>7</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" handle user_id week tweet_count\n",
"0 bpol_bw 3169257933 1 6 \n",
"1 bpol_bw 3169257933 2 3 \n",
"2 bpol_bw 3169257933 3 33 \n",
"3 bpol_bw 3169257933 4 26 \n",
"4 bpol_bw 3169257933 5 7 "
]
},
"execution_count": 215,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# group by week to get number of tweets per week\n",
"df_vis = df_vis.groupby(['handle', 'user_id', 'week']).agg({'tweet_count': 'count'}).reset_index()\n",
"\n",
"# again show df_vis\n",
"df_vis.head()"
]
},
{
"cell_type": "markdown",
"id": "e8680194-ead5-4ba2-9ea5-2f8d61f48a0b",
"metadata": {},
"source": [
"<code style=\"background:#ffbdbd;color:#680E0E;font-weight:bold\">Caution: If you remove the '#' symbols in lines 2,3 and 7, the following code will save a png file called \"barchart_most_active_50\" in a folder named \"charts\". If you don't change anything, the chart will be shown in this notebook. (Press shift+L to show line numbers.) </code>"
]
},
{
"cell_type": "code",
"execution_count": 216,
"id": "ca1068f1-faab-4504-b0c9-fa1d758d574f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"<div id=\"altair-viz-59e8082e3c1149aaabdacaa82dc0b3c1\"></div>\n",
"<script type=\"text/javascript\">\n",
" (function(spec, embedOpt){\n",
" let outputDiv = document.currentScript.previousElementSibling;\n",
" if (outputDiv.id !== \"altair-viz-59e8082e3c1149aaabdacaa82dc0b3c1\") {\n",
" outputDiv = document.getElementById(\"altair-viz-59e8082e3c1149aaabdacaa82dc0b3c1\");\n",
" }\n",
" const paths = {\n",
" \"vega\": \"https://cdn.jsdelivr.net/npm//vega@5?noext\",\n",
" \"vega-lib\": \"https://cdn.jsdelivr.net/npm//vega-lib?noext\",\n",
" \"vega-lite\": \"https://cdn.jsdelivr.net/npm//vega-lite@4.8.1?noext\",\n",
" \"vega-embed\": \"https://cdn.jsdelivr.net/npm//vega-embed@6?noext\",\n",
" };\n",
"\n",
" function loadScript(lib) {\n",
" return new Promise(function(resolve, reject) {\n",
" var s = document.createElement('script');\n",
" s.src = paths[lib];\n",
" s.async = true;\n",
" s.onload = () => resolve(paths[lib]);\n",
" s.onerror = () => reject(`Error loading script: ${paths[lib]}`);\n",
" document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
" });\n",
" }\n",
"\n",
" function showError(err) {\n",
" outputDiv.innerHTML = `<div class=\"error\" style=\"color:red;\">${err}</div>`;\n",
" throw err;\n",
" }\n",
"\n",
" function displayChart(vegaEmbed) {\n",
" vegaEmbed(outputDiv, spec, embedOpt)\n",
" .catch(err => showError(`Javascript Error: ${err.message}<br>This usually means there's a typo in your chart specification. See the javascript console for the full traceback.`));\n",
" }\n",
"\n",
" if(typeof define === \"function\" && define.amd) {\n",
" requirejs.config({paths});\n",
" require([\"vega-embed\"], displayChart, err => showError(`Error loading script: ${err.message}`));\n",
" } else if (typeof vegaEmbed === \"function\") {\n",
" displayChart(vegaEmbed);\n",
" } else {\n",
" loadScript(\"vega\")\n",
" .then(() => loadScript(\"vega-lite\"))\n",
" .then(() => loadScript(\"vega-embed\"))\n",
" .catch(showError)\n",
" .then(() => displayChart(vegaEmbed));\n",
" }\n",
" })({\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"data\": {\"name\": \"data-c556bff0366b24935f22da5e1f494a9d\"}, \"mark\": \"line\", \"encoding\": {\"color\": {\"type\": \"nominal\", \"field\": \"handle\"}, \"tooltip\": [{\"type\": \"quantitative\", \"field\": \"tweet_count\"}, {\"type\": \"quantitative\", \"field\": \"user_id\"}, {\"type\": \"nominal\", \"field\": \"handle\"}, {\"type\": \"quantitative\", \"field\": \"week\"}], \"x\": {\"type\": \"quantitative\", \"field\": \"week\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"tweet_count\"}}, \"selection\": {\"selector012\": {\"type\": \"interval\", \"bind\": \"scales\", \"encodings\": [\"x\", \"y\"]}}, \"width\": 800, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-c556bff0366b24935f22da5e1f494a9d\": [{\"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"week\": 1, \"tweet_count\": 6}, {\"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"week\": 2, \"tweet_count\": 3}, {\"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"week\": 3, \"tweet_count\": 33}, {\"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"week\": 4, \"tweet_count\": 26}, {\"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"week\": 5, \"tweet_count\": 7}, {\"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"week\": 6, \"tweet_count\": 17}, {\"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"week\": 7, \"tweet_count\": 10}, {\"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"week\": 8, \"tweet_count\": 6}, {\"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"week\": 9, \"tweet_count\": 9}, {\"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"week\": 10, \"tweet_count\": 24}, {\"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"week\": 11, \"tweet_count\": 25}, {\"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"week\": 12, \"tweet_count\": 22}, {\"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"week\": 13, \"tweet_count\": 15}, {\"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"week\": 14, \"tweet_count\": 15}, {\"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"week\": 15, \"tweet_count\": 27}, {\"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"week\": 16, \"tweet_count\": 6}, {\"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"week\": 17, \"tweet_count\": 12}, {\"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"week\": 18, \"tweet_count\": 23}, {\"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"week\": 19, \"tweet_count\": 12}, {\"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"week\": 46, \"tweet_count\": 33}, {\"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"week\": 47, \"tweet_count\": 24}, {\"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"week\": 48, \"tweet_count\": 31}, {\"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"week\": 49, \"tweet_count\": 31}, {\"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"week\": 50, \"tweet_count\": 34}, {\"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"week\": 51, \"tweet_count\": 21}, {\"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"week\": 52, \"tweet_count\": 10}, {\"handle\": \"bpol_bw\", \"user_id\": 3169257933, \"week\": 53, \"tweet_count\": 6}, {\"handle\": \"bpol_kueste\", \"user_id\": 4876076194, \"week\": 1, \"tweet_count\": 1}, {\"handle\": \"bpol_kueste\", \"user_id\": 4876076194, \"week\": 3, \"tweet_count\": 17}, {\"handle\": \"bpol_kueste\", \"user_id\": 4876076194, \"week\": 4, \"tweet_count\": 15}, {\"handle\": \"bpol_kueste\", \"user_id\": 4876076194, \"week\": 5, \"tweet_count\": 3}, {\"handle\": \"bpol_kueste\", \"user_id\": 4876076194, \"week\": 6, \"tweet_count\": 15}, {\"handle\": \"bpol_kueste\", \"user_id\": 4876076194, \"week\": 7, \"tweet_count\": 14}, {\"handle\": \"bpol_kueste\", \"user_id\": 4876076194, \"week\": 8, \"tweet_count\": 10}, {\"handle\": \"bpol_kueste\", \"user_id\": 4876076194, \"week\": 10, \"tweet_count\": 15}, {\"handle\": \"bpol_kueste\", \"user_id\": 4876076194, \"week\": 11, \"tweet_count\": 10}, {\"handle\": \"bpol_kueste\", \"user_id\": 4876076194, \"week\": 12, \"tweet_count\": 20}, {
"</script>"
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 216,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# create folder if not already exists\n",
"#if not os.path.exists('charts'):\n",
"# os.makedirs('charts')\n",
"\n",
"# show chart\n",
"alt.Chart(df_vis).mark_line().encode(\n",
" x='week',\n",
" y=alt.Y('tweet_count'),\n",
" color = 'handle',\n",
" tooltip = ['tweet_count','user_id', 'handle', 'week']\n",
").interactive().properties(width=800)#.save(\"charts/aktive-nach-wochen.png\", format = 'png')"
]
},
{
"cell_type": "markdown",
"id": "70a8d3dc-5703-4eb0-80e8-ccde16a9255e",
"metadata": {},
"source": [
"**Achtung: Darstellung nicht ideal, da Werte zwischen KW 19 und 44 nicht existieren. Außerdem beziehen sich KW 44-53 auf das Jahr 2020, 1-19 auf das Jahr 2021**"
]
},
{
"cell_type": "markdown",
"id": "ec2a02ac-c004-4713-8fa5-c0a2449c1917",
"metadata": {},
"source": [
"**Durch die Exploration des Line Charts über Tooltip-Anzeigen ergeben sich weitere Fragen:**\n",
"\n",
"* Was war in KW 5 und 13 und 47 in Karlsruhe los?\n",
"* Was war in KW 5 und 18, 45 und 50 Frankfurt a.M. los?\n",
"* Was war in KW 9 in Dortmund los?\n",
"* Was war in KW 12 und KW 14 in Mannheim los?\n",
"* Was war in KW 17 in Sachsen los?\n",
"* Was war in KW 46 in Mülheim an der Ruhr los?\n",
"* Was war in KW 49 in Bremen los?\n",
"* Was war in KW 49 in Gelsenkirchen los?"
]
},
{
"cell_type": "markdown",
"id": "92585603-ee61-4cb3-bedc-ae6e863c8492",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
" <h2>4. Analyze: <b>Was war los in Karlsruhe</b> (in den Kalenderwochen 5, 13, 47)?</h2>\n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": 217,
"id": "bb918908-1f9e-49bc-b99f-75c5ec36fd30",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>tweet_id</th>\n",
" <th>tweet_text</th>\n",
" <th>created_at</th>\n",
" <th>user_id</th>\n",
" <th>like_count</th>\n",
" <th>retweet_count</th>\n",
" <th>reply_count</th>\n",
" <th>quote_count</th>\n",
" <th>name</th>\n",
" <th>handle</th>\n",
" <th>Polizei Account</th>\n",
" <th>Name</th>\n",
" <th>Typ</th>\n",
" <th>Bundesland</th>\n",
" <th>Stadt</th>\n",
" <th>LAT</th>\n",
" <th>LONG</th>\n",
" <th>week</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>109</th>\n",
" <td>1321119171825012736</td>\n",
" <td>Die #Staatsanwaltschaft Ka hat am Sa bzw. So beim zuständigen Amtsgericht #Haftbefehle gegen zwei Männer erwirkt. Dem 18-Jährigen wird versuchter Totschlag vorgeworfen, dem 19-Jährigen gefährliche Körperverletzung. Zur PM: https://t.co/4MrESOTo3b\\n\\nEure #Polizei #Karlsruhe https://t.co/RZwXmI3VPf</td>\n",
" <td>2020-10-27 15:58:50</td>\n",
" <td>3029998264</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Polizei Karlsruhe</td>\n",
" <td>polizei_ka</td>\n",
" <td>polizei_ka</td>\n",
" <td>Polizei Karlsruhe</td>\n",
" <td>Polizei</td>\n",
" <td>Baden-Württemberg</td>\n",
" <td>Karlsruhe</td>\n",
" <td>49.0068705</td>\n",
" <td>8.4034195</td>\n",
" <td>44</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" tweet_id \\\n",
"109 1321119171825012736 \n",
"\n",
" tweet_text \\\n",
"109 Die #Staatsanwaltschaft Ka hat am Sa bzw. So beim zuständigen Amtsgericht #Haftbefehle gegen zwei Männer erwirkt. Dem 18-Jährigen wird versuchter Totschlag vorgeworfen, dem 19-Jährigen gefährliche Körperverletzung. Zur PM: https://t.co/4MrESOTo3b\\n\\nEure #Polizei #Karlsruhe https://t.co/RZwXmI3VPf \n",
"\n",
" created_at user_id like_count retweet_count reply_count \\\n",
"109 2020-10-27 15:58:50 3029998264 NaN NaN NaN \n",
"\n",
" quote_count name handle Polizei Account \\\n",
"109 NaN Polizei Karlsruhe polizei_ka polizei_ka \n",
"\n",
" Name Typ Bundesland Stadt LAT \\\n",
"109 Polizei Karlsruhe Polizei Baden-Württemberg Karlsruhe 49.0068705 \n",
"\n",
" LONG week \n",
"109 8.4034195 44 "
]
},
"execution_count": 217,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# filter dataset of 50 most active accounts, only include rows where value in 'handle column' is 'polizei_ka'\n",
"df_vis = df_50[df_50['handle']=='polizei_ka']\n",
"\n",
"# have a look at dataframe\n",
"df_vis.head(1)"
]
},
{
"cell_type": "code",
"execution_count": 218,
"id": "f182e57a-6181-4a43-b7b0-a0656c86f1e9",
"metadata": {},
"outputs": [],
"source": [
"# create function to create new dataframes filtered by week\n",
"def create_df_by_week(df,week):\n",
" \n",
" # create dataframe for selected week of input df\n",
" df = df[df['week']==week]\n",
" \n",
" # \n",
" df = df[['tweet_id', 'created_at', 'tweet_text', 'like_count', 'retweet_count', 'reply_count', 'quote_count']]\n",
" \n",
" df = df.rename(columns = {'like_count': 'likes', \n",
" 'retweet_count': 'retweets', \n",
" 'replie_count': 'replies',\n",
" 'quote_count': 'quotes'})\n",
" \n",
" return df"
]
},
{
"cell_type": "markdown",
"id": "0fe3234a-fc7d-4d0c-a3bc-5d791eac4963",
"metadata": {},
"source": [
"<code style=\"background:#ffe0b2;color:#f57c00;font-weight:bold\">KW 5</code>"
]
},
{
"cell_type": "code",
"execution_count": 219,
"id": "e7f92fb7-f0d7-4bb5-be3a-cf70a425956e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"shape: 115 columns, 7 rows\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>tweet_id</th>\n",
" <th>created_at</th>\n",
" <th>tweet_text</th>\n",
" <th>likes</th>\n",
" <th>retweets</th>\n",
" <th>reply_count</th>\n",
" <th>quotes</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>21304</th>\n",
" <td>1356148296654479361</td>\n",
" <td>2021-02-01 07:52:04</td>\n",
" <td>@LaPapper Der Tweet wurde gelöscht, wir können leider nicht mehr sehen, auf was Sie sich bezogen haben 😅</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21422</th>\n",
" <td>1356195468406087684</td>\n",
" <td>2021-02-01 10:59:31</td>\n",
" <td>#GeschädigterGesucht: Ein alkoholisierter 43-Jähriger hat am Samstagmittag in einer Bahn zwischen #Kronenplatz &amp;amp; #Marktplatz einen älteren Fahrgast angegriffen. Zwei jugendliche Mädchen griffen zum Glück ein. #ZivileHelden\\n\\nPM: https://t.co/8qUfvYSBoH\\n\\nEure #Polizei #Karlsruhe https://t.co/depnkQrYF5</td>\n",
" <td>14</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" tweet_id created_at \\\n",
"21304 1356148296654479361 2021-02-01 07:52:04 \n",
"21422 1356195468406087684 2021-02-01 10:59:31 \n",
"\n",
" tweet_text \\\n",
"21304 @LaPapper Der Tweet wurde gelöscht, wir können leider nicht mehr sehen, auf was Sie sich bezogen haben 😅 \n",
"21422 #GeschädigterGesucht: Ein alkoholisierter 43-Jähriger hat am Samstagmittag in einer Bahn zwischen #Kronenplatz &amp; #Marktplatz einen älteren Fahrgast angegriffen. Zwei jugendliche Mädchen griffen zum Glück ein. #ZivileHelden\\n\\nPM: https://t.co/8qUfvYSBoH\\n\\nEure #Polizei #Karlsruhe https://t.co/depnkQrYF5 \n",
"\n",
" likes retweets reply_count quotes \n",
"21304 0 0 0 0 \n",
"21422 14 1 0 1 "
]
},
"execution_count": 219,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# create dataframe\n",
"df_ka_5 = create_df_by_week(df_vis,5)\n",
"\n",
"# print shape\n",
"print(f\"shape: {df_ka_5.shape[0]} columns, {df_ka_5.shape[1]} rows\")\n",
"\n",
"# have a look at dataframe\n",
"df_ka_5.head(2)"
]
},
{
"cell_type": "code",
"execution_count": 220,
"id": "31034da0-d624-4bbf-88ea-127a28283d1e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"<div id=\"altair-viz-4c2cb39badb64cd2acdc9e92a137af05\"></div>\n",
"<script type=\"text/javascript\">\n",
" (function(spec, embedOpt){\n",
" let outputDiv = document.currentScript.previousElementSibling;\n",
" if (outputDiv.id !== \"altair-viz-4c2cb39badb64cd2acdc9e92a137af05\") {\n",
" outputDiv = document.getElementById(\"altair-viz-4c2cb39badb64cd2acdc9e92a137af05\");\n",
" }\n",
" const paths = {\n",
" \"vega\": \"https://cdn.jsdelivr.net/npm//vega@5?noext\",\n",
" \"vega-lib\": \"https://cdn.jsdelivr.net/npm//vega-lib?noext\",\n",
" \"vega-lite\": \"https://cdn.jsdelivr.net/npm//vega-lite@4.8.1?noext\",\n",
" \"vega-embed\": \"https://cdn.jsdelivr.net/npm//vega-embed@6?noext\",\n",
" };\n",
"\n",
" function loadScript(lib) {\n",
" return new Promise(function(resolve, reject) {\n",
" var s = document.createElement('script');\n",
" s.src = paths[lib];\n",
" s.async = true;\n",
" s.onload = () => resolve(paths[lib]);\n",
" s.onerror = () => reject(`Error loading script: ${paths[lib]}`);\n",
" document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
" });\n",
" }\n",
"\n",
" function showError(err) {\n",
" outputDiv.innerHTML = `<div class=\"error\" style=\"color:red;\">${err}</div>`;\n",
" throw err;\n",
" }\n",
"\n",
" function displayChart(vegaEmbed) {\n",
" vegaEmbed(outputDiv, spec, embedOpt)\n",
" .catch(err => showError(`Javascript Error: ${err.message}<br>This usually means there's a typo in your chart specification. See the javascript console for the full traceback.`));\n",
" }\n",
"\n",
" if(typeof define === \"function\" && define.amd) {\n",
" requirejs.config({paths});\n",
" require([\"vega-embed\"], displayChart, err => showError(`Error loading script: ${err.message}`));\n",
" } else if (typeof vegaEmbed === \"function\") {\n",
" displayChart(vegaEmbed);\n",
" } else {\n",
" loadScript(\"vega\")\n",
" .then(() => loadScript(\"vega-lite\"))\n",
" .then(() => loadScript(\"vega-embed\"))\n",
" .catch(showError)\n",
" .then(() => displayChart(vegaEmbed));\n",
" }\n",
" })({\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"data\": {\"name\": \"data-62d8294dd9a872d6a0521c797cdbe258\"}, \"mark\": {\"type\": \"circle\", \"size\": 60}, \"encoding\": {\"color\": {\"type\": \"temporal\", \"field\": \"created_at\", \"legend\": null, \"scale\": {\"scheme\": \"inferno\"}}, \"tooltip\": [{\"type\": \"nominal\", \"field\": \"tweet_id\"}, {\"type\": \"nominal\", \"field\": \"tweet_text\"}, {\"type\": \"quantitative\", \"field\": \"likes\"}, {\"type\": \"temporal\", \"field\": \"created_at\"}], \"x\": {\"type\": \"temporal\", \"field\": \"created_at\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"likes\"}}, \"selection\": {\"selector013\": {\"type\": \"interval\", \"bind\": \"scales\", \"encodings\": [\"x\", \"y\"]}}, \"width\": 600, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-62d8294dd9a872d6a0521c797cdbe258\": [{\"tweet_id\": 1356148296654479361, \"created_at\": \"2021-02-01T07:52:04\", \"tweet_text\": \"@LaPapper Der Tweet wurde gel\\u00f6scht, wir k\\u00f6nnen leider nicht mehr sehen, auf was Sie sich bezogen haben \\ud83d\\ude05\", \"likes\": 0.0, \"retweets\": 0.0, \"reply_count\": 0.0, \"quotes\": 0.0}, {\"tweet_id\": 1356195468406087684, \"created_at\": \"2021-02-01T10:59:31\", \"tweet_text\": \"#Gesch\\u00e4digterGesucht: Ein alkoholisierter 43-J\\u00e4hriger hat am Samstagmittag in einer Bahn zwischen #Kronenplatz &amp; #Marktplatz einen \\u00e4lteren Fahrgast angegriffen. Zwei jugendliche M\\u00e4dchen griffen zum Gl\\u00fcck ein. #ZivileHelden\\n\\nPM: https://t.co/8qUfvYSBoH\\n\\nEure #Polizei #Karlsruhe https://t.co/depnkQrYF5\", \"likes\": 14.0, \"retweets\": 1.0, \"reply_count\": 0.0, \"quotes\": 1.0}, {\"tweet_id\": 1356222090475679752, \"created_at\": \"2021-02-01T12:45:18\", \"tweet_text\": \"@nicidienase Wenn Sie uns Uhrzeit und Adresse sagen, k\\u00f6nnen wir Ihnen vielleicht mitteilen, was los war. Vorbildlich ist das jedenfalls nat\\u00fcrlich nicht, je nach Einsatz aber numal erforderlich, denn in Luft aufl\\u00f6sen k\\u00f6nnen wir unsere Fahrzeuge leider auch nicht...\", \"likes\": 26.0, \"retweets\": 0.0, \"reply_count\": 3.0, \"quotes\": 0.0}, {\"tweet_id\": 1356227085077983232, \"created_at\": \"2021-02-01T13:05:09\", \"tweet_text\": \"@kayxz76 Dort war ein Einsatz aufgrund einer psychisch auff\\u00e4lligen Person.\", \"likes\": 2.0, \"retweets\": 0.0, \"reply_count\": 1.0, \"quotes\": 0.0}, {\"tweet_id\": 1356227341933023232, \"created_at\": \"2021-02-01T13:06:10\", \"tweet_text\": \"@kayxz76 Gerne und Danke ebenso \\ud83d\\udc4d\", \"likes\": 1.0, \"retweets\": 0.0, \"reply_count\": 0.0, \"quotes\": 0.0}, {\"tweet_id\": 1356229958767681538, \"created_at\": \"2021-02-01T13:16:34\", \"tweet_text\": \"@nicidienase Danke f\\u00fcr die Angaben. Sollen wir es als Beschwerde weiterleiten?\", \"likes\": null, \"retweets\": null, \"reply_count\": null, \"quotes\": null}, {\"tweet_id\": 1356230195804561408, \"created_at\": \"2021-02-01T13:17:30\", \"tweet_text\": \"@nicidienase Danke f\\u00fcr die Angaben... Zum Mittagessen holen sind Sonderrechte nat\\u00fcrlich nicht gedacht. Sollen wir es als Beschwerde weiterleiten?\", \"likes\": 6.0, \"retweets\": 0.0, \"reply_count\": 3.0, \"quotes\": 0.0}, {\"tweet_id\": 1356231946423197704, \"created_at\": \"2021-02-01T13:24:28\", \"tweet_text\": \"@Lanschier @nicidienase Sorry, auch wenn Sie in diesem Fall offenbar Recht haben, aber allgemein stimmt es nicht, was Sie schreiben. \\nWenn Sonderrechte berechtigt wahrgenommen werden, dann ist es keine Owi, denn u. A. wir sind dann von den Vorschriften der StVO befreit, siehe \\u00a735 StVO.\", \"likes\": 0.0, \"retweets\": 0.0, \"reply_count\": 3.0, \"quotes\": 0.0}, {\"tweet_id\": 1356240301011120130, \"created_at\": \"2021-02-01T13:57:40\", \"tweet_text\": \"@mikele_gross @nicidienase Ja, wir m\\u00fcssen es trotzdem von @nicidienase wissen... Denn entsprechend m\\u00fcssen wir es steuern.\", \"likes\": 0.0, \"retweets\": 0.0, \"reply_count\": 1.0, \"quotes\": 0.0}, {\"tweet_id\": 1356257509305
"</script>"
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 220,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# show chart\n",
"alt.Chart(df_ka_5).mark_circle(size=60).encode(\n",
" x='created_at',\n",
" y='likes:Q',\n",
" tooltip=['tweet_id:N','tweet_text:N','likes:Q', 'created_at:T'],\n",
" color = alt.Color('created_at', scale=alt.Scale(scheme='inferno'), legend=None),\n",
").interactive().properties(width=600) # .save('charts/df_ka_5.html', format = 'html')"
]
},
{
"cell_type": "markdown",
"id": "7174e71d-e465-4f48-ba01-35169a0f95db",
"metadata": {},
"source": [
"<code style=\"background:#ffe0b2;color:#f57c00;font-weight:bold\">KW 13</code>"
]
},
{
"cell_type": "code",
"execution_count": 221,
"id": "2475abd0-0154-428f-97fb-80e75de6d53e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"shape: 130 columns, 7 rows\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>tweet_id</th>\n",
" <th>created_at</th>\n",
" <th>tweet_text</th>\n",
" <th>likes</th>\n",
" <th>retweets</th>\n",
" <th>reply_count</th>\n",
" <th>quotes</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>33985</th>\n",
" <td>1376421994133127168</td>\n",
" <td>2021-03-29 06:32:30</td>\n",
" <td>Wir setzen mit unserer Kampagne „NICHT BEI UNS!“ ein klares Zeichen ⚠️ gegen #Diskriminierung und #Extremismus. Das Thema betrifft uns alle. Schaut Euch den ersten Clip an!“ #NICHTBEIUNS! #PolizeiBW Link zur Pressemitteilung: https://t.co/D1yLwdnBmS https://t.co/rgx5mksK0S</td>\n",
" <td>194</td>\n",
" <td>17</td>\n",
" <td>160</td>\n",
" <td>116</td>\n",
" </tr>\n",
" <tr>\n",
" <th>33999</th>\n",
" <td>1376425288435957760</td>\n",
" <td>2021-03-29 06:45:36</td>\n",
" <td>@filderbussard Normalerweise nicht, aber das gleicht sich ja über die Jahre so oder so aus 😊</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" tweet_id created_at \\\n",
"33985 1376421994133127168 2021-03-29 06:32:30 \n",
"33999 1376425288435957760 2021-03-29 06:45:36 \n",
"\n",
" tweet_text \\\n",
"33985 Wir setzen mit unserer Kampagne „NICHT BEI UNS!“ ein klares Zeichen ⚠️ gegen #Diskriminierung und #Extremismus. Das Thema betrifft uns alle. Schaut Euch den ersten Clip an!“ #NICHTBEIUNS! #PolizeiBW Link zur Pressemitteilung: https://t.co/D1yLwdnBmS https://t.co/rgx5mksK0S \n",
"33999 @filderbussard Normalerweise nicht, aber das gleicht sich ja über die Jahre so oder so aus 😊 \n",
"\n",
" likes retweets reply_count quotes \n",
"33985 194 17 160 116 \n",
"33999 1 0 0 0 "
]
},
"execution_count": 221,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# create dataframe\n",
"df_ka_13 = create_df_by_week(df_vis,13)\n",
"\n",
"# print shape\n",
"print(f\"shape: {df_ka_13.shape[0]} columns, {df_ka_13.shape[1]} rows\")\n",
"\n",
"# have a look at dataframe\n",
"df_ka_13.head(2)"
]
},
{
"cell_type": "code",
"execution_count": 222,
"id": "f42d466a-ec1d-48c9-a479-2bd46952ac3f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"<div id=\"altair-viz-58d187874c3346b79e2d20334a01fda7\"></div>\n",
"<script type=\"text/javascript\">\n",
" (function(spec, embedOpt){\n",
" let outputDiv = document.currentScript.previousElementSibling;\n",
" if (outputDiv.id !== \"altair-viz-58d187874c3346b79e2d20334a01fda7\") {\n",
" outputDiv = document.getElementById(\"altair-viz-58d187874c3346b79e2d20334a01fda7\");\n",
" }\n",
" const paths = {\n",
" \"vega\": \"https://cdn.jsdelivr.net/npm//vega@5?noext\",\n",
" \"vega-lib\": \"https://cdn.jsdelivr.net/npm//vega-lib?noext\",\n",
" \"vega-lite\": \"https://cdn.jsdelivr.net/npm//vega-lite@4.8.1?noext\",\n",
" \"vega-embed\": \"https://cdn.jsdelivr.net/npm//vega-embed@6?noext\",\n",
" };\n",
"\n",
" function loadScript(lib) {\n",
" return new Promise(function(resolve, reject) {\n",
" var s = document.createElement('script');\n",
" s.src = paths[lib];\n",
" s.async = true;\n",
" s.onload = () => resolve(paths[lib]);\n",
" s.onerror = () => reject(`Error loading script: ${paths[lib]}`);\n",
" document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
" });\n",
" }\n",
"\n",
" function showError(err) {\n",
" outputDiv.innerHTML = `<div class=\"error\" style=\"color:red;\">${err}</div>`;\n",
" throw err;\n",
" }\n",
"\n",
" function displayChart(vegaEmbed) {\n",
" vegaEmbed(outputDiv, spec, embedOpt)\n",
" .catch(err => showError(`Javascript Error: ${err.message}<br>This usually means there's a typo in your chart specification. See the javascript console for the full traceback.`));\n",
" }\n",
"\n",
" if(typeof define === \"function\" && define.amd) {\n",
" requirejs.config({paths});\n",
" require([\"vega-embed\"], displayChart, err => showError(`Error loading script: ${err.message}`));\n",
" } else if (typeof vegaEmbed === \"function\") {\n",
" displayChart(vegaEmbed);\n",
" } else {\n",
" loadScript(\"vega\")\n",
" .then(() => loadScript(\"vega-lite\"))\n",
" .then(() => loadScript(\"vega-embed\"))\n",
" .catch(showError)\n",
" .then(() => displayChart(vegaEmbed));\n",
" }\n",
" })({\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"data\": {\"name\": \"data-7674d02706a94df4d29c2af8e0dc2167\"}, \"mark\": {\"type\": \"circle\", \"size\": 60}, \"encoding\": {\"color\": {\"type\": \"temporal\", \"field\": \"created_at\", \"legend\": null, \"scale\": {\"scheme\": \"inferno\"}}, \"tooltip\": [{\"type\": \"quantitative\", \"field\": \"tweet_id\"}, {\"type\": \"nominal\", \"field\": \"tweet_text\"}, {\"type\": \"quantitative\", \"field\": \"likes\"}, {\"type\": \"temporal\", \"field\": \"created_at\"}], \"x\": {\"type\": \"temporal\", \"field\": \"created_at\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"likes\"}}, \"selection\": {\"selector014\": {\"type\": \"interval\", \"bind\": \"scales\", \"encodings\": [\"x\", \"y\"]}}, \"width\": 600, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-7674d02706a94df4d29c2af8e0dc2167\": [{\"tweet_id\": 1376421994133127168, \"created_at\": \"2021-03-29T06:32:30\", \"tweet_text\": \"Wir setzen mit unserer Kampagne \\u201eNICHT BEI UNS!\\u201c ein klares Zeichen \\u26a0\\ufe0f gegen #Diskriminierung und #Extremismus. Das Thema betrifft uns alle. Schaut Euch den ersten Clip an!\\u201c #NICHTBEIUNS! #PolizeiBW Link zur Pressemitteilung: https://t.co/D1yLwdnBmS https://t.co/rgx5mksK0S\", \"likes\": 194.0, \"retweets\": 17.0, \"reply_count\": 160.0, \"quotes\": 116.0}, {\"tweet_id\": 1376425288435957760, \"created_at\": \"2021-03-29T06:45:36\", \"tweet_text\": \"@filderbussard Normalerweise nicht, aber das gleicht sich ja \\u00fcber die Jahre so oder so aus \\ud83d\\ude0a\", \"likes\": 1.0, \"retweets\": 0.0, \"reply_count\": 0.0, \"quotes\": 0.0}, {\"tweet_id\": 1376426846938615810, \"created_at\": \"2021-03-29T06:51:47\", \"tweet_text\": \"@jojo_pollich @amazonDE Das sieht stark nach Fake aus - am besten nicht auf den Link klicken!\", \"likes\": 0.0, \"retweets\": 0.0, \"reply_count\": 0.0, \"quotes\": 0.0}, {\"tweet_id\": 1376432512734482432, \"created_at\": \"2021-03-29T07:14:18\", \"tweet_text\": \"@_stk @MeinungZuAlles Wir hatten unseren Standpunkt doch erkl\\u00e4rt...\", \"likes\": 0.0, \"retweets\": 0.0, \"reply_count\": 1.0, \"quotes\": 0.0}, {\"tweet_id\": 1376451874627448834, \"created_at\": \"2021-03-29T08:31:14\", \"tweet_text\": \"@Havergoe Der Clip wird auch polizeiintern gesteuert. Die Kollegen und Kolleginnen sind mit den Inhalten vertraut.\", \"likes\": 2.0, \"retweets\": 0.0, \"reply_count\": 1.0, \"quotes\": 0.0}, {\"tweet_id\": 1376453148852772865, \"created_at\": \"2021-03-29T08:36:18\", \"tweet_text\": \"@Arqticz Bitte die Pressemitteilung lesen. Genau damit ist das Problem n\\u00e4mlich nicht gel\\u00f6st.\", \"likes\": 9.0, \"retweets\": 0.0, \"reply_count\": 5.0, \"quotes\": 1.0}, {\"tweet_id\": 1376453992381812738, \"created_at\": \"2021-03-29T08:39:39\", \"tweet_text\": \"@_stk @MeinungZuAlles Wie wir schon geschrieben haben, hatten wir diese Diskussion doch bereits und wir werden sie nicht erneut f\\u00fchren. Was Sie tun bleibt Ihnen \\u00fcberlassen.\", \"likes\": 0.0, \"retweets\": 0.0, \"reply_count\": 1.0, \"quotes\": 0.0}, {\"tweet_id\": 1376473483715227648, \"created_at\": \"2021-03-29T09:57:06\", \"tweet_text\": \"Einsatz Samstagfr\\u00fch in der Karlsruher Oststadt: Nach Bedrohung mit Messer Reizstoff in Gesicht gespr\\u00fcht, zur PM: https://t.co/KPS8ztkPSc\\n\\nEure #Polizei #Karlsruhe https://t.co/VERUmwEuV8\", \"likes\": 4.0, \"retweets\": 1.0, \"reply_count\": 0.0, \"quotes\": 0.0}, {\"tweet_id\": 1376484033002110977, \"created_at\": \"2021-03-29T10:39:01\", \"tweet_text\": \"@ldeniz_ @RegierungBW Genau das soll nicht die L\\u00f6sung sein, deswegen ja auch \\\"Nicht bei uns\\\"...\", \"likes\": 6.0, \"retweets\": 1.0, \"reply_count\": 2.0, \"quotes\": 1.0}, {\"tweet_id\": 1376485430674198537, \"created_at\": \"2021-03-29T10:44:35\", \"tweet_text\": \"@Tokata01 Ein solches Urteil ist uns ehrlich gesagt nicht gel\\u00e4ufig. Wir kennen nur z. einen aktuellen Fall, in dem eine Polizistin suspendiert wurde, weil sie NICHT widersprochen hat.
"</script>"
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 222,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# show chart \n",
"alt.Chart(df_ka_13).mark_circle(size=60).encode(\n",
" x='created_at',\n",
" y='likes',\n",
" tooltip=['tweet_id','tweet_text','likes', 'created_at'],\n",
" color = alt.Color('created_at', scale=alt.Scale(scheme='inferno'), legend=None),\n",
").interactive().properties(width=600)"
]
},
{
"cell_type": "markdown",
"id": "4008a4f6-1ce1-41b4-9f6b-f0572aef1255",
"metadata": {},
"source": [
"<code style=\"background:#ffe0b2;color:#f57c00;font-weight:bold\">KW 47</code>"
]
},
{
"cell_type": "code",
"execution_count": 223,
"id": "b1c99acc-40ea-4819-8f32-06cbf80f8a32",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"shape: 115 columns, 7 rows\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>tweet_id</th>\n",
" <th>created_at</th>\n",
" <th>tweet_text</th>\n",
" <th>likes</th>\n",
" <th>retweets</th>\n",
" <th>reply_count</th>\n",
" <th>quotes</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>21304</th>\n",
" <td>1356148296654479361</td>\n",
" <td>2021-02-01 07:52:04</td>\n",
" <td>@LaPapper Der Tweet wurde gelöscht, wir können leider nicht mehr sehen, auf was Sie sich bezogen haben 😅</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21422</th>\n",
" <td>1356195468406087684</td>\n",
" <td>2021-02-01 10:59:31</td>\n",
" <td>#GeschädigterGesucht: Ein alkoholisierter 43-Jähriger hat am Samstagmittag in einer Bahn zwischen #Kronenplatz &amp;amp; #Marktplatz einen älteren Fahrgast angegriffen. Zwei jugendliche Mädchen griffen zum Glück ein. #ZivileHelden\\n\\nPM: https://t.co/8qUfvYSBoH\\n\\nEure #Polizei #Karlsruhe https://t.co/depnkQrYF5</td>\n",
" <td>14</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" tweet_id created_at \\\n",
"21304 1356148296654479361 2021-02-01 07:52:04 \n",
"21422 1356195468406087684 2021-02-01 10:59:31 \n",
"\n",
" tweet_text \\\n",
"21304 @LaPapper Der Tweet wurde gelöscht, wir können leider nicht mehr sehen, auf was Sie sich bezogen haben 😅 \n",
"21422 #GeschädigterGesucht: Ein alkoholisierter 43-Jähriger hat am Samstagmittag in einer Bahn zwischen #Kronenplatz &amp; #Marktplatz einen älteren Fahrgast angegriffen. Zwei jugendliche Mädchen griffen zum Glück ein. #ZivileHelden\\n\\nPM: https://t.co/8qUfvYSBoH\\n\\nEure #Polizei #Karlsruhe https://t.co/depnkQrYF5 \n",
"\n",
" likes retweets reply_count quotes \n",
"21304 0 0 0 0 \n",
"21422 14 1 0 1 "
]
},
"execution_count": 223,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# create dataframe\n",
"df_ka_47 = create_df_by_week(df_vis,5)\n",
"\n",
"# print shape\n",
"print(f\"shape: {df_ka_47.shape[0]} columns, {df_ka_47.shape[1]} rows\")\n",
"\n",
"# have a look at dataframe\n",
"df_ka_47.head(2)"
]
},
{
"cell_type": "code",
"execution_count": 224,
"id": "86f5ae85-c3d6-46ef-abc7-12379aa9cf7f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"<div id=\"altair-viz-c03a271c4d9940559f713d528122291a\"></div>\n",
"<script type=\"text/javascript\">\n",
" (function(spec, embedOpt){\n",
" let outputDiv = document.currentScript.previousElementSibling;\n",
" if (outputDiv.id !== \"altair-viz-c03a271c4d9940559f713d528122291a\") {\n",
" outputDiv = document.getElementById(\"altair-viz-c03a271c4d9940559f713d528122291a\");\n",
" }\n",
" const paths = {\n",
" \"vega\": \"https://cdn.jsdelivr.net/npm//vega@5?noext\",\n",
" \"vega-lib\": \"https://cdn.jsdelivr.net/npm//vega-lib?noext\",\n",
" \"vega-lite\": \"https://cdn.jsdelivr.net/npm//vega-lite@4.8.1?noext\",\n",
" \"vega-embed\": \"https://cdn.jsdelivr.net/npm//vega-embed@6?noext\",\n",
" };\n",
"\n",
" function loadScript(lib) {\n",
" return new Promise(function(resolve, reject) {\n",
" var s = document.createElement('script');\n",
" s.src = paths[lib];\n",
" s.async = true;\n",
" s.onload = () => resolve(paths[lib]);\n",
" s.onerror = () => reject(`Error loading script: ${paths[lib]}`);\n",
" document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
" });\n",
" }\n",
"\n",
" function showError(err) {\n",
" outputDiv.innerHTML = `<div class=\"error\" style=\"color:red;\">${err}</div>`;\n",
" throw err;\n",
" }\n",
"\n",
" function displayChart(vegaEmbed) {\n",
" vegaEmbed(outputDiv, spec, embedOpt)\n",
" .catch(err => showError(`Javascript Error: ${err.message}<br>This usually means there's a typo in your chart specification. See the javascript console for the full traceback.`));\n",
" }\n",
"\n",
" if(typeof define === \"function\" && define.amd) {\n",
" requirejs.config({paths});\n",
" require([\"vega-embed\"], displayChart, err => showError(`Error loading script: ${err.message}`));\n",
" } else if (typeof vegaEmbed === \"function\") {\n",
" displayChart(vegaEmbed);\n",
" } else {\n",
" loadScript(\"vega\")\n",
" .then(() => loadScript(\"vega-lite\"))\n",
" .then(() => loadScript(\"vega-embed\"))\n",
" .catch(showError)\n",
" .then(() => displayChart(vegaEmbed));\n",
" }\n",
" })({\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"data\": {\"name\": \"data-62d8294dd9a872d6a0521c797cdbe258\"}, \"mark\": {\"type\": \"circle\", \"size\": 60}, \"encoding\": {\"color\": {\"type\": \"temporal\", \"field\": \"created_at\", \"legend\": null, \"scale\": {\"scheme\": \"inferno\"}}, \"tooltip\": [{\"type\": \"quantitative\", \"field\": \"tweet_id\"}, {\"type\": \"nominal\", \"field\": \"tweet_text\"}, {\"type\": \"quantitative\", \"field\": \"likes\"}, {\"type\": \"temporal\", \"field\": \"created_at\"}], \"x\": {\"type\": \"temporal\", \"field\": \"created_at\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"likes\"}}, \"selection\": {\"selector015\": {\"type\": \"interval\", \"bind\": \"scales\", \"encodings\": [\"x\", \"y\"]}}, \"width\": 600, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-62d8294dd9a872d6a0521c797cdbe258\": [{\"tweet_id\": 1356148296654479361, \"created_at\": \"2021-02-01T07:52:04\", \"tweet_text\": \"@LaPapper Der Tweet wurde gel\\u00f6scht, wir k\\u00f6nnen leider nicht mehr sehen, auf was Sie sich bezogen haben \\ud83d\\ude05\", \"likes\": 0.0, \"retweets\": 0.0, \"reply_count\": 0.0, \"quotes\": 0.0}, {\"tweet_id\": 1356195468406087684, \"created_at\": \"2021-02-01T10:59:31\", \"tweet_text\": \"#Gesch\\u00e4digterGesucht: Ein alkoholisierter 43-J\\u00e4hriger hat am Samstagmittag in einer Bahn zwischen #Kronenplatz &amp; #Marktplatz einen \\u00e4lteren Fahrgast angegriffen. Zwei jugendliche M\\u00e4dchen griffen zum Gl\\u00fcck ein. #ZivileHelden\\n\\nPM: https://t.co/8qUfvYSBoH\\n\\nEure #Polizei #Karlsruhe https://t.co/depnkQrYF5\", \"likes\": 14.0, \"retweets\": 1.0, \"reply_count\": 0.0, \"quotes\": 1.0}, {\"tweet_id\": 1356222090475679752, \"created_at\": \"2021-02-01T12:45:18\", \"tweet_text\": \"@nicidienase Wenn Sie uns Uhrzeit und Adresse sagen, k\\u00f6nnen wir Ihnen vielleicht mitteilen, was los war. Vorbildlich ist das jedenfalls nat\\u00fcrlich nicht, je nach Einsatz aber numal erforderlich, denn in Luft aufl\\u00f6sen k\\u00f6nnen wir unsere Fahrzeuge leider auch nicht...\", \"likes\": 26.0, \"retweets\": 0.0, \"reply_count\": 3.0, \"quotes\": 0.0}, {\"tweet_id\": 1356227085077983232, \"created_at\": \"2021-02-01T13:05:09\", \"tweet_text\": \"@kayxz76 Dort war ein Einsatz aufgrund einer psychisch auff\\u00e4lligen Person.\", \"likes\": 2.0, \"retweets\": 0.0, \"reply_count\": 1.0, \"quotes\": 0.0}, {\"tweet_id\": 1356227341933023232, \"created_at\": \"2021-02-01T13:06:10\", \"tweet_text\": \"@kayxz76 Gerne und Danke ebenso \\ud83d\\udc4d\", \"likes\": 1.0, \"retweets\": 0.0, \"reply_count\": 0.0, \"quotes\": 0.0}, {\"tweet_id\": 1356229958767681538, \"created_at\": \"2021-02-01T13:16:34\", \"tweet_text\": \"@nicidienase Danke f\\u00fcr die Angaben. Sollen wir es als Beschwerde weiterleiten?\", \"likes\": null, \"retweets\": null, \"reply_count\": null, \"quotes\": null}, {\"tweet_id\": 1356230195804561408, \"created_at\": \"2021-02-01T13:17:30\", \"tweet_text\": \"@nicidienase Danke f\\u00fcr die Angaben... Zum Mittagessen holen sind Sonderrechte nat\\u00fcrlich nicht gedacht. Sollen wir es als Beschwerde weiterleiten?\", \"likes\": 6.0, \"retweets\": 0.0, \"reply_count\": 3.0, \"quotes\": 0.0}, {\"tweet_id\": 1356231946423197704, \"created_at\": \"2021-02-01T13:24:28\", \"tweet_text\": \"@Lanschier @nicidienase Sorry, auch wenn Sie in diesem Fall offenbar Recht haben, aber allgemein stimmt es nicht, was Sie schreiben. \\nWenn Sonderrechte berechtigt wahrgenommen werden, dann ist es keine Owi, denn u. A. wir sind dann von den Vorschriften der StVO befreit, siehe \\u00a735 StVO.\", \"likes\": 0.0, \"retweets\": 0.0, \"reply_count\": 3.0, \"quotes\": 0.0}, {\"tweet_id\": 1356240301011120130, \"created_at\": \"2021-02-01T13:57:40\", \"tweet_text\": \"@mikele_gross @nicidienase Ja, wir m\\u00fcssen es trotzdem von @nicidienase wissen... Denn entsprechend m\\u00fcssen wir es steuern.\", \"likes\": 0.0, \"retweets\": 0.0, \"reply_count\": 1.0, \"quotes\": 0.0}, {\"tweet_id\": 13562575
"</script>"
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 224,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# show chart\n",
"alt.Chart(df_ka_47).mark_circle(size=60).encode(\n",
" x='created_at',\n",
" y='likes',\n",
" tooltip=['tweet_id','tweet_text','likes:Q', 'created_at'],\n",
" color = alt.Color('created_at', scale=alt.Scale(scheme='inferno'), legend=None),\n",
").interactive().properties(width=600)"
]
},
{
"cell_type": "markdown",
"id": "2356a635-cf76-46a6-a59b-fc6652ea1006",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
" <h2>5. Create map: <b>Wann twitterte welche Polizei wie viel</b>?</h2>\n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": 225,
"id": "b97b26f7-2fa4-4790-9af7-3ae4e0219508",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Polizei 3452\n",
"Bundespolizei 228 \n",
"Landeskriminalamt 106 \n",
"Polizeipräsidium 35 \n",
"Bundesbereitschaftspolizei 10 \n",
"Name: Typ, dtype: int64"
]
},
"execution_count": 225,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# add column containing year\n",
"df_cities = df\n",
"df_cities['year'] = df['created_at'].dt.isocalendar().year\n",
"\n",
"# count tweets per city and week\n",
"df_cities = df_cities.groupby(['name', 'handle', 'Typ', 'Bundesland', 'Stadt', 'LAT', 'LONG', 'year', 'week']).agg({'tweet_id': 'count'}).reset_index()\n",
"\n",
"# show available types and how many of them exist in dataframe\n",
"df_cities['Typ'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 226,
"id": "bebed7d8-f98c-46e4-b123-47b2d48ffc0a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>handle</th>\n",
" <th>Typ</th>\n",
" <th>Bundesland</th>\n",
" <th>Stadt</th>\n",
" <th>LAT</th>\n",
" <th>LONG</th>\n",
" <th>year</th>\n",
" <th>week</th>\n",
" <th>tweet_id</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>344</th>\n",
" <td>Polizei Aalen</td>\n",
" <td>polizeiaalen</td>\n",
" <td>Polizei</td>\n",
" <td>Baden-Württemberg</td>\n",
" <td>Aalen</td>\n",
" <td>48.836689</td>\n",
" <td>10.097116</td>\n",
" <td>2020</td>\n",
" <td>44</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>345</th>\n",
" <td>Polizei Aalen</td>\n",
" <td>polizeiaalen</td>\n",
" <td>Polizei</td>\n",
" <td>Baden-Württemberg</td>\n",
" <td>Aalen</td>\n",
" <td>48.836689</td>\n",
" <td>10.097116</td>\n",
" <td>2020</td>\n",
" <td>45</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>346</th>\n",
" <td>Polizei Aalen</td>\n",
" <td>polizeiaalen</td>\n",
" <td>Polizei</td>\n",
" <td>Baden-Württemberg</td>\n",
" <td>Aalen</td>\n",
" <td>48.836689</td>\n",
" <td>10.097116</td>\n",
" <td>2020</td>\n",
" <td>46</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>347</th>\n",
" <td>Polizei Aalen</td>\n",
" <td>polizeiaalen</td>\n",
" <td>Polizei</td>\n",
" <td>Baden-Württemberg</td>\n",
" <td>Aalen</td>\n",
" <td>48.836689</td>\n",
" <td>10.097116</td>\n",
" <td>2020</td>\n",
" <td>47</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>348</th>\n",
" <td>Polizei Aalen</td>\n",
" <td>polizeiaalen</td>\n",
" <td>Polizei</td>\n",
" <td>Baden-Württemberg</td>\n",
" <td>Aalen</td>\n",
" <td>48.836689</td>\n",
" <td>10.097116</td>\n",
" <td>2020</td>\n",
" <td>48</td>\n",
" <td>7</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name handle Typ Bundesland Stadt \\\n",
"344 Polizei Aalen polizeiaalen Polizei Baden-Württemberg Aalen \n",
"345 Polizei Aalen polizeiaalen Polizei Baden-Württemberg Aalen \n",
"346 Polizei Aalen polizeiaalen Polizei Baden-Württemberg Aalen \n",
"347 Polizei Aalen polizeiaalen Polizei Baden-Württemberg Aalen \n",
"348 Polizei Aalen polizeiaalen Polizei Baden-Württemberg Aalen \n",
"\n",
" LAT LONG year week tweet_id \n",
"344 48.836689 10.097116 2020 44 10 \n",
"345 48.836689 10.097116 2020 45 6 \n",
"346 48.836689 10.097116 2020 46 5 \n",
"347 48.836689 10.097116 2020 47 4 \n",
"348 48.836689 10.097116 2020 48 7 "
]
},
"execution_count": 226,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# remove tweets that have unwanted types (~ means not)\n",
"df_cities = df_cities[~df_cities['Typ'].isin([\"Landeskriminalamt\", \"Bundesbereitschaftspolizei\", \"Bundespolizei\"])]\n",
"\n",
"# have a look at dataframe\n",
"df_cities.head()"
]
},
{
"cell_type": "code",
"execution_count": 227,
"id": "9dc7cdc4-85e5-4144-a5e8-095c41cdf79b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"29"
]
},
"execution_count": 227,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# how many weeks do have data? \n",
"len(df_cities['week'].unique())"
]
},
{
"cell_type": "markdown",
"id": "cfacf660-b159-4c29-8203-65934017ffba",
"metadata": {},
"source": [
"<code style=\"background:#ffbdbd;color:#680E0E;font-weight:bold\">Caution: The following code will create a subfolder in a folder called \"charts\" and save images in png format there! </code>"
]
},
{
"cell_type": "code",
"execution_count": 229,
"id": "32a15314-f700-4481-90b1-1445907e0c98",
"metadata": {},
"outputs": [],
"source": [
"# create folders if they do not already exist\n",
"if not os.path.exists('charts/tweets-pro-woche'):\n",
" os.makedirs('charts/tweets-pro-woche')\n",
"\n",
"# load world map\n",
"from vega_datasets import data\n",
"\n",
"# create and export png maps\n",
"for i in range(1,54):\n",
" \n",
" # filter df_cities by week and save to dataframe \"tweet_count\"\n",
" tweet_count = df_cities[df_cities['week'] == i].reset_index()\n",
" tweet_count = tweet_count.rename(columns=({'tweet_id': 'Anzahl Tweets'}))\n",
" \n",
" try:\n",
" # get year if data available, else pass\n",
" year = tweet_count['year'][0]\n",
" except:\n",
" pass\n",
"\n",
" # save geodata from vega_datasets to variable \"countries\"\n",
" countries = alt.topo_feature(data.world_110m.url, 'countries')\n",
" \n",
" # define basic values appropriate for map of Germany\n",
" projection = 'mercator' # select Mercator projection\n",
" scale = 1800 # Magnify\n",
" center = [10,51.5] # [lon, lat]\n",
" clip_extent = [[0, 0], [600, 600]] # [[left, top], [right, bottom]]\n",
"\n",
" # create background map\n",
" background = alt.Chart(countries).mark_geoshape(\n",
" fill='lightgray',\n",
" stroke='white'\n",
" ).project(\n",
" type = projection,\n",
" scale = scale, \n",
" center = center, \n",
" clipExtent= clip_extent, \n",
" ).properties(\n",
" title=f'So viel twitterte die Polizei im Jahr {year} in Kalenderwoche {i}',\n",
" width=600, height=600\n",
" )\n",
"\n",
" # create points\n",
" points = alt.Chart(tweet_count).mark_circle().encode(\n",
" longitude='LONG:Q',\n",
" latitude='LAT:Q',\n",
" size=alt.Size('Anzahl Tweets:Q'),\n",
" color=alt.Color('week', scale=alt.Scale(domain=['week'], range=['#154889']), legend=None),\n",
" tooltip=['handle:N','name:N','Stadt:N','Anzahl Tweets:Q','LAT:Q','LONG:Q'],\n",
" ).project(\n",
" type= projection,\n",
" scale= scale,\n",
" center= center,\n",
" clipExtent= clip_extent,\n",
" )\n",
"\n",
" # export background map and points to png files in subfolders\n",
" (background + points).save(f\"charts/tweets-pro-woche/pol_cities_kw-{i:02d}.png\", format = 'png') "
]
},
{
"cell_type": "code",
"execution_count": 230,
"id": "42694a70-8225-43cd-a1eb-3bbc9c4ae984",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53]\n"
]
},
{
"data": {
"text/plain": [
"['charts/tweets-pro-woche/pol_cities_kw-01.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-02.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-03.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-04.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-05.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-06.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-07.png']"
]
},
"execution_count": 230,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# print every week for which data is available\n",
"list_weeks_with_data = sorted(df_cities['week'].unique())\n",
"print(list_weeks_with_data)\n",
"\n",
"# get all images in directory\n",
"import glob\n",
"imgs = sorted(glob.glob(\"charts/tweets-pro-woche/*.png\"))\n",
"\n",
"# sort images\n",
"imgs = sorted(imgs)\n",
"\n",
"# show first items in image list as an example (remove square brackets and numbers to get full list)\n",
"imgs[0:7]"
]
},
{
"cell_type": "code",
"execution_count": 231,
"id": "1be5ca11-362d-4783-886c-79ea87681da8",
"metadata": {},
"outputs": [],
"source": [
"# manually create list of images (due to missing values and dates from different years, this is fastest method)\n",
"\n",
"imgs = ['charts/tweets-pro-woche/pol_cities_kw-49.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-44.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-45.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-46.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-47.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-48.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-49.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-50.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-51.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-52.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-53.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-01.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-02.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-03.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-04.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-05.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-06.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-07.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-08.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-09.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-10.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-11.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-12.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-13.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-14.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-15.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-16.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-17.png',\n",
" 'charts/tweets-pro-woche/pol_cities_kw-18.png'\n",
"]"
]
},
{
"cell_type": "markdown",
"id": "7d4a7e8d-fce1-4058-9309-54c91e360aa1",
"metadata": {},
"source": [
"<code style=\"background:#ffbdbd;color:#680E0E;font-weight:bold\">Caution: The following code will save a gif in your charts folder: \"map_tweets_per_week.gif\"! </code>"
]
},
{
"cell_type": "code",
"execution_count": 232,
"id": "9c98a418-6b21-4170-824f-53cb81331cc3",
"metadata": {},
"outputs": [],
"source": [
"# create gif of maps\n",
"\n",
"# import python pillow library\n",
"from PIL import Image\n",
"\n",
"# Create the frames\n",
"frames = []\n",
"\n",
"# loop through images and append each to list of frames\n",
"for i in imgs:\n",
" new_frame = Image.open(i)\n",
" frames.append(new_frame)\n",
"\n",
"# create folder if not already exists\n",
"if not os.path.exists('charts'):\n",
" os.makedirs('charts')\n",
"\n",
"# save into a GIF file that loops forever\n",
"frames[0].save('charts/map_tweets_per_week.gif', format='GIF',\n",
" append_images=frames[1:],\n",
" save_all=True,\n",
" duration=300, loop=0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ca7bd59d-1f6a-4660-9fca-5248f78704ae",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "python-scientific kernel",
"language": "python",
"name": "python-scientific"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
},
"toc-autonumbering": false,
"toc-showcode": false,
"toc-showmarkdowntxt": false
},
"nbformat": 4,
"nbformat_minor": 5
}