<div class="alert alert-block alert-warning">
    <h2><b>COPBIRD – TEAM 21</b></h2>
</div>

**What is CopBird?** It's a project that evaluates the behavior of the German police on Twitter. This jupyter notebook was created during the hackathon from May 21 to May 23, 2021. More information on the project can be found [here](https://copbird.org/).

**Where can I get the data?** Unfortunately, the full data is not published because its usage is restricted to scientific research only. Nevertheless, the tweet IDs can be downloaded [here](https://copbird.org/assets/tweet_id.csv).

**Where should I place this notebook?** Please put this file in a directory that contains also a folder called "data" including all the necessary data in csv format. Your folder should look like this:

```
.
├── charts                         # folder for results, will be created if not existing
│   └── tweets-pro-woche           # -- " --
├── copbird.ipynb                  # this file
└── data                           # folder "data"
    ├── copbird_table_entity.csv   # necessary data files in csv format
    ├── copbird_table_tweet.csv    # -- " --
    ├── copbird_table_user.csv     # -- " --
    └── polizei_accounts_geo.csv   # -- " --

```

**How can I use this notebook?** To make sure that everythink works properly, all cells should be run in order. Verbose comments should make it understandable for noobs.

<code style="background:#ffbdbd;color:#680E0E;font-weight:bold">Caution: A message like this indicates if a cell will change your system, e.g. save image files or create folders! </code>

**Which libraries do I need?** You will need [pandas](https://pandas.pydata.org/) to analyze the data, [altair](https://altair-viz.github.io/) to visualize the data, [vega_datasets](https://github.com/vega/vega-datasets), and [pillow](https://python-pillow.org/), the fork of PIL, the Python Imaging Library. Please install them, e.g. by using the following command: `pip install pandas altair vega_datasets pillow`. Additionally, we will use the modules `os` and `glob` as parts of the standard library which do not need to be installed separately.

**How can I change the view?** https://pandas.pydata.org/docs/user_guide/options.html

<div class="alert alert-block alert-warning">
    <h2>0. Preparation</h2>
</div>

In [191]:
import pandas as pd   # analysis
import altair as alt  # visualization 

import os             # work with files and folders

In [192]:
# settings

# suppress decimal places in floats (= keine Nachkommastellen anzeigen)
pd.options.display.float_format = '{:,.0f}'.format

# wrap text with no whitespace
pd.set_option('display.max_colwidth', 0)

In [1]:
# import datasets
entities = pd.read_csv("data/copbird_table_entity.csv")
tweets = pd.read_csv("data/copbird_table_tweet.csv")
users = pd.read_csv("data/copbird_table_user.csv")
locations = pd.read_csv("data/polizei_accounts_geo.csv", sep = "\t")


NameError: name 'pd' is not defined

<div class="alert alert-block alert-warning">
    <h2>1. Exploration</h2>
</div>

In [194]:
# explore entities
print(f"shape: {entities.shape[0]} rows, {entities.shape[1]} columns")
entities.head()

shape: 131424 rows, 3 columns


Unnamed: 0,tweet_id,tag,entity_type
0,1321021123463663616,mahanna196,mention
1,1321025127388188673,bka,mention
2,1321028108665950208,StrupeitVolker,mention
3,1321029199998656513,bka,mention
4,1321032307277443072,Sitewinder,mention


In [195]:
# explore column entity_type of entities:
# show all entity types and corresponding amount of values
entities['entity_type'].value_counts()

hashtag    71313
url        35635
mention    24476
Name: entity_type, dtype: int64

In [196]:
# explore tweets
print(f"shape: {tweets.shape[0]} rows, {tweets.shape[1]} columns")
tweets.head(5)

shape: 45001 rows, 8 columns


Unnamed: 0,id,tweet_text,created_at,user_id,like_count,retweet_count,reply_count,quote_count
0,1321021123463663616,"@mahanna196 Da die Stadt keine Ausnahme für Radfahrer aufgeführt hat, gilt diese (Stand jetzt) auch für Radfahrer. *sr",2020-10-27 09:29:13,778895426007203840,2.0,1.0,2.0,0.0
1,1321023114071969792,"#Zeugengesucht\nDie Hintergründe zu dem Tötungsdelikt in #Gesundbrunnen sind bislang unklar. Unsere 6. #MoKo sucht daher nach Zeugen, die Hinweise zu der Tötung von Mila SIMIC geben können.\n\n☎️(030) 4664-911666\n\n#PM &amp; Foto:\nhttps://t.co/cwzVsRWdCN\n\n^tsm https://t.co/JdeEh04UAH",2020-10-27 09:37:08,2397974054,20.0,24.0,4.0,1.0
2,1321025127388188673,RT @bka: EUROPE´S MOST WANTED – Sexualstraftäter nach Vergewaltigung einer Minderjährigen gesucht! \n➡️https://t.co/CoaTgx9qAR \n➡️https://t.…,2020-10-27 09:45:08,2397974054,,,,
3,1321028108665950208,"@StrupeitVolker Wir verstehen nicht so recht was Sie wollen, aber kennen Sie das mit dem Glashaus?",2020-10-27 09:56:59,2810902381,55.0,2.0,3.0,0.0
4,1321029199998656513,Wir unterstützen das @bka bei der #Öffentlichkeitsfahndung nach einem Tatverdächtigen zur Vergewaltigung einer Minderjährigen. Foto und Personenbeschreibung des Mannes finden Sie hier: https://t.co/YP8bLuakMF https://t.co/ooh75YQjgX,2020-10-27 10:01:19,223758384,16.0,9.0,5.0,0.0


In [197]:
# show tweet example
tweets['tweet_text'][0]

'@mahanna196 Da die Stadt keine Ausnahme für Radfahrer aufgeführt hat, gilt diese (Stand jetzt) auch für Radfahrer. *sr'

In [198]:
# explore users
print(f"shape: {users.shape[0]} rows, {users.shape[1]} columns")
users.head()

shape: 161 rows, 3 columns


Unnamed: 0,id,name,handle
0,1032561433102434304,Polizei Wittlich,PolizeiWittlich
1,1143867545226764293,Bayerisches Landeskriminalamt,LKA_Bayern
2,1169206134189830145,Polizei Stendal,Polizei_SDL
3,1184024283342950401,Polizei Ravensburg,PolizeiRV
4,1232548941889228808,Polizei Bad Nenndorf,Polizei_BadN


In [199]:
# explore locations
print(f"shape: {locations.shape[0]} rows, {locations.shape[1]} columns")
locations.head()

shape: 163 rows, 7 columns


Unnamed: 0,Polizei Account,Name,Typ,Bundesland,Stadt,LAT,LONG
0,bpol_11,Bundespolizei Spezialkräfte,Bundespolizei,-,-,-,
1,bpol_bepo,Bundesbereitschaftspolizei,Bundesbereitschaftspolizei,-,-,-,-
2,bpol_air_fra,Bundespolizei Flughafen Frankfurt am Main,Bundespolizei,Hessen,Frankfurt am Main,50.1109221,8.6821267
3,bpol_b,Bundespolizei Berlin,Bundespolizei,Berlin,Berlin,52.520007,13.404954
4,bpol_b_einsatz,Bundespolizei Berlin Einsatz,Bundespolizei,Berlin,Berlin,52.520007,13.404954


<div class="alert alert-block alert-warning">
    <h2>2. Combine tweets and users to working dataframe <b>df</b> </h2>
</div>

In [200]:
# merge dataframes tweets and users
df = tweets.merge(users, how = "left", left_on = "user_id", right_on="id")

In [201]:
# have a look at new dataframe
df.head(1)

Unnamed: 0,id_x,tweet_text,created_at,user_id,like_count,retweet_count,reply_count,quote_count,id_y,name,handle
0,1321021123463663616,"@mahanna196 Da die Stadt keine Ausnahme für Radfahrer aufgeführt hat, gilt diese (Stand jetzt) auch für Radfahrer. *sr",2020-10-27 09:29:13,778895426007203840,2,1,2,0,778895426007203840,Polizei Oldenburg-Stadt/Ammerl,Polizei_OL


In [202]:
# necessary adjustments

# rename columns
df = df.rename(columns={"id_x": "tweet_id"})

# drop duplicate columns
df = df.drop(columns="id_y")

# show dataframe again
df.head(2)

Unnamed: 0,tweet_id,tweet_text,created_at,user_id,like_count,retweet_count,reply_count,quote_count,name,handle
0,1321021123463663616,"@mahanna196 Da die Stadt keine Ausnahme für Radfahrer aufgeführt hat, gilt diese (Stand jetzt) auch für Radfahrer. *sr",2020-10-27 09:29:13,778895426007203840,2,1,2,0,Polizei Oldenburg-Stadt/Ammerl,Polizei_OL
1,1321023114071969792,"#Zeugengesucht\nDie Hintergründe zu dem Tötungsdelikt in #Gesundbrunnen sind bislang unklar. Unsere 6. #MoKo sucht daher nach Zeugen, die Hinweise zu der Tötung von Mila SIMIC geben können.\n\n☎️(030) 4664-911666\n\n#PM &amp; Foto:\nhttps://t.co/cwzVsRWdCN\n\n^tsm https://t.co/JdeEh04UAH",2020-10-27 09:37:08,2397974054,20,24,4,1,Polizei Berlin,polizeiberlin


In [203]:
# show datatypes of new dataframe
df.dtypes

tweet_id         int64  
tweet_text       object 
created_at       object 
user_id          int64  
like_count       float64
retweet_count    float64
reply_count      float64
quote_count      float64
name             object 
handle           object 
dtype: object

In [204]:
# convert date column to datetime format
df['created_at'] = pd.to_datetime(df['created_at'])

In [205]:
# add location details

# preparation: necessary because values are spelled differently in columns needed for merge
locations['Polizei Account'] = locations["Polizei Account"].str.replace(' ', '') # delete spaces 
df['handle'] = df['handle'].str.lower() # convert everything to lower case

# merge tables
df = df.merge(locations, how = "left", left_on = "handle", right_on="Polizei Account")

In [206]:
# add column with week number
df['week'] = df['created_at'].dt.isocalendar().week

In [207]:
# show new dataframe
df.head(1)

Unnamed: 0,tweet_id,tweet_text,created_at,user_id,like_count,retweet_count,reply_count,quote_count,name,handle,Polizei Account,Name,Typ,Bundesland,Stadt,LAT,LONG,week
0,1321021123463663616,"@mahanna196 Da die Stadt keine Ausnahme für Radfahrer aufgeführt hat, gilt diese (Stand jetzt) auch für Radfahrer. *sr",2020-10-27 09:29:13,778895426007203840,2,1,2,0,Polizei Oldenburg-Stadt/Ammerl,polizei_ol,polizei_ol,Polizei Oldenburg-Stadt/Ammerland,Polizei,Niedersachsen,Oldenburg,53.1389753,8.2146017,44


<div class="alert alert-block alert-warning">
    <h2>3. Analyze: <b>Welches sind die 50 aktivsten Polizei-Accounts?</b></h2>
</div>

In [208]:
# prepare dataframe for visualization
df_vis = df.groupby(['name', 'handle', 'user_id']).agg({"tweet_id": 'count'}).reset_index()

# rename columns
df_vis = df_vis.rename(columns = {'tweet_id': 'tweet_count'})

# show df_vis
df_vis.head()

Unnamed: 0,name,handle,user_id,tweet_count
0,Bayerisches Landeskriminalamt,lka_bayern,1143867545226764293,84
1,Bundesbereitschaftspolizei,bpol_bepo,4876078570,29
2,Bundespolizei Baden-Württember,bpol_bw,3169257933,488
3,Bundespolizei Bayern,bpol_by,3169867654,285
4,Bundespolizei Berlin,bpol_b,4876039738,115


In [209]:
# how many accounts are in dataset?
df_vis.shape[0]

161

In [210]:
# only use 50 accounts with most tweets in dataset 
df_vis = df_vis.sort_values(by='tweet_count', ascending = False)[0:50]

<code style="background:#ffbdbd;color:#680E0E;font-weight:bold">Caution: If you remove the '#' symbols in lines 2,3 and 16, the following code will save a png file called "barchart_most_active_50" in a new folder named "charts". If you don't change anything, the chart will be shown in this notebook. </code>

In [None]:
# create folder if not already exists
#if not os.path.exists('charts'):
    #os.makedirs('charts')

# draw bar chart
bar = alt.Chart(df_vis).mark_bar().encode(
    x=alt.X('tweet_count:Q'),
    y=alt.Y('name:O', sort='-x'),
    tooltip = 'tweet_count'
)

rule = alt.Chart(df_vis).mark_rule(color='red').encode(
    x='mean(tweet_count):Q'
)

(bar + rule).properties(width=600)#.save("barchart_most_active_50.png", format = "png")

In [212]:
# create list with 50 accounts with most tweets for later usage
top_50 = list(df_vis.user_id.unique())

# create dataset only of 50 top accounts
df_50 = df[df['user_id'].isin(top_50)==True]

<div class="alert alert-block alert-warning">
    <h2>4. Analyze: <b>Welche Dienststelle setzt wann wie viele Tweets ab?</b></h2>
</div>

In [213]:
# limit to 50 most active accounts
df_vis = df_50[['created_at', 'user_id', 'handle', 'tweet_id']]

# count tweets over time
df_vis = df_vis.groupby(['handle', 'user_id', 'created_at']).agg({"tweet_id": ['count']}).reset_index()

# have a look at new created df_vis
df_vis.head()

Unnamed: 0_level_0,handle,user_id,created_at,tweet_id
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,count
0,bpol_bw,3169257933,2020-11-09 06:47:09,1
1,bpol_bw,3169257933,2020-11-09 09:03:03,1
2,bpol_bw,3169257933,2020-11-09 09:13:18,1
3,bpol_bw,3169257933,2020-11-09 09:24:05,1
4,bpol_bw,3169257933,2020-11-09 14:58:43,1


In [214]:
# rename columns
df_vis.columns = ['handle', 'user_id', 'created_at', 'tweet_count']

# add week column
df_vis['week'] = df_vis['created_at'].dt.isocalendar().week

# again show df_vis
df_vis.head()

Unnamed: 0,handle,user_id,created_at,tweet_count,week
0,bpol_bw,3169257933,2020-11-09 06:47:09,1,46
1,bpol_bw,3169257933,2020-11-09 09:03:03,1,46
2,bpol_bw,3169257933,2020-11-09 09:13:18,1,46
3,bpol_bw,3169257933,2020-11-09 09:24:05,1,46
4,bpol_bw,3169257933,2020-11-09 14:58:43,1,46


In [215]:
# group by week to get number of tweets per week
df_vis = df_vis.groupby(['handle', 'user_id', 'week']).agg({'tweet_count': 'count'}).reset_index()

# again show df_vis
df_vis.head()

Unnamed: 0,handle,user_id,week,tweet_count
0,bpol_bw,3169257933,1,6
1,bpol_bw,3169257933,2,3
2,bpol_bw,3169257933,3,33
3,bpol_bw,3169257933,4,26
4,bpol_bw,3169257933,5,7


<code style="background:#ffbdbd;color:#680E0E;font-weight:bold">Caution: If you remove the '#' symbols in lines 2,3 and 7, the following code will save a png file called "barchart_most_active_50" in a folder named "charts". If you don't change anything, the chart will be shown in this notebook. (Press shift+L to show line numbers.) </code>

In [216]:
# create folder if not already exists
#if not os.path.exists('charts'):
#    os.makedirs('charts')

# show chart
alt.Chart(df_vis).mark_line().encode(
    x='week',
    y=alt.Y('tweet_count'),
    color = 'handle',
    tooltip = ['tweet_count','user_id', 'handle', 'week']
).interactive().properties(width=800)#.save("charts/aktive-nach-wochen.png", format = 'png')

**Achtung: Darstellung nicht ideal, da Werte zwischen KW 19 und 44 nicht existieren. Außerdem beziehen sich KW 44-53 auf das Jahr 2020, 1-19 auf das Jahr 2021**

**Durch die Exploration des Line Charts über Tooltip-Anzeigen ergeben sich weitere Fragen:**

* Was war in KW 5 und 13 und 47 in Karlsruhe los?
* Was war in KW 5 und 18, 45 und 50 Frankfurt a.M. los?
* Was war in KW 9 in Dortmund los?
* Was war in KW 12 und KW 14 in Mannheim los?
* Was war in KW 17 in Sachsen los?
* Was war in KW 46 in Mülheim an der Ruhr los?
* Was war in KW 49 in Bremen los?
* Was war in KW 49 in Gelsenkirchen los?

<div class="alert alert-block alert-warning">
    <h2>4. Analyze: <b>Was war los in Karlsruhe</b> (in den Kalenderwochen 5, 13, 47)?</h2>
</div>

In [217]:
# filter dataset of 50 most active accounts, only include rows where value in 'handle column' is 'polizei_ka'
df_vis = df_50[df_50['handle']=='polizei_ka']

# have a look at dataframe
df_vis.head(1)

Unnamed: 0,tweet_id,tweet_text,created_at,user_id,like_count,retweet_count,reply_count,quote_count,name,handle,Polizei Account,Name,Typ,Bundesland,Stadt,LAT,LONG,week
109,1321119171825012736,"Die #Staatsanwaltschaft Ka hat am Sa bzw. So beim zuständigen Amtsgericht #Haftbefehle gegen zwei Männer erwirkt. Dem 18-Jährigen wird versuchter Totschlag vorgeworfen, dem 19-Jährigen gefährliche Körperverletzung. Zur PM: https://t.co/4MrESOTo3b\n\nEure #Polizei #Karlsruhe https://t.co/RZwXmI3VPf",2020-10-27 15:58:50,3029998264,,,,,Polizei Karlsruhe,polizei_ka,polizei_ka,Polizei Karlsruhe,Polizei,Baden-Württemberg,Karlsruhe,49.0068705,8.4034195,44


In [218]:
# create function to create new dataframes filtered by week
def create_df_by_week(df,week):
    
    # create dataframe for selected week of input df
    df = df[df['week']==week]
    
    # 
    df = df[['tweet_id', 'created_at', 'tweet_text', 'like_count', 'retweet_count', 'reply_count', 'quote_count']]
    
    df = df.rename(columns = {'like_count': 'likes', 
                             'retweet_count': 'retweets', 
                             'replie_count': 'replies',
                             'quote_count': 'quotes'})
    
    return df

<code style="background:#ffe0b2;color:#f57c00;font-weight:bold">KW 5</code>

In [219]:
# create dataframe
df_ka_5 = create_df_by_week(df_vis,5)

# print shape
print(f"shape: {df_ka_5.shape[0]} columns, {df_ka_5.shape[1]} rows")

# have a look at dataframe
df_ka_5.head(2)

shape: 115 columns, 7 rows


Unnamed: 0,tweet_id,created_at,tweet_text,likes,retweets,reply_count,quotes
21304,1356148296654479361,2021-02-01 07:52:04,"@LaPapper Der Tweet wurde gelöscht, wir können leider nicht mehr sehen, auf was Sie sich bezogen haben 😅",0,0,0,0
21422,1356195468406087684,2021-02-01 10:59:31,#GeschädigterGesucht: Ein alkoholisierter 43-Jähriger hat am Samstagmittag in einer Bahn zwischen #Kronenplatz &amp; #Marktplatz einen älteren Fahrgast angegriffen. Zwei jugendliche Mädchen griffen zum Glück ein. #ZivileHelden\n\nPM: https://t.co/8qUfvYSBoH\n\nEure #Polizei #Karlsruhe https://t.co/depnkQrYF5,14,1,0,1


In [220]:
# show chart
alt.Chart(df_ka_5).mark_circle(size=60).encode(
    x='created_at',
    y='likes:Q',
    tooltip=['tweet_id:N','tweet_text:N','likes:Q', 'created_at:T'],
    color = alt.Color('created_at', scale=alt.Scale(scheme='inferno'), legend=None),
).interactive().properties(width=600) # .save('charts/df_ka_5.html', format = 'html')

<code style="background:#ffe0b2;color:#f57c00;font-weight:bold">KW 13</code>

In [221]:
# create dataframe
df_ka_13 = create_df_by_week(df_vis,13)

# print shape
print(f"shape: {df_ka_13.shape[0]} columns, {df_ka_13.shape[1]} rows")

# have a look at dataframe
df_ka_13.head(2)

shape: 130 columns, 7 rows


Unnamed: 0,tweet_id,created_at,tweet_text,likes,retweets,reply_count,quotes
33985,1376421994133127168,2021-03-29 06:32:30,Wir setzen mit unserer Kampagne „NICHT BEI UNS!“ ein klares Zeichen ⚠️ gegen #Diskriminierung und #Extremismus. Das Thema betrifft uns alle. Schaut Euch den ersten Clip an!“ #NICHTBEIUNS! #PolizeiBW Link zur Pressemitteilung: https://t.co/D1yLwdnBmS https://t.co/rgx5mksK0S,194,17,160,116
33999,1376425288435957760,2021-03-29 06:45:36,"@filderbussard Normalerweise nicht, aber das gleicht sich ja über die Jahre so oder so aus 😊",1,0,0,0


In [222]:
# show chart 
alt.Chart(df_ka_13).mark_circle(size=60).encode(
    x='created_at',
    y='likes',
    tooltip=['tweet_id','tweet_text','likes', 'created_at'],
    color = alt.Color('created_at', scale=alt.Scale(scheme='inferno'), legend=None),
).interactive().properties(width=600)

<code style="background:#ffe0b2;color:#f57c00;font-weight:bold">KW 47</code>

In [223]:
# create dataframe
df_ka_47 = create_df_by_week(df_vis,5)

# print shape
print(f"shape: {df_ka_47.shape[0]} columns, {df_ka_47.shape[1]} rows")

# have a look at dataframe
df_ka_47.head(2)

shape: 115 columns, 7 rows


Unnamed: 0,tweet_id,created_at,tweet_text,likes,retweets,reply_count,quotes
21304,1356148296654479361,2021-02-01 07:52:04,"@LaPapper Der Tweet wurde gelöscht, wir können leider nicht mehr sehen, auf was Sie sich bezogen haben 😅",0,0,0,0
21422,1356195468406087684,2021-02-01 10:59:31,#GeschädigterGesucht: Ein alkoholisierter 43-Jähriger hat am Samstagmittag in einer Bahn zwischen #Kronenplatz &amp; #Marktplatz einen älteren Fahrgast angegriffen. Zwei jugendliche Mädchen griffen zum Glück ein. #ZivileHelden\n\nPM: https://t.co/8qUfvYSBoH\n\nEure #Polizei #Karlsruhe https://t.co/depnkQrYF5,14,1,0,1


In [224]:
# show chart
alt.Chart(df_ka_47).mark_circle(size=60).encode(
    x='created_at',
    y='likes',
    tooltip=['tweet_id','tweet_text','likes:Q', 'created_at'],
    color = alt.Color('created_at', scale=alt.Scale(scheme='inferno'), legend=None),
).interactive().properties(width=600)

<div class="alert alert-block alert-warning">
    <h2>5. Create map: <b>Wann twitterte welche Polizei wie viel</b>?</h2>
</div>

In [225]:
# add column containing year
df_cities = df
df_cities['year'] = df['created_at'].dt.isocalendar().year

# count tweets per city and week
df_cities = df_cities.groupby(['name', 'handle', 'Typ', 'Bundesland', 'Stadt', 'LAT', 'LONG', 'year', 'week']).agg({'tweet_id': 'count'}).reset_index()

# show available types and how many of them exist in dataframe
df_cities['Typ'].value_counts()

Polizei                       3452
Bundespolizei                 228 
Landeskriminalamt             106 
Polizeipräsidium              35  
Bundesbereitschaftspolizei    10  
Name: Typ, dtype: int64

In [226]:
# remove tweets that have unwanted types (~ means not)
df_cities = df_cities[~df_cities['Typ'].isin(["Landeskriminalamt", "Bundesbereitschaftspolizei", "Bundespolizei"])]

# have a look at dataframe
df_cities.head()

Unnamed: 0,name,handle,Typ,Bundesland,Stadt,LAT,LONG,year,week,tweet_id
344,Polizei Aalen,polizeiaalen,Polizei,Baden-Württemberg,Aalen,48.836689,10.097116,2020,44,10
345,Polizei Aalen,polizeiaalen,Polizei,Baden-Württemberg,Aalen,48.836689,10.097116,2020,45,6
346,Polizei Aalen,polizeiaalen,Polizei,Baden-Württemberg,Aalen,48.836689,10.097116,2020,46,5
347,Polizei Aalen,polizeiaalen,Polizei,Baden-Württemberg,Aalen,48.836689,10.097116,2020,47,4
348,Polizei Aalen,polizeiaalen,Polizei,Baden-Württemberg,Aalen,48.836689,10.097116,2020,48,7


In [227]:
# how many weeks do have data? 
len(df_cities['week'].unique())

29

<code style="background:#ffbdbd;color:#680E0E;font-weight:bold">Caution: The following code will create a subfolder in a folder called "charts" and save images in png format there! </code>

In [229]:
# create folders if they do not already exist
if not os.path.exists('charts/tweets-pro-woche'):
    os.makedirs('charts/tweets-pro-woche')

# load world map
from vega_datasets import data

# create and export png maps
for i in range(1,54):
    
    # filter df_cities by week and save to dataframe "tweet_count"
    tweet_count = df_cities[df_cities['week'] == i].reset_index()
    tweet_count = tweet_count.rename(columns=({'tweet_id': 'Anzahl Tweets'}))
    
    try:
    # get year if data available, else pass
        year = tweet_count['year'][0]
    except:
        pass

    # save geodata from vega_datasets to variable "countries"
    countries = alt.topo_feature(data.world_110m.url, 'countries')
    
    # define basic values appropriate for map of Germany
    projection = 'mercator'             # select Mercator projection
    scale = 1800                        # Magnify
    center = [10,51.5]                  # [lon, lat]
    clip_extent = [[0, 0], [600, 600]]  # [[left, top], [right, bottom]]

    # create background map
    background = alt.Chart(countries).mark_geoshape(
        fill='lightgray',
        stroke='white'
    ).project(
        type = projection,
        scale = scale,                          
        center = center,                     
        clipExtent= clip_extent,    
    ).properties(
        title=f'So viel twitterte die Polizei im Jahr {year} in Kalenderwoche {i}',
        width=600, height=600
    )

    # create points
    points = alt.Chart(tweet_count).mark_circle().encode(
        longitude='LONG:Q',
        latitude='LAT:Q',
        size=alt.Size('Anzahl Tweets:Q'),
        color=alt.Color('week', scale=alt.Scale(domain=['week'], range=['#154889']), legend=None),
        tooltip=['handle:N','name:N','Stadt:N','Anzahl Tweets:Q','LAT:Q','LONG:Q'],
        ).project(
        type= projection,
        scale= scale,
        center= center,
        clipExtent= clip_extent,
    )

    # export background map and points to png files in subfolders
    (background + points).save(f"charts/tweets-pro-woche/pol_cities_kw-{i:02d}.png", format = 'png')    

In [230]:
# print every week for which data is available
list_weeks_with_data = sorted(df_cities['week'].unique())
print(list_weeks_with_data)

# get all images in directory
import glob
imgs = sorted(glob.glob("charts/tweets-pro-woche/*.png"))

# sort images
imgs = sorted(imgs)

# show first items in image list as an example (remove square brackets and numbers to get full list)
imgs[0:7]

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53]


['charts/tweets-pro-woche/pol_cities_kw-01.png',
 'charts/tweets-pro-woche/pol_cities_kw-02.png',
 'charts/tweets-pro-woche/pol_cities_kw-03.png',
 'charts/tweets-pro-woche/pol_cities_kw-04.png',
 'charts/tweets-pro-woche/pol_cities_kw-05.png',
 'charts/tweets-pro-woche/pol_cities_kw-06.png',
 'charts/tweets-pro-woche/pol_cities_kw-07.png']

In [231]:
# manually create list of images (due to missing values and dates from different years, this is fastest method)

imgs = ['charts/tweets-pro-woche/pol_cities_kw-49.png',
 'charts/tweets-pro-woche/pol_cities_kw-44.png',
 'charts/tweets-pro-woche/pol_cities_kw-45.png',
 'charts/tweets-pro-woche/pol_cities_kw-46.png',
 'charts/tweets-pro-woche/pol_cities_kw-47.png',
 'charts/tweets-pro-woche/pol_cities_kw-48.png',
 'charts/tweets-pro-woche/pol_cities_kw-49.png',
 'charts/tweets-pro-woche/pol_cities_kw-50.png',
 'charts/tweets-pro-woche/pol_cities_kw-51.png',
 'charts/tweets-pro-woche/pol_cities_kw-52.png',
 'charts/tweets-pro-woche/pol_cities_kw-53.png',
 'charts/tweets-pro-woche/pol_cities_kw-01.png',
 'charts/tweets-pro-woche/pol_cities_kw-02.png',
 'charts/tweets-pro-woche/pol_cities_kw-03.png',
 'charts/tweets-pro-woche/pol_cities_kw-04.png',
 'charts/tweets-pro-woche/pol_cities_kw-05.png',
 'charts/tweets-pro-woche/pol_cities_kw-06.png',
 'charts/tweets-pro-woche/pol_cities_kw-07.png',
 'charts/tweets-pro-woche/pol_cities_kw-08.png',
 'charts/tweets-pro-woche/pol_cities_kw-09.png',
 'charts/tweets-pro-woche/pol_cities_kw-10.png',
 'charts/tweets-pro-woche/pol_cities_kw-11.png',
 'charts/tweets-pro-woche/pol_cities_kw-12.png',
 'charts/tweets-pro-woche/pol_cities_kw-13.png',
 'charts/tweets-pro-woche/pol_cities_kw-14.png',
 'charts/tweets-pro-woche/pol_cities_kw-15.png',
 'charts/tweets-pro-woche/pol_cities_kw-16.png',
 'charts/tweets-pro-woche/pol_cities_kw-17.png',
 'charts/tweets-pro-woche/pol_cities_kw-18.png'
]

<code style="background:#ffbdbd;color:#680E0E;font-weight:bold">Caution: The following code will save a gif in your charts folder: "map_tweets_per_week.gif"! </code>

In [232]:
# create gif of maps

# import python pillow library
from PIL import Image

# Create the frames
frames = []

# loop through images and append each to list of frames
for i in imgs:
    new_frame = Image.open(i)
    frames.append(new_frame)

# create folder if not already exists
if not os.path.exists('charts'):
    os.makedirs('charts')

# save into a GIF file that loops forever
frames[0].save('charts/map_tweets_per_week.gif', format='GIF',
               append_images=frames[1:],
               save_all=True,
               duration=300, loop=0)