Alice GG - page 4

Posted on June 9, 2023

Introducing Mikochi: a minimalist remote file browser
Like many people working in DevOps, I have taken the bad habit to keep playing with servers and containers in my free time. One of the things I have running is a Media Server, which I use to access my collection of movies and shows (that I evidently own and ripped myself). To make my life easier with this, I have built a web application that allows me to browse, manage, and stream/download files. It is called Mikochi and received its first stable release last week.

Problem statement

My media server was initially running Jellyfin. It is a pretty nice piece of software that probably fits the need of many people. Sadly for me, it focuses a lot on areas I didn’t care about (metadata, transcoding, etc) while being lackluster on classic file management.

The features I need is to have basic FTP-like management from a browser. This means it needs to list the content of folders and allow navigation between them while allowing to download, rename, delete, and upload files.

In addition to that, I also wanted a search function that could lead me to any file/directory in the server.

Since it’s replacing a media server, the last requirement was streaming. I do not use streaming in the browser much (since it doesn’t always support fancy codecs like HEVC), so I just needed to be able to read it from a media player like VLC or MPV, which is easier.

Frontend

One of my aims in this project was to get back into frontend development since I didn’t touch a line of JavaScript in a while. For this project, I decided to use Preact, a React alternative weighing only 3kb.

Preact was a great surprise. I expected a framework that small to be too good to be true, but it works well. I didn’t experience any trouble learning it since it is almost the same API as React and didn’t encounter any performance issues or unexplainable crashes. I will definitely try to use it again for future projects.

The complete JS bundle size ends up being ~36kb, barely more than the icon pack that I use.

The character who gave this software its name

Backend

The backend was made using Go, which has been one of my main languages for the past 5 years. I used the Gin framework to handle the regular HTTP boilerplate, which worked admirably.

The only pain point I had was re-implementing JWT authentication. I had decided to not use a library for that because I felt that, it might not handle an edge case well: I need tokens passed in GET params for streaming requests, since VLC isn’t going to write a Authorization header. It’s not particularly complex but it is a lot of code.

I had the good surprise that streaming files “just works” in a single line of code:
c.File(pathInDataDir)
Running it

If you’re interested in trying out Mikochi, it can be launched with just a Docker image:
docker run \ -p 8080:8080 -v $(PWD)/data:/data \ -e DATA_DIR="/data" -e USERNAME=alicegg \ -e PASSWORD=horsebatterysomething zer0tonin/mikochi:latest
Compiled binaries are also available on GitHub. And for those who love fighting with Ingresses and PersistentVolumeClaims, there’s a helm chart available.
Posted on April 3, 2023

Specialization considered harmful

It is sometimes recommended that software engineers should learn “depth-first”, and seek to specialize early in their careers. I think his advice is misguided. In my opinion, having a wide range of knowledge is in many cases more important than being extremely good at a very specialized task. I will use this article to make the case for avoiding specialization as a software engineer.

Photo by Kenny Eliason

It’s not just about practice hours

A common misconception when learning a new skill is that, since it might take 10,000 hours to master it, the best thing to do is to start practicing as early as possible and with as much focus as possible. Reality is however not as simple.

It may be true that just putting in a lot of focused practice hours will lead to amazing results in problems that are very constrained in scope (like chess). However, for subjects that have a very broad, and frequently evolving set of problems, experience working on very diverse subjects will often perform better than intense specialization.

One of the reasons behind that is that many problems that are at first sight unrelated will have similar patterns. Being exposed to a wide variety of problems allows you to see a lot of potential patterns between problems.

This is why history has many records of people achieving breakthroughs in many fields. For example, Benoit Mandelbrot first noticed the concept of fractal by studying probability distributions in financial markets, he managed to find an application of the concept to many patterns that appear in nature, such as coastlines, clouds, and blood vessels.

Tech changes, fast

Many people underestimate how fast the world of software engineering can change. Some extremely popular concepts like “DevOps” were pretty much not a thing 10 years ago. 20 years ago, I doubt anyone would have known what differentiated a “frontend developer” from a “backend developer”. Even if you zoom on very specific technologies, things are changing every year: React code written in 2023 doesn’t have much in common with React code written in 2015.

Photo by Lorenzo Herrera

Being a generalist, allows you to adapt much faster to change, it can be seen as a way of “learning to learn”. Once you have been exposed to many problems and solutions, picking up new tools and adapting to changes in the field becomes easier and easier.

There’s more to software engineering than code

Most importantly, learning the ins and out of a programming language and tech stack is not what brings value. Software engineering is the art of using computers to solve problems. Those problems are generally not about computers, but involve businesses and people. In my experience, this is a point that is easy to miss when over-focusing on the depth of a specific technology.

This is also where a lot of people who make a career change and transition late into the tech industry have an edge. They can compensate for their late start by being more aware of the reality of business and the needs of organizations.
Posted on March 9, 2023

Why diversification matters for long-term investors? Meet Shannon's Demon
Any introduction to finance will mention that diversification is extremely important. Intuitively, it is easy to understand that diversification reduces risks. If I own stocks in two companies, and one of them goes bankrupt, I lose less than if I had invested all my money in it. However, what appears less intuitive is that diversification itself will increase investment portfolio returns. This phenomenon is known as Shannon’s Demon, from the name of its inventor Claude Shannon, also famous for his work on cryptography.

Photo by micheile dot com

Let’s take the following scenario: I am an investor who can purchase two assets with completely random and unpredictable returns. My crystal ball is broken, so I cannot know in advance which of the two assets will perform better. They can be modeled by a random walk. Any change to my investment portfolio will cost a 1% transaction fee.

I can use the following two strategies:
- put 100% of my money in one of the two assets and hope it performs well
- put 50% of my money in each asset and rebalance the portfolio every 6 months to keep each position at 50%
A cherry-picked example of the return of two random assets and a balanced portfolio (in red)

I made a Monte-Carlo experiment that simulates the above scenario 100,000 times over 20 years (5040 trading days). After inspecting the final returns, I observed the following:
- in nearly 100% of outcomes, the balanced portfolio beats investing everything in the worst-performing asset
- in 70% of outcomes, the balanced portfolio also beats investing everything in the best-performing asset
Explanation

The second observation might be surprising, but can be easily explained. The trick is that regular rebalancing will create a mechanical way to “buy low and sell high”.

Let’s illustrate it step by step:
- at the start, I own 500 of asset X, for a value of 10$ each, and 500 of Y for the same value
Name Price Position Value

X 10$ 500 5,000

Y 10$ 500 5,000
- after 6 months, my position in X experienced explosive growth and now trades at 50$, while Y is stable at 10$
Name Price Position Value

X 50$ 1000 25,000

Y 10$ 500 5,000
- I rebalance the portfolio, and now own 15,000$ in X and 15,000$ in Y
Name Price Position Value

X 50$ 300 15,000

Y 10$ 1,500 15,000
- 6 months later, X performed poorly and is back at 10$, while Y is now worth 20$
Name Price Position Value

X 10$ 300 3,000

Y 20$ 1,500 30,000
- I rebalance, and now own 16,500$ in X and 16,500$ in Y
Name Price Position Value

X 10$ 1,650 16,500

Y 20$ 825 16,500

Over the last scenario, my portfolio gained 23,000$. If I had invested everything into Y, I would have gained only 10,000$. And if I had bough and held X over the same time, I would not have made any profit.

Caveat 1: correlated assets

One important thing to consider is that this only work with assets that are not positively correlated. A positive correlation means that both assets will move up at the same time, and down at the same time. In this situation, the balanced portfolio will still perform better than the worst of the two assets, but will most of the time underperform the best-performing asset.

The return of two positively correlated assets and a balanced portfolio (in red) containing both

In practice, many asset returns are very correlated with each other. For example, in the stock market, we can observe periods where the large majority of stocks tend to move up (bull markets) or down (bear markets). This makes the creation of a diversified stock portfolio more complicated than just randomly picking multiple stocks to purchase. This is also why it is often advised to diversify across multiple asset classes (ie, stocks, bonds, commodities, and cash).

A common way to make use of Shannon’s Demon is to use an asset with a stable price (ie. a money market fund) alongside a volatile one. A portfolio with 50% cash and 50% in a volatile asset will not only cut the risk in half but sometimes outperform being 100% in a single volatile asset. Simulating 10000 random walks over 20 years, in 91% of cases, the balanced portfolio beats investing everything in the random walk.

A simulation of a stable and a volatile asset and the resulting balanced portfolio

An alternative use of Shannon’s Demon is to profit from negatively correlated positions. This can be done by taking two positively correlated assets and being long on one, and short on the other. This strategy is called pair trading, and it is not something you should try at home.

An example of two negatively correlated assets and the resulting portfolio

Caveat 2: real-life is not as simple

Obviously, there’s more to portfolio management than just diversification and rebalancing. First, the 50/50 repartition I used in this article is rarely optimal. The common calculation used for optimal position sizing is the Kelly criterion, which would need an entire article dedicated to it to cover it properly.

In previous examples, I used a fixed 6 months period between rebalances. I chose this period for completely arbitrary reasons. An optimized rebalancing strategy would need to take into account asset variances, investment time-frames, and transaction costs to determine the rebalancing intervals.

Lastly, while simulations using random walks are a useful mathematical tool, they might not reflect real market conditions. A portfolio composed of assets that are likely to depreciate over time is unlikely to be profitable, no matter how diversified it is. This is why asset purchases should be carefully researched, eventually with the help of a professional financial advisor (not me).
Posted on March 2, 2023

Learning Dutch online in 2023

Back in 2020, I left France to live in the Netherlands. Since then, I have spent some time learning the language and reached an intermediate (B1-B2) level. While learning Dutch is not a hard requirement to live and work in the country, it does make day-to-day life a bit more convenient. There are plenty of resources out there to learn languages, so here’s my pick of tools that are worth spending your time on.

Photo by Artem Shuba

Duolingo

Duolingo is probably the most famous language-learning app around. Despite its popularity, many people in the language learning community will dismiss it as just a toy. It is, in my opinion, actually great for beginners. One of the main benefits of it is that it focuses on complete sentences, and will drill the sentence structure of your target language in your head.

However, once you have acquired a basic idea of your target language’s grammar the application loses any usefulness. Duolingo is sadly extremely slow at introducing new vocabulary, and going across the whole course is unnecessarily time-consuming. Which I guess is a consequence of its subscription business model. The gamification makes it addictive enough that I reached a 300 days learning streak, but I would honestly not recommend using it for more than six months.

Anki

Anki is a very often recommended option in language learning communities. It is a spaced repetition app, that will show you flashcards and ask you if you know the correct translation. Unlike Duolingo, Anki is essentially focused on vocabulary.

It is free and open-source software, so the application gives you a lot of customization options, which lets you introduce new words at the rhhythm you want. A few Dutch decks are available online, but many Anki users prefer creating their own decks with words they encounter in target language content.

Anki didn’t stick with me, I think the issue was the lack of variety in exercises. Doing the same thing to every flashcard turned the activity into a chore (instead of the addictive game that Duolingo was).

Memrise

Memrise is my personal favorite. Like Anki, it is a spaced repetition app, which focuses on memorizing vocabulary. But it is much more enjoyable than Anki because it alternates between different exercise variations. Adding to that, their repetition algorithm seems smarter, making for a more enjoyable and efficient learning experience.

It contains seven official courses, for a total of more than 2000 words. Which is largely enough content to start engaging with native content. Additional content is provided by the community.

Just like Duolingo, some features are locked behind a subscription.

Italki

Unlike the previous apps, Italki is not focused on providing exercises but on matching students with language teachers. Some could say it is the “Uber” of language learning, providing cheap and personalized courses. It is a pretty good way of getting into some conversation practice, which is especially useful if you are not in the country. The experience will vary from teacher to teacher, but I had a great time using it.

Native content

Of course, apps and courses are a great help when learning a language, but exposure to native content is what will help the most in the journey toward fluency. In the beginning, I found NOS Jeugdjournaal to be very helpful. It is the kid’s version of the NOS newspaper and comes with shorter articles, simpler vocabulary, and less depressing news. Most articles are accompanied by video clips which also helps build your listening, and their daily “Ochtendjournaal” can be easily fit into a routine.

For more advanced learners, there are quite a few Dutch TV shows and movies around. Amongst my personal favorites are the Dirty Lines show, which is about two brothers creating the first dutch phone-sex line, and the movie De Oost, which covers the story of a dutch soldier in the Indonesian War of Independence. You can find a list of Dutch movies on IMDB.
Posted on February 6, 2023

How much can you really get out of a 4$ VPS?
When starting a new project, evaluating the budget needed for cloud hosting can be a tricky question. On one side, you will hear horror stories of people waking up to an unexpected 100k$ bill from their provider. But on the other hand, you will see providers advertising costs sometimes as low as 4$ per month for a virtual machine. In this article, I will perform load testing on one of those 4$ VPS (from an unnamed provider) to figure out if the promise of running your production on such a low budget is realistic.

Photo by Annie Spratt

The test application

For this test, I designed a simple CRUD application in Go. It mimics a blogging application and lets user create posts, lists the latest posts, and display a single post. In other words, it has the following three routes:
- a GET / route that renders an HTML template and shows the title of the 10 latest posts
- a GET /<post_id> route that renders an HTML template and shows the title and body of the selected post
- a POST / route that accepts a JSON with the post title and body, timestamps it, stores it in the database and redirects to GET /
For the database, I chose to use MongoDB. I picked it because it is simple to set up, popular, and claims of being web scale.

The application was developed without making any particular performance optimizations. The only database-specific optimization I created was to create an index on the post timestamps, which allows listing the latest posts decently fast.

Both the application and MongoDB were deployed using Docker with docker-compose.

The load test

I used K6 to perform a load test. K6 is a software that will generate “virtual users” who continuously run test scenarios defined using javascript.

I defined two scenarios:
- 10% of users would be creating posts
- 90% of users display the latest 10 posts, and then open one of those posts (picked at random).
The test would progressively ramp up until we reach 50 virtual users and then come back down, for a total test duration of 1min30s.

Launching the test from my local computer, k6 managed to succesfully run more than 94k request, with an average duration of 21ms per request, and a 95 percentile duration of 33ms. While this test didn’t reach the point where the server would be failing, a closer look at the data already gives more insights. I exported k6 metrics to a CSV, and used pandas to analyze the data. Plotting the request durations against the number of requests per second we can observe that the duration starts spiking when k6 sends around 1300 requests/seconds.

During the tests, we can identify a potential bottleneck. The CPU load increases with the number of virtual users, and quickly reaches 100%. This is shown in the htop screenshot below, with both mongo and the application itself requesting most of the available CPU. In contrast, both RAM and disk throughput seemed to be steadily lower than the system capabilities.

Conclusion and limitations

This test shows that, as long as you don’t plan on building the next Twitter, a very cheap VPS might be fine for the start of a project. However, this result might seriously differ from real life applications because those contain complex business logic requiring more resources than a simple CRUD application. Adding to that, more networking overhead is bound to happen when clients connect from different IPs and use TLS, which I did not do in the above test.

The best way to determine the hosting budget for a real application would be to test it until failure using a distributed k6 setup. This can be done using the k6 operator for kubernetes or the (somewhat expensive) k6 cloud.

You can find the source code for the application, k6 scenarios and analysis script used in this article on my GitHub.