pACE OF pLAY aLGORITHM (popalgo)
Home Page - 1/25/2024
Popalgo: The History
Intro
Back in the heyday with Kegpost fantasy, there was more than one week where I absolutely spent an embarrassingly amount of time writing a prediction email that would only be seen by 11 of my friends and had offered no other value beyond entertainment and to an outsider, there would be no debate. A funny thing would happen though that only we would be able to understand. At some point during the week while putting together the email, it would no longer be considered an embarrassing amount of time. That’s just how much time it takes to produce an elite product that you can be proud of. That’s the only way to get the Night before Fantasy or the Coughlin/Karol Patriot gif. You put in an embarrassing amount of time and effort until it can be no longer considered embarrassing because of the product you end up producing on the field. This is Popalgo in a nutshell, and before we get into what went into the 3.0 version launched this season, we have to dig into how we got here. This is long and way more for me than for you guys, but here we go.
History
I really wanted to start at the beginning in mid January 2022 when I needed to fill the void of college football ending so I tried to really hit the lab as Coach would say, and come up with data-driven plays.
At this point in my life, I KNOW we can’t beat the books. I’ve learned enough through ENAD to know about the industry and how it works to understand that whatever hair-brain idea we come up with, we will NOT be able to consistently hit at a high enough rate to make it a sustainable strategy. It truly is a David/Goliath type situation. They have way more resources and are way smarter than you, it’s silly/borderline stupid to put in an actual amount of time to try and compete. I understood this part, and still wanted to move forward mostly for fun, but obviously had tempered expectations. What I learned very quickly and grossly underestimated at first was the Books’ biggest weapon ON TOP of what they already have, the Vig. That 2.5% is a massive hurdle and anybody who thinks otherwise hasn’t spent enough time in the mud. I am almost glad I did underestimate that at first because I probably would have never started knowing how futile of an effort it is to truly compete with goliath.
The Original Pace of Play Algorithm (Popalgo OG) - Jan 2022
The idea was simple, take past CBB totals data and use that data to come up with a formula that would have produced the best results in the hopes that it will continue to produce those results moving forward. I called it Popalgo, short for Pace of Play algorithm because I started with calculating how pace of play (possessions per 40) affected team and subsequent game totals. I added a few other data points like defensive Pace of Play (PoP), pts scored, pts given up, FGA per 40, Def FGA per 40, 3PA, 3P%, Def 3PA/%, and maybe a couple others like macro home/away splits and conference/non conference splits. I’m not entirely sure what went into it and I do regret not saving it, but this was essentially it with the above stats weighed in a way that if I would have used this algorithm starting December 1st, I’d hit something like 58%, and if I started Jan 1st, it would have been 60%. These aren’t the exact numbers, but it was something like this. It was data driven, it took me a whole week to gather the data and run the numbers and I came up with a formula that would have produced excellent results.
I was feeling real good and confident and then we went “live” sort of speak on a random Saturday in late late Jan. That’s when disaster struck. We went something like 18-6 that first day and 12-5 on Sunday. We finally made it.
Obviously as the sample size started to get to actual numbers, we regressed to the mean very quickly. As I tracked the next two weeks and we lost all the money we made the first weekend and then some, it was painfully apparent that the higher conviction plays were not doing any better than the lower ones and the more games tracked, the more obvious that we are hovering around 50% across the board, a LONG way from the 52.5% needed to be profitable.
I think we shut it down after a few weeks and was a fun little memory mainly because ENAD missed the lucky beginning and only got on board after we were essentially lighting money on fire thinking we could capture the magic of the first weekend.
Below is a crude and simplified diagram of how I came up with the first version of the algo.
The Black Box
This represents the process in coming up with actual algorithm itself. I call it a black box because you throw numbers in, run a bunch of BCS (best case scenario) tests to try and weigh the value of the data creating a singular formula that spits out a number that is more accurate than the number the book spits out. Basically the Black Box in these flowcharts is not a dataset or table, but represents the process used which makes up the algorithm.
As I added more and more stats, the more complicated it and time consuming this got obviously. Although I was backtracking aggregate stats, I still continued this for a few weeks so I would have to update the aggregate stats manually (even though I only added 2-3 games per team) and was backtracking so it changed the data very little - looking back this was a massive waste of time)
Ultimately, its a series of algorithms that are used to determine the best “weight” of given data point(s) or data set(s) to come up with a formula that produces a final number when you plug in the two teams’ correlating data numbers. The concept being that the further away from the books’ total number should theoretically be the highest conviction plays. I’ll explain more about the black box below because I wasn’t really doing anything special at this point. I basically ran a bunch of tests “weighing” the importance of Pace of play offense & defense and FGA and FGM (I think I added 3Ps as well eventually but not sure), but I was making a big mistake when running these tests and using the results to create the algo which I explain below and deemed it the backtracking fallacy after the fact.
In this case, I was taking the base aggregate stats above from each team for the last month or so, and then using those numbers to run tests to determine how you’d weigh the importance of those numbers in a formula which would have produced the best results (highest win%) in that same timeframe. It wasn’t until the offseason after we shut it down, but when I put thought about how to improve the algo and how I could have done better, it hit me like a ton of bricks. I am sure there’s a real name for it, but I called it the backtracking fallacy. I tried to explain this to TT in a bar one time, and after 15 minutes of me rambling, he hit me with a, “I have no fucking clue what you’re talking about, but I love your enthusiasm.”
Backtracking Fallacy
Let’s use the teams Pace of Play (poss. per 40min) and as an example of the only two stats we have available, and the date is Jan 20th, ‘22. I’m trying to come up with an algorithm based on these two stats. After running the optimization numbers and BCS tests, I come to the conclusion that Pace of Play should be weighed with a 1.061x and DefPop weighed at 1.073x (x being pts scored/gm) ultimately resulting in a final total number when adding both teams that produces positive results against the books line. How I got these numbers is that I run the Black Box tests to determine what would have produced the best results from 12/1/21 through 1/19/22 (roughly 16 games played per team)
In order to properly test what would have produced the best results, you need to “run” the algorithm with the data available at that time, and those are the test results that you need in your black box calculation tests. Otherwise, you are essentially using the data to confirm the data. Realizing this after the fact was very important because after figuring this out, I understood much better the magnitude of this project, and how much time and effort needs to go into this to actually do this the right way. This led to what I call the Imitation Game offseason.
The Autopsy of Season 1: What went right vs. what went wrong
I’m not going to get too detailed because looking back, I don’t even consider this a real attempt at coming up with a legitimate and sustainable CBB totals algo. What went right is that I started it and continued it for a few weeks which allowed to me to realize what actually needs to go into something like this to make it close to viable. What went wrong is everything else, but I highlighted the backtracking part above because it really shined a light on how I would need to run and operate the “black box” tests next season and the amount of work and prep that is needed to go into those calculations. Hint: I still severely underestimated it the next year, but made massive improvements that would later set the foundation for 3.0.
Popalgo version 2.0 - Jan 2023
There is zero reason why I should have resurrected and put in any more amount of time into this hack of a project, but I showed all the haters by doing just that in the offseason. It wasn’t consistent or anything, but every now and then I would spend a couple hours a creating the infrastructure for what we have today.
The Imitation Game
Imitation Game is the OG algorithm and what BC was able to ultimately create to create was a real turing point in tech and computer history. Trying to crack the code was a massive guess n check game, and BC focused all of his effort on creating a computer that would run these tests to trying and come up with the code. He knew that due to the insane amount of possible permutations, it would be a waste of time to try and crack the code without the help of what was essentially the first computer. Whether its trying to crack a Nazi code or beat the books in totals, the concept is the same. It’s a massive and complex game of guess and check. Learning the backtracking fallacy lesson from the previous season, I thought that I understood what it took in order to do this right and come up with a system that would run Guess n Check tests or optimization tests or BCS tests (Best Case Scenario) and make sure that it has the ability to “run” A LOT of tests optimizing the data available to come up with the most accurate predictive final number.
Infrastructure
I always kind of knew that this was the case, but it really hit home when I was trying to rev this up again the next year. It’s just a massive game of guess and check, and you need to figure out how to use the technology available to play the guess and check game better than the books.
This was the area I felt like I could actually make some real headway as its right down my alley career-wise. Find a way to utilize technology to make life easier. This starts and ends with the infrastructure and database. I needed to be able to run a lot of these Turing Tests in the in order to out-algo goliath. Of course last year, I manually extracted the data, aggregated it by teams, and then ran the tests after the fact. This year, we needed to run the tests using the data available at the time, which means that if I kept manually extracting the teams and games data, I would need to do it every single day, store that info somewhere, and use THOSE numbers to run those tests. This meant that the database needed to be buttoned up. Keep in mind, I am writing the code in the black box to run these tests and it starts to get real complicated real quick when you need to check it against all these different tables. It’s also getting larger and larger daily and even larger if you want to add other datasets.
It can get away from you very quickly (which it ultimately did) so I put a ton of focus on the data extraction and database creation aspect. It took me until January of ‘23 to even start working on the black box portion of the algorithm which is ultimately how the formula is created. This was deliberate as I had started understanding the importance of setting yourself up for success as it becomes way too difficult once you are deep in the weeds. Below is the initial flowchart of the the 2.0 Infrastructure initiative and it took me way longer to get to this point than I had anticipated.
At this point, the more I started focusing on the importance of the database and infrastructure factor, the more important it was. It was apparent that if I wanted to come up with a viable algo, the ability to test-run a mass amount of data as well as introduce new data easily and efficiently is vital in coming up with something sustainable. The black box formula creation is the engine that creates popalgo, but without the database feeding and providing the information to create a successful formula, it meant nothing. Hence, why the more serious I got on this aspect, the more I wanted to get it right and more time was spent on it.
Data Extraction Process
I moved everything to SQL server, brushed up on my coding (I am not great, but this isn’t anything too complicated as of yet) and then started writing extraction scripts to get the necessary games and CBB aggregate data into the database and separate that information into the proper tables. This was a much bigger undertaking than I had originally imagined and after all was said and done still required quite a bit of manual manipulation daily/weekly to get it right. It ended up getting to be a bit too much for me, and I constantly fell behind and cut corners to try and “catch up”. This was a big factor why the second version never really got off the ground, and the algo for this year never produced anything substantially viable.
I also would run into issues with the sources in pulling the data from the internet. I would pull from multiple sources for the aggregate data, the game results, and the books’ lines. With these different sources, it led to a complication of problems that took time to resolve such as the different terminology in team names, the formatting of numbers/characters, and the inconsistencies and changing of source files. I made A LOT of progress in 3.0 to resolve a majority of these problems, but still can do a better job in improving this in the offseason. Ultimately, the solution is to buy all the raw data from the game outcomes, and perform the calculations yourself to get the aggregate and necessary data, rather than rely on other sources doing it for me. That is what I did for the most part this year, but I still have plenty of room for improvement on this front.
Shutdown of Popalgo version 2.0
At the end of the this season around late January/early Feb, this endeavor kind of fizzled out due to my inability to put in anywhere close to enough time needed to make this happen. My visions of grandeur to re-release Popalgo to the boys just in time for the tournament, but by the time I got around to seriously trying to run the turing tests in the black box to come up with a formula, it was very apparent that it wasn’t going to come up with something worthy enough to inform the boys we were back. Some of the other reasons why Popalgo was not in the cards this year.
Tracking Errors - I no longer was backtracking which was a major step in the right direction, but I still had an issue with properly tracking how successful my CBB totals algorithm would have performed had I “gone live”. This is because I never really went live so-to-say, so I was just running all these tests on the ever changing data available, and the algorithm would be constantly changing to try and produce the best results. It wasn’t until a little too late that I realized in order to prove it viable, I needed to choose one version of the algo, track those results, and with every tweak or change, I couldn’t go back and change the past results even though I obviously would want to due to the better results. This was something I took VERY seriously this year, and meticulously tracked the algo results without ever changing past results after improvements and additions which are happening constantly. I never truly went live as they say and because of that, it was near impossible to create and prove a formula that has/had positive results. At the end of the day, the goal is to bet and make money on these games, so we need to try and re-create those exact conditions which I was not doing. I was also pulling opening line data after the fact, which would give you inaccurate results due to human aspect of actually having to login and place the bets on the games. So even if I properly tracked it, it still would have been a tad misleading. I resolved these issues for this year, and made it a point to track everything the right way.
Defining Algorithm Success - I knew in my mind what it meant to come up with a viable algorithm that we could use to make money consistently, but I had a real conundrum in translating that into the black box testing formulas to quickly determine the best combination of data needed to come up with one final number.
Let’s think of success would look like? Reminder, due to the vig, you need at least a 52.4% winning percentage to be profitable. Anything below that you would be losing money. Below are hypothetical results on what would be considered a “successful” algorithm for a sample size of 1,000 games.
Conviction Score
0+ (All Games)
1+
2+
3+
4+
# of Games (n=1000)
1,000
800
500
150
50
Winning %
51%
52%
53%
54%
55%
Base Net Units +/-
-29.0
-6.4
+6.5
+5.1
+2.8
Conviction score refers to how confident you are in the outcome. In this scenario, the algorithm would have spit out 50 of the 1,000 games where the final total number was greater than 4% of the books line number. This means that these 50 games should ultimately have a better winning percentage than the games where the algorithm came up with a number closer to the books total number.
At the end of the day, you need to define the success by how much money you make, so winning percentage and number of units made are obviously key factors. For the above example, let’s assume we are doing Base Net Units instead of Dynamic Net Units (Base Net = same exact unit size across the board, Dynamic = varying unit size based on predictive probability of winning vs losing)
The above is an excellent example of what we would be looking for as a successful algorithm and one that we can build and improve upon assuming the results continue as the sample size increases) Note: In a college basketball season, there are 362 teams (this season) resulting in roughly 5,500 D1 vs D1 matchups.
This may seem like a simple enough chart that defines success, but it is not so easy to plug into the calculations when trying to find the right balance of statistics in the black box test scenarios and it gets even way more complicated. I am going to go more into this concept below in 3.0, but wanted to show what ultimately I was looking for when trying to find a base algorithm to build off of and improve. Unfortunately, at this point in the process, I was nowhere near to replicating anything close to this chart with a sufficient sample size, hence the version 2 failure which ultimately led to the second shutdown.
Also, don’t even get me started on when I started manually tracking how I was doing when it came to predicting line movements. It was all over the place and even more definitive proof that this version wasn’t going to go-live anytime soon.
Time & Effort - The more I got into the database and infrastructure project in December of 2022, the more I started getting a much better understanding of how much time and effort actually needs to go into this. This was the same exact realization I had with the first version 11 months prior, and even though I thought I knew it, I still grossly underestimated how much is needed to go into it. I was doing this on the side when I had free time on top of a job and raising a 1 year old child. At some point, it became a tad disheartening the more I realized what it would actually take to get it right, and then it kind of became a self-fulfilling prophesy as I knew that I wouldn’t be able to put in the time needed in the time-frame for this season, so it would hurt my morale in continuing the project knowing I wouldn’t be able to put in the time and effort truly needed. This aspect was the biggest difference in why I believe we got off the ground this year with 3.0
Popalgo 3.0 - November 2023
Everything else was kind of a joke up until this point. It was necessary for me to understand what needs to go into it and it allowed me to setup somewhat of a framework for housing the data for testing in order to create the algorithm, but the amount of work put in the first 2 years was exceeded in about 2 weeks and then I just kept on going.
What is Popalgo? - The Black Box
Let’s pretend we are locked in a room, and only have Pace of Play and PPG (which is actually Points per 40min [overtime calculation is added at the end but lets not worry about that for now]) Also there are only 4 or 5 teams to make it easier, and we are 2 months into the season (Roughly December 10th, each team has played about 7 or 8 games).
How can we use the Pace of Play to weigh it in a matter which would have brought us the best over/under total results against the opening line over the past 2 months. This is our best guess into how we should weigh it moving forward.
Obviously with the more teams and the greater the sample size, you are going to be putting a lot of faith on this one stat which is a recipe for disaster. By Dec 10th, with 360+ teams you are looking at an in-year sample size at the time of roughly n=1400 and you can almost guarantee all of these numbers are hovering around 50% regardless of how granular you get with the weight amount calculations. You can kind of see where we are heading though and it can get very spicy very quickly.
Let’s add one more stat for these teams, FGA per 40.
Okay, so this is the gist of what makes up the algo and what the mission statement would be. This would obviously not be static, but would adjust with new data and larger sample sizes to ensure that it stays fluid and gets us the best possible results. You want to make sure that whatever formula you create has the ability to keep up with the data and be dynamic rather than static, this is very important as you can imagine.
Now, at this point you just keep building off what algorithm you have by adding different elements like other basic stats, trends, groupings, historical, etc... which I will get into below, but let’s address the elephant in the room before we move on. Obviously winning % being the end all/be all Pace of Play weighted & FGA_per_40 weighted is flawed, and we can’t be using that as the benchmark.
Why is it flawed? Let’s look at the below example of just Pace of Play (How many possessions per team per game) but let’s add the conviction score. If you are going to keep adding stats and other attributes, you want to make sure you are adding it to an algorithm that works and not something that is broken (below in Blue is an example of the former and the orange is the example of the latter, even though the overall win % is very different).
Obviously the end-goal is to make as much money as possible so I need to convert this into one number, but let’s hold off on that for now because this is one stat and we need to build off of it and keep building. So with this example above, even though 53.9% win percentage is the highest on the list of weighted Pace of Play, the numbers show that as the conviction score grows, the win % actually decreases. This tells us that the algorithm is broken and isn’t really worth building off.
The blue numbers above, on the other hand, show an upward trend with the higher conviction scores having a better win%. This is is ultimately what we want to build off of because it means we are moving in the right direction with the algorithm and we can be confident in using this weighted pace of play data point as long as it maintains this trend as sample size grows.
Okay, now that we have one stat down, we can start adding other attributes to the algorithm. Keep in mind, this is built in a way that constantly updates and changes with new data being added. This is the stage we want to be at when we start adding other data like FGA_40 or 3PA_40 or FTA_per_40 or Offensive_Efficiency or DefPPG_40 or Defensive_FGA_40 or Def_3PA_40 or Defensive_Efficiency, etc... These are what I call the Base Stats. Stats that are being accumulated and tested over the course of the CURRENT season.
Why are we using Avg +/- standard deviation? The short answer is that this is how we started and it’s easy to grasp this. It’s not ideal to use avg+/- std, but its a fine place to start. As we add more to it, it...
Base Stats
Just like the stats above Pace of Play (possessions per 40) and FGA per 40, Base Stats refer to the stats of the current year. For this scenario we are going to use December 10th as the Go-Live Date) Meaning that the Base Stats are the aggregate stats per team for the 8-10 games played from October to December 10th.
Yes, this is a tad hypocritical because I had a whole section about backtracking, but you need to start the season off somewhere, and I took the aggregate base stats (above) per team from the start of the season through Dec 10 (7-8 games per team) and ran the tests seen above to come up with an algo that we can build off of based on the aggregate team data, and from here on out, the data changes and we are no longer backtracking but tracking daily moving forward with dynamic numbers that update as sample size increases and data grows. Theoretically, this means this algo should be improving as time goes on and sample size increase. Now Base_Stats are in.
Now, we need to keep going. The next dataset I added was Trends (TR). All the infrastructure work and setup allowed me to be able to add these additional stats relatively easy which allows us to consistently improve the algo without re-hauling everything . This is low-key what I consider the most valuable aspect of Popalgo and why it was so important to put in the time and effort up-front on the tech infrastructure, the extract scripts, database setup, and black box testing processes. Without the ability to keep improving the algo efficiently, there is zero shot I am writing this right now. The first two years, I was nowhere close to where I was at this point in mid to late November of last year (2023). There is still a ton of room for improvement and quite a bit inefficiencies, but the strides made on this front from late October to early December was massive. Make no mistake, without the TMT fund result, the amount of time I was spending on this in that time-frame was borderline absurd, but I truly don’t think there’s any other way if you want to actually do this right and compete.
Trends
Just like we kept adding on base stats after Pace of Play and FGA per 40, we also can add whole data sets the same way with the same goal: Incrementally increase the KPIs by adding more data/tests. Trends were the 2nd major dataset group that I added. (This is a little revisionist history as I was manually running historical numbers to make sure I wasn’t hitting flukes, and I did incorporate historical/macro datasets to the algo as well (see below)
I take all the Base Stats (BS) and plug them into the Trends dataset (TR) and slightly improve the output. Rinse, wash, repeat. With the trends, I simplified it by consolidating a 3-5-7 (last 3 games, last 5, last 7 games) trend and plugging that into the Base_Stats to try and get more granular with the stats. Obviously at the beginning, the data is not that different due to the small sample size, but as the season goes on, the trends take the base stats from above and improve them because of focused granularity. So now, we are running the 1_3_5 game algo into the Base_Stats as well as the 3_5_7 algo into the base stats which has proven to increase the numbers across the board. Another successful dataset addition. Now, we have a system to add datasets and track properly, let’s keep going. At this point, its December 8th 2023, you have a system down, and Coach is bothering you non-stop for picks.
Groupings
Taking individual data-points, and grouping them in a manner to allow for more accurate/granular testing and results. Let’s use the below example for one of the groupings.
Below is a simplified example final number (a grouping in and of itself) for projected offensive and defensive datapoint, and then when you break it down into the 4 groups below, it allows to re-run the above to look at different matchups between the quadrants in an attempt to slightly improve the algo. When a Q1 team plays another Q1 team how does that differ than a Q1 playing a Q4 team for example. As you can imagine there are a million different combinations you can do in order to group teams to make the testing easier and more efficient.
We also KenPomed the teams into 4 different groups so that we can see what the differences are between high rated teams playing each other versus low rated teams and every mix & match you can think of. Similarly, I added a timeline, non-conference play vs conference play and attempted to add tournament play (though that failed miserably - see below).
Ultimately, there were a few grouping additions that were tested and added to the algo building upon what we had already created. You plug in these tests into the base + trends above, and then you can even group on top of that based on the different groupings grouped. You can see how it gets out of hand quickly, but being diligent and documenting everything allows you to get way more granular and much better results.
What is described above in the Base_Stats & Trends sections is ultimately groupings at the data and algo levels. This is an aspect of taking that mentality and converting it into the individual teams, specific matchups, time of year, etc... Yes, technically a lot of these are supposed to be taken into account in tests already, but there is not enough direction and focus in those tests, so this way gets you slightly better results. Building upon the algo to consistently get slightly better results is the entire strategy. Grouping datapoints like this allows us to do that and ultimately played a big role in the leap from 54%-55.5%
Groupings (GR) is ultimately a stepping stone and a shortcut into where we ultimately want to end up. You want to ultimately have one number that correlates how good, efficient, fast, etc.. when playing a specific team that has another one number for those stats, and they are playing that game on December 10th as opposed to Feb 28th. Grouping different data-points allows you to get a little more accurate and be a bridge until you ultimately find the absolute best correlation numbers producing the top results. There are a lot of things this allowed us to incorporate this year, and the more we keep chipping away and adding granularity data, the more we can convert these datasets into the regular base_stats individual numbers resulting in absolute top tier results. (i.e. theoretically 8 groups should be more accurate and give you better results than 4, and that works for all datasets and calculations as well)
We are adding datasets and running countless tests to determine the best course of action to weigh everything to give us the best results. Also, these tests are using new information and theoretically should be improving at the individual level which allows us to improve at the top level as well. The are two more major datasets like the three above that play a role in the calculation. Note: some of these overlap and it doesn’t look nearly as pretty as above, because it’s all over the place in the SQL database, I am just trying to use the pictures and diagrams to make it easy to understand all that goes into the algo. Also, a lot of these don’t come into play right away at the soft go-live December 9th, but gradually get included throughout the season.
Macro & Historical Data
Macro - These are two datasets that are kind of off to the side for and incorporated at the end or manually checked against. I need to do a better job of getting both of these integrated with the other data tests and my inability to do that this season is ultimately what I believe played the biggest factor in the downfall of March which is explained in detail later on in this doc.
Macro refers to the overall games and adjustments made. Ultimately, it needs to be included in the other datasets, but I took a shortcut and used this to make up for some missing elements. One example is Overtime. College games go into overtime around 5.5% of the time (0.6% double OT) and that number increases the closer the line of the game is (i.e. a -3 game has a higher chance of going to OT than a -12 game) The average overtime produces about 18.5 points so all of this information is calculated after the fact. It’s in the macro section because I calculated it based on overall averages rather than individual teams which is ultimately where it needs to be. This information needs to be moved to based stats and calculated by the team rather than a blanket number. For example, a Virginia game going to OT is going to produce far less points than a Bama game going to OT. I started doing this a little bit towards the end, but it wasn’t nearly up to snuff.
Another category in the Macro dataset is Home/Away splits. I incorporated this a little bit, but not nearly individualized enough per team/per conference or even per individualized base_stats like FTA per game which is one stat that I can see having the biggest difference between home and away games.
The last part of the Macro dataset which ultimately led to the Popalgo 3.0 downfall was the different aspects of the games. The big ones are the conference tournament games, obviously the big tournament games, and from what we learned this week, the 10 days or so leading up to the conference tournament games. Obviously, these ended up being the downfall. I absolutely underestimated and miscalculated this portion of the lab work, and that will be the biggest factor that we need to address in the offseason. There’s a whole section below about March which covers it in more detail, but this offers a great segue because that is tied to the historical dataset which I failed to properly incorporate into the algorithm, and boy did it come back to haunt come March.
Historical -
December 9th, 2023 - Soft Go-Live
What is a go-live and why is it important? When you are running all these tests, the algorithm and results are constantly changing, and because of it, you are constantly improving the past numbers, but that is also severely misleading because you are consistently.
This is charlatan stuff unless we can see results. Anybody can just say shit, but here are the results for all games Dec. 9th* through Feb 29th (n=3447 roughly ⅔ of the entire cbb season)
Base Trends
Groupings
Historical
Macro
,
- The framework of what needed to be done tech/data wise was there, but still grossly underestimated the time and effort needed to actually do it right. Looking back it is more embarrassing how much of a joke I thought the OG Popalgo was at this time when in reality, this ideation was much more of a joke. The first one was kind of a bit and for fun, this one I took a little seriously, thought it through, and ended up putting together a joke of a run. It is much closer to the first try than the 3rd, and its not even close.
- Popalgo 3.0
- November - Jan 23rd
- Popalgo 3.0
- How it Works?
- Guess n check
- Giant database - Need to “run” the algorithm on the day of and test results. And tweaks that are made need to be run. You need to constantly run best
.
Overview - How 3.0 Works
- Massive game of guess n check. The concept is simple, how do you weigh stats to determine how many total points will be scored. Where the guess n check comes into play is basically testing and weighing out different statistical combinations in order to see if you can come up with one number that is proven to be “better” than the number the book comes up with.
The Basics
- How to play guess n check with computers and compete with the books?
- This is exactly what the books are doing, but they have way more money and way smarter people to be able to do this, that’s why I started with a small niche. You can’t really compete unless you get lucky, stay at it, and consistently have the ability to improve
- Run best-case scenario tests. You can do this in excel, and there are some things ad hoc I run in excel, but excel is not a database, so I usually run these sin SQL Server (an expense btw) In order to run (GnC tests or best case tests, you need to have a database of results and stats on certain days.
- i.e. Just an example, let’s say FGA_per_20 is weighted 0.025x and I want to change it to 0.03x (This is a very simplified version of the algo) then in order to test to see if that is a “good” change, I need to “run” that formula ON the specific day to see how it effects the results
- Why? FGA_per_20 by team changes with every game AND FGA_per_20 would be involved in trend, groupings, matchups, and macro (this will be explained below) so it is essential if you want to know how the algo would change results on December 10th, the algo needs to be run on December 9th, using the data from Dec. 9th and before.
- This means you need an organized database with team stats, game results, etc (It actually turns into just needing to extract the game results with the stats of that game, from there you can calculate everything else, this is where I’m trying to get to but I’m not there yet and I use a few tables that combine results and aggregate team stats for me. This is a goal for 4.0 - improve the database
- This means extraction scripts from online, this is no easy task and took awhile to get right, I started with 2.0 and built off it, but basically you need to pull results and put the data in the right tables, you also need to pull lines (Opening, closing), so it matters when you run it
- For 3.0 I kept it a 15 minute process to pull opening lines on purpose - Can’t make it too easy
- This means extraction scripts from online, this is no easy task and took awhile to get right, I started with 2.0 and built off it, but basically you need to pull results and put the data in the right tables, you also need to pull lines (Opening, closing), so it matters when you run it
- I am attempting to move from SQL Server to Python - Hoping to get this done in the summer
- i.e. Just an example, let’s say FGA_per_20 is weighted 0.025x and I want to change it to 0.03x (This is a very simplified version of the algo) then in order to test to see if that is a “good” change, I need to “run” that formula ON the specific day to see how it effects the results
Infrastructure
- Infrastructure - The biggest thing I learned is that if you are going to do this right, you need to set yourself up for success when it comes to consistent improvement. You’re not going to magically come up with a formula that is going to beat the books, but you may be able to find a niche and stay a little bit ahead of them as long as you keep at it and consistently improve. the ability to improve and adapt starts with the infrastructure. 1st year was a joke, 2nd year fell apart because I didn’t spend enough time and one of the reasons was it took WAY too long to try and implement changes. Optimization tests were clunky and I just couldn’t keep up.
- The Build Out - Notre Dame Stadium ‘97 Addition -We added 20k seats from 95-97 without having to move games (i.e. Bears Champaign) and we did it all for under 60 mil. I’m not saying the architect had the foresight, but a major part of this was that he created a stadium that was relatively easy to build upon without disrupting the core. The ability to build the algo without going crazy or spending every waking hour is essential and probably what I would consider the facet of Popalgo I am most proud of even though I am not close to where I think I can be (4.0?) Yes, more proud of this aspect than I am the actual results from Dec9-Feb29. Results don’t matter.
- Let’s keep using FGA_20 and results (Pts_per_20) as examples. Right now, those are the only two specific stats I’ve mentioned. It’s much easier to think about it in a way where you start small and build from there. If you just had these two and ran a shit load of tests to be able to weigh FGA_per_20 vs Pts_per_20 slightly better than the books, you are still going to lose money no question. Realistically, you may be able to move it from 50.0% to 50.1% over the course of a season and that 0.1% is probably a stretch. You would also need to get in at opening and be super diligent etc. Basically, you won’t be able to get to 52.5% (Realistically need around 54-55% to make it sustainable)
- Luckily, those are just 2 data points, you need to build off this and keep improving the algorithm GnC one stat at a time (you can’t really go one at a time because there’s too much, but its easy to think about it like that. So for our example, let’s say we were also given 3PA_per_20 and 3PM_per_20. Now we can run GnC best fit tests using all 4 of these data points to get a slightly better number against the books in both win/loss% and Line_Movements. You just rinse, wash, repeat, but you need to make sure you create a system to easily add, test, and modify the algo based on different data additions and parameters set. As you can imagine, it gets very hairy and complicated very quickly, and THAT is where you can get that +4-5% needed to make it sustainable
- The Weeds & The Spiderverse -The ability to build and improve is held in such regard because pretty soon, you will start getting to the weeds section. Tyler asked me one time if the boys could do this stuff, and the answer is unequivocally yes. You would all come to the same conclusion and get here eventually. There is very little that I am doing that is math-y. Some things surrounding averages, certain standard deviations samples mostly dealing with outliers that I had to hop on reddit for math help, but for the most part, its nothing special and you would all get here eventually if you had to. What I have going for me is the tech part, but that is a huge part of it as you can imagine. Basically, as you would build stats different stats and run best-case tests with every stat, you will eventually come to the weeds, and this is where you separate the men from the boys as the boomers would say.
- Let’s continue to use our examples and let’s also add the hypothetical that it is December 8th, 2023. So we are weighing out teams Field Goal Attempts, Points, 3PA, 3PM, and then lets add any stat you can think of. Keep in mind it’s 12/8 and every team has racked up 7-8 games at this point (I found a way to only track D1 v D1 games) You also can add Possessions_per_20 which is basically Pace of Play (Popalgo!) and the maybe Efficiency, FTA/FTM, and then of course the defensive equivalent of all these stats and now you have about 12 data points per team collected over the course of 8 games. Now keep in mind, these are non-conference games with a relatively wide range of matchups (i.e Duke vs Abilene-Christian) Okay, now what? You basically hit a point of diminishing returns where the stats aren’t helping and no matter how many best case scenarios you run including all of them, it doesn’t move the needle nor give you an edge in determining future results. How do you take these basic stats and make them more powerful while also adding the other data available to in order to come up with the best possible number? This is what I call getting into the weeds as it takes the data available, adds historical data, and then takes it to a whole new level to come up with a number to beat the books. THIS part is really what separates the men from the boys. The last one was more the babies from the toddlers.
- Groupings
- Matchups
- Trends
- Macro
- Historical
- Basics/Groupings/Trends/Macro/
- Historical
- Matchup
- Groupings
- Let’s continue to use our examples and let’s also add the hypothetical that it is December 8th, 2023. So we are weighing out teams Field Goal Attempts, Points, 3PA, 3PM, and then lets add any stat you can think of. Keep in mind it’s 12/8 and every team has racked up 7-8 games at this point (I found a way to only track D1 v D1 games) You also can add Possessions_per_20 which is basically Pace of Play (Popalgo!) and the maybe Efficiency, FTA/FTM, and then of course the defensive equivalent of all these stats and now you have about 12 data points per team collected over the course of 8 games. Now keep in mind, these are non-conference games with a relatively wide range of matchups (i.e Duke vs Abilene-Christian) Okay, now what? You basically hit a point of diminishing returns where the stats aren’t helping and no matter how many best case scenarios you run including all of them, it doesn’t move the needle nor give you an edge in determining future results. How do you take these basic stats and make them more powerful while also adding the other data available to in order to come up with the best possible number? This is what I call getting into the weeds as it takes the data available, adds historical data, and then takes it to a whole new level to come up with a number to beat the books. THIS part is really what separates the men from the boys. The last one was more the babies from the toddlers.
- The Spiderverse - The above sections are listed out but its not that simple and the more you think about it, the more everything is interconnected.
- For example let’s just take FGA_per_20 as our ONE data point. You can run that in trends (3_5_7, 1_2_3, 5_10_20, etc), you can run that in groupings (Tier1_Tier3, slow_fast, T1slow_T2_Med, etc). Historical is connected in groupings with like minded teams in the past, trends can be tied to groupings, trends can be tied to matchups, historical, etc.
- All of these different pods can be intertwined as they are not mutually exclusive. This just makes things so much more complicated to track and test, but is absolutely necessary in trying to beat the books because you know this is the type of shit they are doing, but I promise you they are much more buttoned up than I am.
- Track & Feel - In order to make this sustainable and keep up with the books, you need to constantly track and improve. The Algorithm on December 9th is WAY different than the algorithm on March 10th. Now, in order to keep a sane head, keep it sustainable, and re-create results for 4.0, you need to track religiously. For example, the main algo from Feb10-Feb28 (n=850) would have produced slightly less successful results if ran from Dec9-JaN9 (N=700). That means if I ran this again, I need to either figure out how to tie the two together to maximize results or I need to introduce the changes throughout the year and use the timeline as a basis. This requires an meticulous level of tracking and planning which I knew beforehand.
- Changes
- Tweaks vs Adjustments
- Tweaks/Additions
- Major Adjustments
- Jan 8th
- Feb 9th
- Mar 3rd
- Tweaks vs Adjustments
- Timeline Tracking
- Changes
- Closing Line vs Final Score - In the movie Imitation Game, BC is doing the exact same thing I am in trying to create an algorithm in deciphering the Nazi code. In his defense, he was literally starting from scratch. Definitely a turing point in history when it comes computers and technology. Anyway, there’s a scene where a switchboard girl slips to BC and co. at a dance no less that the girl Nazi switchboard girl signs off all her messages with L-O-V-E. This information led to Knightly and crew sprinting to the lab and breaking the code within like 5 minutes. What does this have to do with P3.0? I was running a lot of the above testing out different weighs and I was weighing the final score very heavily overall and 100% in A LOT of the calculations. Just like BC/KK used the L-O-V-E information to seriously cut down on the number of combinations the computer had to test aka “Guess and Check”, adding a closing line number expedited the process and made things far easier. The std dev of final scores is about 10.5 but the std dev of closing number is a much tighter 4.5. This allowed us to expedite the process making it easier to get the mendoza number. “You don’t have to be faster than the bear, just faster than your hiking buddy.”
Observations
- Results Don’t Matter - this is really the only “strategy” and its more to stroke my ego and feel like I contributed something with my sports knowledge. In reality, its all about weighing individual stats, and this was a direction I started out in, and it ended up working to some extent, hence I kept building on it. Obviously, the “strategy” portion can only take you so far as you will find out in the “What went Wrong Section Late Feb-Early March Section”
- Above, we used the FGA_per_20 example and weighed it out. In reality, that is a major stat used and one of the strategies that popalgo does differently than the books is the TMT mantra of Results don’t Matter. Popalgo weighs attempts more than actual results. The idea was that there are 362 teams and about 3000 players that get meaningful minutes and they’re all basically the same. If a team scores 80 pts per_20 (+1 stdev) on 28 fga_per_20 (-1 stdev) the algo is going to lean toward the team regressing to the mean sort of speak. Now, it’s difficult to isolate this exact strategy as a key difference, but I did spend a week one time trying to see how to get slightly closer to opening line, and lowering attempts importance which in turn raises results importance, got me closer to opening line, but also lowered overall win% across the board along with line movement predictions.
- Results Don’t Matter _ Part 2
- This is
Results
- Dec 9th - Jan 23rd
- The Fund
- Why start the fund - Data
- The Results
- The Good
- The Bad
Today's Plays (1/25/2024)
Tomorrow's Plays (1/26/2024)