Description
Key Learnings
- Learn the basics of large language models and how they can generate new data for AEC.
- Learn how procedural design techniques work with tile-based design systems.
- Learn how optimization and LLMs can be coupled to generate new building designs.
- Learn about intuitive, no-code design exploration and generative techniques.
Speakers
- AGAdam GaierAdam Gaier is a research scientist at the Autodesk AI Lab where he pursues research at the intersection of evolutionary and machine learning. He received master's degrees in evolutionary computing and robotics, and a PhD focused on tackling expensive design problems through the fusion of machine learning, quality diversity, and neuro-evolution approaches. His work has received recognition at top venues across these fields, including a spotlight talk at NeurIPS (machine learning), multiple best paper awards at GECCO (evolutionary computation) and AIAA (aerodynamics design optimization). His current research focuses on the use of large language models and evolutionary optimization for architectural design.
- JSJames StoddartJim Stoddart is a Principal Research Scientist in the AEC Industry Futures group within Autodesk Research and a core member of The Living, an Autodesk Studio. His work explores how new design, construction, and material technologies will lead to a better built environment for our planet and its inhabitants. With Industry Futures, Jim has collaborated with a wide range of industry leaders to demonstrate the value of new design technologies—including bio-materials, machine learning, and Generative Design—on real-world applications and built projects. He has authored several peer-reviewed papers, holds patents for key generative design applications, and contributes articles to industry publications. His design projects have been widely published, won multiple AEC design and innovation awards, and have been exhibited at international venues.
- LVLorenzo VillaggiLorenzo Villaggi is a Principal Research Scientist within the AEC Industry Futures group, Autodesk Research. His work focuses on novel data-driven design approaches, reusable design intelligence, advanced visualization and sustainability. Recent projects include a net zero carbon and affordable housing development for Factory_OS in California, the NIS engine Factory for Airbus in Hamburg, the Alkmaar Affordable Housing District for Van Wijnen in the Netherlands, the Autodesk Mars Office in Toronto, and the Embodied Computation Lab for Princeton University.
JIM STODDART: Welcome to our session, TileGPT, Generative AI Tools for Intuitive Building Design exploration. First, the safe harbor statement, just to quickly note that everything we'll be presenting today is a research prototype. None of these features exist on any current or future product roadmap at this time. So please do not make any purchasing decisions based on what we're sharing today.
A quick round of introductions. Adam Gaier is a Principal Research Scientist within the AI lab at Autodesk Research. My name is Jim Stoddart. I'm a Principal Research Scientist in the AEC Industry Futures Group at Autodesk research. And our colleague, Lorenzo, is also a Principal Research Scientist within AEC Industry Futures and will join for the in-person session at AU.
Today, we're going to show you some ongoing work within Autodesk Research to explore how large language models can be combined with generative design to create novel ways of exploring and interacting with a data-driven design process. The roots of this project started as a collaborative research with our customer, Factory_OS, who is a leading manufacturer of volumetric modular housing based in the Bay Area. With them, we are investigating how tools like generative design could accelerate their development process and trying to find opportunities to provide intuitive solutions to work with their catalog of modular housing units.
For context, here's a video of a Factory_OS building under construction. These volumetric units are built and fully finished offsite, in Factory_OS's facility in Vallejo, California. Models are then trucked to a project site. They're lifted and stacked in place. And then, finally, utilities are connected. And the facade and the roof are installed. In the end, you have a complete building in less time than a conventional on-site construction.
Our investigation with Factory_OS was exploring the opportunity to do what we called "multi-scale generative design," where a project could be quickly configured. You select a location, a building site, a pro forma, and then automation would generate and explore design options. This included site layout, with multiple buildings and landscape. It would populate each of those buildings with Factory_OS units and generate structural framing. Each of those designs would be evaluated and compared based on competing objectives around cost, carbon, and habitability. And an optimization loop would identify and elevate high-performing solutions as good candidates for construction.
Zooming in, the problem of designing with a unitized construction system might seem like an easy task. It's like stacking LEGOs, after all. But, in reality, these combinatorial problems are really tough to solve. There's a huge number of possible configurations. And there are rules that constrain certain combinations of modules. All of this makes it hard to fit within a typical parametric/generative model. And while you could build a complex model that embeds all these rules, these can quickly become unwieldy, and difficult to maintain, and difficult to generalize to new problems or projects.
So we asked, what if we could make a smart and generalizable system that designs with rules, but that doesn't require any coding or explicit writing of those rules? We call this idea "example-based design." The designer just has to provide a catalog of parts and create some examples of how those should fit together, functionally and formally. And the rest is handled by automation, which can digest each example, extract the inherent rules, and then use those to generate new combinations based on those learned constraints. The output is a diverse set of design permutations.
So how does this work? We're using an algorithm called "wave function collapse" or WFC, for short. It's a procedural content generation method that was published in 2016 and that largely adapts earlier work developed by Paul Merrell, in 2009, called "model synthesis." It takes a piece of constrained data as an input. Here are the small pixel image of a branching plant with flowers. It extracts the local adjacency rules and synthesizes new outputs that are locally similar to the original.
One advantage here is this process works with any form of constrained data. And you can find examples in one-dimension. People have generated music or poetry. Lots of examples in two-dimensions for image and texture generation, and 3D. And you can even use it on structured non-grid based data like meshes. This is the basic pipeline for a WFC solver. You have two inputs, a desired output size and one or more examples. The solver learns the patterns, and it will generate an output of the desired size with the learned constraints.
To use this with a 3D modular building data, we just need to encode the design into a compatible input format. Here, we're capturing the module ID, its position in the design, and any sort of transformation, like a rotation or mirror. On the output side, a decoding step translates the output data back into geometry. And we can also do things, like compute metrics directly from the raw data, without the need for geometry or simulation. Lastly, to make this useful within a generative framework, we need to also build in controls.
For this, we've included variable weights that can increase or decrease the frequency of a tile appearing and pre-constraints, where we can have the solver start from some known configuration to induce the formation of a specific geometric feature. Here, you can see it creates a staggered configuration on one end of the building.
So why would we use this process instead of a conventional parametric model? Mainly because it's intuitive. All you need is a single example design, and no code, or no node graphs. And, from that, we can automatically generate each of these variations. We can also take it further by inputting multiple examples, and the system will generate hybridized options that combine features into novel arrangements.
We can also combine this with a unit catalog with pre-computed values to rapidly compute metrics from a design without the need for expensive simulation. Here, we get floor plans, cost, pro forma, and embodied carbon instantly, as the model updates. As part of our work with Factory_OS, we expose this functionality in Forma via an extension, which will allow them to quickly evaluate new project developments with buildings generated from their catalog. For more info on this work, please check out our other class, From Prototype to Platform, Delivering New Design Capabilities on Autodesk Forma.
This idea of no-code example-based design was pretty exciting. And it led us to ask why couldn't we take the intuitive control of the system and apply it to more complex design problems that are even harder to represent with a conventional generative model. We wanted to know could we go from a building layout, which is a fairly straightforward 2-and-1/2 D problem, to a much more complicated full-site design that included site layouts with multiple varied buildings and landscape design.
So why design at the level of the site? Well, we know that sustainable design benefits from a holistic approach, not just designing each building independently. Almost 40% of the carbon in a new development exists outside of the building envelope. And factors like the arrangement of buildings on a site, through effects like self-shading or impact on microclimate, can significantly impact the operational energy of individual buildings.
So we developed a new tile set that included new building tiles that could create more varied apartments and building layouts, while still being compliant with volumetric modularization, and new green space tiles that could generate a variety of landscape profiles and depths for supporting larger carbon sequestering plantings. Here, we see some example sites being generated by the wave function collapse solver and some of the variation possible within the new tile set.
But more design flexibility comes with a cost of increased complexity. We went from eight to 10 tiles in the building design space to 38 base tiles. And to add to that, each tile type can have up to eight variants through mirror and rotation transformations. This results in 212 possible states for every position in our design solutions, each with its own adjacency rules and control weights. This makes for a massive number of potential configurations and a huge design space to search through for optimal design candidates.
While the WFC solver can still handle this complexity, you start to see much longer compute times. And because WFC operates on local conditions, the burden of interpreting or defining large-scale patterns, like the shapes of building footprints, still falls mostly on the user. Tasks like manual fine-tuning of generated solutions for improved performance are almost nearly impossible with this many inputs.
So this led us to ask could we start to use AI to solve this challenge and make navigating potential solutions and performance stress much more intuitive. So I'll pass it off, now, to Adam, to dive into our prototype solution which we call TileGPT.
ADAM GAIER: Thanks, Jim. So our partnership with Factory_OS is not just about improving efficiencies. It's about reimagining the design process, itself. We've taken disparate strands of research and integrated them into a powerful new hybrid approach to generative design. This approach combines the strengths of example-based procedural design, diversity-based optimization, and large language models. Through example-based procedural design like wave function collapse, we generate designs that strictly adhere to architectural constraints.
Using diversity-based optimization in combination with WFC we can explore a vast landscape of design possibilities, with generative design tools taking the best of what the algorithm can generate. Then we fine-tune the language model using this dataset. This allows us to have intuitive, high-level control over designs using natural language queries. So what's revolutionary here is how these elements feed into each other. The language models output serves as high-level guidance, which then goes back into our WFC system for refinement and constraint satisfaction.
This approach draws from topics we've been investigating at Autodesk Research for some time and packages it all into an integrated workflow for generative design. By marrying the low data requirements and strict constraint satisfaction with the intuitive control mechanisms of language models we're unlocking a new paradigm in generative design. So let's pull back the curtain and delve into each-- how each of these elements works under the hood.
So let's start with WFC. Wave function collapse acts on a grid, with a goal of filling all the cells with tiles that obey adjacency rules learned from samples. So we can set some initial tiles by hand, for instance, designating the bottom tiles as the street of our site. Now, within each cell, we track the probability for each tile type. The probability of each tile type is set at the start of the algorithm. And then the space probability is the same across all cells.
Any tiles which are not allowed because of the neighboring tiles, these adjacency constraints, are removed from the probability distribution. And the probabilities of that cell are calculated. So these same basic probabilities are applied to every single cell in the grid, all the same, and then adjusted for any neighbors that they have.
We then compute an entropy score for each cell, which corresponds to how certain the outcome is, depending on the probability distribution. Then, from those cells with the lowest entropy scores, that is, the most certain, one cell is chosen. A tile type is chose based-- chosen based on the probability in that cell, and the tile type is fixed or collapsed.
Once the tile is collapsed, we update the allowed tiles of the neighbors, updating their probabilities, and repeat this process, choosing the minimum entropy cell, collapsing it to a legal tile, and updating the probabilities of the neighbors. And we repeat this process until the grid is entirely full, giving us a site layout.
Now, a generative pretrained transformer or a GPT model also works on a discrete space, a vector, which we're going to fold into a grid. Language models like GPT operate on tokens representing single words, letters, or parts of words. In our case, the tokens will each represent a tile type. So, again, we're going to give the model a starting point. Here's the first street tile of our site.
So, while WFC chooses tiles to collapse based on an entropy calculation, GPT chooses tiles one at a time, in order. And, like wave function collapse, each tile type has a probability of being chosen. But how that probability is derived is different. Rather than only considering the adjacent tiles, a GPT model considers all the tiles produced already. These tiles are called the "context."
Right now, we just have that initial tile in our context, but it's the tiles gathered here, not just the immediate neighbors, which determine the probabilities of placing each tile type in the next position. To generate a new tile, GPT model reads in the context and, based on the statistics learned during training, adjusts the probability of each next tile type. It then choose the tile based on that probability distribution. The new tile is added to the context, and a new probability distribution is calculated.
The probability of each tile changes after each token is added until we reach the end of the context length, at which point, the context has shifted and only the most recent tiles used. This global awareness, knowing everything that came before, combined with the local awareness, knowing what's adjacent to each tile, gives us the kind of control we just couldn't get in WFC. So, if we're interested in site-level features, like a lot of small parks, we can ask for that.
And the model will place more individual patches of green space on the site. Because wave function collapse only works on immediate adjacencies, it could only ever give us more or less total green space, not affect how that green space was distributed over the site.
So how concretely do we ask for something like a lot of small parks? Well, we're going to be using something called "cross-attention." We use a text encoder, another pretrained model to convert a phrase like "a lot of small parks" into a vector of numbers. This vector is integrated into the model, so that, when determining the probability for the next token, the model looks at both, at the previously generated tokens and the numerical version of this prompt.
When training the model, we can label each site in our dataset with these natural language prompts, teaching the model to associate certain prompts with certain site attributes. Then, when we want to generate a site with certain attributes, we can just ask for it in natural language. So rather than trying to tune tile weights or particular building elements, we can simply ask for high-level features.
So you see that, at a high level, both wave function collapse and GPT operate on a similar conceptual plane. They calculate cellwise probabilities, choose a tile to place, and update the grid as they go. However, when you dig deeper, the differences in their training and usability become glaringly apparent. So this is where TileGPT comes in, at the intersection of these two paradigms. We're weaving WFCs low data requirements and guaranteed constraint satisfaction and GPTs intuitive control mechanisms.
A WFC model requires just one example to train. Makes it very quick, very data efficient. In contrast, GPT models require huge datasets to be effective. WFC ensures that designs strictly adhere to constraints. It is, essentially, a constraint satisfaction model. GPT, however, lacks risk rigidity, operating on probabilistic assumptions with no guarantees.
But WFC offers limited control, allowing you only to influence the frequency of low-level design elements. GPT, on the other hand, provides flexible high-level control through natural language prompts. So what we're proposing is an elegant fusion of these contrasting strengths. Imagine a generative design system that's both data-efficient and intuitively controlled, where WFC's constraint satisfaction meets GPT's natural language prowess. And this is what we're striving for with TileGPT.
So, with TileGPT, we envision a generative design system that marries WFC's reliable constraint satisfaction with GPT's user-friendly natural language interface. And this fusion aims to revolutionize how we approach design problems. But now we hit a roadblock. We need a data set to train this model and not just any data. We need the right kind of data.
So there's two main challenges here. First, sourcing the data, and second, ensuring that the data is both of high quality and represents the variety we aim to cover. So could we use just wave function collapse to generate a dataset, just initialize a WFC model with varied tile weights, and just let it start churning out site layouts, and then label the layouts based on the attributes we care about? Well, it's a good starting point, but this method has pitfalls.
The generated sites often have big quality issues. Think lots of wasted space that could have been utilized for buildings or landscaping. And the attributes that are produced tend to skew towards the average, resulting in a bland data set lacking in extremes, limiting what our model can produce. The graphs to the side here, they show the distributions of attributes you get just by repeating WFC with different weights and random seeds. It only creates a handful of these more extreme examples that we're most interested in, like many units or a lot of sequestered carbon.
And if we don't have interesting examples in it, we can't expect our model to produce interesting designs. So we have a problem. We have a vision for TileGPT, but need the right data to bring it to life. So sourcing that right data is the next leap we need to make. So here's where optimization-based generative design comes in. So we're going to optimize site layouts through a mix of evolutionary optimization and wave function collapse. By optimizing the starting conditions of WFC, that small set of initial tiles we give WFC at the start, and the tile weights, we can push it towards one design region or another.
So we're going to use an optimization algorithm called MAP-Elites, which explicitly searches for diversity, along with high performance, to generate a collection of varied solutions. MAP-Elites searches for new solutions that fill a grid or map, whose bins and axes are defined by attributes. So one axis can have the number of units and, on the other, the size of the largest park. When a solution is generated, we evaluate it, and get a location on this attribute, and store the solution there, one per bin, so sites with a lot of units and very big park, sites with few units and a very big park, sites with a small park and few units, every combination.
And, at the start, we're going to seed the map with some initial solutions generated randomly. And then we can begin optimization in an evolutionary fashion. We can start by selecting an existing solution from the grid, bury that solution by altering the probability weights or the fixed tiles, evaluating it to get its performance and its location in this attribute grid. And then, if a solution already exists in that same cell, the two are compared according to performance.
And the better-performing one is placed in the bin, and the other is thrown out. So, here, we're going to be judging sites by the amount of empty space, preferring those which make more use of the site. Running this algorithm produces a set of increasingly high-performing solutions that span the range of attribute values, giving us a high-performing data-set perfectly balanced across attribute labels. MAP-Elites allows us to generate a balanced dataset of labeled data that's just perfect for training.
So, now that we have our dataset, let's refocus on the generative aspect. Remember, we're dealing with this complex set of 212 unique tiles, each with their own set of adjacency rules. So training a GPT model on this level of granularity not only adds computational overhead, but also distracts from our primary goal, pursuing a global-level optimization and exploration.
In this framework, the role of the GPT model is to act as a strategic overseer, focusing on abstract design choices rather than micromanaging tile adjacencies. So to alleviate this complexity, we've abstracted the tiles into eight functional categories, landscaping, building core, corridors, et cetera. And this abstraction allows the GPT model to operate on a level that's both computationally efficient and aligned with its strength in handling high-level representations.
So we can offload the details to WFC after giving it a rough, high-level design. And here lies a symbiosis; on the left, a layout generated by the GPT model using our abstracted eight categories; on the right, the same layout fleshed out by wave function collapse, accommodating the intricate requirements of 212 tiles. So, in this way, we enable the GPT model to ideate at a high level, focusing on spatial relationships based on prompted features and performance, while relying on WFC for the detailed execution. So, this way, we can marry abstraction with precision.
So here's that generative process unfolded. The GPT model, guided by a text prompt, iteratively selects tiles from the simplified set of eight categories. These tile choices serve as a high-level blueprint. That blueprint is then handed over to wave function collapse, which fills in the details by selecting from this comprehensive 212-tile set.
What we're witnessing is a seamless collaboration. GPT generates the overarching design, while WFC ensures its viability completeness. So let's look at this in practice. Initiating the process with textual prompts, like "a site with many parks"-- "small parks" directly influences the generated layouts to meet those criteria. Altering the text prompt, say, to-- let's put in "a site with many units." This results, then, in corresponding variations in design.
The system is also capable of generating sites with particular performance attributes. So, in this way, we can specify we want "a site with a lot of sequestered carbon." And this guides the generation process towards that goal. And this is just a one-shot kind of system. It's very interactive. Using the technique of inpainting that's in other kind of generative AI workflows for images, you can erase a portion of a site and request that it be filled with an alternate design with specific features.
So here's our first site with a small park and many units. And you get this initial design. But it still looks like there's a lot of space up there at the top we could use for landscaping or building. So we can highlight the area at the top and regenerate it using another prompt, like "a lot of sequestered carbon." And then this gives us some options regenerated from that previous design.
So, down here, at C, it put in a lot of parks, but the number of units is really reduced. And we still need to have some people living there. So I think, for these, I like A the best. So we'll take A, select it. But I think we can do a bit better. I'd like to break up this big building here to open the site a bit. And, but I don't want to lose too many units. So I can highlight this area. And then say I want to fill it in with many units and some sequestered carbon. And then, here, at B, is a site that I'm pretty satisfied with. A lot of units, but still plenty of carbon sequestration.
So these layouts are then exportable for subsequent evaluation and modification in standard design software like Revit or Forma. So here's that site layout we just made put it into Forma and an accompanying daylight analysis run on it. So the examples that we've talked about today mostly center around landscaping and carbon. But the same technique could be used for other attributes, like privacy, or sunlight, or unit mix.
So, today, we've articulated a hybrid model that interlaces generative design and generative AI, taking the best of both. And rather than replacing generative design, generative AI can be embedded within it, making the entire system more accessible and useful. In this framework, generative design takes on the crucial task of synthesizing high-quality training data.
Generative AI comes into play for on-the-fly design generation and interaction, providing an unprecedented avenue for exploration. Then generative design reenters the scene, serving as a touchstone for validation and precise adjustments.
Our approach fuses the direct manipulation capabilities of traditional interfaces with the intuitive nature of natural language input. This offers nuanced control and flexibility, not just in geometric configurations, but also in achieving various performance objectives. The real revolution here is that we're not merely generating designs. We're beginning to build models that learn the fundamental relationships between geometry and performance.
This preempts the need for the aimless, post-hoc scrutiny of design options that generative design often leads us into. Instead, we pave a path for purposeful exploration, allowing for a synthesis of both control directives and serendipitous design outcomes. Thank you.