설명
주요 학습
- Incorporate design objectives into AI Generated Content
- N/A
- N/A
발표자
- VBVishu BhooshanVishu is an Associate at Zaha Hadid Architects. He co-administers the Computation and Design group (ZHACODE) in London. He leads the development of a state-of-the-art, proprietary computational code framework to synthesize high-performance façade and roof geometries and consequently enables their structural optimisation, parametric modelling and coordination with Building Information Modelling (BIM). The framework also assimilates field-tested research and development in early-stage design optioneering, robotic construction technologies, and digital upgrade of historical design and construction techniques in timber and masonry. Additionally, the framework powers applied research in emerging technologies of machine learning and artificial intelligence, geographic information systems and spatial data analytics. Since joining Zaha Hadid Architects in 2013, he has been involved in several design competitions and commissions ranging from research prototypes, products, galleries, stadiums, metro stations, residential buildings, masterplans as well as designing for the metaverse & gaming industry. Vishu is currently a Lecturer at Architectural Computation, Bartlett post-graduate programme at University College of London (UCL). He has taught and presented at several international workshops and professional CAD conferences. In the past few years Vishu has received awards for excellence in computational design and research, such as the '2022 Digital Futures Young Award', and the '2022 Best Young Research Paper' at the International Conference on Structures & Architecture (co author), while publishing many more research papers in the field over the last decade with ZHA.
VISHU BHOOSHAN: Hello, everyone. I'm Vishu Bhooshan from Zaha Hadid Architects, presenting this session on tectonics via AI at Zaha Hadid Architects, and this session is sponsored by HP. I'm affiliated to these two companies or institutes. One is the computation and design group at Zaha Hadid Architects, and the second one, where I'm teaching as a lecturer at the University College of London, a part of the architectural computation course.
In both of these roles, I kind of do research in computation and design on a day to day basis, but also looking at ways to disseminate knowledge via these kind of conferences and also via teaching at workshops and universities. A bit about the team I work for at Zaha Hadid Architects. CODE is an acronym for computation and design. It was started in 2007 by Patrik Schumacher, JJ Bouchon, and Niels Fisher. It was started as a project independent research group looking into novel technologies of digital and robotic manufacture initially and also at the geometry processing and rationalization of geometry.
So as you can see, it was initially looking at smaller scale pavilions to understand the technologies, both on the design creation side and also design delivery side. But as the research has matured on both the delivery and the technologies for manufacture, you can see that the application scale has gone up from pavilions to buildings, like arenas and stadiums to larger scale masterplans. The team is currently of 20 people. Most of them are architects as their background, but with various interests in computational technologies from people interested in geometry processing, machine learning, looking at architectural geometry and fabrication, parametric detailing, robotic 3D printing, et cetera.
Typically, a research trend kind of matures in this way in the office. So you start off with a research topic, which is project independent, try to develop toolkits associated with it specifically for design, and these get tested out by pilots or special interest projects. Once it has matured and has the robustness as checked in a small scale project, then they are deployed onto larger scale projects across the office.
The four strands of research we are currently having in the team at various levels of maturation is high performance geometry, looking specifically at architectural geometry, which is structure and fabrication aligned, and looking at applying it into this is the oldest strand, and this is now getting applied into large scale projects. The second one we are looking at is participatory design systems, which also include game technologies, and this one has also been around for, like, eight to 10 years now. And slowly, the same set of toolkits are also being embedded into the web and metaverse.
The presentation today will focus on the last part, which is machine learning and AI, and that's more recent, but still been around for three to four years. So the agenda for the presentation is looking at tectonism via AI. So we look at our early beginnings, what AI can do, what we did with AI, and what we want to do with AI, and what we are currently doing with it, and what next is in the pipeline for us and outlook. And I conclude with the summary of all of these in the end.
So what AI can do, it's an early beginnings and pilot collaborations, like with any other research trends we do in the office. So we start. This is basically looking at early stage into image generation pipelines using AI. So we started looking at the GANS based on photo data sets we had in the office, based on build projects, like interior photos and exterior photos, also, augmented slightly with what is available on the internet, so as to create these kind of quick animation of blends between the various OR projects in the office, so as to get our feet into it.
Then we started looking at diffusion models, looking at how giving a prompt and, again, training it with the data sets from the office or augmenting it with data sets from the office to create these kind of image outputs. At that time, we were looking at-- when we started, we were looking at DALL-E, DALL-E 2 by OpenAI, Midjourney, Stable Diffusion, and more recently, also, Adobe Firefly. As with any other research trajectories, we generally collaborate with the pioneer in that field to understand the technology behind it a bit better. So in this case, for the diffusion models, we did a collaboration with Refik Anadol Studio called architecting the metaverse using a pre-release of DALL-E 2.
So with the data set, which was used to train, or augment the training data set, was the data sets we had in house. Apart from the images, we also added, in this case, 3D model data sets, adding renders as well so that we had a large array of data sets. And it was augmented by some-- datas which were available on Flickr Images of buildings.
Once the training was done, it generated these kind of spatial outputs, or geometric image outputs, which could be, which is similar to the aura of the office. So it's related to the kind of geometries we generate in our designs. Once we had these kind of various outputs as the 3D side of things were still not mature enough, or it's still under the process of getting developed, we were looking at designerly ways of how we could recreate these in the interfaces we normally design, like Autodesk Maya.
So these were some early tests of how these spaces could be reinterpreted using a designer input. And we also created these kind of 3D models based on the AI images to understand the spatial quality and spatial quality aspects of the spaces generated by the images. Those were the initial beginnings. Next, we'll look at what we are doing with AI. So or one of them is, like, because it's image based, it was-- we wanted to use it as an early-stage design assist tool.
So we built based on Stable Diffusion checkpoints, we built our own LoRA models on top of it, looking at various models for exteriors, various models for master plans, various models for interiors and graphic design. Now, this gives an overview of the various models we have for the exterior facade systems, kind of categorizing our 3D geometries and renders based on criterias of program, structural system, louver, and facade system as well, and trying to create these various LoRAs so that we can call upon them on a specific prompt, or create a blend, which you will see in the subsequent slides.
Similar kind of LoRA models were set up for graphical learning so that we could create these kind of graphical outputs quickly as well. All of this was done by-- enables us to integrate into design pipelines because of accelerated training provided by hardwares we have in the office, like NVIDIA RTX cards and HP hardwares.
What it's enabling us to do is to each of these LoRAs to be trained in 45 minutes to 1.5 hours, so which is very quick. And it also does these three levels, or three methods. Like, either you create a lot of only one model, or you do a combination of them. And all of this takes about one hour as average. And how it's getting embedded into design assist tools is like, so you have a segmentation map, and also the control at canny edges, and then you have these prompts which will-- which are also specific tags.
We'll see how it was done later on. But then based on these tags, it generate an output. So for the same set of inputs, we are able to quickly generate variations. So it becomes very useful for early-stage design where we are weekly able to generate multiple iterations of options, and then pick the one we want to go ahead with. Same thing is done in masterplan models as well. So we have the same methods of training on existing masterplans we have in the office, and looking at combining multiple LoRAs or make a single LoRA.
What is a interesting aspect of this is we are also looking at weighting the weights of the prompt versus the weights of the diffusion model. And that also generates a wide range of iterations of options. And then, which enables us to choose how to develop it further, as you can see on the right. As the weight becomes more, it is becoming more and more towards the images we trained from the office data set. As the weight is less, it's giving more generic outputs. Similar kind of things for masterplans, another view of it. Again, various outputs based on weights.
Now, getting a bit more into workflows and use cases. So as I mentioned previously, we're looking at tagging an input image with specific tags. So the tags in this case, we're looking at typologies, like in this case program, which is commercial offices, residential, sports, hospitality, et cetera. But we are also adding tectonic tags to it, like whether it is going to be done by timber, or robotic hot wire cutting, 3D printed, et cetera. And these tags become part of the prompts to generate the outputs, as you can see. They become keywords like office, facade, et cetera.
We are also looking at control nets to get a bit more refinements of the images. So we're looking at canny edges, depth map, and segmenting maps, and normal maps to kind of create the defined-- much more refined images. So the style library, as for in this case like an external, for giving an example of the external LoRA, so it takes an input as a 3D machine, and the user has a choice to choose from these external LoRAs.
So they can choose either one and then get a generated image, or they can create a combination of multiple LoRA models, and quickly generate variations. So this is accelerating the design process to feed in early stages so people can then accordingly develop their designs based on what they're seeing in their images.
Similarly, these are various outputs for materiality. Again, for early stages of design, looking at glass, concrete, timber, and giving an input to the designer where they want to move in terms of materiality. These workflows are also enabling us to do these kind of small snippets of videos, which were then subsequent. This is from a project for the Dongdaemun Plaza Museum in Seoul, where we created these NFTs for them, which was like small five-second, 10-second animations. And also, looking at the various tectonics of the same space, whether it was in concrete, if it was timber, et cetera.
How to get deeper into how we are integrating this into the softwares we use in the office. So we have our own software-agnostic spatial technology stack called zSpace, which is the core framework which we develop independent of the softwares. So all the methods and the logics are in this-- in the core framework. Then, it is easier to create extensions, or plugins, for platforms like Autodesk Maya using their API, also NVIDIA Omniverse, et cetera, to create applications in these specific softwares the rest of the office uses.
We use a lot of Maya, Autodesk Maya, for early stages of design. So we integrated this toolkit, the workflow in there. So as you can see, this model picks up an image from the viewport of Autodesk Maya, and the plugin gives this interface on the right to do the LoRA, to pick the LoRA model, the prompts, and the waiting to get output image based on Stable Diffusion.
Now, this also showcases a quick video so you can, as the designer is making some changes on the right, you're also getting a quick preview of what those changes entail in terms of spatial quality. So the designer is able to quickly get this feedback, and make design changes accordingly, in this case, looking at the interior LoRA. We also integrated the shape modeling techniques with other Diffusion models like Midjourney and DALL-E as well. So this is giving an example of how it works with Midjourney.
A similar kind of example, looking how Autodesk Maya is integrated with the masterplan LoRA. So in this case, the designer is making changes to a masterplan massing, and then quickly able to generate these kind of images on the left, just to get the visual feedback while they're designing. We are also looking at pixel streaming, where in this case, the designer is doing segmented models. And then based on the segmentation, either drawn or modeled in Maya or-- Maya or other design softwares. And they are able to quickly look at the outputs on the left.
So that's what we are currently doing, like previously done. And then now, we are looking into what we are most recently doing with the AI in the company. So that is to teach tectonics to AI. A bit about what is tectonism. Tectonism is a subsidiary style of parametricism, the design theory, or the design background, for the projects in the office. So what it does is specifically is to make visible in shape and stylistically heightening performance criterias. These performance criterias would be structural, fabrication, environmental, spatial, et cetera. And that's what forms tectonism.
And through the projects developed in the office, we are known, what are the benefits of it? It's high performance in terms of geometry, but also in terms of user experience and interactions people have in these projects. So this is a project, our project with the highest atrium in the world, [INAUDIBLE] SoHo. And we are learning from these-- the benefits of tectonism.
Other benefits are also tectonic projects are also structurally aligned. So this is a project we developed with block research group, incremental 3D and [INAUDIBLE]. We're looking at 3D-printed bridge, which is standing in pure compression. So again, highlighting 3D printing tectonic aspects into a structural system, which is standing in pure compression.
Like this enables it to be using less material, but also plugs in novel contemporary fabrication technologies like 3D printing, and combines with wisdom of ancient wisdom of masonry. To understand a bit more details on this, please join my session on unifying workflows, where I get-- delve a bit more into detail of high performance geometry and tectonism at the titled Unifying Workflows With Open USD.
So how are we teaching tectonics to AI? As previously mentioned, for the diffusion models, we wanted to embed these features, which is structural features, environmental features, fabrication tectonics. A previous workflow, as we saw, was taking an input image, whether it is a depth map or an image, and then subsequently, it generates s output based on a prompt. What to do the teaching of diffusion AI tectonics to the diffusion model? We had to introduce these steps of having a structural AI model, a fabrication-related AI model, and environmental AI model.
To create the data set, this was done in a first experimented in a workshop in digital futures at Shanghai, where we were using Amoeba and Peregrine as our topology optimization software to create this data set of 3D models. So one, Amoeba, is giving a bit more solid 3D geometry while Peregrine gives you a single-line optimized-- topology optimized center lines. This was also combined with structural models, which were available on the publicly available-- so as to get a bit more understanding of what are the various types of structural systems out there based on material as well.
And subsequently, we developed tectonic details, which were related to the materials we wanted to explore for these towers, which was timber, concrete, and steel. So multiple geometry and data sets were generated using these tectonic principles based on material. And for the environmental aspects, again, we were looking at various scenarios, whether it was south facing, west facing, also programmatic distribution in the tower, whether it was living spaces, or office spaces, and also considering floor heights.
Based on those criterias, we also were able to set up tools to make parametric variations, looking at the-- what would be the balcony system or the lower system, et cetera. We generated these kind of models, which could be used as a data set. And this is an rendered output of one of such data sets. So how are we doing tagging it into for the training was to, again, pick up like a 360-degree view, multiple images, where you can also see the angle and the position of the camera used as a tag.
But more other-- apart from that, we were also embedding tags of structures. So anything with HDR, like Amoeba or SDR Peregrine showcases that was the structural model. And all the other things of the shape of the geometry, et cetera, we also used as tags. And similar kind of thing was done for the fabrication side, looking at digital timber, concrete, metal, and also sub sites inside of that, whether it is developable surfaces, whether it is glulam, bentwood, et cetera, to give a bit more details into the fabrication side.
And similarly for the environment, it was doing the same, looking at whether it's horizontal or vertical louvers or balconies, where they are located, what side they are located in. All of those was kind of embedded as training tags. So this gives an overview of the data set and the models, which is and then for all the three AI models we kind of built on.
And then subsequently, this was trained using the model was Kohya diffusion model, and then used using NVIDIA RTX cards, and to create the-- yeah, the very quickly train these models in 30 to 40 minutes. And these are some of the outputs of AI visualization on the right. It was also powered by workstations given by HP. So this also enables us to speed up the process of training.
Some visualization of the outputs is like a video showcasing how based on these prompts, we are able to quickly generate variations on six tower models. I'll showcase a bit more detail in subsequent slides. So this is looking at how the same model was used to create tectonic details in visualizations. And this is a video which showcases the workflow, like overview of the workflow, how it was developed, and generating output.
So as I mentioned, these are the various data sets, the structural fabrication and environmental data sets. To get a bit more understanding, this is showcased in the HP booth. Please do visit us so where we can give you a bit more insight into what's happening in the back end. So once this training was done, you will see an initial test of 3D-- 2D to 3D was also tested. But that is still in very early stages.
But you can still see what was the outputs we were getting. But the more substantial results were related to the AI visualization side, which was able to create these kind of variations very quickly based on the tags and the prompts.
So moving on, what we want to-- what we are currently-- what we want AI to do next is to assist in creating spatial content creation for cities. So what does that mean? So to get started, we wanted to also look in AI with 3D. So what could be geometrical learning, how we can teach geometrical-- learn from geometrical features.
So one of the examples we were looking at, again, was topology optimization. Typically, topology optimization takes about 20 to 30 minutes to set up and run the simulation. Because it's a finite element analysis, you also have to create a high resolution geometry, which is not very conducive for early stages of design. So we were looking at whether we can actually predict topology optimization results, even if it just gives you two classes that you need material here and no material somewhere else.
And if it could be predicted quickly, then it-- even if it is not 100% accurate, at least it's pushing us in the right domain. So we were trying to learn, or make these predictions, based on locally-calculable geometrical features like mesh distance to boundaries, angle of the load, et cetera. So this was done on a training set for chair, a chair we designed, with topology optimization. And as you can see, the prediction currently varies anywhere between 60-- 70% to 85%, which is already in a good ballpark for early stage of design.
But we want to refine this further as we develop a bit-- make it a bit more robust to be more accurate and work on a larger data set. Why we want to get into the 3D side of it is to kind of assist in spatial tileset creation for cities or districts. This is from a game we recently developed with Epic Games. But we want to use these tileset creation, and use AI in assisting in creation of these tiles. So what would AI require for that would be the data sets of procedurally-generated content.
In this case, models which have a-- again generated in Maya, which has a sequence of 3D operators. So a large language model can easily pick up, or learn, the sequence of operators, and try to recreate them, or create a combination of them, to quickly create these towers, or but also learn because we have most of the projects are designed in Autodesk. Maya, which stores a history of operators, we can actually expand our data set for all the 3,000 to 4,000 projects we have in the office and classified across various typologies like towers, bridges, cultural buildings, et cetera.
And the data set could also be assisted by these kind of gameplays. This is Project Merlin we developed with UEFN to create, again, based on tilesets, creating spatial assets for buildings, but also landscape. So these could also be used as data points for training, because it is procedurally done, sequentially done.
Again, so to create these large language models learning, we can really need it to be-- we can use existing language models, as you can see on the left by NVIDIA's ones. But we also require the hardware, which enables that. So HP is able to provide us with such hardwares to create these kind of training data sets, or to do the training on the data sets we have.
The large language models, as I previously mentioned, can also learn sequence of operators for the tectonic aspects, not only creating a shape, but creating shapes, which are tectonically aware, whether it is for 3D printing, robotic hardware cutting, digital timber, et cetera. So we have built based on projects we have done previously. We have this extensive set of methods, which already does this procedural generation. So we want to train large language models on those so as to they can create 3D geometry.
The goal would be once we have such geometries, or tile sets, was also to do AI to use them to create these kind of combinatorics. So create variations, which generating variations from simple tile sets, so and then making a evaluation check of which one is feasible, and which one is not. This gives an example of the combinatorics we did for a housing project. This is just showcasing the application's not AI generated.
With the goal, why we are developing that, would be to generate these kind of based on simple tilesets to create also aggregations for city districts, which is where we are getting towards, whether it is based on programming differences, whether it's housing, or offices, and also how it integrates with existing conditions in the city, whether it's a water body or landscape, et cetera.
So in summary, we looked at how we are using AI diffusion models to create these spatial models, which are similar to the projects we have in the office, or the style or aura of the office. We looked at how AI and we are teaching tectonics to AI, how the AI is integrated-- we are having structure fabrication models into integrated into diffusion models so as to get these kind of early-stage visualizations of the structure, of the environmental features like louvers.
We also saw how AI is being accelerated now with integration with AI models, which are publicly available, and also great improvements in the hardware by companies like HP. And that's enabling us to accelerate the training process and saving us time. And we also saw how AI and 3D could be-- how AI is becoming more 3D with crowdsourced content, but also procedurally generated data sets, which can be trained with large language models so as to quickly generate tile sets and combinatorics for cities.
So yeah. So let's join and collaborate to create these blueprints for future cities. This session was brought to you by Z for HP, and AMD, and NVIDIA. Please visit the booth of HP where we can show the showcase the various solutions they have for AI, and how it could be integrated into your workflows.
But also, visit the booth if you want to have a bit more understanding on the tectonics for AI part, which is also showcased as 3D prints and data set models over there. So please visit the booth. Hope to see some of you there. Thank you. Thanks a lot, Robert, for your attention.